The document describes a proposed Klessydra-T1 vector coprocessor architecture designed for multi-threaded edge computing cores. It achieves a 3x speedup over a baseline core through configurable SIMD and MIMD vector acceleration schemes. Benchmark results show cycle count reductions for workloads like convolution and matrix multiplication when using the coprocessor in various SISD, SIMD, and MIMD configurations. Resource utilization and maximum frequency are also analyzed.
This document discusses the challenges of building and optimizing open RAN systems for 5G networks. It describes Picocom's 5G baseband system-on-chip architecture using multiple RISC-V clusters and hardware accelerators. Maintaining performance and detecting problems is difficult due to the complex timing requirements across hundreds of users. Mentor's embedded analytics solution monitors the system non-intrusively using on-chip sensors to detect issues like timing overruns and help optimize performance both during development and over the lifetime of deployments.
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...RISC-V International
The document summarizes the Klessydra-T architecture for designing vector coprocessors for multi-threaded edge computing cores. It describes the interleaved multi-threading baseline, parameterized vector acceleration schemes using the Klessydra vector intrinsic functions. Performance results show up to 3x speedup over a baseline core for benchmarks like convolution, FFT, and matrix multiplication on FPGA implementations with different configurations of vector lanes, functional units, and scratchpad memories.
Getting started with RISC-V verification what's next after compliance testingRISC-V International
The document discusses the CPU design verification (DV) process for RISC-V processors and the challenges presented by RISC-V's open standard nature. It covers developing a verification plan, obtaining tests and models, running simulations, and verifying until coverage metrics are met. Key aspects include using a reference model for configuration and comparison, techniques like self-check, signature comparison, trace logging and step-and-compare, and test suites like riscv-compliance. The presenter demonstrates step-and-compare verification between an Imperas reference model and RISC-V RTL using open source tools and models.
The document summarizes an online test program generator for RISC-V microprocessors called MicroTESK. It describes how the generator works offline by translating specifications into test programs, and online by executing directly on the device under test to generate and run tests. The generator uses techniques like combinatorial brute-force and randomization to generate diverse test cases and checks results with signatures and transformations to detect errors. Future work areas include supporting more RISC-V instruction subsets and advanced techniques like model-based generation, mutations, and equivalence checking.
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V International
This document summarizes the NOEL-V processor family from Cobham Gaisler. It describes the NOEL-V as a RISC-V compliant 64-bit processor with fault tolerance features. It provides details on the processor architecture, peripherals, software ecosystem, verification process, and commercial and open source availability. Examples of projects adopting the NOEL-V include the European H2020 funded De-RISC and SELENE projects for safety-critical computing.
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
Join RISC-V BitManip industry leader Claire Xenia Wolf and Dr. James Cuff for an open and lively discussion with an interactive Q&A on RISC-V and BitManip including trends and comparisons with the existing architecture landscape including x86 and ARM and what specifically makes RISC-V unique.
The document proposes several extensions to the RISC-V ISA to improve code size efficiency. It analyzes benchmark programs to identify optimization opportunities where common instruction sequences can be fused into single instructions. New instructions proposed include TBLJAL for table-based function calls and jumps, PUSHPOP for saving/restoring multiple registers, and MULIADD for fusing load, multiply and add instructions. Evaluation shows the proposed instructions reduce code size by up to 10% on average across benchmarks when implemented in the compiler.
The document discusses optimizing deep neural networks (DNNs) for deployment on ultra-low power RISC-V cores. It describes the PULP-NN library which optimizes the computational backend for int8 arithmetic. PULP-NN maximizes data reuse, improves kernel regularity, and exploits parallelism to achieve high utilization. It also introduces DORY, a tool for tiling and code generation that formulates tiling as a constraint programming problem to maximize tile sizes while fitting memory constraints.
This document discusses the challenges of building and optimizing open RAN systems for 5G networks. It describes Picocom's 5G baseband system-on-chip architecture using multiple RISC-V clusters and hardware accelerators. Maintaining performance and detecting problems is difficult due to the complex timing requirements across hundreds of users. Mentor's embedded analytics solution monitors the system non-intrusively using on-chip sensors to detect issues like timing overruns and help optimize performance both during development and over the lifetime of deployments.
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...RISC-V International
The document summarizes the Klessydra-T architecture for designing vector coprocessors for multi-threaded edge computing cores. It describes the interleaved multi-threading baseline, parameterized vector acceleration schemes using the Klessydra vector intrinsic functions. Performance results show up to 3x speedup over a baseline core for benchmarks like convolution, FFT, and matrix multiplication on FPGA implementations with different configurations of vector lanes, functional units, and scratchpad memories.
Getting started with RISC-V verification what's next after compliance testingRISC-V International
The document discusses the CPU design verification (DV) process for RISC-V processors and the challenges presented by RISC-V's open standard nature. It covers developing a verification plan, obtaining tests and models, running simulations, and verifying until coverage metrics are met. Key aspects include using a reference model for configuration and comparison, techniques like self-check, signature comparison, trace logging and step-and-compare, and test suites like riscv-compliance. The presenter demonstrates step-and-compare verification between an Imperas reference model and RISC-V RTL using open source tools and models.
The document summarizes an online test program generator for RISC-V microprocessors called MicroTESK. It describes how the generator works offline by translating specifications into test programs, and online by executing directly on the device under test to generate and run tests. The generator uses techniques like combinatorial brute-force and randomization to generate diverse test cases and checks results with signatures and transformations to detect errors. Future work areas include supporting more RISC-V instruction subsets and advanced techniques like model-based generation, mutations, and equivalence checking.
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V International
This document summarizes the NOEL-V processor family from Cobham Gaisler. It describes the NOEL-V as a RISC-V compliant 64-bit processor with fault tolerance features. It provides details on the processor architecture, peripherals, software ecosystem, verification process, and commercial and open source availability. Examples of projects adopting the NOEL-V include the European H2020 funded De-RISC and SELENE projects for safety-critical computing.
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
Join RISC-V BitManip industry leader Claire Xenia Wolf and Dr. James Cuff for an open and lively discussion with an interactive Q&A on RISC-V and BitManip including trends and comparisons with the existing architecture landscape including x86 and ARM and what specifically makes RISC-V unique.
The document proposes several extensions to the RISC-V ISA to improve code size efficiency. It analyzes benchmark programs to identify optimization opportunities where common instruction sequences can be fused into single instructions. New instructions proposed include TBLJAL for table-based function calls and jumps, PUSHPOP for saving/restoring multiple registers, and MULIADD for fusing load, multiply and add instructions. Evaluation shows the proposed instructions reduce code size by up to 10% on average across benchmarks when implemented in the compiler.
The document discusses optimizing deep neural networks (DNNs) for deployment on ultra-low power RISC-V cores. It describes the PULP-NN library which optimizes the computational backend for int8 arithmetic. PULP-NN maximizes data reuse, improves kernel regularity, and exploits parallelism to achieve high utilization. It also introduces DORY, a tool for tiling and code generation that formulates tiling as a constraint programming problem to maximize tile sizes while fitting memory constraints.
An Open Discussion of RISC-V BitManip, trends, and comparisons _ CuffRISC-V International
Join RISC-V BitManip industry leader Claire Xenia Wolf and Dr. James Cuff for an open and lively discussion with an interactive Q&A on RISC-V and BitManip including trends and comparisons with the existing architecture landscape including x86 and ARM and what specifically makes RISC-V unique.
This document discusses building cache-coherent scaleout systems using OmniXtend. It describes the OmniXtend architecture, which uses a fully open cache-coherence protocol that works over Ethernet. It then discusses the OmniXtend reference design, compute node architecture, address space, and hardware design. It also covers the single operating system and independent nodes system models, the unified boot process, and status of the current implementation. Lastly, it proposes ways to further develop the system through simulation, emulation, and future work.
This document summarizes a presentation on static partitioning virtualization for RISC-V. It discusses the motivation for embedded virtualization, an overview of static partitioning hypervisors like Jailhouse and Xen, and the Bao hypervisor. It then provides an overview of the RISC-V hypervisor specification and extensions, including implemented features. It evaluates the performance overhead and interrupt latency of a prototype RISC-V hypervisor implementation with and without interference mitigations like cache partitioning.
This document discusses OpenCL support for RISC-V cores. It provides an introduction to OpenCL and describes how it can be used for heterogeneous platforms with RISC-V cores. It outlines an OpenCL framework for RISC-V with the host on x86 and devices as RISC-V cores like the AndeSim NX27V. It also describes OpenCL C extensions for the RISC-V Vector extension and the compilation flow from OpenCL C to LLVM IR to target binaries. Current status includes passing most OpenCL conformance tests on QEMU and work ongoing for the x86+AndeSim platform.
AndesClarity is a pipeline visualizer and analyzer for Andes V5 vector processors. It graphically represents instruction execution and pipeline stages with performance information. It helps optimize algorithms by identifying bottlenecks and stalls. The document provides an example of using AndesClarity to optimize a fast discrete cosine transform algorithm through four iterations. Each optimization interleaves instructions to better utilize the vector processor's functional units and reduce dependencies between iterations.
This document discusses Andes Technology Corporation's RISC-V processor IP solutions. It summarizes that Andes is a leading RISC-V CPU IP vendor that provides a portfolio of RISC-V processor cores ranging from embedded control to application processing. It also develops tools like the AndeSight IDE and provides customization services through its AndeSentry security framework and scalable acceleration architecture.
RISC-V growth and successes in technology and industry - embedded world 2021RISC-V International
RISC-V International has more than 1,000 members across over 50 countries who are working in hardware, software, services, and various industries for a strong and healthy RISC-V ecosystem. It is projected that by 2025 there will be over 62 billion RISC-V CPU cores and the total market for RISC-V IP and software is expected to grow to over $1b by 2025.
In 2020 alone, we saw successes with newly defined RISC-V accelerator architectures, affordable RISC-V open source small-board computers, development boards for personal computers, and an incredibly fast 64-bit RISC-V Core as the community also ratified key specifications and made advances in security.
As we see the growth of RISC-V into industries such as AI, machine learning, blockchain, 5G, medical, and industrial, we will see the ratifications of new extensions that enable this growth.
Join Kim McMahon, Director of Marketing and Stephano Cetola, Technical Program Manager as we take a look at where RISC-V is going in 2021.
This document summarizes a presentation on reverse engineering the Rocket-Chip SoC generator to develop a customized SoC called Aghaaz. The presentation covers deconstructing the Rocket-Chip software architecture, developing a Micro-Architecture and Software Specification (MASS) document, configuring an Aghaaz SoC using the MASS document, and generating the SoC from the Rocket-Chip generator. Key aspects included developing object-oriented representations of Rocket-Chip modules, flowcharts to explain the code, and configuring an RV32 core with caches and extensions.
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
This document discusses architectural exploration for AI and ML accelerators using simulation tools. It notes that current AI/ML applications require custom hardware configurations to achieve performance goals. The Imperas simulation tools allow analyzing performance on different hardware designs by running software on virtual platforms months before RTL implementation. Imperas provides virtual platforms for heterogeneous systems running full operating systems along with detailed analysis, profiling and debugging tools. It also includes a RISC-V reference model that enables developing custom instructions for architectural exploration of AI/ML accelerators.
Educating the computer architects of tomorrow's critical systems with RISC-VRISC-V International
This document provides information about a virtual RISC-V summit event taking place from December 8-10. It then summarizes a presentation given by Leonidas Kosmidis on educating computer architects with RISC-V. The presentation discusses safety critical systems and why companies are interested in RISC-V for these applications. It also describes the computer architecture curriculum and RISC-V projects at the Polytechnic University of Catalonia and Barcelona Supercomputing Center. Specific projects from a processor design course are summarized, including dual/triple lockstep CPUs, a WCET support implementation, and vector extensions added to the Lagarto RISC-V core. The document concludes by acknowledging those involved
1. The document introduces RISC-V assembly, an open standard instruction set architecture based on reduced instruction set computer principles.
2. It provides an example of RISC-V assembly code that loads a byte into a register, loads an immediate value into another register, increments the second register, and stores the first register value at the address in the second register.
3. It also mentions the RARS simulator that can be used to debug RISC-V assembly code and provides a second example program.
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53KarthiSugumar
This presentation focuses on the architectural exploration of RISC-V ISA based processor for networking applications such as a Router, using the trade-off between power consumption and performance. The optimized architecture is compared against commercially available RISC processors from ARM. A model of RISC-V based Solid-State Drive is also proposed.
LAS16-403: GDB Linux Kernel Awareness
Speakers: Peter Griffin
Date: September 29, 2016
★ Session Description ★
The presentation will look at the ways in which GDB can be enhanced when debugging the Linux kernel to give it better knowledge of the underlying operating system to enable a better debugging experience. It will also provide a status of the current work being undertaken in this area by the ST landing team, a demo and potential future work.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-403
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-403/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
1. Logically split the work between those responsible for the device tree binding, any framework changes, the driver code, and DTS additions.
2. Create git commits for the device tree binding, driver implementation, and DTS changes in a logical series.
3. Post the commit series to the appropriate mailing lists after addressing any feedback, with cover letter, signatures, and CCing maintainers.
This document discusses the SDSoC development environment for designing systems using Xilinx Zynq devices. It provides:
- An overview of the Zynq architecture and its processing system and programmable logic.
- A description of the SDSoC environment which provides an Eclipse-based IDE, compiler toolchain, and infrastructure to develop applications combining a processing system with hardware accelerators.
- An explanation of the SDSoC development flow which allows software functions to be selected for hardware acceleration with automated generation of hardware systems, software stubs, and configuration.
Socionext is developing low power ARM server solutions including the SC2A11 multicore processor and SC2A20 SoC switch. They aim to build scalable small core systems with optimized performance and power efficiency compared to traditional servers. Socionext has integrated their solutions into a prototype low power scalable server and is developing the necessary software including UEFI, Linux, and applications to support various server workloads.
SemiDynamics introduced two new RISC-V cores, AVISPADO 220 and ATREVIDO 220, both supporting the upcoming RISC-V Vector spec version 1.0. AVISPADO 220 is an in-order core with a technique called "Gazzillion Misses" that allows a high number of outstanding memory requests. ATREVIDO 220 is an out-of-order core also utilizing Gazzillion Misses. SemiDynamics also provides a customizable RISC-V Vector Processing Unit that implements the vector spec and can be integrated with the cores. Both cores and the vector unit are available for licensing.
This document discusses using fuzzing to generate tests for RISC-V compliance testing. It proposes extending an LLVM-based fuzzer with custom mutators and coverage metrics tailored for RISC-V. Experimental results found bugs in several RISC-V simulators, demonstrating the effectiveness of fuzzing for negative compliance testing. The approach generates platform-independent assembly tests and filters invalid tests. It leverages an open-source RISC-V virtual prototype for test execution.
BKK16-400A LuvOS and ACPI Compliance TestingLinaro
ARM server hardware will be shipping in 2016. An incredible amount of work has been done to get this far -- defining and implementing industry standards used by servers, development and testing of SoCs, and all sorts of Linux kernel work. So, how do we make sure we meet all these industry standards?
To a great extent, we've relied on magical thinking so far. That works, but only for so long. LuvOS and FWTS were created in order to catch many of the problems users have found; in this presentation, we describe how we have started extending FWTS to check for standards compliance, specifically ACPI and the SBBR, and how we can use LuvOS to run FWTS and other test suites so that we can rely on hard data, and not just wishful thinking.
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...RISC-V International
Esperanto has developed an AI processor chip called ET-SoC-1 that contains over 1000 custom RISC-V cores on a single 7nm chip. The chip is targeted at datacenter inferencing and provides superior performance and energy efficiency compared to incumbent solutions. Esperanto's solution is fully programmable and scalable from hundreds to thousands of CPU cores to handle future AI models. The document provides details on the RISC-V based CPU cores, memory hierarchy, software stack, and how the chips can be deployed in datacenters to meet the challenges of hyperscale AI inferencing.
Here are some useful GDB commands for debugging:
- break <function> - Set a breakpoint at a function
- break <file:line> - Set a breakpoint at a line in a file
- run - Start program execution
- next/n - Step over to next line, stepping over function calls
- step/s - Step into function calls
- finish - Step out of current function
- print/p <variable> - Print value of a variable
- backtrace/bt - Print the call stack
- info breakpoints/ib - List breakpoints
- delete <breakpoint#> - Delete a breakpoint
- layout src - Switch layout to source code view
- layout asm - Switch layout
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
Jack Dongarra from the University of Tennessee presented these slides at Ken Kennedy Institute of Information Technology on Feb 13, 2014.
Listen to the podcast review of this talk: http://insidehpc.com/2014/02/13/week-hpc-jack-dongarra-talks-algorithms-exascale/
An Open Discussion of RISC-V BitManip, trends, and comparisons _ CuffRISC-V International
Join RISC-V BitManip industry leader Claire Xenia Wolf and Dr. James Cuff for an open and lively discussion with an interactive Q&A on RISC-V and BitManip including trends and comparisons with the existing architecture landscape including x86 and ARM and what specifically makes RISC-V unique.
This document discusses building cache-coherent scaleout systems using OmniXtend. It describes the OmniXtend architecture, which uses a fully open cache-coherence protocol that works over Ethernet. It then discusses the OmniXtend reference design, compute node architecture, address space, and hardware design. It also covers the single operating system and independent nodes system models, the unified boot process, and status of the current implementation. Lastly, it proposes ways to further develop the system through simulation, emulation, and future work.
This document summarizes a presentation on static partitioning virtualization for RISC-V. It discusses the motivation for embedded virtualization, an overview of static partitioning hypervisors like Jailhouse and Xen, and the Bao hypervisor. It then provides an overview of the RISC-V hypervisor specification and extensions, including implemented features. It evaluates the performance overhead and interrupt latency of a prototype RISC-V hypervisor implementation with and without interference mitigations like cache partitioning.
This document discusses OpenCL support for RISC-V cores. It provides an introduction to OpenCL and describes how it can be used for heterogeneous platforms with RISC-V cores. It outlines an OpenCL framework for RISC-V with the host on x86 and devices as RISC-V cores like the AndeSim NX27V. It also describes OpenCL C extensions for the RISC-V Vector extension and the compilation flow from OpenCL C to LLVM IR to target binaries. Current status includes passing most OpenCL conformance tests on QEMU and work ongoing for the x86+AndeSim platform.
AndesClarity is a pipeline visualizer and analyzer for Andes V5 vector processors. It graphically represents instruction execution and pipeline stages with performance information. It helps optimize algorithms by identifying bottlenecks and stalls. The document provides an example of using AndesClarity to optimize a fast discrete cosine transform algorithm through four iterations. Each optimization interleaves instructions to better utilize the vector processor's functional units and reduce dependencies between iterations.
This document discusses Andes Technology Corporation's RISC-V processor IP solutions. It summarizes that Andes is a leading RISC-V CPU IP vendor that provides a portfolio of RISC-V processor cores ranging from embedded control to application processing. It also develops tools like the AndeSight IDE and provides customization services through its AndeSentry security framework and scalable acceleration architecture.
RISC-V growth and successes in technology and industry - embedded world 2021RISC-V International
RISC-V International has more than 1,000 members across over 50 countries who are working in hardware, software, services, and various industries for a strong and healthy RISC-V ecosystem. It is projected that by 2025 there will be over 62 billion RISC-V CPU cores and the total market for RISC-V IP and software is expected to grow to over $1b by 2025.
In 2020 alone, we saw successes with newly defined RISC-V accelerator architectures, affordable RISC-V open source small-board computers, development boards for personal computers, and an incredibly fast 64-bit RISC-V Core as the community also ratified key specifications and made advances in security.
As we see the growth of RISC-V into industries such as AI, machine learning, blockchain, 5G, medical, and industrial, we will see the ratifications of new extensions that enable this growth.
Join Kim McMahon, Director of Marketing and Stephano Cetola, Technical Program Manager as we take a look at where RISC-V is going in 2021.
This document summarizes a presentation on reverse engineering the Rocket-Chip SoC generator to develop a customized SoC called Aghaaz. The presentation covers deconstructing the Rocket-Chip software architecture, developing a Micro-Architecture and Software Specification (MASS) document, configuring an Aghaaz SoC using the MASS document, and generating the SoC from the Rocket-Chip generator. Key aspects included developing object-oriented representations of Rocket-Chip modules, flowcharts to explain the code, and configuring an RV32 core with caches and extensions.
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
This document discusses architectural exploration for AI and ML accelerators using simulation tools. It notes that current AI/ML applications require custom hardware configurations to achieve performance goals. The Imperas simulation tools allow analyzing performance on different hardware designs by running software on virtual platforms months before RTL implementation. Imperas provides virtual platforms for heterogeneous systems running full operating systems along with detailed analysis, profiling and debugging tools. It also includes a RISC-V reference model that enables developing custom instructions for architectural exploration of AI/ML accelerators.
Educating the computer architects of tomorrow's critical systems with RISC-VRISC-V International
This document provides information about a virtual RISC-V summit event taking place from December 8-10. It then summarizes a presentation given by Leonidas Kosmidis on educating computer architects with RISC-V. The presentation discusses safety critical systems and why companies are interested in RISC-V for these applications. It also describes the computer architecture curriculum and RISC-V projects at the Polytechnic University of Catalonia and Barcelona Supercomputing Center. Specific projects from a processor design course are summarized, including dual/triple lockstep CPUs, a WCET support implementation, and vector extensions added to the Lagarto RISC-V core. The document concludes by acknowledging those involved
1. The document introduces RISC-V assembly, an open standard instruction set architecture based on reduced instruction set computer principles.
2. It provides an example of RISC-V assembly code that loads a byte into a register, loads an immediate value into another register, increments the second register, and stores the first register value at the address in the second register.
3. It also mentions the RARS simulator that can be used to debug RISC-V assembly code and provides a second example program.
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53KarthiSugumar
This presentation focuses on the architectural exploration of RISC-V ISA based processor for networking applications such as a Router, using the trade-off between power consumption and performance. The optimized architecture is compared against commercially available RISC processors from ARM. A model of RISC-V based Solid-State Drive is also proposed.
LAS16-403: GDB Linux Kernel Awareness
Speakers: Peter Griffin
Date: September 29, 2016
★ Session Description ★
The presentation will look at the ways in which GDB can be enhanced when debugging the Linux kernel to give it better knowledge of the underlying operating system to enable a better debugging experience. It will also provide a status of the current work being undertaken in this area by the ST landing team, a demo and potential future work.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-403
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-403/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
1. Logically split the work between those responsible for the device tree binding, any framework changes, the driver code, and DTS additions.
2. Create git commits for the device tree binding, driver implementation, and DTS changes in a logical series.
3. Post the commit series to the appropriate mailing lists after addressing any feedback, with cover letter, signatures, and CCing maintainers.
This document discusses the SDSoC development environment for designing systems using Xilinx Zynq devices. It provides:
- An overview of the Zynq architecture and its processing system and programmable logic.
- A description of the SDSoC environment which provides an Eclipse-based IDE, compiler toolchain, and infrastructure to develop applications combining a processing system with hardware accelerators.
- An explanation of the SDSoC development flow which allows software functions to be selected for hardware acceleration with automated generation of hardware systems, software stubs, and configuration.
Socionext is developing low power ARM server solutions including the SC2A11 multicore processor and SC2A20 SoC switch. They aim to build scalable small core systems with optimized performance and power efficiency compared to traditional servers. Socionext has integrated their solutions into a prototype low power scalable server and is developing the necessary software including UEFI, Linux, and applications to support various server workloads.
SemiDynamics introduced two new RISC-V cores, AVISPADO 220 and ATREVIDO 220, both supporting the upcoming RISC-V Vector spec version 1.0. AVISPADO 220 is an in-order core with a technique called "Gazzillion Misses" that allows a high number of outstanding memory requests. ATREVIDO 220 is an out-of-order core also utilizing Gazzillion Misses. SemiDynamics also provides a customizable RISC-V Vector Processing Unit that implements the vector spec and can be integrated with the cores. Both cores and the vector unit are available for licensing.
This document discusses using fuzzing to generate tests for RISC-V compliance testing. It proposes extending an LLVM-based fuzzer with custom mutators and coverage metrics tailored for RISC-V. Experimental results found bugs in several RISC-V simulators, demonstrating the effectiveness of fuzzing for negative compliance testing. The approach generates platform-independent assembly tests and filters invalid tests. It leverages an open-source RISC-V virtual prototype for test execution.
BKK16-400A LuvOS and ACPI Compliance TestingLinaro
ARM server hardware will be shipping in 2016. An incredible amount of work has been done to get this far -- defining and implementing industry standards used by servers, development and testing of SoCs, and all sorts of Linux kernel work. So, how do we make sure we meet all these industry standards?
To a great extent, we've relied on magical thinking so far. That works, but only for so long. LuvOS and FWTS were created in order to catch many of the problems users have found; in this presentation, we describe how we have started extending FWTS to check for standards compliance, specifically ACPI and the SBBR, and how we can use LuvOS to run FWTS and other test suites so that we can rely on hard data, and not just wishful thinking.
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...RISC-V International
Esperanto has developed an AI processor chip called ET-SoC-1 that contains over 1000 custom RISC-V cores on a single 7nm chip. The chip is targeted at datacenter inferencing and provides superior performance and energy efficiency compared to incumbent solutions. Esperanto's solution is fully programmable and scalable from hundreds to thousands of CPU cores to handle future AI models. The document provides details on the RISC-V based CPU cores, memory hierarchy, software stack, and how the chips can be deployed in datacenters to meet the challenges of hyperscale AI inferencing.
Here are some useful GDB commands for debugging:
- break <function> - Set a breakpoint at a function
- break <file:line> - Set a breakpoint at a line in a file
- run - Start program execution
- next/n - Step over to next line, stepping over function calls
- step/s - Step into function calls
- finish - Step out of current function
- print/p <variable> - Print value of a variable
- backtrace/bt - Print the call stack
- info breakpoints/ib - List breakpoints
- delete <breakpoint#> - Delete a breakpoint
- layout src - Switch layout to source code view
- layout asm - Switch layout
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
Jack Dongarra from the University of Tennessee presented these slides at Ken Kennedy Institute of Information Technology on Feb 13, 2014.
Listen to the podcast review of this talk: http://insidehpc.com/2014/02/13/week-hpc-jack-dongarra-talks-algorithms-exascale/
The document summarizes a thesis presentation on modeling and validating the performance of a centralized fault identification and location system for medium voltage direct current shipboard power systems. Key points discussed include developing models to analyze factors affecting the system's performance such as topology, noise, bandwidth. The system was demonstrated to identify faults within 300 microseconds through hardware-in-loop testing under different operating conditions. Future work proposed expanding the system to include ring topologies and exhaustive noise analysis.
The document discusses the development and testing of a centralized fault identification and location (CFL) system for a medium voltage direct current shipboard power system. Key points:
1) A CFL system was modeled to identify faults within 8 ms as required for the power system.
2) Testing of the CFL system demonstrated fault detection times of around 300 microseconds for different system configurations and fault conditions.
3) Performance models were developed to analyze how factors like topology, bandwidth, noise and others affect the CFL system and scaling.
2012 Techniques for Verification and Debugging of LPDDR3 Memory Designs.pdfssuser2a2430
This document discusses techniques for verifying and debugging LPDDR3 memory designs. It begins with a review of key LPDDR3 features like increased speed and bandwidth. It then covers considerations for signal integrity like recommended oscilloscope bandwidths. Debugging techniques discussed include using signal access probes, the DDRA analysis software for JEDEC tests, and visual trigger capabilities. The document emphasizes that LPDDR3 specifications push system power envelopes and present new measurement challenges.
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd Iaetsd
This document proposes an efficient CORDIC pipelined FFT algorithm for fingerprint recognition on FPGAs. The CORDIC algorithm uses only shift and add operations, making it suitable for replacing multipliers in the butterfly operations of an FFT. This reduces computational complexity. The proposed system takes a fingerprint, processes it with the CORDIC pipelined FFT, extracts features which are stored and then matched against a test fingerprint for recognition. The algorithm aims to provide an efficient hardware implementation of FFT and fingerprint recognition using minimal computations.
The document discusses PG-Strom, an open source project that uses GPU acceleration for PostgreSQL. PG-Strom allows for automatic generation of GPU code from SQL queries, enabling transparent acceleration of operations like WHERE clauses, JOINs, and GROUP BY through thousands of GPU cores. It introduces PL/CUDA, which allows users to write custom CUDA kernels and integrate them with PostgreSQL for manual optimization of complex algorithms. A case study on k-nearest neighbor similarity search for drug discovery is presented to demonstrate PG-Strom's ability to accelerate computational workloads through GPU processing.
In this deck from the UK HPC Conference, Gunter Roeth from NVIDIA presents: Hardware & Software Platforms for HPC, AI and ML.
"Data is driving the transformation of industries around the world and a new generation of AI applications are effectively becoming programs that write software, powered by data, vs by computer programmers. Today, NVIDIA’s tensor core GPU sits at the core of most AI, ML and HPC applications, and NVIDIA software surrounds every level of such a modern application, from CUDA and libraries like cuDNN and NCCL embedded in every deep learning framework and optimized and delivered via the NVIDIA GPU Cloud to reference architectures designed to streamline the deployment of large scale infrastructures."
Watch the video: https://wp.me/p3RLHQ-l2Y
Learn more: http://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
This document provides an overview of Brendan Gregg's presentation on BPF performance analysis at Netflix. It discusses:
- Why BPF is changing the Linux OS model to become more event-based and microkernel-like.
- The internals of BPF including its origins, instruction set, execution model, and how it is integrated into the Linux kernel.
- How BPF enables a new class of custom, efficient, and safe performance analysis tools for analyzing various Linux subsystems like CPUs, memory, disks, networking, applications, and the kernel.
- Examples of specific BPF-based performance analysis tools developed by Netflix, AWS, and others for analyzing tasks, scheduling, page faults
The document describes the development and testing of a novel mathematical computing architecture called MaPU. Key highlights include a multi-granularity parallel storage system that enables simultaneous matrix row and column access, a high dimension data model, and a cascading pipeline with a state machine-based program model. The first MaPU chip was implemented on a 40nm process with 4 MaPU cores. Testing showed the MaPU core was up to 6.94 times faster than a similar TI C66x DSP core for various algorithms like FFT and matrix multiplication. Power analysis indicated tested power was within 8% of estimated power.
The document provides an overview of the architecture of Nexus 9000 series switches and techniques for troubleshooting them. It discusses the modular components of Nexus 9500 switches including supervisors, fabrics, I/O modules, and line cards. It also covers tools for monitoring system health and detailed troubleshooting techniques. The goal is to provide an understanding of the Nexus 9000 architecture and introduce system telemetry and troubleshooting case scenarios.
Iaetsd vlsi based implementation of a digitalIaetsd Iaetsd
This document describes the design and implementation of a digital oscilloscope using an FPGA development board. It provides a low-cost alternative to commercial oscilloscopes. The design has three main blocks - an ADC converter to digitize analog signals, an FPGA for processing and control, and a VGA display. It allows users to view and measure signals up to 80MHz. The FPGA handles tasks like buffering digital data, generating display signals for the monitor, and interfacing with a mouse for user input. The overall design aims to provide hobbyists and students with an affordable tool for circuit debugging and learning oscilloscope functionality.
This project aims to develop ubiquitous low-power image processing platforms. It has several objectives including defining a reference platform, instantiating it through use cases, and demonstrating performance improvements. Several partners from industry and academia are involved. Key tasks include selecting hardware components, developing interfaces and tools, and validating the platform using applications like medical imaging, automotive driver assistance, and unmanned aerial vehicles. An initial hardware instance was selected using the Sundance EMC2 board with an ARM CPU and FPGA. The UAV use case involves real-time stereo depth estimation for obstacle avoidance.
A continuous time adc and digital signal processing system for smart dust and...eSAT Journals
This document discusses a continuous-time (CT) analog-to-digital converter (ADC) and digital signal processing system suitable for applications like smart dust and wireless sensor networks. The key benefits of the CT system are lower noise, no need for a clock generator or anti-aliasing filter.
The paper proposes a clockless, event-driven CTADC based on delta modulation. An unbuffered, area-efficient segmented resistor string digital-to-analog converter is used. This architecture achieves an 87.5% reduction in resistors, switches and flip-flops for an 8-bit converter compared to prior designs.
The CTADC uses a level-crossing sampling technique where samples are generated when
A continuous time adc and digital signal processing system for smart dust and...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Similar to Klessydra t - designing vector coprocessors for multi-threaded edge-computing cores (20)
One presenter discussed weaknesses found in the LLVM inliner's ability to find optimization opportunities for RISC-V code compared to other compilers, resulting in larger code size. A new approach called mutual inlining (MI) looks at the whole call graph to make inlining decisions and could provide more insights than the LLVM inliner. Integrating MI inside the LLVM inliner by replacing the current inliner with MI was suggested to address these weaknesses.
This document summarizes a presentation given at the London Open Source Meetup for RISC-V on April 19, 2021. The presentation introduced the RISC-V Online Tutor, an online course for learning RISC-V fundamentals from digital logic to C programming. It provided an overview of the course structure and lessons, which take students through RISC-V assembly, processor design, and application development. It also demonstrated the online learning platform and its ability to interact with remote FPGA hardware during lessons. The goal is to invite community participation and collaboration to further develop the Online Tutor.
The document announces a London open source meetup for RISC-V on April 19th. RISC-V is a free and open instruction set architecture that enables new processor innovation through open collaboration. It provides free and extensible software and hardware freedom. RISC-V International is a nonprofit organization with over 1,000 members in more than 50 countries that was founded in 2015. The document also advertises upcoming events from the BCS Open Source Specialist Group, including an advocacy event on May 20th and an event on open source in space.
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...RISC-V International
This document summarizes a presentation about ZeroPoint Technologies' memory compression technology called Ziptilion. Some key points:
1) Ziptilion uses hardware-accelerated memory compression algorithms to double effective memory capacity and bandwidth. This helps address challenges from the end of Moore's Law.
2) It provides a virtual compressed memory pool (VCP) that is transparent to the operating system. Benchmark results show it provides 20% higher performance than an uncompressed baseline.
3) An evolution called Ziptilion+ aims for over 2.5x compression of machine learning workloads.
4) ZeroPoint also develops ZSWAP+/ZRAM+, which accelerates the popular ZSWAP
GlobalPlatform provides standards for trusted execution environments (TEEs) that are deployed across billions of devices. The standards define hardware and software specifications for TEEs to securely deliver digital services. GlobalPlatform is working with RISC-V to define TEE configurations for lightweight IoT devices and leverage RISC-V's secure hardware enclave capabilities. The organization's protection profiles and security certification help service providers assess risks when using TEE technologies.
The De-RISC initiative aims to develop the first space-amenable RISC-V based computing platform. It involves:
1) Cobham Gaisler developing a fault-tolerant multicore MPSoC based on NOEL-V RISC-V cores.
2) FentISS developing a space-qualified hypervisor called XtratuM for RISC-V.
3) The Barcelona Supercomputing Center developing an extended statistics unit to help manage multicore interference.
4) Thales assessing the platform through benchmarking, executing a satellite software stack, and evaluating a command and data handling use case.
The goal is to have an integrated and validated platform ready
This document proposes a no-human-in-the-loop open-source "idea to manufacturing" SoC compiler. It consists of SoCGen, which generates RTL from a JSON description, and OpenLANE, which produces a clean GDSII layout from the RTL with no human intervention. SoCGen includes a library of open-source verified IP cores and supports multiple bus architectures. OpenLANE uses carefully-curated open-source EDA tools tuned for an open PDK. The goal is to streamline and automate the entire custom SoC design process from concept to silicon to enable more widespread adoption.
MultiZone IoT Firmware provides a trusted execution environment (TEE) that shields trusted applications from untrusted third party libraries. It works with any RISC-V processor and provides up to 4 separated hardware and software execution worlds. MultiZone includes pre-integrated security libraries, an RTOS, and connectivity standards to provide a complete and secure IoT stack.
1. Manuel Offenberg of Seagate discussed securing data at the edge using RISC-V and Keystone enclaves to protect data during creation and movement.
2. OpenTitan can provide another layer of trust by securing the root of trust.
3. Endpoint security is crucial for ensuring overall data integrity and trustworthiness when significant data is being generated at billions of sensors and IoT devices.
This document summarizes the evolution of the RISC-V software ecosystem from 2015 to 2020. It describes how initial ports of key software in 2015, like GCC and Linux, have expanded to include upstream support in most open source software projects today. It outlines remaining priorities like completing support for specifications and filling gaps in programming language and application software support. The document concludes by encouraging continued collaboration to further mature the RISC-V software ecosystem.
Ripes tracking computer architecture throught visual and interactive simula...RISC-V International
Ripes is a visual processor simulator and assembly editor for RISC-V that was created to teach computer architecture concepts. It allows interactive simulation and visualization of different processor models, including single-cycle, pipelined, and models with caching. Ripes uses the Visual Simulation of Register Transfer Logic (VSRTL) framework, which generates circuit visualizations from processor descriptions. This allows Ripes to simulate various RISC-V processors and visualize their data paths during execution. Ripes has been expanded over time to support cache simulation and integration with C toolchains.
This document discusses porting the Tock operating system to the OpenTitan project. It provides background on OpenTitan and Tock, describes the status of the porting work, and highlights a deep dive into implementing USB and CTAP support on Tock running on OpenTitan hardware. Key points covered include OpenTitan using the Ibex RISC-V core, Tock being designed for small platforms without MMUs and enforcing security through Rust, the interface for Tock applications, and modules already supported through the mainline Tock project.
The document discusses porting OpenJ9 JDK to RISC-V architecture. It involves preparing the software toolchain for cross-compilation to RISC-V, preparing hardware like the HiFive Unleashed development board, and developing OpenJ9 JDK through a mix of local and cross compilation. The status shows OpenJ9 JDK can execute in interpreter mode on the RISC-V emulator and HiFive board running Debian, with future work planned on JIT support, different GC strategies, and supporting other Java versions.
Open source manufacturable pdk for sky water 130nm process nodeRISC-V International
The document discusses an open source manufacturable process design kit (PDK) for the SkyWater 130nm process node developed by Google and SkyWater. It provides a link to slides about the PDK located at https://j.mp/rv20-sky130. The PDK will allow for open hardware design using the SkyWater 130nm fabrication process.
seL4 is an open source, high-assurance microkernel that has been ported to run on RISC-V processors. It is one of the most secure and fastest microkernels available. seL4 has formal mathematical proofs of functional correctness and security properties. It is now available to use on RISC-V, making it the only operating system kernel for RISC-V that has undergone this level of formal verification. The porting process revealed that RISC-V's regular architecture makes the kernel code more simple and similar in size to ARM-based kernels compared to x86 kernels.
Fueling the datasphere how RISC-V enables the storage ecosystemRISC-V International
This document summarizes Seagate's work with RISC-V processors for storage applications. It discusses Seagate's history with custom CPUs and reasons for adopting RISC-V. Seagate has developed two RISC-V cores - a high-performance out-of-order core currently powering a hard drive demonstration, and an area-optimized in-order core for auxiliary workloads. RISC-V allows innovation for real-time processing and security at the edge by enabling domain-specific architectures. The talk promotes collaboration and involvement in the open RISC-V ecosystem.
This document discusses emulating systems-on-chip (SoCs) on Amazon Web Services (AWS) field-programmable gate arrays (FPGAs). It describes how the author validated a RISC-V CPU design by running Linux on it within an AWS F1 FPGA instance. It provides details on using Bluespec System Verilog (BSV) to model CPU cores, the AWS FPGA shell, Connectal for connecting hardware and software, and virtio device models to emulate I/O without custom drivers. The document shows how the approach allows running different RISC-V processor designs from the DARPA SSITH program securely in the cloud for security evaluation.
The document discusses developing applications for the PolarFire® System on Chip (SoC) field programmable gate array (FPGA). It notes that developing for the PolarFire SoC is less complicated than it seems. It covers the functionality of the application and monitor cores, how to develop bare metal applications using SoftConsole IDE, and how to develop FPGA applications using the Libero SoC Design Suite. Examples of driver code and build systems are also provided to simplify the development process for the PolarFire SoC.
1. Manuel Offenberg of Seagate discussed securing data at the edge using RISC-V and Keystone enclaves to protect data during creation and movement.
2. OpenTitan can provide another layer of trust by securing the root of trust.
3. Endpoint security is crucial for ensuring overall data integrity and trustworthiness when significant data is being generated at billions of sensors and IoT devices.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing cores
1. Information Classification: General
December 8-10 | Virtual Event
Klessydra-T: Designing Vector Coprocessors for
Multi-Threaded Edge-Computing Cores
Mauro Olivieri
Professor
Sapienza University of Rome
#RISCVSUMMIT
2. Information Classification: General
Francesco Lannutti
collaborator @Synopsys
DIGITAL SYSTEM LAB @ SAPIENZA UNIVERSITY OF ROME
Marcello Barbirotta
PhD candidate
Mauro Olivieri
Associate Professor
Francesco Menichelli
Assistant Professor
Antonio Mastrandrea
Research Fellow
Abdallah Cheikh
Research Fellow
Luigi Blasi
PhD cand. @DSI Gmbh
Francesco Vigli
PhD cand. @ ELT Spa
Stefano Sordillo
PhD candidate
3. Information Classification: General
INTRODUCTION & MOTIVATION
THE KLESSYDRA-T ARCHITECTURE
• Interleaved Multi-Threading baseline
• Parameterized vector acceleration schemes
• Klessydra vector intrinsic functions
BENCHMARK WORKLOADS
• Convolution, Matmul, FFT
• Homogeneous and composite workload
RESULTS
• Cycle count and absolute execution time
• Maximum clock frequency and hardware resource utilization
• Energy efficiency
CONCLUSIONS
OUTLINE
4. Information Classification: General
19/04/2021 Page 4
APPLICATION CONTEXT AND MOTIVATION
There are recognized drives towards (extreme)
edge computing: availability, energy saving,
security, etc., having implications on both SW
design and HW design
HW design challenges of extreme edge computing
devices:
• Local energy budget
• Cost & size
• Computing power
General setting:
• Possibly taking advantage of inherently
multi-threaded application routines
• Inevitability of hardware acceleration support
5. Information Classification: General
• “space-qualified” core,
• T0 microarchitecture
• + configurable HW/SW fault-
tolerance support
• “edge computing” core
• extends T0 microarchitecture
• RV32IM
• + configurable multiple
scratchpad memories
• + configurable vector unit
• extended ISA
• Starting point
• M mode v1.10
• RV32I user ISA
• single hart
• M mode v1.10
• RV32I user ISA
• Atomic ext. (partial)
• multiple PC & CSR
• multiple interleaved
harts
PULPino
feat.
Klessydra S0
core
PULPino
feat.
Klessydra
T0 cores
PULPino
feat.
Klessydra F0
cores
PULPino
feat.
Klessydra T1
cores
19/04/2021 Page 5
core
courtesy of
THE PULPINO-COMPATIBLE KLESSYDRA CORE
FAMILY
6. Information Classification: General
THE KLESSYDRA IMT MICROARCHITECTURE
Baseline Klessydra T03 core features:
• Thread context switch at each clock cycle
• in-order, single issue instruction execution
• feed-forward pipeline structure (no hardware support
for pipeline hazard handling)
• bare metal execution (RISCV M mode)
The vector-accelerated Klessydra-T13 core has been
designed as a superset of the basic Klessydra-T03
microarchitecture.
Regfile
Decode
PC
PC
CSR
Data Mem
WB
Debug
Updater
harc
Updater
hart a
hart b
hart c
Fetch
Prg Mem
Execute
Program memory
Data memory
7. Information Classification: General
THE KLESSYDRA-T1 MICROARCHITECTURE
FAMILY
Input Mapping
Add
Sub
Shft Mul Accum Relu
MFU
Bank Intrlv
Bank1
Bank0 BankN
SPMI
Data reorder
Output Mapping
MAU_busy
MAU_req
EXEC
Regfile
Decode
Fetch
PC
PC
CSR
Data Mem
WB
Debug
Prg Mem
Updater
harc
Updater
DSP Initialization
Control / Mapping
Add
Sub
Shft Mul Accum Relu
Accl Exec
MFU
Accl Init
hart a
hart a,
b, or c
hart c SPMI
B0 B1 B2
LSU
x F
x D
SPM
SPM
SPM
x D
bank
bank
bank
…
x N
bank
bank
bank
bank
bank
bank
SPM0 SPM1
SPMN-1
Regfile
Decode
PC
PC
CSR
Data Mem
WB
Debug
Updater
harc
Updater
hart a
hart b
hart c
Fetch
Prg Mem
Execute
Program memory
Data memory
Execute MFU
SPMI
LSU
Klessydra T13 core features
multiple units in the execution stage
• scalar execution unit (EXEC)
• vector-oriented multi-purpose functional
unit (MFU) with Scratchpad Memory
support
• Load/Store unit (LSU)
possible concurrent execution of instructions
of different types
8. Information Classification: General
HARDWARE ACCELERATION PARAMETRIC
SCHEMES
The parametric coprocessor architecture in T13 cores,
comprised of the MFU and the SPMIs, can be
configured at synthesis level according to the following
values:
• the number of parallel lanes D in the MFU, which
defines the DLP degree and also corresponds to the
number of SPM banks in each SMPI block
• the number of MFUs F
• the SPM bank capacity B
• the number of SPMs N
• the number of SPMIs M
• The sharing scheme of MFUs and SMPI among the
harts, i.e. heterogeneous or symmetric
19/04/2021 Titolo Presentazione Pagina 8
M=1, F=1, D=1: SISD
M=1, F=1, D=2,4,8: Pure SIMD
M=3, F=3, D=1: Symmetric MIMD
M=3, F=3, D=2,4,8: Symmetric MIMD + SIMD
M=3, F=1, D=1: Heterogeneous MIMD
M=3, F=1, D=2,4,8: Heterogeneous MIMD + SIMD
9. Information Classification: General
KLESSYDRA VECTOR EXTENSION AND INTRINSIC
FUNCTIONS
Assembly syntax – (r) denotes
memoryaddressing via register r
Short description
kmemld (rd),(rs1),(rs2) load vector into scratchpad region
kmemstr (rd),(rs1),(rs2) store vector into main memory
kaddv (rd),(rs1),(rs2) adds vectors in scratchpad region
ksubv (rd),(rs1),(rs2) subtract vectors in scratchpad region
kvmul (rd),(rs1),(rs2) multiply vectors in scratchpad region
kvred (rd),(rs1),(rs2) reduce vector by addition
kdotp (rd),(rs1),(rs2) vector dot product into register
ksvaddsc (rd),(rs1),(rs2) add vector + scalar into scratchpad
ksvaddrf (rd),(rs1),rs2 add vector + scalar into register
ksvmulsc (rd),(rs1),(rs2) multiply vector + scalar into scratchpad
ksvmulrf (rd),(rs1),rs2 multiply vector + scalar into register
kdotpps (rd),(rs1),(rs2) vector dot product and post scaling
ksrlv (rd),(rs1),rs2 vector logic shift within scratchpad
ksrav (rd),(rs1),rs2 vector arithmetic shift within scratchpad
krelu (rd),(rs1) vector ReLu within scratchpad
kvslt (rd),(rs1),(rs2) compare vectors and create mask vector
ksvslt (rd),(rs1),rs2 compare vector-scalar and create mask
kvcp (rd),(rs1) copy vector within scratchpad region
The instructions supported by the coprocessor sub-
system are exposed to the programmer in the form of
very simple intrinsic functions, fully integrated in the
RISC-V gcc compiler toolchain.
CSR_MVSIZE(Row_size); //set vector length
for( i = Zeropad_offset; i < Row_size-Zeropad_offset;i++) { //scan the Output Matrix rows
k_element = 0;
for ( FM_row_pointer = -Zeropad_offset; FM_row_pointer <= Zeropad_offset; FM_row_pointer++) {
for ( column_offset = 0; column_offset < kernel_size; column_offset++){
FM_offset = (i+FM_row_pointer)*Row_size + column_offset; // set pointer in SPM space
ksvmulsc( SPM_D, (SPM_A + FM_offset), (SPM_B + k_element++) ); // temporary vector result
ksrav( SPM_D, SPM_D, scaling_factor ); //scaling for fixed point alignment
OM_offset = (Row_size*i) + Zeropad_offset; // set pointer in SPM space
kaddv( (SPM_C + OM_offset), (SPM_C + OM_offset), SPM_D ); // update Output Matrix row
}
}
}
10. Information Classification: General
BENCHMARK WORKLOADS AND EVALUATION SETUP
2D convolution
• 32-bit data elements in fixed-point representation
• 3x3 filter size
• matrix sizes of 4x4, 8x8, 16x16, and 32x32 elements
• additional analysis of larger than 3x3 filter sizes on 32x32 matrices
FFT
• 256 complex samples
Matmul
• Square matrices of 64x64 elements
• Homogeneous workload (3 harts running same program)
• Composite workload (3 harts running different programs)
19/04/2021 Titolo Presentazione Pagina 10
ANALYZED PERFORMANCE FIGURES
ON FPGA SOFT-CORE
IMPLEMENTATION
• Average total cycle count per hart
• Maximum clock frequency
• Absolute execution time
• Hardware Resource Utilization
• Average energy per algorithmic
operation
14. Information Classification: General
• Assuming maximum clock frequency for each core
• Zeroriscy core taken as common reference
• In pure SIMD configurations, the speed-up grows linearly
with the DLP
• Going from a SISD/SIMD to MIMD+SIMD improved the
speedup in all cases, despite the frequency drop
associated to the MIMD hardware.
• The symmetric MIMD+SIMD schemes exhibit up to 17X
speed-up over Zeroriscy for Convolution 32x32 and up to
13X speed-up for the composite workload.
• Heterogeneous MIMD configurations maintain an almost
perfect overlap with the symmetric MIMD.
• The non-accelerated Klessydra-T03, exhibits an absolute
performance gain over RI5CY and ZeroRiscy
Pagina 14
ABSOLUTE EXECUTION TIME SPEED-UP
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SISD,
DLP
1
pure
SIMD,
DLP
2
pure
SIMD,
DLP
4
pure
SIMD,
DLP
8
Sym.
MIMD,
DLP
1
Sym.
MIMD+SIMD,
DLP
2
Sym.
MIMD+SIMD,
DLP
4
Sym.
MIMD+SIMD,
DLP
8
Het.
MIMD,
DLP
1
Het.
MIMD+SIMD,
DLP
2
Het.
MIMD+SIMD,
DLP
4
Het.
MIMD+SIMD,
DLP
8
Klessydra
T03
(no
accel.)
RI5CY
(DSP
extension)
ZeroRiscy
(no
accel.)
Conv.2D 4x4
Conv.2D 8x8
Conv.2D 16x16
Conv.2D 32x32
FFT 256
MatMul 64x64
Composite
15. Information Classification: General
ENERGY EFFICIENCY
• The result of this analysis is expressed as energy
per algorithmic operation, for the FPGA soft-core
implementations, normalized to Zeroriscy, taken as
reference.
• The most energy efficient designs resulted to be
the T13 symmetric MIMD configurations
• The heterogenous MIMD approach exhibited an
almost complete overlap in energy consumption
with the symmetric MIMD
• The pure SIMD schemes resulted in a larger
energy consumption than other schemes, due to
the impossibility of efficiently exploiting TLP.
Pagina 15
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3 SISD,
DLP
1
pure
SIMD,
DLP
2
pure
SIMD,
DLP
4
pure
SIMD,
DLP
8
Sym.
MIMD,
DLP
1
Sym.
MIMD+SIMD,
DLP
2
Sym.
MIMD+SIMD,
DLP
4
Sym.
MIMD+SIMD,
DLP
8
Het.
MIMD,
DLP
1
Het.
MIMD+SIMD,
DLP
2
Het.
MIMD+SIMD,
DLP
4
Het.
MIMD+SIMD,
DLP
8
Klessydra
T03
(no
accel.)
RI5CY
(DSP
extension)
ZeroRiscy
(no
accel.)
Conv.2D 4x4 Conv.2D 8x8
Conv.2D 16x16 Conv.2D 32x32
FFT 256 MatMul 64x64
Composite
16. Information Classification: General
Pagina 16
LARGER CONVOLUTION FILTERS
Core DLP
Filter (5x5) Filter (7x7) Filter (9x9) Filter (11x11)
Cycle
Cnt
X1000
T (us) E [uJ]
Cycle
Cnt
X1000
T (us) E [uJ]
Cycle
Cnt
X1000
T
(us)
E [uJ]
Cycle
Cnt
X1000
T
(us)
E [uJ]
T13 SIMD 2 52.7 362 50.6 101.2 694 97.1 165.8 1136 159.1 246.5 1689 236.6
T13 SIMD 8 24.6 179 34.4 46.1 335 64.5 74.7 543 104.7 110.6 803 154.8
T13 Sym MIMD 2 19.5 148 26.9 35.8 272 49.4 57.4 436 79.2 84.4 641 116.5
T13 Sym MIMD 8 11.8 113 28.9 19.2 183 46.9 29.8 284 72.7 42.9 408 104.7
T13 Het MIMD 2 20.5 159 28.3 37.5 291 51.8 60.2 467 83.1 88.5 687 122.1
T03 (no accel.) - 247 1120 215.5 514.8 2328 447.9 881.2 3985 766.6 1369.1 6191 1191.1
RISCY - 180 1971 252.0 385.3 4218 539.4 662.5 7252 927.5 1000.2 10949 1400.3
ZeroRiscy - 318.9 2721 226.4 674.5 5754 478.9 1129.7 9637 802.1 1697.8 14482 1205.4
• The matrix being convoluted is 32x32 elements
• The speed-up and energy efficiency trends continue as the filter dimensions grow, reaching X35 speedup over the Zeroriscy reference
17. Information Classification: General
The MIMD-SIMD vector coprocessor schemes enable tuning the TLP and DLP
• >15X absolute time speed-up , -85% energy per operation.
Kernels that are less effectively vectorizable can still take benefit SPMs and TLP, in an IMT core,
• 2X-3X speed-up.
Fully symmetric MIMD and heterogeneous MIMD give very similar results,
• functional unit contention is less impacting than SPM contention.
• coprocessor contention can be effectively mitigated by functional unit heterogeneity
Pure DLP acceleration always give inferior results than a balanced TLP/DLP acceleration.
• The IMT microarchitecture benefits from TLP and DLP acceleration in a single core.
In the absence of hardware acceleration, IMT still exhibits a performance advantage over single-thread execution
• Simplified hardware structure phylosophy
19/04/2021 Pagina 17
CONCLUSIONS
18. Information Classification: General
December 8-10 | Virtual Event
Thank you for joining
Contribute to the RISC-V conversation on social!
#RISCVSUMMIT #KLESSYDRA @mauro_olivieri_
https://github.com/klessydra
Mauro.Olivieri@uniroma1.it