Designing RISC-V-based Accelerators for next generation Computers (DRAC) is a 3-year project (2019-2022) funded by the ERDF Operational Program of Catalonia 2014-2020. DRAC will design, verify, implement and fabricate a high performance general purpose processor that will incorporate different accelerators based on the RISC-V technology, with specific applications in the field of post-quantum security, genomics and autonomous navigation. In this talk, we will provide an overview of the main achievements in the DRAC project, including the fabrication of Lagarto, the first RISC-V processor developed in Spain.
The document provides an overview of vector technology and the RISC-V Vector extension. It discusses SIMD and vector processor concepts, the evolution of vector instruction sets from Intel MMX/SSE/AVX to the RISC-V Vector specification. It also covers vector register file organization, instruction chaining, applications of vector processing, and challenges in implementing vector architectures. Andes Technology is introducing the AndesCoreTM NX27V processor core which implements the RISC-V Vector extension.
The document provides an introduction to ARMv8 Aarch64, a 64-bit instruction set introduced in ARMv8. Some key points:
- It uses 64-bit pointers and registers, with 32-bit fixed length instructions. It has 31 general purpose 64-bit registers.
- Many traditional ARM features are removed or modified, such as conditional execution, immediate shifts, load/store multiple. New features are added like large PC-relative addressing and branching.
- It has a load-acquire/store-release memory model and supports atomic operations. Advanced SIMD and floating point are now mandatory.
The document discusses the specifications and technologies of the Intel Core i5-3550S processor. It uses the Ivy Bridge microarchitecture with a 22nm process. It has 4 cores with 4 threads each, supports up to 32GB of RAM, and has cache memory including a shared 8MB L3 cache. It supports many Intel technologies like Turbo Boost, Hyper-Threading, Virtualization, and AES-NI.
The document discusses the next ten years of RISC-V development. It notes that while specialized ISAs come and go, general purpose ISAs can last for decades. In the next ten years, RISC-V aims to become the industry standard ISA across all computing devices. To manage diversity, RISC-V uses a modular design with bases and extensions. The Technical Steering Committee will focus development on key application domains and release architecture profiles to guide the ecosystem. New security threats and applications will emerge, requiring ongoing development and specialization of RISC-V.
ARM microprocessors are widely used in embedded systems. The document provides an overview of ARM processors including their history, features, product families, architecture, and development tools. Key points covered include ARM's role in licensing processor cores, common ARM-based products, the ARM instruction set architecture, and both open-source and proprietary development tools for ARM processors.
PCIe is a standard expansion card interface introduced in 2004 to replace PCI and PCI-X. It uses serial instead of parallel communication and is scalable, allowing for higher maximum system bandwidth. The presentation discusses the history of expansion card standards leading to PCIe, including ISA, EISA, VESA, PCI, and PCI-X. It also covers key aspects of PCIe such as the root complex, endpoints, switches, lanes, bus:device.function notation, enumeration, and address spaces such as configuration space.
The document provides an overview of vector technology and the RISC-V Vector extension. It discusses SIMD and vector processor concepts, the evolution of vector instruction sets from Intel MMX/SSE/AVX to the RISC-V Vector specification. It also covers vector register file organization, instruction chaining, applications of vector processing, and challenges in implementing vector architectures. Andes Technology is introducing the AndesCoreTM NX27V processor core which implements the RISC-V Vector extension.
The document provides an introduction to ARMv8 Aarch64, a 64-bit instruction set introduced in ARMv8. Some key points:
- It uses 64-bit pointers and registers, with 32-bit fixed length instructions. It has 31 general purpose 64-bit registers.
- Many traditional ARM features are removed or modified, such as conditional execution, immediate shifts, load/store multiple. New features are added like large PC-relative addressing and branching.
- It has a load-acquire/store-release memory model and supports atomic operations. Advanced SIMD and floating point are now mandatory.
The document discusses the specifications and technologies of the Intel Core i5-3550S processor. It uses the Ivy Bridge microarchitecture with a 22nm process. It has 4 cores with 4 threads each, supports up to 32GB of RAM, and has cache memory including a shared 8MB L3 cache. It supports many Intel technologies like Turbo Boost, Hyper-Threading, Virtualization, and AES-NI.
The document discusses the next ten years of RISC-V development. It notes that while specialized ISAs come and go, general purpose ISAs can last for decades. In the next ten years, RISC-V aims to become the industry standard ISA across all computing devices. To manage diversity, RISC-V uses a modular design with bases and extensions. The Technical Steering Committee will focus development on key application domains and release architecture profiles to guide the ecosystem. New security threats and applications will emerge, requiring ongoing development and specialization of RISC-V.
ARM microprocessors are widely used in embedded systems. The document provides an overview of ARM processors including their history, features, product families, architecture, and development tools. Key points covered include ARM's role in licensing processor cores, common ARM-based products, the ARM instruction set architecture, and both open-source and proprietary development tools for ARM processors.
PCIe is a standard expansion card interface introduced in 2004 to replace PCI and PCI-X. It uses serial instead of parallel communication and is scalable, allowing for higher maximum system bandwidth. The presentation discusses the history of expansion card standards leading to PCIe, including ISA, EISA, VESA, PCI, and PCI-X. It also covers key aspects of PCIe such as the root complex, endpoints, switches, lanes, bus:device.function notation, enumeration, and address spaces such as configuration space.
Linux on RISC-V with Open Hardware (ELC-E 2020)Drew Fustini
Want to run Linux on open hardware? This talk will explore how the RISC-V, an open instruction set (ISA), and open source FPGA tools can be leveraged to achieve that goal. I will explain how myself and others at Hackaday Supercon teamed up to get Linux running on a RISC-V soft-core in the ECP5 FPGA on the conference badge. I will introduce Migen, LiteX and Vexriscv, and explain how they enabled us to quickly implement an SoC in the FPGA capable of running Linux. I will also explore other Linux-capable open source RISC-V implementations, and how some are being used in industry. I will highlight that OpenHW Group has adopted the PULP Ariane from ETH Zurich for its Core-V CVA64 implementation. Finally, I will look at what Linux-capable "hard" RISC-V SoC's currently exist, and what is on the horizon for 2020 and 2021. This talk is should be relevant to people who are interested in building open hardware systems capable of running Linux. It should also be useful to people who are curious about RISC-V. Software engineers may find it exciting to learn how Python can be used to for chip-level design with Migen and LiteX, and simplify building a System-on-Chip (SoC) for an FPGA.
Concept explanation on Hardware Abstraction Layer in embedded system development.
Slides start with explanation of modular programming as introduction to show the importance of HAL.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2X8uz92.
Alex Bradbury gives an overview of the status and development of RISC-V as it relates to modern operating systems, highlighting major research strands, controversies, and opportunities to get involved. Filmed at qconlondon.com.
Alex Bradbury is co-founder of lowRISC CIC, aiming to bring the benefits of open source development to the hardware industry by producing a high quality, secure, and open source SoC and associated infrastructure. He is a well-known member of the LLVM community, and is code owner and primary author of the upstream RISC-V back-end.
The document provides an overview of the ARM architecture, including:
- ARM was founded in 1990 and licenses its processor core intellectual property to design partners.
- The ARM instruction set includes 32-bit ARM and 16-bit Thumb instructions. ARM supports different processor modes like user mode, IRQ mode, and FIQ mode.
- Popular ARM processors include ARM7 and Cortex-M series. ARM licenses its IP to semiconductor companies who integrate the cores into various end products.
1. The document provides an overview of the ARM architecture, including details on registers, exceptions, interrupts, memory management and instructions.
2. It describes the evolution of the ARM architecture from ARMv7 to AArch64 (ARMv8), noting changes in registers and exception handling between the versions.
3. The document also covers memory management features like MMU support, different memory types (normal vs device memory), and caching behavior in the ARM architecture.
The x86 instruction set architecture began with Intel's 16-bit processors in the 1980s and has since evolved through numerous extensions. It supports multiple execution modes including 16-bit real mode, 32-bit protected mode, and 64-bit long mode. The instruction format includes optional prefixes, opcode bytes, addressing fields, and immediate data. General purpose registers are used for operands along with memory addressing modes. Subsequent x86 architectures, such as AMD64, expanded register sizes and added new instructions while maintaining backwards compatibility.
The ARM processor architecture uses either reduced instruction set computing (RISC) or complex instruction set computing (CISC). RISC aims to improve performance by reducing the number of clock cycles per instruction through simpler instructions that execute in one cycle. CISC relies more on hardware for complex instructions. Memory in ARM systems is hierarchical, with cache memory closest to the processor core and secondary storage like hard drives further away. Peripherals allow input/output and are memory mapped through registers. Initialization code configures hardware and runs diagnostics before booting the operating system.
This document discusses the RISC-V instruction set architecture (ISA). It begins by providing background on how ISAs determine what processors can be used in different markets. It then describes the origins of RISC-V as a clean-slate open ISA developed at UC Berkeley to address limitations of prior proprietary ISAs. The document outlines key attributes of RISC-V that differentiate it, such as its modularity, extensibility, and support for a wide range of implementations through standard extensions.
RISC-V Boot Process: One Step at a TimeAtish Patra
- OpenSBI is an open-source implementation of the RISC-V Supervisor Binary Interface (SBI) specifications. It provides runtime services in M-mode to facilitate booting of operating systems.
- OpenSBI supports various RISC-V platforms including SiFive boards, QEMU, and is integrated with projects like U-Boot and EDK2. It provides a standardized way for operating systems to interface with the underlying hardware.
- Future work includes supporting more platforms, implementing the SBI v0.2 specification, and enabling features like sequential CPU boot and hypervisor support. OpenSBI aims to establish a stable boot ecosystem for RISC-V.
This document discusses various serial communication protocols used in embedded systems including RS-232, RS-485, I2C, SPI, CAN, and USB. It provides details on the voltage levels, maximum speeds, cable lengths, and other specifications of each protocol. It explains how differential signaling and twisted pair cables allow RS-485 to communicate over longer distances and faster speeds compared to RS-232.
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
This document discusses architectural exploration for AI and ML accelerators using simulation tools. It notes that current AI/ML applications require custom hardware configurations to achieve performance goals. The Imperas simulation tools allow analyzing performance on different hardware designs by running software on virtual platforms months before RTL implementation. Imperas provides virtual platforms for heterogeneous systems running full operating systems along with detailed analysis, profiling and debugging tools. It also includes a RISC-V reference model that enables developing custom instructions for architectural exploration of AI/ML accelerators.
The Pentium processor introduced in 1993 features a superscalar architecture that allows multiple instructions to be executed simultaneously. It has separate 8KB instruction and data caches and a 64-bit data bus. The Pentium uses dynamic branch prediction and out-of-order execution to further improve performance through superscalar design.
Simultaneously Leveraging Linux and Android in a GENIVI compliant IVI System mentoresd
Simultaneously Leveraging Linux and Android in a GENIVI compliant IVI System – Andrew Patterson
It is widely accepted that Linux is the operating system of choice when building a complex, in-vehicle infotainment (IVI) system. The ability to support and quickly integrate device drivers for features such as CAN, MOST, graphics accelerators, networking interfaces, and Bluetooth can result in key differentiators for any GENIVI compliant IVI-based system. But what if Android was introduced as a second operating system? This session multiple implementations integrating both Android and Linux on multicore SoCs sharing audio and video resources across both domains while maintaining GENIVI compliance. Implementations with and without hypervisor technology will also be presented.
The document introduces RISC-V, an open instruction set architecture originated at UC Berkeley, outlines its design goals of being freely available and suitable for direct hardware implementation, and describes aspects of its ISA design including its load-store architecture, lack of condition codes, and support for 32, 64, and 128-bit addressing as well as its calling convention for passing arguments in registers and on the stack.
The document discusses implementing PCIe Address Translation Services (ATS) in ARM-based systems-on-chips (SoCs). It describes an example ARM server system with various components like CPUs, memory controllers, and I/O devices. It then explains how ATS works to improve memory access performance by allowing devices to cache address translations locally instead of relying solely on the IOMMU. The document outlines the typical components involved in ATS like the address translation cache, translating agent, and address translation protection table. It also describes how the ARM System MMU (SMMU) implements ATS and supports distributed address translation caching by endpoints.
This document provides an introduction to the ARM processor architecture. It discusses key aspects of ARM including the ARM programming model, instruction set, memory hierarchy, and development tools. ARM is a popular reduced instruction set computing (RISC) architecture used in many portable electronic devices due to its low power consumption.
This document summarizes a presentation on reverse engineering the Rocket-Chip SoC generator to develop a customized SoC called Aghaaz. The presentation covers deconstructing the Rocket-Chip software architecture, developing a Micro-Architecture and Software Specification (MASS) document, configuring an Aghaaz SoC using the MASS document, and generating the SoC from the Rocket-Chip generator. Key aspects included developing object-oriented representations of Rocket-Chip modules, flowcharts to explain the code, and configuring an RV32 core with caches and extensions.
The document discusses different types of reusable components in system-on-chip (SoC) design including intellectual property (IP) cores. It describes synthesizable cores as having a high-level description but requiring synthesis and layout, soft cores as having a technology-dependent netlist but layout is required, and firm cores as having an encrypted netlist with layout and size/speed predictability. Hard cores are described as having a fixed, technology-specific layout with determined size and speed but lack portability between foundries. The document notes that hard cores provide benefits of high performance, low power consumption, and predictability for applications like processor cores, memories, and FPGAs.
The document discusses System on Chips (SoCs). It begins by outlining Moore's Law and how IC technology has scaled over time. This has enabled more system components to be integrated onto a single chip to create SoCs. The document then discusses trends in IC technology like technology scaling, system-on-a-chip, embedded systems, and time-to-market pressures. It provides examples of SoC applications and describes the SoC design process involving hardware-software co-design and reuse of intellectual property cores. In conclusion, the document defines an SoC as an integrated circuit that implements most or all functions of an electronic system on a single chip.
Arm device tree and linux device driversHoucheng Lin
This document discusses how the Linux kernel supports different ARM boards using a common source code base. It describes how device tree is used to describe hardware in a board-agnostic way. The kernel initializes machine-specific code via the device tree and initializes drivers by matching compatible strings. This allows a single kernel binary to support multiple boards by abstracting low-level hardware details into the device tree rather than the kernel source. The document also contrasts the ARM approach to the x86 approach, where BIOS abstraction and standardized buses allow one kernel to support most x86 hardware.
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systemsinside-BigData.com
In this deck from the 2018 Swiss HPC Conference, DK Panda from Ohio State University presents: Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems.
"This talk will focus on challenges in designing HPC, Deep Learning, and HPC Cloud middleware for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X (PGAS - OpenSHMEM/UPC/CAF/UPC++, OpenMP, and CUDA) programming models by taking into account support for multi-core systems (KNL and OpenPower), high-performance networks, GPGPUs (including GPUDirect RDMA), and energy-awareness. Features, sample performance numbers and best practices of using MVAPICH2 libraries (http://mvapich.cse.ohio-state.edu)will be presented.
For the Deep Learning domain, we will focus on popular Deep Learning frameworks (Caffe, CNTK, and TensorFlow) to extract performance and scalability with MVAPICH2-GDR MPI library. Finally, we will outline the challenges in moving these middleware to the Cloud environments."
Watch the video: https://wp.me/p3RLHQ-iyc
Learn more: http://www.cse.ohio-state.edu/~panda
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the 2018 Swiss HPC Conference, DK Panda from Ohio State University presents: Exploiting HPC Technologies for Accelerating Big Data Processing and Associated Deep Learning.
"This talk will provide an overview of challenges in accelerating Hadoop, Spark, and Memcached on modern HPC clusters. An overview of RDMA-based designs for Hadoop (HDFS, MapReduce, RPC and HBase), Spark, Memcached, Swift, and Kafka using native RDMA support for InfiniBand and RoCE will be presented. Enhanced designs for these components to exploit NVM-based in-memory technology and parallel file systems (such as Lustre) will also be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project will be shown. Benefits of these stacks to accelerate deep learning frameworks (such as CaffeOnSpark and TensorFlowOnSpark) will be presented."
Watch the video: https://wp.me/p3RLHQ-iko
Learn more: http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Linux on RISC-V with Open Hardware (ELC-E 2020)Drew Fustini
Want to run Linux on open hardware? This talk will explore how the RISC-V, an open instruction set (ISA), and open source FPGA tools can be leveraged to achieve that goal. I will explain how myself and others at Hackaday Supercon teamed up to get Linux running on a RISC-V soft-core in the ECP5 FPGA on the conference badge. I will introduce Migen, LiteX and Vexriscv, and explain how they enabled us to quickly implement an SoC in the FPGA capable of running Linux. I will also explore other Linux-capable open source RISC-V implementations, and how some are being used in industry. I will highlight that OpenHW Group has adopted the PULP Ariane from ETH Zurich for its Core-V CVA64 implementation. Finally, I will look at what Linux-capable "hard" RISC-V SoC's currently exist, and what is on the horizon for 2020 and 2021. This talk is should be relevant to people who are interested in building open hardware systems capable of running Linux. It should also be useful to people who are curious about RISC-V. Software engineers may find it exciting to learn how Python can be used to for chip-level design with Migen and LiteX, and simplify building a System-on-Chip (SoC) for an FPGA.
Concept explanation on Hardware Abstraction Layer in embedded system development.
Slides start with explanation of modular programming as introduction to show the importance of HAL.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2X8uz92.
Alex Bradbury gives an overview of the status and development of RISC-V as it relates to modern operating systems, highlighting major research strands, controversies, and opportunities to get involved. Filmed at qconlondon.com.
Alex Bradbury is co-founder of lowRISC CIC, aiming to bring the benefits of open source development to the hardware industry by producing a high quality, secure, and open source SoC and associated infrastructure. He is a well-known member of the LLVM community, and is code owner and primary author of the upstream RISC-V back-end.
The document provides an overview of the ARM architecture, including:
- ARM was founded in 1990 and licenses its processor core intellectual property to design partners.
- The ARM instruction set includes 32-bit ARM and 16-bit Thumb instructions. ARM supports different processor modes like user mode, IRQ mode, and FIQ mode.
- Popular ARM processors include ARM7 and Cortex-M series. ARM licenses its IP to semiconductor companies who integrate the cores into various end products.
1. The document provides an overview of the ARM architecture, including details on registers, exceptions, interrupts, memory management and instructions.
2. It describes the evolution of the ARM architecture from ARMv7 to AArch64 (ARMv8), noting changes in registers and exception handling between the versions.
3. The document also covers memory management features like MMU support, different memory types (normal vs device memory), and caching behavior in the ARM architecture.
The x86 instruction set architecture began with Intel's 16-bit processors in the 1980s and has since evolved through numerous extensions. It supports multiple execution modes including 16-bit real mode, 32-bit protected mode, and 64-bit long mode. The instruction format includes optional prefixes, opcode bytes, addressing fields, and immediate data. General purpose registers are used for operands along with memory addressing modes. Subsequent x86 architectures, such as AMD64, expanded register sizes and added new instructions while maintaining backwards compatibility.
The ARM processor architecture uses either reduced instruction set computing (RISC) or complex instruction set computing (CISC). RISC aims to improve performance by reducing the number of clock cycles per instruction through simpler instructions that execute in one cycle. CISC relies more on hardware for complex instructions. Memory in ARM systems is hierarchical, with cache memory closest to the processor core and secondary storage like hard drives further away. Peripherals allow input/output and are memory mapped through registers. Initialization code configures hardware and runs diagnostics before booting the operating system.
This document discusses the RISC-V instruction set architecture (ISA). It begins by providing background on how ISAs determine what processors can be used in different markets. It then describes the origins of RISC-V as a clean-slate open ISA developed at UC Berkeley to address limitations of prior proprietary ISAs. The document outlines key attributes of RISC-V that differentiate it, such as its modularity, extensibility, and support for a wide range of implementations through standard extensions.
RISC-V Boot Process: One Step at a TimeAtish Patra
- OpenSBI is an open-source implementation of the RISC-V Supervisor Binary Interface (SBI) specifications. It provides runtime services in M-mode to facilitate booting of operating systems.
- OpenSBI supports various RISC-V platforms including SiFive boards, QEMU, and is integrated with projects like U-Boot and EDK2. It provides a standardized way for operating systems to interface with the underlying hardware.
- Future work includes supporting more platforms, implementing the SBI v0.2 specification, and enabling features like sequential CPU boot and hypervisor support. OpenSBI aims to establish a stable boot ecosystem for RISC-V.
This document discusses various serial communication protocols used in embedded systems including RS-232, RS-485, I2C, SPI, CAN, and USB. It provides details on the voltage levels, maximum speeds, cable lengths, and other specifications of each protocol. It explains how differential signaling and twisted pair cables allow RS-485 to communicate over longer distances and faster speeds compared to RS-232.
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
This document discusses architectural exploration for AI and ML accelerators using simulation tools. It notes that current AI/ML applications require custom hardware configurations to achieve performance goals. The Imperas simulation tools allow analyzing performance on different hardware designs by running software on virtual platforms months before RTL implementation. Imperas provides virtual platforms for heterogeneous systems running full operating systems along with detailed analysis, profiling and debugging tools. It also includes a RISC-V reference model that enables developing custom instructions for architectural exploration of AI/ML accelerators.
The Pentium processor introduced in 1993 features a superscalar architecture that allows multiple instructions to be executed simultaneously. It has separate 8KB instruction and data caches and a 64-bit data bus. The Pentium uses dynamic branch prediction and out-of-order execution to further improve performance through superscalar design.
Simultaneously Leveraging Linux and Android in a GENIVI compliant IVI System mentoresd
Simultaneously Leveraging Linux and Android in a GENIVI compliant IVI System – Andrew Patterson
It is widely accepted that Linux is the operating system of choice when building a complex, in-vehicle infotainment (IVI) system. The ability to support and quickly integrate device drivers for features such as CAN, MOST, graphics accelerators, networking interfaces, and Bluetooth can result in key differentiators for any GENIVI compliant IVI-based system. But what if Android was introduced as a second operating system? This session multiple implementations integrating both Android and Linux on multicore SoCs sharing audio and video resources across both domains while maintaining GENIVI compliance. Implementations with and without hypervisor technology will also be presented.
The document introduces RISC-V, an open instruction set architecture originated at UC Berkeley, outlines its design goals of being freely available and suitable for direct hardware implementation, and describes aspects of its ISA design including its load-store architecture, lack of condition codes, and support for 32, 64, and 128-bit addressing as well as its calling convention for passing arguments in registers and on the stack.
The document discusses implementing PCIe Address Translation Services (ATS) in ARM-based systems-on-chips (SoCs). It describes an example ARM server system with various components like CPUs, memory controllers, and I/O devices. It then explains how ATS works to improve memory access performance by allowing devices to cache address translations locally instead of relying solely on the IOMMU. The document outlines the typical components involved in ATS like the address translation cache, translating agent, and address translation protection table. It also describes how the ARM System MMU (SMMU) implements ATS and supports distributed address translation caching by endpoints.
This document provides an introduction to the ARM processor architecture. It discusses key aspects of ARM including the ARM programming model, instruction set, memory hierarchy, and development tools. ARM is a popular reduced instruction set computing (RISC) architecture used in many portable electronic devices due to its low power consumption.
This document summarizes a presentation on reverse engineering the Rocket-Chip SoC generator to develop a customized SoC called Aghaaz. The presentation covers deconstructing the Rocket-Chip software architecture, developing a Micro-Architecture and Software Specification (MASS) document, configuring an Aghaaz SoC using the MASS document, and generating the SoC from the Rocket-Chip generator. Key aspects included developing object-oriented representations of Rocket-Chip modules, flowcharts to explain the code, and configuring an RV32 core with caches and extensions.
The document discusses different types of reusable components in system-on-chip (SoC) design including intellectual property (IP) cores. It describes synthesizable cores as having a high-level description but requiring synthesis and layout, soft cores as having a technology-dependent netlist but layout is required, and firm cores as having an encrypted netlist with layout and size/speed predictability. Hard cores are described as having a fixed, technology-specific layout with determined size and speed but lack portability between foundries. The document notes that hard cores provide benefits of high performance, low power consumption, and predictability for applications like processor cores, memories, and FPGAs.
The document discusses System on Chips (SoCs). It begins by outlining Moore's Law and how IC technology has scaled over time. This has enabled more system components to be integrated onto a single chip to create SoCs. The document then discusses trends in IC technology like technology scaling, system-on-a-chip, embedded systems, and time-to-market pressures. It provides examples of SoC applications and describes the SoC design process involving hardware-software co-design and reuse of intellectual property cores. In conclusion, the document defines an SoC as an integrated circuit that implements most or all functions of an electronic system on a single chip.
Arm device tree and linux device driversHoucheng Lin
This document discusses how the Linux kernel supports different ARM boards using a common source code base. It describes how device tree is used to describe hardware in a board-agnostic way. The kernel initializes machine-specific code via the device tree and initializes drivers by matching compatible strings. This allows a single kernel binary to support multiple boards by abstracting low-level hardware details into the device tree rather than the kernel source. The document also contrasts the ARM approach to the x86 approach, where BIOS abstraction and standardized buses allow one kernel to support most x86 hardware.
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systemsinside-BigData.com
In this deck from the 2018 Swiss HPC Conference, DK Panda from Ohio State University presents: Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems.
"This talk will focus on challenges in designing HPC, Deep Learning, and HPC Cloud middleware for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X (PGAS - OpenSHMEM/UPC/CAF/UPC++, OpenMP, and CUDA) programming models by taking into account support for multi-core systems (KNL and OpenPower), high-performance networks, GPGPUs (including GPUDirect RDMA), and energy-awareness. Features, sample performance numbers and best practices of using MVAPICH2 libraries (http://mvapich.cse.ohio-state.edu)will be presented.
For the Deep Learning domain, we will focus on popular Deep Learning frameworks (Caffe, CNTK, and TensorFlow) to extract performance and scalability with MVAPICH2-GDR MPI library. Finally, we will outline the challenges in moving these middleware to the Cloud environments."
Watch the video: https://wp.me/p3RLHQ-iyc
Learn more: http://www.cse.ohio-state.edu/~panda
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the 2018 Swiss HPC Conference, DK Panda from Ohio State University presents: Exploiting HPC Technologies for Accelerating Big Data Processing and Associated Deep Learning.
"This talk will provide an overview of challenges in accelerating Hadoop, Spark, and Memcached on modern HPC clusters. An overview of RDMA-based designs for Hadoop (HDFS, MapReduce, RPC and HBase), Spark, Memcached, Swift, and Kafka using native RDMA support for InfiniBand and RoCE will be presented. Enhanced designs for these components to exploit NVM-based in-memory technology and parallel file systems (such as Lustre) will also be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project will be shown. Benefits of these stacks to accelerate deep learning frameworks (such as CaffeOnSpark and TensorFlowOnSpark) will be presented."
Watch the video: https://wp.me/p3RLHQ-iko
Learn more: http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the HPC Advisory Council Spain Conference, DK Panda from Ohio State University presents: Communication Frameworks for HPC and Big Data.
Watch the video presentation: http://insidehpc.com/2015/09/video-communication-frameworks-for-hpc-and-big-data/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Learn more: http://www.hpcadvisorycouncil.com/events/2015/spain-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
HiPEAC 2019 Workshop - Hardware Starter Kit Agri Tulipp. Eu
The TSK-Agri is an energy efficient embedded image processing board featuring a Xilinx Zynq UltraScale+ MPSoC processor. It has various I/O including digital, analog and USB ports, compatibility with common sensors and frameworks like ROS and OpenCV, and open source software/firmware. An example application involves computer vision and the board is optimized for tasks like deep learning on neural networks with low power consumption and high performance.
In this deck from the HPC User Forum in Detroit, Leonardo Flores from the European Commission presents: EuroHPC and European HPC Strategy.
EuroHPC is a joint collaboration between European countries and the European Union about developing and supporting exascale supercomputing by 2022/2023. The EuroHPC declaration, signed on March 23 2017 signed by 7 countries marked the beginning of EuroHPC. Meanwhile there are 21 countries that have joined EuroHPC. However it is not certain yet whether Cyprus will sign the EuroHPC JU from the start (News from 2018-05-01). Latvia was the 21nd country to commit to joining EuroHPC. Estonia was the 21nd to sign the declaration and the 22nd to commit."
Watch the video: https://insidehpc.com/2018/09/video-eurohpc-european-hpc-strategy/
Sign up for our insideHPC Newsletter: http://insidehpc.com
El nuevo superordenador Mare Nostrum y el futuro procesador europeoAMETIC
Presentación a cargo de Mateo Valero, del BSC-CNC, en el 33er Encuentro de la Economía Digital y las Telecomunicaciones organizado por AMETIC y Santander Empresas en colaboración con la UIMP
A Library for Emerging High-Performance Computing ClustersIntel® Software
This document discusses the challenges of developing communication libraries for exascale systems using hybrid MPI+X programming models. It describes how current MPI+PGAS approaches use separate runtimes, which can lead to issues like deadlock. The document advocates for a unified runtime that can support multiple programming models simultaneously to avoid such issues and enable better performance. It also outlines MVAPICH2's work on designs like multi-endpoint that integrate MPI and OpenMP to efficiently support emerging highly threaded systems.
Hi,
My name is Rohan Narula. I am a Fresh Graduate from The University of Texas at Arlington (MS Electrical Engineering) seeking full-time opportunities from June 2017. My specializations are in Embedded Systems / Firmware Development, Automation & Controls.
Educating the computer architects of tomorrow's critical systems with RISC-VRISC-V International
This document provides information about a virtual RISC-V summit event taking place from December 8-10. It then summarizes a presentation given by Leonidas Kosmidis on educating computer architects with RISC-V. The presentation discusses safety critical systems and why companies are interested in RISC-V for these applications. It also describes the computer architecture curriculum and RISC-V projects at the Polytechnic University of Catalonia and Barcelona Supercomputing Center. Specific projects from a processor design course are summarized, including dual/triple lockstep CPUs, a WCET support implementation, and vector extensions added to the Lagarto RISC-V core. The document concludes by acknowledging those involved
High-Performance and Scalable Designs of Programming Models for Exascale Systemsinside-BigData.com
- The document discusses programming models and challenges for exascale systems. It focuses on MPI and PGAS models like OpenSHMEM.
- Key challenges include supporting hybrid MPI+PGAS programming, efficient communication for multi-core and accelerator nodes, fault tolerance, and extreme low memory usage.
- The MVAPICH2 project aims to address these challenges through its high performance MPI and PGAS implementation and optimization of communication for technologies like InfiniBand.
EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...VEDLIoT Project
IoT - Accelerated Deep Learning for Cognitive Edge Computing, Jens Hagemeyer, EU-IoT Training Workshops Series – “AIoT and Edge Machine Learning”, May 2021
Many HPC applications are massively parallel and can benefit from the spatial parallelism offered by reconfigurable logic. While modern memory technologies can offer high bandwidth, designers must craft advanced communication and memory architectures for efficient data movement and on-chip storage. Addressing these challenges requires to combine compiler optimizations, high-level synthesis, and hardware design.
In this talk, I will present challenges, solutions, and trends for generating massively parallel accelerators on FPGA for high-performance computing. These architectures can provide performance comparable to software implementations on high-end processors, and much higher energy efficiency thanks to logic customization.
This bilingual engineer has 19 years of experience designing hardware and software architectures for embedded systems in various industries. He has extensive skills in programming languages like C/C++ and software tools. He has experience working with processors like ARM and FPGAs from Altera and Xilinx. His work experience includes projects in medical devices, telecommunications, and research involving designing electronic boards, coding firmware, and developing software.
Real time machine learning proposers day v3mustafa sarac
This document discusses DARPA's Real Time Machine Learning (RTML) program. The objective is to develop hardware generators and compilers that can automatically create application-specific integrated circuits for machine learning from high-level code. This would allow no-human-in-the-loop creation of efficient neural network hardware. The program has two phases: phase 1 develops an ML hardware compiler, and phase 2 demonstrates RTML systems for applications like wireless communication and image processing. Key goals are high performance, low power consumption, and support for a variety of neural network architectures and machine learning techniques.
VEDLIoT – A heterogeneous hardware platform for next-gen AIoT applications, Jens Hagemeyer, EU-IoT Training Session on “Machine Learning at the Edge and the FarEdge”, IoT Week (online event), August 2021
Iirdem design and implementation of finger writing in air by using open cv (c...Iaetsd Iaetsd
The document describes a project to design a system for finger writing in air using an Open CV library on an ARM platform. The proposed system uses a webcam, ARM microcontroller and display unit to capture finger movements or handwriting in front of the camera and display it on the screen in real-time. It analyzes the finger trajectories using Open CV and recognizes the patterns for display. The system is aimed at providing a more accessible way of digital writing compared to conventional methods.
Cedric Philippe is a 31-year-old French software engineer with 4 years of experience in C++ development for energy companies. He has worked on projects involving protocols like InterSAS, TASE2, and ASN1. His experience includes roles in analysis, design, development, testing and deployment of software. He is currently seeking new challenges in Brussels and has experience working in multiple countries.
The document discusses the Re-Vision stack which provides implementations for computer vision applications using FPGAs. It includes designs for dense optical flow, stereo vision, and combining multiple algorithms. The stack utilizes OpenCV libraries optimized for FPGAs and supports various camera interfaces. It leverages SDSoC to integrate C/C++ code with FPGA logic and allows parallel processing. Deep learning is mentioned as a future integration to combine computer vision tasks with neural networks. Resource conflicts between video and neural network memory needs require additional programmable logic and connections.
UCX: An Open Source Framework for HPC Network APIs and BeyondEd Dodds
UCX is an open source framework for high performance computing (HPC) network APIs and beyond. It is a collaborative effort between industry, national laboratories, and academia to develop the next generation HPC communication framework. UCX aims to provide a unified communication API that supports multiple network architectures and HPC programming models through a performance-oriented and community-driven approach.
Similar to DRAC: Designing RISC-V-based Accelerators for next generation Computers (20)
"El álgebra lineal es una herramienta fundamental en muchos campos de la ciencia y la tecnología. Es particularmente importante en la física, la ingeniería, la informática y la estadística. La capacidad de manipular eficientemente grandes cantidades de datos y matrices complejas es esencial en estas áreas para la resolución de problemas y la toma de decisiones.
A priori, puede dar la sensación de que estamos muy lejos del uso del álgebra lineal en nuestro día a día. Sin embargo, algunas técnicas como la descomposición en valores singulares y la regresión lineal para entrenar modelos y hacer predicciones precisas están detrás de la inteligencia artificial y el aprendizaje automático. ¿Te suena ChatGPT? Puede no parecerlo, pero el álgebra lineal también está detrás en algunos de sus procesos. Por este motivo, debemos seguir trabajando en este campo, ya que su importancia seguirá creciendo a medida que se generen y analicen grandes cantidades de datos en el mundo actual.
"
La pandemia de COVID-19 ha supuesto una proliferación de mapas y contramapas. Por ello, organizaciones de la sociedad civil y movimientos sociales han generado sus propias interpretaciones y representaciones de los datos sobre la crisis. Estos también han contribuido a visibilizar aspectos, sujetos y temas que han sido desatendidos o infrarrepresentados en las visualizaciones hegemónicas y dominantes. En este contexto, la presente ponencia se centra en el análisis de los imaginarios sociales relacionados con la elaboración de mapas durante la pandemia. Es decir, trata de indagar en la importancia de los mapas para el activismo digital, las potencialidades que se extraen de esta tecnología y los valores asociados a las visualizaciones creadas con ellos. El objetivo último es reflexionar sobre la vía emergente del activismo de datos, así como sobre la intersección entre los imaginarios sociales y la geografía digital.
This talk will begin introducing the uElectronics section of ESA at ESTEC and the general activities the group is responsible for. Then, it will go through some of the R+D on-going activities that the group is involved with, hand in hand with universities and/or companies. One of the major ones is related to the European rad-hard FPGAs that have been partially founded by ESA for several years and that will be playing a major role in the sector in the upcoming years. It´s also worth talking about the RTL soft IPs that are currently under development and that will allow us to keep on providing the European ecosystem with some key capabilities. The latter will be an overview of RISC-V space hardened on-going activities that might be replacing the current SPARC based processors available for our missions.
El objetivo de esta charla es presentar las últimas novedades incorporadas en la arquitectura ARM y describir las tendencias en la microarquitectura de los procesadores con arquitectura ARM. ARM es una empresa relativamente pequeña en comparación con otros gigantes del sector tecnológico. Sin embargo, la amplia implantación de su arquitectura, siendo ampliamente dominante en algunos sectores, y sus microarquitecturas, hacen que la tecnología ARM ocupe un lugar central en el desarrollo tecnológico del mundo actual. La tecnología ARM está presente prácticamente en todo el espectro tecnológico, desde los dispositivos más sencillos hasta el HPC y Cloud computing, pasando por los smartphones, automoción electrónica de consumo, etc
"Formal verification has been used by computer scientists for decades to prevent
software bugs. However, with a few exceptions, it has not been used by researchers
working in most areas of mathematics (geometry, algebra, analysis, etc.). In this
talk, we will discuss how this has changed in the past few years, and the possible
implications to the future of mathematical research, teaching and communication.
We will focus on the theorem prover Lean and its mathematical library
mathlib, since this is currently the system most widely used by mathematicians.
Lean is a functional programming language and interactive theorem prover based
on dependent type theory, with proof irrelevance and non-cumulative universes.
The mathlib library, open-source and designed as a basis for research level
mathematics, is one of the largest collections of formalized mathematics. It allows
classical reasoning, uses large- and small-scale automation, and is characterized
by its decentralized nature with over 200 contributors, including both computer
scientists and mathematicians."
"Part of the research community thinks that it is still early to tackle the development of quantum software engineering techniques. The reason is that how the quantum computers of the future will look like is still unknown. However, there are some facts that we can affirm today: 1) quantum and classical computers will coexist, each dedicated to the tasks at which they are most efficient. 2) quantum computers will be part of the cloud infrastructure and will be accessible through the Internet. 3) complex software systems will be made up of smaller pieces that will collaborate with each other. 4) some of those pieces will be quantum, therefore the systems of the future will be hybrid. 5) the coexistence and interaction between the components of said hybrid systems will be supported by service composition: quantum services.
This talk analyzes the challenges that the integration of quantum services poses to Service Oriented Computing."
In this talk, after a brief overview of AI concepts in particular Machine Learning (ML) techniques, some of the well-known computer design concepts for high performance and power efficiency are presented. Subsequently, those techniques that have had a promising impact for computing ML algorithms are discussed. Deep learning has emerged as a game changer for many applications in various fields of engineering and medical sciences. Although the primary computation function is matrix vector multiplication, many competing efficient implementations of this primary function have been proposed and put into practice. This talk will review and compare some of those techniques that are used for ML computer design.
Tras una breve introducción a la informática médica y unas pinceladas sobre conceptos prácticos de Inteligencia Artificial (posible definición consensuada, strong VS weak AI y técnicas y métodos comúnmente empleados), el bloque central de la charla muestra ejemplos prácticos (en forma de casos de éxito) de distintos desarrollos llevados a cabo por el grupo de Sistemas Informáticos de Nueva Generación (SING: http//sing-group.org/) en los ámbitos de (i) Informática clínica (InNoCBR, PolyDeep), (ii) Informática para investigación clínica (PathJam, WhichGenes), (iii) bioinformática traslacional (Genómica: ALTER, Proteómica: DPD, BI, BS, Mlibrary, Mass-Up, e integración de datos ÓMICOS: PunDrugs) y (iv) Informática en salud pública (CURMIS4th). Finalmente, se comenta brevemente la importancia que se espera tenga en un futuro inmediato la IA interpretable (XAI, Explainable Artificial Intelligence) y la participación humana (HITL. Human-In-The-Loop). La charla termina con una breve reflexión sobre las lecciones aprendidas por el ponente después de más de 16 años de desarrollo de sistemas inteligentes en el ámbito de la informática médica.
Many emerging applications require methods tailored towards high-speed data acquisition and filtering of streaming data followed by offline event reconstruction and analysis. In this case, the main objective is to relieve the immense pressure on the storage and communication resources within the experimental infrastructure. In other applications, ultra low latency real time analysis is required for autonomous experimental systems and anomaly detection in acquired scientific data in the absence of any prior data model for unknown events. At these data rates, traditional computing approaches cannot carry out even cursory analyses in a time frame necessary to guide experimentation. In this talk, Prof. Ogrenci will present some examples of AI hardware architectures. She will discuss the concept of co-design, which makes the unique needs of an application domain transparent to the hardware design process and present examples from three applications: (1) An in-pixel AI chip built using the HLS methodology; (2) A radiation hardened ASIC chip for quantum systems; (3) An FPGA-based edge computing controller for real-time control of a High Energy Physics experiment.
En esta conferencia se presentará una revisión del concepto de autonomía para robots móviles de campo y la identificación de desafíos para lograr un verdadero sistema autónomo, además de sugerir posibles direcciones de investigación. Los sistemas robóticos inteligentes, por lo general, obtienen conocimiento de sus funciones y del entorno de trabajo en etapa de diseño y desarrollo. Este enfoque no siempre es eficiente, especialmente en entornos semiestructurados y complejos como puede ser el campo de cultivo. Un sistema robótico verdaderamente autónomo debería desarrollar habilidades que le permitan tener éxito en tales entornos sin la necesidad de tener a-priori un conocimiento ontológico del área de trabajo y la definición de un conjunto de tareas o comportamientos predefinidos. Por lo que en esta conferencia se presentarán posibles estrategias basadas en Inteligencia Artificial que permitan perfeccionar las capacidades de navegación de robots móviles y que sean capaces de ofrecer un nivel de autonomía lo suficientemente elevado para poder ejecutar todas las tareas dentro de una misión casa-a-casa (home-to-home).
Quantum computing has become a noteworthy topic in academia and industry. The multinational companies in the world have been obtaining impressive advances in all areas of quantum technology during the last two decades. These companies try to construct real quantum computers in order to exploit their theoretical preferences over today’s classical computers in practical applications. However, they are challenging to build a full-scale quantum computer because of their increased susceptibility to errors due to decoherence and other quantum noise. Therefore, quantum error correction (QEC) and fault-tolerance protocol will be essential for running quantum algorithms on large-scale quantum computers.
The overall effect of noise is modeled in terms of a set of Pauli operators and the identity acting on the physical qubits (bit flip, phase flip and a combination of bit and phase flips). In addition to Pauli errors, there is another error named leakage errors that occur when a qubit leaves the defined computational subspace. As the location of leakage errors is unknown, these can damage even more the quantum computations. Thus, this talk will briefly provide quantum error models.
Los chatbots son un elemento clave en la transformación digital de nuestra sociedad. Están por todas partes: eCommerce, salud digital, asistencia a clientes, turismo,... Pero si habéis usado alguno, probablemente os habrá decepcionado. Lo confieso, la mayoría de los chatbots que existen son muy malos. Y es que no es nada fácil hacer un chatbot que sea realmente útil e inteligente. Un chatbot combina toda la complejidad de la ingeniería de software con la del procesamiento de lenguaje natural. Pensad que muchos chatbots hay que desplegarlos en varios canales (web, telegram, slack,...) y a menudo tienen que utilizar APIs y servicios externos, acceder a bases de datos internas o integrar modelos de lenguaje preentrenados (por ej. detectores de toxicidad), etc. Y el problema no es sólo crear el bot, si no también probarlo y evolucionarlo. En esta charla veremos los mayores desafíos a los que hay que enfrentarse cuando nos encargan un proyecto de desarrollo que incluye un chatbot y qué técnicas y estrategias podemos ir aplicando en función de las necesidades del proyecto, para conseguir, esta vez sí un chatbot que sepa de lo que habla.
The main challenge of concurrent software verification has always been in achieving modularity, i.e., the ability to divide and conquer the correctness proofs with the goal of scaling the verification effort. Types are a formal method well-known for its ability to modularize programs, and in the case of dependent types, the ability to modularize and scale complex mathematical proofs.
In this talk I will present our recent work towards reconciling dependent types with shared memory concurrency, with the goal of achieving modular proofs for the latter. Applying the type-theoretic paradigm to concurrency has lead us to view separation logic as a type theory of state, and has motivated novel abstractions for expressing concurrency proofs based on the algebraic structure of a resource and on structure-preserving functions (i.e., morphisms) between resources.
Microarchitectural attacks, such as Spectre and Meltdown, are a class of
security threats that affect almost all modern processors. These attacks exploit the side-effects resulting from processor optimizations to leak sensitive information and compromise a system’s security.
Over the years, a large number of hardware and software mechanisms for
preventing microarchitectural leaks have been proposed. Intuitively, more
defensive mechanisms are less efficient, while more permissive mechanisms may offer more performance but require more defensive programming. Unfortunately, there are no
hardware-software contracts that would turn this intuition into a basis for
principled co-design.
In this talk, we present a framework for specifying hardware/software security
contracts, an abstraction that captures a processor’s security guarantees in a
simple, mechanism-independent manner by specifying which program executions a
microarchitectural attacker can distinguish.
La aparición de vulnerabilidades por la falta de controles de seguridad es una de las causas por las que se demandan nuevos marcos de trabajo que produzcan software seguro de forma predeterminada. En la conferencia se abordará cómo transformar el proceso de desarrollo de software dando la importancia que merece la seguridad desde el inicio del ciclo de vida. Para ello se propone un nuevo modelo de desarrollo – modelo Viewnext-UEx – que incorpora prácticas de seguridad de forma preventiva y sistemática en todas las fases del proceso de ciclo de vida del software. El propósito de este nuevo modelo es anticipar la detección de vulnerabilidades aplicando la seguridad desde las fases más tempranas, a la vez que se optimizan los procesos de construcción del software. Se exponen los resultados de un escenario preventivo, tras la aplicación del modelo Viewnext-UEx, frente al escenario reactivo tradicional de aplicar la seguridad a partir de la fase de testing.
This document discusses trusting artificial intelligence systems. It begins with an overview of trust in social and computing contexts. It then discusses artificial intelligence, including machine learning, deep learning, and natural language processing. It details how AI systems can be attacked, including adversarial inputs, data poisoning, and model stealing. It raises important discussions around using AI in contexts like cybersecurity, medicine, transportation, and sentiment analysis, and the challenges of ensuring systems can be trusted.
El uso de energías renovables es clave para cumplir los objetivos de desarrollo sostenible de la Agenda 2030. Entre estas energías, la eólica es la segunda más utilizada debido a su alta eficiencia. Algunos estudios sugieren que la energía eólica será la principal fuente de generación en 2050. Por ello es conveniente seguir investigando en la aplicación de técnicas de control avanzadas en estos sistemas.
Entre estas técnicas avanzadas cabe destacar las redes neuronales y el aprendizaje por refuerzo combinadas con estrategias clásicas de control. Estas técnicas ya se han empleado con éxito en el modelado y el control de sistemas complejos.
Esta conferencia presentará la aplicación de redes neuronales y aprendizaje por refuerzo al control de aerogeneradores, centrándolo especialmente en el control de pitch. Se detallarán diferentes configuraciones con redes neuronales y otras técnicas aplicadas al control de pitch. Finalmente se propondrán algunas técnicas híbridas que combinen lógica difusa, tablas de búsqueda y redes neuronales, mostrando resultados que han permitido probar su utilidad para mejorar la eficiencia de las turbinas eólicas.
As the world's energy demand rises, so does the amount of renewable energy, particularly wind energy, in the supply. The life cycle of wind farms starting from manufacturing the components to decommission stage involve significant involvement of cost and the application of AI and data analytics are on reducing these costs are limited. With this conference talk, the audience expected to know some of the interesting applications of AI and data analytics on offshore wind. And, also highlight the future challenges and opportunities. This conference could be useful for students, academics and researcher who want to make next career in offshore wind but yet know where to start.
This document discusses the evolution of edge AI systems and architectures for the Internet of Things (IoT) era. It describes how IoT has transitioned from simple wireless sensor networks to complex systems that converge digitized enterprise data with edge AI sensors and deep learning analytics. Edge AI moves intelligence closer to IoT devices by enabling real-time data processing and filtering at the network edge. This reduces data transmission costs and latency. The document outlines several examples of edge AI applications in healthcare, smart homes, and industry that analyze sensor data in real-time to provide personalized and energy efficient services. It also discusses how new edge AI hardware platforms and open-source systems are enabling more customized and affordable IoT solutions.
Embedded real-time software construction has usually posed interesting challenges due to the complexity of the tasks these systems have to execute. Most methods for developing these systems are either hard to scale up for large systems, or require a difficult testing effort with no guarantee for bug-free software products. Construction of system models and their analysis through simulation reduces both end costs and risks, while enhancing system capabilities and improving the quality of the final products. This is a useful approach, moreover considering that testing under actual operating conditions may be impractical and in some cases impossible. In this talk, we will present a Modeling and Simulation-based framework to develop embedded systems based on the DEVS (Discrete Event systems Specification) formalism. This approach combines the advantages of a simulation-based approach with the rigor of a formal methodology. We will discuss how to use this framework to incrementally develop embedded applications, and to integrate simulation models with hardware components seamlessly.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
DRAC: Designing RISC-V-based Accelerators for next generation Computers
1. April 20 2023
DRAC: Designing RISC-V-
based Accelerators for Next
Generation Computers
Miquel Moretó
UPC and BSC
Universidad Complutense de Madrid (UCM), Madrid (Spain)
2. Barcelona Supercomputing Center
Centro Nacional de Supercomputación
BSC-CNS objectives
Supercomputing services
to Spanish and EU researchers
R&D in Computer, Life, Earth and
Engineering Sciences
PhD programme, technology
transfer, public engagement
Spanish Government 60%
Catalan Government 30%
Univ. Politècnica de Catalunya (UPC) 10%
BSC-CNS is
a consortium
that includes
3. MareNostrum 1
2004 – 42,3 Tflops
1st
Europe / 4th
World
New technologies
MareNostrum 2
2006 – 94,2 Tflops
1st
Europe / 5th
World
New technologies
MareNostrum 3
2012 – 1,1 Pflops
12th
Europe / 36th
World
MareNostrum 4
2017 – 11,1 Pflops
2nd
Europe / 13th
World
New technologies
Access: prace-ri.eu/hpc_acces Access: bsc.es/res-intranet
General Purpose Cluster: 11.15 Pflops
MN4 CTE-Power: 1.57 Pflops
MN4 CTE-ARM: 0.65 Pflops
MN4 CTE-AMD: 0.52 Pflops
MareNostrum 4
Total peak performance: 13,9 Pflops
4. GPP - General Purpose
Intel Sapphire Rapids
Peak performance: 45,4 Pflops
Sustained HPL: 35,4 Pflops
April 2023
ACC – Accelerated
Intel Sapphire Rapids
NVIDIA Hopper
Peak performance: 260 Pflops
Sustained HPL: 163 Pflops
June 2023
NGT GPP - Next Generation
NVIDIA Grace
Peak performance: 2,82 Pflops
Sustained HPL: 2 Pflops
June 2023
NGT ACC - Next Generation
Intel Emerald Rapids
Intel Rialto Bridge
Peak performance: 6 Pflops
Sustained HPL: 4,24 Pflops
December 2023
InfiniBand NDR 200
Fat Tree
Spectrum Scale File System
248 PB HDD
2,81 PB NVMe
402 PB tape
January 2023
MareNostrum5
8. Computer
Sciences
Earth
Sciences
CASE
Life
Sciences
To influence the way machines are built, programmed
and used: programming models, performance tools,
Big Data, Artificial Intelligence , computer architecture,
energy efficiency
To develop and implement global and
regional state-of-the-art models for short-
term air quality forecast and long-term
climate applications
To understand living organisms by means of
theoretical and computational methods
(molecular modeling, genomics, proteomics)
To develop scientific and engineering software to
efficiently exploit super-computing capabilities
(biomedical, geophysics, atmospheric, energy, social
and economic simulations)
Mission of BSC Scientific Departments
10. Today´s Technology Trends
Massive penetration of Open
Source Software
• IoT (Arduino),
• Mobile (Android),
• Enterprise (Linux),
• HPC (Linux,
OpenMP, etc.)
New Open Source Hardware
Momentum from IoT and the
Edge to HPC
• RISC-V
• OpenPOWER
Moore´s Law + Power =
Specialization (HW/SW Co-
Design)
• More cost effective
• More performant
• Less Power
11. HPC Today
• Europe has led the way in defining a
common open HPC software ecosystem
• Linux is the de facto standard OS despite
proprietary alternatives
• Software landscape from Cloud to IoT
already enjoys the benefit of open source
• Open source provides:
• A common platform, specification and
interface
• Accelerates building new functionality by
leveraging existing components
• Lowers the entry barrier for others to
contribute new components
• Crowd-sources solutions for small and larger
problems
• What about Hardware and in particular, the
CPU? CPUs/GPUs/ASICs
HW Systems
OS
Compiler/Toolchain
Schedulers
Libraries/Platforms
Applications
OPEN
CLOSED
Linux
LLVM
COMPS
OpenMP
GROMACS,NAMD,
WRF, VASP, etc.
OCP
12. HPC Tomorrow
• Europe can lead the way to a
completely open SW/HW stack for the
world
• RISC-V provides the open source
hardware alternative to dominating
proprietary non-EU solutions
• Europe can achieve complete
technology independence with these
foundational building blocks
• Currently at the same early stage in HW
as we were with SW when Linux was
adopted many years ago
• RISC-V can unify, focus, and build a
new microelectronics industry in
Europe. CPUs/GPUs/ASICs
HW Systems
OS
Compiler/Toolchain
Schedulers
Libraries/Platforms
Applications
OPEN
Linux
LLVM
COMPS
OpenMP
GROMACS,NAMD,
WRF, VASP, etc.
OCP
RISC-V,
OpenPower,
MIPS
14. The European Processor Initiative (EPI) under the SGA1 of the
Framework Partnership Agreement (FPA: 800928), to design and
implement a roadmap for a new family of low-power European
processors for extreme scale computing, high-performance Big-Data and
a range of emerging applications.
• History: Remember MontBlanc? BSC leads the RISC-V HPC accelerator
development
• Consortium (SGA1):
28 partners from 10 European countries to Coordinate: Bull SAS (France)
• Budget: €80M (100% funded)
• Duration: 36 months (01/12/2018-31/12/2021)
• 5 Streams (4 Technical and 1 Management/Exploitation/C&D)
The European Processor Initiative
Key contact point
Jesús Labarta jesus.labarta@bsc.es
👉
15. 15
EPI MAIN OBJECTIVE
To develop European microprocessor and accelerator technology
Strengthen competitiveness of EU industry and science
Rhea
Arm-based
general purpose
CPU
EPAC
RISC-V based
Accelerators
SiPearl BSC, SemiDynamics,
EXTOLL, FORTH, ETHZ,
UniBo, UniZG, Chalmers,
CEA, E4, Menta, ZPT, …
16. 16
Pilot based
on EPAC
EPI SGA2
Rhea1 Go-To-Market
EPAC Demonstrator
OVERALL TECHNOLOGY ROADMAP
EPI SGA1
EPAC test chip 1.0
Next project(s)
EU Chips act
related
Exascale
Systems
Centers of Excellence in HPC Applications
2019 – 2021 2022 - 2024 2025 - …
2015 – 2018
H2020 projects
HPC ecosystem exploration
Pilot based
on Rhea I
Rhea
Arm-based
general
purpose CPU
EPAC
RISC-V
based
Accelerators
17. 17
EPAC: A RISC-V ACCELERATOR
17
EPAC accelerator
Network on Chip (NoC)
L2 HN …
…
C
V
STX … VRP
AXI Lite
Peripherals Peripherals Peripherals
Bridge
Bridge
Host
CPU
L2 HN L2 HN
C
V
STX
18. 18
RISC-V core and VPU
RISC-V core: Avispado
• 2-way in-order core
• Full HW-support for unaligned accesses
• Cache: L1I$ =16KB, L1D$ = 32KB
VPU
• Long vectors: 256 DP elements
– #Functional Units (FUs) << Vector Length (VL)
– 1 vector instruction can take several cycles
• 8 Lanes per core
– FMA/lane: 2 DP Flop/cycle
• 40 physical registers, some out of order
F. Minervini, O. Palomar. RISC-V Summit 2021
“Vitruvius: An Area-Efficient Decoupled Vector
Accelerator for High Performance Computing”
Architecture Vector register size (1 cell = 1 double element)
Intel AVX512 D1 D2 D3 D4 D5 D6 D7 D8
Arm Neon D1 D2
A64FX D1 D2 D3 D4 D5 D6 D7 D8
NEC Aurora SX D1 D2 D3 D4 D5 D6 D7 D8 … D256
EPAC VPU D1 D2 D3 D4 D5 D6 D7 D8 … D256
Key contact point
Adrián Cristal adrian.cristal@bsc.es
👉
Key contact point
Roger Espasa (SemiDynamics)
👉
19. 19
EPAC
EPAC programming environment
• Offload of processes from host to accelerator
• Interoperability MPI + OpenMP
– Accelerator can run MPI and OpenPM processes
• Task-based models
– Taskify MPI calls
– Single mechanism
Concurrency
Locality and data management
• Long vectors (256 elements, 8 lanes per core)
– Decouple front-end from back-end
– Convey access pattern semantics to the architecture
– Vector length agnostic (VLA) programming and architecture
Applications
Libraries (FFTW, SpMV, ...)
Scheduler (Slurm)
Compiler (LLVM)
OS (Linux)
Hardware
(Host + Accelerator)
Programming Model
(OpenMP, MPI)
Memory
Host
CPU
Bridge L2
Vector
Core
STX
20. 20
How to use the V-extensions?
• Assembler: always a valid option but not the most pleasant
• C/C++ builtins (intrinsics)
– Low-level mapping to instructions
– Allows embedding it into an existing C/C++ codebase
– Allows relatively quick experimentation
• #pragma omp simd (aka “Semi automated vectorization”)
– Relies on vectorization capabilities of the compiler
Usually works but gets complicated if the code calls functions
– Also usable in Fortran
• Autovectorization: let compiler work for you ;-)
Interested in compiling your code in RVV?
Roger Ferrer roger.ferrer@bsc.es
👉
21. 21
RISC-V platforms (SDV)
• Hardware/Software infrastructure for
Continuous Integration and RTL check
– HiFive commercial hardware (scalar)
– EPAC RTL (1core) on FPGA
• Platform to:
– Demonstrate a full HPC software stack
Linux, compiler, libraries, job scheduler, MPI
– Test latest RTL with complex codes
Advanced performance analysis tools
Accurate timing available
Commercial and FPGA-based
RISC-V
Commercial
(only scalar)
EPAC FPGA-based
implementation
(vector support)
Full SDV
Interested in testing your code on EPAC?
Filippo Mantovani filippo.mantovani@bsc.es
👉
22. 22
• Chip fabrication Q2 2021
• Global Foundries 22nm (GF22FDX)
• Final Top level chip floorplan
• Total area:
• 5943 X 4593 um2
• (27.297 mm2
)
STX
Avispado
VPU
L2 HN
Avispado
VPU
L2 HN
Avispado
VPU
L2 HN
Avispado
VPU
L2 HN
STX
VRP
serdes serdes
EPAC 1.0 Test Chip (Tapeout Q2 2021)
23. 23
EPAC 1.5 Test Chip (Tapeout Q4 2022)
24/10/2022 ACAT 2022, Suarez
STX STX
XP XP XP
HN
L2$
HN
L2$
XP
HN
L2$
XP
HN
L2$
AVS
VPU
FPGA
Bridge
VRP
XP
I/O
micro-
tile
AVS
VPU
36. Genome Alignment Acceleration (WFA)
• Pairwise alignment/mapping DNA/RNA sequences with the
novel Wavefront Alignment (WFA) algorithm
• Target: provide specific hardware support for the most time-
consuming operations of the algorithm.
• ISA extensions:
- [vmax_vv] Vectorial maximum.
- [vmax3inc_vv] Vectorial “3-way” maximum fused with
increment.
- [vcnt] Scalar “count consecutive matches”
- [vcnt_vv] Vectorial “count consecutive matches”.
• Use narrow-integers 16-bit or 8-bit integers
• Monolythic accelerator integrated with Lagarto SoC
Single aligner (due to area limitations) capable of 64 ops/cycle
Current Place and Route (PnR) results: 1.1GHz (typical corner)
Area: 1.6mm2 in Global Foundries 22nm
Performance speedups: 515x with 10K reads, 10% error
37. • Main Goal: RISC-V acceleration of different PQC schemes
• Classic McEliece (CME) KEM acceleration:
• HW/SW co-design implementation of
CME KEM @ Zynq Ultrascale [FPL’21]
• Monolythic CME accelerator based on HLS
• Integration of the CME accelerator in
Lagarto SoC via AXI interface
• Accelerate other KEM and digital signature (DS) schemes:
• NRTU KEM and Crystals-Kyber KEM / Crystals-Dilithium DS
• Both algorithms rely on very different operations and will require different
acceleration techniques.
[FPL’21] V. Kostalampros et al., HLS-Based HW/SW Co-Design of the Post-
Quantum Classic McEliece Cryptosystem. FPL 2021.
PQC Acceleration
41. First Steps Towards a Lagarto Multicore
DVINO
L1 I L1 D
L2 cache
NoC
DVINO
L1 I L1 D
L2 cache
NoC
DVINO
L1 I L1 D
L2 cache
NoC
DVINO
L1 I L1 D
L2 cache
NoC
Mem Cntr
• Multicore design based on:
• DVINO processor
• RISC-V ISA support: I, M, A,
F, D, C, V
• 4-lane VPU
• OpenPiton 2-level cache
hierarchy (priv. L1, shared L2)
• Current status
• Linux boot (openSBI)
• RTL simulation of parallel
applications (up to 64 cores)
• Multiple memory controllers
• FPGA-ready
• Integrating Ka+VPU
42. Towards a RISC-V Heterogeneous Manycore
Ka
Core
VPU
DL1
IL1
L2
General Purpose
Processor Tile
Ka
Core DL1
IL1
L2
Ka
Core DL1
IL1
Sargan
-tana
DL1
IL1
L2
Sargan
-tana
DL1
IL1
HPC Accelerator Tile
Ka
Core
SAURIA
DL1
IL1
L2
Automotive, ML
Accelerator Tile
Sargan
-tana
WFA
DL1
IL1
L2
Genomics
Accelerator Tile
Other Domain-Specific Accelerator Tiles (sparse, security, safety, etc.)
45. PERTE Chip. Microelectronics &
Semiconductors
Recovery,
Transformation and
Resilience Plan
45
M€
COMPONENT I. BOLSTERING SCIENTIFIC CAP
ACITY 1.165
ACTION 1: Development of R&D&i on cutting-edge and alternative architecture microprocessors 475
ACTION 2: Development of R&D&i on integrated photonics 150
ACTION 3: Development of R&D&i on quantum chip development 40
ACTION 4: Budget line for the IPCEI on Microelectronics and Communication Technologies 500
COMPONENT II. DESIGN S
TRA
TE
GY 1.330
ACTION 5: Creation of cutting-edge alternative architecture microprocessor fabless companies 950
ACTION 6: Creation of pilot lines 300
ACTION 7: Creation of a network for education, training and skills-building in relation to semiconductors 80
3. EXECUTION
3. EXECUTION
ACTIONS & BUDGET (I)
46. PERTE Chip. Microelectronics &
Semiconductors
Recovery,
Transformation and
Resilience Plan
46
M€
COMPONENT III. CONSTRUCTION OF F
ABRICA
TION PLANTS IN SPAIN 9.350
ACTION 8: Creating fabrication capacity at sizes below 5 nm 7.250
ACTION 9: Creating fabrication capacity at sizes above 5 nm 2.100
COMPONENT IV
. STIMULATING THE ICT MANUF
ACTURING INDUSTRY IN SPAIN 400
ACTION 10: ICT manufacturing industry incentive scheme 200
ACTION 11: Creation of a chips fund 200
GOVERNANCE 5
Special Commissioner for the Microelectronic and Semiconductors Project 5
TOT
AL PUBLIC INVESTMENT 12.250
3. EXECUTION
3. EXECUTION
ACTIONS & BUDGET (II)
47. Intel Labs Barcelona are back!
● New joint Intel – BSC Laboratory to design HPC processors based on RISC-V technology
● Funding: 400M$ in the next 10 years. Headcount: ~200 (estimated)
Other companies will also come to Spain!
48. Research & Product
Developing European Hardware/Software
Technology
Full Stack Open Source HPC Ecosystem
Intel and BSC: Continuing to collaborate into
the Zettascale era
European & Global collaboration
Build Full System based on RISC-V: MN6 and
many others
Path to Zettascale
49. If your experience and/or motivation include any of the
disciplines and skills below, check out our QR:
Hardware (Processor architecture, micro-architecture, accelerators,
memory hierarchy, memory controllers, HBM, DRAM, non-volatile
memory, RTL design, VHDL, verilog, SystemC, System verilog, Synopsys,
Cadence, Mentor Graphics, synthesis, place and route, timing closure,
packaging, PCB design, verification, validation, CI, post-silicon debug,
DFT, gate-level simulation,…)
Software (programming models, MPI, compilers (LLVM), SYCL, OneAPI,
Tensorflow, PyTorch, Apache Spark, CI/CD, operating systems, managed
runtimes, OpenMP, task-based programming models, containers,
security, fault-tolerance, virtualization, C/C++, Tcl, Python, Perl/Csh,… )
Research Engineer/Researcher
R&D for Zettascale and beyond: Applications to Accelerators
Reference: 176_23_CS_Z_R0-4-RE1-4
BSC is building a New Lab!
100+ Job Opportunities