The document discusses Linux network drivers and provides information about:
- The Linux network subsystem and protocol stack, typically using TCP/IP.
- Network interface card (NIC) drivers which provide a uniform interface for the network layer to access physical network cards.
- Key data structures like struct sk_buff and struct net_device that network drivers interact with for packet handling and device operations.
- Functions for network device registration, open/close, interrupt handling, and flow control.
- Examples of simple network drivers and how to write one for a Realtek NIC.
TRex Realistic Traffic Generator - Stateless support Hanoch Haim
New Stateless support in TRex provides:
- High performance packet generation of up to 22 million packets per second per core and support for interfaces from 1Gbps to 100Gbps.
- Flexible traffic profiles that can generate multiple streams of traffic with programmable fields using a field engine.
- Statistics on a per port, per stream, and per traffic profile basis including latency and jitter.
- Python API and interactive console for automation and control.
DPDK greatly improves packet processing performance and throughput by allowing applications to directly access hardware and bypass kernel involvement. It can improve performance by up to 10 times, allowing over 80 Mbps throughput on a single CPU or double that with two CPUs. This enables telecom and networking equipment manufacturers to develop products faster and with lower costs. DPDK achieves these gains through techniques like dedicated core affinity, userspace drivers, polling instead of interrupts, and lockless synchronization.
PCIe Gen 3.0 Presentation @ 4th FPGA CampFPGA Central
PCIe Gen3 presentation by PLDA at 4th FPGA Camp in Santa Clara, CA. For more details visit http://www.fpgacentral.com/fpgacamp or http://www.fpgacentral.com
PCI Express (Peripheral Component Interconnect Express) abbreviated as PCIe or PCI-E, is designed to replace the older PCI, PCI-X, AGP standards. We present a data communication developed system for use the transfer data between the host and the peripheral devices via PCIe. The performance and the available area on the board are effective by using the PCIe. PCIe is a serial expansion bus interconnection method which is use for high speed communication. PCI Express represents the currently fastest and most expensive solution to connect the peripheral devices with general purpose CPU. It provides a highest bandwidth connection in the PC platform. In this paper, we highlight the different types of bus architecture. Here the PCIe architecture is described how data transfer between the CPU to the destination.
The document discusses Linux network drivers and provides information about:
- The Linux network subsystem and protocol stack, typically using TCP/IP.
- Network interface card (NIC) drivers which provide a uniform interface for the network layer to access physical network cards.
- Key data structures like struct sk_buff and struct net_device that network drivers interact with for packet handling and device operations.
- Functions for network device registration, open/close, interrupt handling, and flow control.
- Examples of simple network drivers and how to write one for a Realtek NIC.
TRex Realistic Traffic Generator - Stateless support Hanoch Haim
New Stateless support in TRex provides:
- High performance packet generation of up to 22 million packets per second per core and support for interfaces from 1Gbps to 100Gbps.
- Flexible traffic profiles that can generate multiple streams of traffic with programmable fields using a field engine.
- Statistics on a per port, per stream, and per traffic profile basis including latency and jitter.
- Python API and interactive console for automation and control.
DPDK greatly improves packet processing performance and throughput by allowing applications to directly access hardware and bypass kernel involvement. It can improve performance by up to 10 times, allowing over 80 Mbps throughput on a single CPU or double that with two CPUs. This enables telecom and networking equipment manufacturers to develop products faster and with lower costs. DPDK achieves these gains through techniques like dedicated core affinity, userspace drivers, polling instead of interrupts, and lockless synchronization.
PCIe Gen 3.0 Presentation @ 4th FPGA CampFPGA Central
PCIe Gen3 presentation by PLDA at 4th FPGA Camp in Santa Clara, CA. For more details visit http://www.fpgacentral.com/fpgacamp or http://www.fpgacentral.com
PCI Express (Peripheral Component Interconnect Express) abbreviated as PCIe or PCI-E, is designed to replace the older PCI, PCI-X, AGP standards. We present a data communication developed system for use the transfer data between the host and the peripheral devices via PCIe. The performance and the available area on the board are effective by using the PCIe. PCIe is a serial expansion bus interconnection method which is use for high speed communication. PCI Express represents the currently fastest and most expensive solution to connect the peripheral devices with general purpose CPU. It provides a highest bandwidth connection in the PC platform. In this paper, we highlight the different types of bus architecture. Here the PCIe architecture is described how data transfer between the CPU to the destination.
This document provides an overview of the verification strategy for PCI-Express. It discusses the PCI-Express protocol, including the physical, data link, transaction, and software layers. It outlines the verification paradigm, including functional verification using constrained random testing, assertions, asynchronous/power domain simulations, and performance verification. It also discusses compliance verification through electrical, data link, transaction, and system architecture checklists. Finally, it discusses design for verification through a modular and scalable architecture to promote reusability and reduce verification effort and complexity.
This document summarizes key aspects of GPU hardware and the SIMT (Single Instruction Multiple Thread) architecture used in NVIDIA GPUs. It describes the evolution of NVIDIA GPU hardware, the differences between latency-oriented CPUs and throughput-oriented GPUs, how SIMT combines SIMD and threading, warp scheduling, divergence and convergence, predicated and conditional execution.
The document provides step-by-step instructions for building and running Intel DPDK sample applications on a test environment with 3 virtual machines connected by 10G NICs. It describes compiling and running the helloworld, L2 forwarding, and L3 forwarding applications, as well as using the pktgen tool for packet generation between VMs to test forwarding performance. Key steps include preparing the Linux kernel for DPDK, compiling applications, configuring ports and MAC addresses, and observing packet drops to identify performance bottlenecks.
Building Open Data Lakes on AWS with Debezium and Apache HudiGary Stafford
Build a simple open data lake on AWS using a combination of open-source software (OSS), including Red Hat’s Debezium, Apache Kafka, and Kafka Connect for change data capture (CDC), and Apache Hive, Apache Spark, Apache Hudi, and Hudi’s DeltaStreamer for managing our data lake. We will use fully-managed AWS services to host the open data lake components, including Amazon RDS, Amazon MKS, Amazon EKS, and EMR.
Link to the blog post and video: https://garystafford.medium.com/building-open-data-lakes-with-debezium-and-apache-hudi-c3370d3f86fb
DPDK is a set of drivers and libraries that allow applications to bypass the Linux kernel and access network interface cards directly for very high performance packet processing. It is commonly used for software routers, switches, and other network applications. DPDK can achieve over 11 times higher packet forwarding rates than applications using the Linux kernel network stack alone. While it provides best-in-class performance, DPDK also has disadvantages like reduced security and isolation from standard Linux services.
Creating Your Own PCI Express System Using FPGAs: Embedded World 2010Altera Corporation
This document discusses creating PCI Express systems using FPGA devices. It provides an overview of PCI Express, describing its key functional elements like the root complex and endpoints. It also outlines PCI Express support in Altera FPGAs, including both hard IP blocks and soft IP cores that enable PCI Express connectivity. The hard IP blocks perform the various PCI Express layers and reduce resource usage compared to soft cores.
The document provides an overview of the PCI Express system architecture. It discusses the architectural perspective of PCI Express including how it maintains backwards compatibility with PCI/PCI-X while improving performance through serial point-to-point connectivity and packet-based transactions. It also covers the PCI Express transaction model and types, including memory, I/O, configuration and message transactions, as well as posted and non-posted transaction types.
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
The document describes Cisco's OpenSOC, an open source security operations center that can analyze 1.2 million network packets per second in real time. It discusses the business need for such a solution given how breaches often go undetected for months. The solution architecture utilizes big data technologies like Hadoop, Kafka and Storm to enable real-time processing of streaming data at large scale. It also provides lessons learned around optimizing the performance of components like Kafka, HBase and Storm topologies.
This document discusses adding support for PCI Express and new chipset emulation to Qemu. It introduces a new Q35 chipset emulator with support for 64-bit BAR, PCIe MMCONFIG, multiple PCI buses and slots. Future work includes improving PCIe hotplug, passthrough and power management as well as switching the BIOS to SeaBIOS and improving ACPI table support. The goal is to modernize Qemu's emulation of PCI features to match capabilities of newer hardware.
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD
This document discusses AMD's chiplet architecture for high-performance server and desktop processors. Key points include:
- AMD partitions the system-on-a-chip design, using 7nm technology for CPU cores while leaving I/O interfaces in older process nodes. This improves performance and lowers costs.
- CPU dies ("chiplets") are connected using high-speed SerDes links both on-package and between dies. This allows for more chiplets and cores than traditional monolithic designs.
- Innovations in packaging, power distribution, and operating system scheduling were required to enable the multi-chiplet design and improve performance.
SAS vs SATA_ The Key Differences That You Should Know.pptxcalltutors
In this Presentation, we have discussed SAS vs SATA. If you are interested in knowing the differences between SAS vs SATA, then it is very helpful to you.
The Architecture of 11th Generation Intel® Processor GraphicsIntel® Software
Scheduled for release this year, this next generation brings significant improvements over the widely used 9th generation of Intel® Processor Graphics. The talk begins with an overview of Intel® Graphics architecture, its building blocks, and their performance implications. Next, take an in-depth look at the new and innovative features of this latest generation of integrated graphics.
This document discusses SR-IOV (Single Root I/O Virtualization) in ACRN. It begins with an introduction to SR-IOV, describing how it allows PCIe devices to be isolated and have near bare-metal performance through the use of Physical Functions (PFs) and Virtual Functions (VFs). It then outlines the SR-IOV architecture in ACRN, including how it detects and initializes SR-IOV devices, assigns VFs to VMs, and manages the lifecycle of VFs. Finally, it provides an agenda for an SR-IOV demo using an Intel 82576 NIC and concludes with a Q&A section.
Cassandra was chosen over other NoSQL options like MongoDB for its scalability and ability to handle a projected 10x growth in data and shift to real-time updates. A proof-of-concept showed Cassandra and ActiveSpaces performing similarly for initial loads, writes and reads. Cassandra was selected due to its open source nature. The data model transitioned from lists to maps to a compound key with JSON to optimize for queries. Ongoing work includes upgrading Cassandra, integrating Spark, and improving JSON schema management and asynchronous operations.
This document discusses hardware offloading of VXLAN encapsulation and decapsulation in OVS-DPDK. It proposes representing virtual ports (vPorts) as tables to enable hardware offloading of VXLAN processing. Matching and actions on the vPort table would occur in hardware before decapsulation. Fallback processing using software would be used if full hardware offloading is not possible. The goal is to leverage intelligent NIC capabilities to accelerate VXLAN tunnel processing and improve performance for cloud, NFV, and storage workloads.
The document discusses UART (Universal Asynchronous Receiver/Transmitter) communication. It describes how UARTs allow for asynchronous serial communication between devices using only 2 wires by converting parallel data to serial and vice versa. The UART communication process involves a transmitting UART adding start, stop and optionally parity bits to data before transmitting it serially bit-by-bit to a receiving UART which reconstructs the parallel data. It also discusses the TTL and RS-232 physical layer standards for UART.
ExpEther is a technology that can extend the PCI Express bus beyond the confines of a computer chassis via Ethernet without any modification of existing hardware and software. So, Computing resources like GPUs, NVMe, SSDs video boards can be added to a standard ethernet fabric as if adding such resources directly into the chassis to provide scale up flexibility Therefore, ExpEther can build a new type of computing environment without localized physical constraints and is cost effective with the use of standard Ethernet equipment.
This document discusses techniques for offloading I/O transactions from the CPU to improve performance. It introduces iDMA which allows direct communication between I/O devices and system memory without CPU involvement. It also describes the "Hot Potato" approach which treats payload data as a "hot potato" passed directly between devices without CPU processing. Finally, it proposes "Device2Device" (D2D) communication which allows direct transfer of data between I/O devices like sending video data directly from a SSD to a NIC without using system memory or the CPU. Measurements show these approaches can significantly reduce latency and improve throughput and power efficiency compared to traditional CPU-managed I/O.
This document provides an overview of the verification strategy for PCI-Express. It discusses the PCI-Express protocol, including the physical, data link, transaction, and software layers. It outlines the verification paradigm, including functional verification using constrained random testing, assertions, asynchronous/power domain simulations, and performance verification. It also discusses compliance verification through electrical, data link, transaction, and system architecture checklists. Finally, it discusses design for verification through a modular and scalable architecture to promote reusability and reduce verification effort and complexity.
This document summarizes key aspects of GPU hardware and the SIMT (Single Instruction Multiple Thread) architecture used in NVIDIA GPUs. It describes the evolution of NVIDIA GPU hardware, the differences between latency-oriented CPUs and throughput-oriented GPUs, how SIMT combines SIMD and threading, warp scheduling, divergence and convergence, predicated and conditional execution.
The document provides step-by-step instructions for building and running Intel DPDK sample applications on a test environment with 3 virtual machines connected by 10G NICs. It describes compiling and running the helloworld, L2 forwarding, and L3 forwarding applications, as well as using the pktgen tool for packet generation between VMs to test forwarding performance. Key steps include preparing the Linux kernel for DPDK, compiling applications, configuring ports and MAC addresses, and observing packet drops to identify performance bottlenecks.
Building Open Data Lakes on AWS with Debezium and Apache HudiGary Stafford
Build a simple open data lake on AWS using a combination of open-source software (OSS), including Red Hat’s Debezium, Apache Kafka, and Kafka Connect for change data capture (CDC), and Apache Hive, Apache Spark, Apache Hudi, and Hudi’s DeltaStreamer for managing our data lake. We will use fully-managed AWS services to host the open data lake components, including Amazon RDS, Amazon MKS, Amazon EKS, and EMR.
Link to the blog post and video: https://garystafford.medium.com/building-open-data-lakes-with-debezium-and-apache-hudi-c3370d3f86fb
DPDK is a set of drivers and libraries that allow applications to bypass the Linux kernel and access network interface cards directly for very high performance packet processing. It is commonly used for software routers, switches, and other network applications. DPDK can achieve over 11 times higher packet forwarding rates than applications using the Linux kernel network stack alone. While it provides best-in-class performance, DPDK also has disadvantages like reduced security and isolation from standard Linux services.
Creating Your Own PCI Express System Using FPGAs: Embedded World 2010Altera Corporation
This document discusses creating PCI Express systems using FPGA devices. It provides an overview of PCI Express, describing its key functional elements like the root complex and endpoints. It also outlines PCI Express support in Altera FPGAs, including both hard IP blocks and soft IP cores that enable PCI Express connectivity. The hard IP blocks perform the various PCI Express layers and reduce resource usage compared to soft cores.
The document provides an overview of the PCI Express system architecture. It discusses the architectural perspective of PCI Express including how it maintains backwards compatibility with PCI/PCI-X while improving performance through serial point-to-point connectivity and packet-based transactions. It also covers the PCI Express transaction model and types, including memory, I/O, configuration and message transactions, as well as posted and non-posted transaction types.
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
The document describes Cisco's OpenSOC, an open source security operations center that can analyze 1.2 million network packets per second in real time. It discusses the business need for such a solution given how breaches often go undetected for months. The solution architecture utilizes big data technologies like Hadoop, Kafka and Storm to enable real-time processing of streaming data at large scale. It also provides lessons learned around optimizing the performance of components like Kafka, HBase and Storm topologies.
This document discusses adding support for PCI Express and new chipset emulation to Qemu. It introduces a new Q35 chipset emulator with support for 64-bit BAR, PCIe MMCONFIG, multiple PCI buses and slots. Future work includes improving PCIe hotplug, passthrough and power management as well as switching the BIOS to SeaBIOS and improving ACPI table support. The goal is to modernize Qemu's emulation of PCI features to match capabilities of newer hardware.
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD
This document discusses AMD's chiplet architecture for high-performance server and desktop processors. Key points include:
- AMD partitions the system-on-a-chip design, using 7nm technology for CPU cores while leaving I/O interfaces in older process nodes. This improves performance and lowers costs.
- CPU dies ("chiplets") are connected using high-speed SerDes links both on-package and between dies. This allows for more chiplets and cores than traditional monolithic designs.
- Innovations in packaging, power distribution, and operating system scheduling were required to enable the multi-chiplet design and improve performance.
SAS vs SATA_ The Key Differences That You Should Know.pptxcalltutors
In this Presentation, we have discussed SAS vs SATA. If you are interested in knowing the differences between SAS vs SATA, then it is very helpful to you.
The Architecture of 11th Generation Intel® Processor GraphicsIntel® Software
Scheduled for release this year, this next generation brings significant improvements over the widely used 9th generation of Intel® Processor Graphics. The talk begins with an overview of Intel® Graphics architecture, its building blocks, and their performance implications. Next, take an in-depth look at the new and innovative features of this latest generation of integrated graphics.
This document discusses SR-IOV (Single Root I/O Virtualization) in ACRN. It begins with an introduction to SR-IOV, describing how it allows PCIe devices to be isolated and have near bare-metal performance through the use of Physical Functions (PFs) and Virtual Functions (VFs). It then outlines the SR-IOV architecture in ACRN, including how it detects and initializes SR-IOV devices, assigns VFs to VMs, and manages the lifecycle of VFs. Finally, it provides an agenda for an SR-IOV demo using an Intel 82576 NIC and concludes with a Q&A section.
Cassandra was chosen over other NoSQL options like MongoDB for its scalability and ability to handle a projected 10x growth in data and shift to real-time updates. A proof-of-concept showed Cassandra and ActiveSpaces performing similarly for initial loads, writes and reads. Cassandra was selected due to its open source nature. The data model transitioned from lists to maps to a compound key with JSON to optimize for queries. Ongoing work includes upgrading Cassandra, integrating Spark, and improving JSON schema management and asynchronous operations.
This document discusses hardware offloading of VXLAN encapsulation and decapsulation in OVS-DPDK. It proposes representing virtual ports (vPorts) as tables to enable hardware offloading of VXLAN processing. Matching and actions on the vPort table would occur in hardware before decapsulation. Fallback processing using software would be used if full hardware offloading is not possible. The goal is to leverage intelligent NIC capabilities to accelerate VXLAN tunnel processing and improve performance for cloud, NFV, and storage workloads.
The document discusses UART (Universal Asynchronous Receiver/Transmitter) communication. It describes how UARTs allow for asynchronous serial communication between devices using only 2 wires by converting parallel data to serial and vice versa. The UART communication process involves a transmitting UART adding start, stop and optionally parity bits to data before transmitting it serially bit-by-bit to a receiving UART which reconstructs the parallel data. It also discusses the TTL and RS-232 physical layer standards for UART.
ExpEther is a technology that can extend the PCI Express bus beyond the confines of a computer chassis via Ethernet without any modification of existing hardware and software. So, Computing resources like GPUs, NVMe, SSDs video boards can be added to a standard ethernet fabric as if adding such resources directly into the chassis to provide scale up flexibility Therefore, ExpEther can build a new type of computing environment without localized physical constraints and is cost effective with the use of standard Ethernet equipment.
This document discusses techniques for offloading I/O transactions from the CPU to improve performance. It introduces iDMA which allows direct communication between I/O devices and system memory without CPU involvement. It also describes the "Hot Potato" approach which treats payload data as a "hot potato" passed directly between devices without CPU processing. Finally, it proposes "Device2Device" (D2D) communication which allows direct transfer of data between I/O devices like sending video data directly from a SSD to a NIC without using system memory or the CPU. Measurements show these approaches can significantly reduce latency and improve throughput and power efficiency compared to traditional CPU-managed I/O.
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
This document summarizes a presentation given at the 2005 IEEE Hot Chips conference about parallelism in modern processors and how it relates to programming models. It discusses different types of parallelism available at the processor, system, and application levels. It then examines approaches to parallelism used by general-purpose CPUs, special-purpose CPUs like the Cell processor, and GPUs. While parallelism is increasing in these devices, programming them effectively remains challenging due to the difficulty of parallel programming and lack of appropriate language and tooling support. The document calls for more research in parallel programming models and languages to make better use of emerging multi-core architectures.
The document discusses IBM's POWER8 technology, which features up to 12 cores per socket, 8 threads per core, larger caches, improved memory bandwidth and latency, integrated I/O subsystem and PCIe controller, and fine-grained power management. It provides details on IBM Power Systems such as the S814 and S824 servers that use POWER8, including their specifications, performance improvements over previous generations, and storage options.
The document summarizes the Cell processor architecture, which was developed as a collaboration between IBM, Sony, and Toshiba to address limitations in processor performance. The Cell consists of 9 cores - 1 PowerPC core called the PPE and 8 synergistic processor elements (SPEs) optimized for SIMD operations. It has a peak performance of over 200 GFLOPS and was used in the PlayStation 3 game console to enable graphics-intensive applications. The document outlines the Cell architecture and how it aims to overcome performance walls related to power, memory, and frequency limitations.
In-Memory and TimeSeries Technology to Accelerate NoSQL Analyticssandor szabo
The ability of Informix to combine the in
-
memor
y
performance of Informix Warehouse Accelerator
and the flexibility of TimeSeries and NoSQL
analytics positions it to be ready for the IoT era.
The Oracle SPARC T4-1 system is a 1-socket, 2RU enterprise server featuring Oracle's SPARC T4 processor with 8 cores and 64 threads. It has up to 256GB of DDR3 RAM, 6 PCIe slots, SAS storage, and dual 10GbE ports. The T4-1 is the successor to the SPARC T3-1 and uses the same chassis and service processor, targeting database, middleware, virtualization, and security workloads.
This document discusses performance optimization for data centers on multi-core platforms and provides a case study analysis. It introduces Intel software tuning tools, describes a methodology for data center performance tuning involving system, application, and microarchitecture levels, and analyzes a case study where thread synchronization overhead was identified and reduced through the use of NPTL in Linux, improving CPU utilization and throughput.
Galil Ethernet or EtherCAT Motion Control Webinar January 26, 2016Electromate
This document discusses choosing between Ethernet and EtherCAT networks for motion control applications. It provides an overview of centralized vs distributed control systems, describes the key differences between Ethernet and EtherCAT at the hardware level, and gives examples of Galil motion controllers and I/O modules that support both network types. A decision tree is presented outlining considerations for determining whether EtherCAT or Ethernet is best for a given application.
The Nucleus RM Capture is a customizable 3U rackmount server platform designed for high-speed network traffic recording and analysis. It features 20 front-access removable hard drives, dual Intel Xeon processors, up to 512GB RAM, and multiple PCIe slots. The system is engineered to capture network traffic at 10Gbps or higher line rates while providing powerful processors and flexibility for future interface upgrades and application acceleration. It is fully customizable and supported by NextComputing.
CETH for XDP [Linux Meetup Santa Clara | July 2016] IO Visor Project
This document discusses CETH (Common Ethernet Driver Framework), which aims to improve kernel networking performance for virtualization. CETH simplifies NIC drivers by consolidating common functions. It supports various NICs and accelerators. CETH features efficient memory and buffer management, flexible TX/RX scheduling, and a customizable metadata structure. It is being simplified to work with XDP for even higher performance network I/O processing in the kernel. Next steps include further optimizations and measuring performance gains when using CETH with XDP and virtualized environments.
Socionext is developing low power ARM server solutions including the SC2A11 multicore processor and SC2A20 SoC switch. They aim to build scalable small core systems with optimized performance and power efficiency compared to traditional servers. Socionext has integrated their solutions into a prototype low power scalable server and is developing the necessary software including UEFI, Linux, and applications to support various server workloads.
Heterogeneous Computing : The Future of SystemsAnand Haridass
Charts from NITK-IBM Computer Systems Research Group (NCSRG)
- Dennard Scaling,Moore's Law, OpenPOWER, Storage Class Memory, FPGA, GPU, CAPI, OpenCAPI, nVidia nvlink, Google Microsoft Heterogeneous system usage
The MYC-CZU3EG CPU Module is a powerful MPSoC System-on-Module (SoM) based on Xilinx Zynq UltraScale+ ZU3EG which features a 1.2 GHz quad-core ARM Cortex-A53 64-bit application processor
The document provides information on the HPE ProLiant DL20 Gen10 Server, including:
- It is a 1U rack server powered by Intel Xeon E, Pentium, and Core i3 processors, offering flexibility and value.
- Standard features include Intel C242 chipset, up to 64GB memory, 1Gb Ethernet ports, and various storage options.
- It comes in various pre-configured models for entry, performance, and solution workloads.
EtherCAT as a Master Machine Control ToolDesign World
There is an increasing demand in the automation and motion control industries for a localized motion control solution that can coordinate motion between multiple remote components.
Previously, field bus protocols such as Modbus or Ethernet have been implemented to address this demand. Although successful in moving data across automation networks, these protocols lacked the real time performance necessary for a distributed motion control system.
The EtherCAT communication protocol provides a high speed, low overhead communication scheme that allows efficient, deterministic communication between motion controller and remote components. Based on Ethernet and streamlined specifically for point to point transmission of real time data, the EtherCAT standard is quickly becoming the preferred choice for centralized control of tightly coupled motion between remote components.
This presentation is aimed at designers of automation and motion control systems with a basic understanding of Ethernet communication.
The document describes a 5-day residency program hosted by the OpenPOWER Academic Discussion Group (ADG) at NIE Mysore from June 6-10, 2022. The program aims to bridge industry and academia knowledge in chip design by developing curriculum on OpenPOWER technology and training lab assistants. Engineers and academicians with 5+ years experience in chip design/verification are eligible to participate. They will collaborate on developing course materials and lab exercises to teach undergraduate students in fields like ECE and CSE. The program seeks to help fulfill India's goals in chip design manpower and self-reliance through initiatives like Make in India and the India Semiconductor Mission.
This document provides an overview of digital design and Verilog. It discusses binary numbers and boolean algebra as the foundation of digital systems. It also describes logic gates, combinational and sequential circuits, finite state machines, and datapath and control units. Finally, it introduces Verilog, describing different modeling types like gate level, behavioral, dataflow, and switch level modeling. It positions Verilog as a hardware description language used to more easily design digital circuits compared to manual drawing.
The Libre-SOC Project aims to create an entirely Libre-Licensed, transparently-developed fully auditable Hybrid 3D CPU-GPU-VPU, using the Supercomputer-class OpenPOWER ISA as the foundation.
Our first test ASIC is a 180nm "Fixed-Point" Power ISA v3.0B processor, 5.1mm x 5.9mm, as a proof-of-concept for the team, whose primary expertise is in Software Engineering. Software Engineering training brings a radically different approach to Hardware development: extensive unit tests, source code revision control, automated development tools are normal. Libre Project Management brings even more: bug trackers, mailing lists, auditable IRC logs and a wiki are standard fare for Libre Projects that are simply not normal Industry-Standard practice.
This talk therefore goes through the workflow, from the original HDL through to the GDS-II layout, showing how we were able to keep track of the development that led to the IMEC 180nm tape-out in July 2021. In particular, by following a parallel development process involving "Real" and "Symbolic" Cell Libraries, developed by Chips4Makers, will be shown how our developers did not need to sign a Foundry NDA, but were still able to work side-by-side with a University that did. With this parallel development process, the University upheld their NDA obligations, and Libre-SOC were simultaneously able to honour its Transparency Objectives.
Workload Transformation and Innovations in POWER Architecture Ganesan Narayanasamy
IT Industry is going through two major transformations. One is adaption of AI and tight integration of the same in the commercial applications and enterprise workflow. Two the transformation in software architecture through the concepts like microservices and the cloud native architecture. These transformation alongside the aggressive adaption of IoT/mobile and 5G in all our day today activities is making the world operate in more real time manner which opens-up a new challenge to improve the hardware architecture to adapt to these requirements. These above two major transformation pushes the boundary of the entire systems stack making the designer rethink hardware. This talk presents you a picture of how the enterprise Industry leading POWER architecture is transforming to fulfill the performance demands of these newer generation workloads with primary focus on the AI acceleration on the chip.
July 16th 2021 , Friday for our newest workshop with DoMS, IIT Roorkee, Concept to Solutions using OpenPOWER Stack. It's time to discover advances in #DeepLearning tools and techniques from the world's leading innovators across industries, research, and public speakers.
Register here:
https://lnkd.in/ggxMq2N
This presentation covers two uses cases using OpenPOWER Systems
1. Diabetic Retinopathy using AI on NVIDIA Jetson Nano: The objective is to classify the diabetic level solely on retina image in a remote area with minimum doctor's inference. The model uses VGG16 network architecture and gets trained from scratch on POWER9. The model was deployed on the Jetson Nano board.
1. Classifying Covid positivity using lung X-ray images: The idea is to build ML models to detect positive cases using X-ray images. The model was trained on POWER9, and the application was developed using Python.
IBM Bayesian Optimization Accelerator (BOA) is a do-it-yourself toolkit to apply state-of-the-art Bayesian inferencing techniques and obtain optimal solutions for complex, real-world design simulations without requiring deep machine learning skills. This talk will describe IBM BOA, its differentiation and ease of use, and how researchers can take advantage of it for optimizing any arbitrary HPC simulation.
This presentation covers various partners and collaborators who are currently working with OpenPOWER foundation ,Use cases of OpenPOWER systems in multiple Industries , OpenPOWER Workgroups and OpenCAPI features .
The IBM POWER10 processor represents the 10th generation of the POWER family of enterprise computing engines. Its performance is a result of both powerful processing cores and high-bandwidth intra- and inter-chip interconnect. POWER10 systems can be configured with up to 16 processor chips and 1920 simultaneous threads of execution. Cross-system memory sharing, through the new Memory Inception technology, and 2 Petabytes of addressing space support an expansive memory system. The POWER10 processing core has been significantly enhanced over its POWER9 predecessor, including a doubling of vector units and the addition of an all-new matrix math engine. Throughput gains from POWER9 to POWER10 average 30% at the core level and three-fold at the socket level. Those gains can reach ten- or twenty-fold at the socket level for matrix-intensive computations.
Everything is changing from Health Care to the Automotive markets without forgetting Financial markets or any type of engineering everything has stopped being created as an individual or best-case scenario a team effort to something that is being developed and perfectioned by using AI and hundreds of computers.And even AI is something that we no longer can run in a single computer, no matter how powerful it is. What drives everything today is HPC or High-Performance Computing heavily linked to AI In this session we will discuss about AI, HPC computing, IBM Power architecture and how it can help develop better Healthcare, better Automobiles, better financials and better everything that we run on them
Macromolecular crystallography is an experimental technique allowing to explore 3D atomic structure of proteins, used by academics for research in biology and by pharmaceutical companies in rational drug design. While up to now development of the technique was limited by scientific instruments performance, recently computing performance becomes a key limitation. In my presentation I will present a computing challenge to handle 18 GB/s data stream coming from the new X-ray detector. I will show PSI experiences in applying conventional hardware for the task and why this attempt failed. I will then present how IC 922 server with OpenCAPI enabled FPGA boards allowed to build a sustainable and scalable solution for high speed data acquisition. Finally, I will give a perspective, how the advancement in hardware development will enable better science by users of the Swiss Light Source.
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
As the adoption of AI technologies increases and matures, the focus will shift from exploration to time to market, productivity and integration with existing workflows. Governing Enterprise data, scaling AI model development, selecting a complete, collaborative hybrid platform and tools for rapid solution deployments are key focus areas for growing data scientist teams tasked to respond to business challenges. This talk will cover the challenges and innovations for AI at scale for the Industires such as Healthcare and Automotive , the AI ladder and AI life cycle and infrastructure architecture considerations.
This talk gives an introduction about Healthcare Use cases - The AI ladder and Lifestyle AI at Scale Themes The iterative nature of the workflow and some of the important components to be aware in developing AI health care solutions were being discussed. The different types of algorithms and when machine learning might be more appropriate in deep learning or the other way will also be discussed. Use cases in terms of examples are also shared as part of this presentation .
Healthcare has became one of the most important aspects of everyones life. Its importance has surged due to the latests outbreaks and due to this latest pandemic it has become mandatory to collaborate to improve everyones Healthcare as soon as possible.
IBM has reacted quickly sharing not only its knowledge but also its Artificial Intelligence Supercomputers all around the world.
Those Supercomputers are helping to prevail this outbreak and also future ones.
They have completely different features compared to proposals from other players of this Supercomputers market.
We will try to make a quick look at the differences of those AI focused Supercomputers and how they can help in the R&D of Healthcare solutions for everyone, from those ones with access to a big IBM AI Supercomputer to those ones with access to only one small IBM AI focused server.
Healthcare has became one of the most important aspects of everyones life. Its importance has surged due to the latests outbreaks and due to this latest pandemic it has become mandatory to collaborate to improve everyones Healthcare as soon as possible.
IBM has reacted quickly sharing not only its knowledge but also its Artificial Intelligence Supercomputers all around the world.
Those Supercomputers are helping to prevail this outbreak and also future ones.
They have completely different features compared to proposals from other players of this Supercomputers market.
We will try to make a quick look at the differences of those AI focused Supercomputers and how they can help in the R&D of Healthcare solutions for everyone, from those ones with access to a big IBM AI Supercomputer to those ones with access to only one small IBM AI focused server.
Moving object recognition (MOR) corresponds to the localization and classification of moving objects in videos. Discriminating moving objects from static objects and background in videos is an essential task for many computer vision applications. MOR has widespread applications in intelligent visual surveillance, intrusion detection, anomaly detection and monitoring, industrial sites monitoring, detection-based tracking, autonomous vehicles, etc. In this session, Murari provided a poster about the deep learning algorithms to identify both locations and corresponding categories of moving objects with a convolutional network. The challenges in developing such algorithms have been discussed.
The document discusses AI in the enterprise, including use cases, infrastructure considerations, and the AI lifecycle. It provides examples of how AI can be applied in various industries and common patterns of analytics using AI. It also outlines the data science model development workflow and considerations for AI infrastructure, software, and data management throughout the AI lifecycle.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Elizabeth Buie - Older adults: Are we really designing for our future selves?
PCI Express switch over Ethernet or Distributed IO Systems for Ubiquitous Computing and IoT Solutions
1. PCI Express switch over Ethernet or Distributed IO
Systems for Ubiquitous Computing and IoT Solutions
Deepak Pathania, NEC
2. Actionable
Information
Real-time
Feedback
Challenge faced in Real-Time Data Analytics
Big Data of varying characteristics, such
as Live feeds, graphics, video, text, etc.
comes into cloud computers
This data is to be processed and analyzed
in real-time
However, instead of building servers with
such accelerators, Cloud vendors still
prefer building homogeneous servers due
to TCO and efficiency considerations
Real-Time Analytics,
Deep Learning, etc… To accelerate such processing, a large
number of accelerators such as GPUs and
FPGAs, along with high speed storage are
required
Xeon Phi GPU FPGA
3. So, What can be a Dynamic Accelerator Deployment Solution?
A technology that extends PCI Express beyond the confines of a computer
chassis via Ethernet, WITHOUT any modification of existing hardware and
software or PCIe switch over Ethernet (ExpEther or EE)
Server
CPU
Memory
PCI Express
ExpEther
NIC
L2 Switch
Standard
Ethernet
PCI Express
IO Device
ExpEther
Engine
ExpEther
Engine
IO Expansion Unit
with
PCIe Cards
4. Just another implementation of PCIe Switch
IO
Device
IO
Device
ExpEther Engine is seen as PCIe Switch from CPU
Ethernet region is invisible from the CPU
Upstream Port
(PCI Bridge)
Downstream Port
(PCI Bridge)
Downstream Port
(PCI Bridge)
Internal PCI bus
CPU
IO
Device
IO
Device
PCIe Switch
CPU
Ethernet
Switch
ExpEther Engine
(PCI Bridge)
ExpEther Engine
(PCI Bridge)
ExpEther Engine
(PCI Bridge)
Ethernet Fabric
(Invisible)
PCI Express
PCI Express
PCI Express
PCI Express
5. Broad-Scale Single Computer
PCIe
Switch
IO
Device
IO
Device
CPU CPU
IO
DeviceIO
DeviceIO
Device
IO
DeviceIO
DeviceIO
Device
In the same rack In the next rack
IO
DeviceIO
Device
In another floor
IO
DeviceIO
Device
In another building
A PCI express switch
is equivalent to
Ethernet fabric.
ExpEther
Engines
ExpEther
Engines
ExpEther
Engine
ExpEther
Engines
ExpEther
Engines
Ethernet
Switch
Ethernet
Switch
Ethernet
Switch
Ethernet
Switch
ExpEther can build new type of computing environment without physical constraints
6. ExpEther Architecture
• Achieve the “System on Network”
• Merge the PCI Express technology into Ethernet technology
• Connect logically in MAC layer
• No impact for upper or lower layer of the PCIe and Ethernet standard for future
expansion
Application
OS
PCI Driver
EFI/PCI BIOS
ExpEther Logic
MAC
PHY
40G 10G 1G
Application
OS
NDIS Driver
Ethernet Logic
MAC
PHY
10M 100M 1G 10G 40G
Ethernet
ExpEther
Software
Hardware
Upper Compatible
No modification for
future expansion of
ExpEther or Ethernet
7. Resource Disaggregated Platform or ExpEther features
Ether
Frame
CPU
PCI Express
ExpEther
Engine
PCI Express
Ethernet
Switch
ExpEther
Engine
ExpEther
Engine
ExpEther
Engine
I/O
Device
I/O
Device
I/O
Device
PCIExpress
Equivalent to direct connection
(Ethernet is invisible from CPU/IO)
1
Ethernet
Fabric
Low Latency
(L2 Ether w/o SW stack)
2
I/O Dynamic Reconfiguration
(Hot-Plug Scheme)
4
EE PCI Express TLP
No packet loss
(Adding reliability to Ethernet)
3
8. Dual Path for Throughput and Reliability
• Two Ethernet connections are established between the Host Chip and I/O Chip
• Load balancing for performance
• Path redundancy for failure recovery
Dual Port
CPU
ExpEther
Host Chip
I/O
Device
ExpEther
IO Chip
I/O
Device
ExpEther
IO Chip
Failure Recovery
Quickly detects path failures
and switches paths
Load-balancing
Round-robin data packet
transmission between the
two redundant connections
Ethernet Fabric-I
Ethernet Fabric-II
40G ExpEther NIC
9. Frame Rate Control
TCP/IP : Rate control is triggered by packet loss (TCP Reno)
Network
Bandwidth
Slow Start Avoid
Congestion
Time
Avoid
Congestion
Avoid
Congestion
Packet loss causes significant performance degradation because of retransmission.
ExpEther : Rate control is always done by measuring network latency
Probing Avoid Congestion
Network
Bandwidth
Time
Packet loss does not occur basically in ExpEther.
ExpEther engine always measures the frame arrival time of receive
side and minutely controls the frame rate to avoid packet loss.
10. SAS JBOD
Multi-path IO with Resource Disaggregation or ExpEther
• Multi-Path IO (MPIO)
• MPIO is one of the technic for achieving high-reliability. If the target IO device supports MPIO,
it can support MPIO even under ExpEther.
• Multi-Path Ethernet
• It supports the high-speed network path failover.
Host
SAS
HBA#0
SAS
HBA#1
Host
EE
NIC#0
SAS JBOD
SAS
HBA#0
SAS
HBA#1
Equivalent
Act Act
MPIO
Ether
Switch
Ether
Switch
EE EE
MPIO
High-Speed
Network Failover
11. Dynamic Reconfiguration and Hot-Plug Capability
Host
B D G I
Host
A J
Host
C E H
Host
F
Group#1 Group#2 Group#3 Group#4
Logical View
Host Host Host
1 2 4
A B C D E F G H I J
1 1 1 12 23 3 34
EE
Manager
PCIe
Switch
PCIe
Switch
PCIe
Switch
PCIe
Switch
Host
Ethernet Fabric
3
12. Dynamic Reconfiguration and Hot-Plug Capability
• Group ID (GID : 1~4,095)
• GID range from 1 to 15 is set by physical DIP switch residing on card.
• Setting GID to 0 allows Management Software to program a soft GID.
Host Host HostHost
Management
Server
EE
1
EE
2
EE
3
EE
4
EE EE EE EE EE EE EE EE EE EE EE EE EE EE EE EE
1 1 1 1 12 2 2 23 3 34 4 4 4
IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO
Group ID
Configuration
Group ID
Configuration
Collecting
Various
Information
- ExpEther Manager -
Configuration
• Group ID Configuration
Monitoring
• ExpEther network status
• PCIe device status
• New ExpEther detection
• Failure detection
Management Frame
- Mng. Frame -
Special Ether Frame
• ExpEther hard wired logic directly
receives and sends the frames for
configuration and management
13. ExpEther Technology Architectural Possibilities
▐ Std-EE : Standard PCIe-over-Ethernet
• Foundation of ExpEther
▐ MR-EE : I/O sharing
• Multi-hosts are able to share an IO device by using SR-IOV compliant device
▐ P2P-EE: I/O direct connection
• Support for the Peer-to-Peer data transfer between I/O devices.
▐ NTB-EE : Remote direct memory access by NTB
• Hi-speed data transfer between hosts
Host
Std-EE
I/O I/O
P2P-EE P2P-EE
Ethernet
Switch
Peer-to-Peer
Current Path
Host
NTB-EE
Ethernet
Switch
Host
NTB-EE
Host
NTB-EE
NTB
Ethernet
I/O
Std-EE
I/O
Std-EE
Host
Std-EE
I/O I/O
Std-EE Std-EE
Ethernet
Switch
PCIe-over-Ether
Host
Std-EE
PartitioningPartitioning
Host
Std-EE
SR-IOV
Ethernet
Switch
Host
Std-EE
Host
Std-EE
SR-IOV
MR-EE MR-EE
Resource Sharing
Ethernet
15. Performance of EE vs Local with PCIe based SSD’s
name/ssd 1 2 4
local 2728448.0 5133619.2 10321510.4
ExpEther(HBA1) 2728584.5 5004185.6 6648012.8
ExpETher(HBA2) - - 9974886.4
ExpEther(HBA1)/local (%) 100.01 97.48 64.41
ExpEther(HBA2)/local (%) - - 96.64
Theoretical Value 2700000 5400000 10800000
name/ssd 1 2 4
local 1032396.8 2044231.7 3913407.6
ExpEther(HBA1) 1035468.8 2049361.9 3870552.8
ExpETher(HBA2) - - 3901378.2
ExpEther(HBA1)/local (%) 100.30 100.25 98.90
ExpEther(HBA2)/local (%) - - 99.69
Theoretical Value 1080000 2160000 4320000
There is no impact on bandwidth in ExpEther that can fully support PCIe x8 gen3 (64Gbps)
16. Performance of EE vs Local with PCIe based SSD’s
name/ssd 1 2 4
local 455913 911963 1823617
ExpEther(HBA1) 455984 912167 1224985
ExpETher(HBA2) - - 1823856
ExpEther(HBA1)/local (%) 100.02 100.02 67.17
ExpEther(HBA2)/local (%) - - 100.01
Theoretical Value 450000.00 900000.00 1800000.00
name/ssd 1 2 4
local 65470 129356 259631
ExpEther(HBA1) 65365 128806 259631
ExpETher(HBA2) - - 259838
ExpEther(HBA1)/local (%) 99.84 99.57 100.00
ExpEther(HBA2)/local (%) - - 100.08
Theoretical Value 75000.00 150000.00 300000.00
ExpEther can achieve the similar IOPS as local by increasing the IO depth parameter to hide the latency of
Ethernet.
17. Service Acceleration Platform with RD or ExpEther
EE Client
USB/
VGA
KVM
CPU/
Chipset
CPU/
Chipset
Remote IO
GPGPU
GPGPU
GPGPU
GPGPU
GPGPU
GPGPU
GPGPUAccelerator
FPGA
NVMe
SSDNVMe
SSD
NVMe
SSD
NVMe
SSD
ExpEther
Engines
NVMe
SSDNVMe
SSD
NVMe
SSD
NVMe
SSD
ExpEther
Engines
NVMe
SSDNVMe
SSD
NVMe
SSD
NVMe
SSD
ExpEther
Engines
NVMe
SSDNVMe
SSD
NVMe
SSD
NVMe
SSD
ExpEther
Engines
ExpEther
HBA
ExpEther
HBA
ExpEther
Engine
Ethernet
Ether
Switch
ExpEther
Engine
USB
Ctrl
ExpEther
Engines
ExpEther
Engines
Sensors
Ether
Switch
Accelerator Resource Pool
IO devices can be dynamically allocated to
appropriate host according to workload
Ether
Switch
18. Case : Resource Pool System for HPC (Osaka University)
Server
Server
Server
Server
Server
Server
Server
Server
Server
Server
SAS JBOD
SAS JBOD
SAS JBOD
SAS Ctrl
GPUs
GPUs
TOR SW
Server
Server
Server
Server
Server
Server
Server
Server
Server
Server
SAS JBOD
SAS JBOD
SAS Ctrl
GPUs
GPUs
TOR SW
Server
Server
Server
Server
Server
Server
Server
Server
Server
Server
SAS JBOD
SAS JBOD
SAS Ctrl
GPUs
GPUs
TOR SW
Server
Server
Server
Server
Server
Server
Server
Server
Server
Server
PCoIP
K2 GRID
GPUs
GPUs
TOR SW
Server
Server
Server
Server
Server
Server
Server
Server
Server
Server
SAS JBOD
SAS JBOD
SAS Ctrl
GPUs
GPUs
TOR SW
Server
Server
Server
Server
Server
Server
Server
Server
Server
NIC
PCIe Flash
GPUs
GPUs
TOR SW
Server ServerServer Server
CPU
GPU
GPU
GPU
GPU
HDD
HDD
Flash
Flash
SoftwareProvisioning
Server System is configured according
to user requirement
▌64 servers and 70 IO devices for research in Osaka University
There are GPUs, Flash storages and VDI accelerators as IO device
The IO devices are dynamically connected to the servers through 10G ExpEther in accordance
with server’s workload
19. Case : Easy Extension of Measurement Equipment (PXI)
PCIe Cable
E.g. Different Room
Optical Cable (more than 1 Mile...)
Ethernet
Switch
ExpEther Manager Software
assigns ID to each ExpEther
module
Current PXI products are typically extended
by PCIe cable. So the measurement system
is fixed and the installation location is very
limited.
If ExpEther engine is implemented into PXI chassis,
the system can have a large number of PXI
modules and dynamically configure the system.
PXI Module
PXI (PCI eXtensions for Instrumentation)
is one of several modular electronic
instrumentation platforms based on PCIe.
20. Case: Ultra-Fast Failover Recovery for Database system with EE and ExpressClusterX
Main DB
(FC SAN)
DB Journal
(NVMe + EE)
Ethernet
FC
NVMe SSD is faster than
Fiber Channel.
Use NVMe SSD as Journal
for DB.
Fail
Active Server
Standby Server
When Active Server fails,
NVMe SSDs’ connection
is switched, allowing for
DB journal restore on
Standby Server.
Configuration with Legacy Failover New Configuration
OS
EE40G I/O
Expansion Unit
EE40G Board
OS
40G Switch
Primary Server Secondary Server
EEM EEM
21. Wide-Area
Network
Local
Network
Edge Computing
Device Computing
Cloud ComputingL5
L3
L1
IoT Layers
Living at the Edge for going Real-Time with ExpEther
L5 Cloud ~ Analytics
L3 Edge ~ Abstraction/Real-Time Proc.
L1 Device/Sensor ~ Smart Device
Real-Time
Feedback
Rack-Scale or Resource pooling with dynamic
reconfiguration allows low-cost, low-power and high
performance computing data centers at the cloud level.
Actionable
Information
ExpEther can connect devices directly to the edge and
servers using simple everything in hardware approach or
no complex software protocol stack for communication
which is high-speed and low power. Making devices
smarter.
ExpEther helps in bringing analytics to the edge.
In combination with low-power and high-performance
hardware like FPGA’s one can achieve an idealistic
abstraction required for Real-time processing.
Data
Collection
Analytics
Abstraction
23. Summary
• The EE or resource disaggregated system allows to have next generation
computer hardware architectures due to following features:
• Giving distance or length with dynamic switching capability.
• Same or similar performance of local vs remotely located IO’s.
• Moving within chassis devices outside with plug and play ability (independent of OS or
drivers and applications).
• Making legacy devices useful and cost-effective system realization.
• A resource disaggregated system using well time, applications, environment
tested protocols like PCIe and Ethernet or EE is simple, yet a revolutionary
step forward towards next generation computer hardware architectures or
systems with the trust from the best of both legacies.
26. Business Menu
• Product Sales Business
• Sales of the product which was developed as an option for Express
server
• FPGA IP Core License Business
• Development of an FPGA IP Core with ExpEther technology according to
customer’s requirement, and release binary image file