MemVerge CEO Charles Fan describes why memory-hungry generative AI is a driver for CXL technology, the new computing model for AI, and MemVerge software for CXL and AI.
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateMemory Fabric Forum
OCP Steering Committee member and ex-President of the CXL Consortium, Siamak Tavallaei, provides an update on the CXL specifications with a focus on the recently released 3.1 specification.
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLMemory Fabric Forum
Thibault Grossi, Sr. Technology & Market Analyst, shares excerpts from the recently published report, Memory Processor Interface, Focus on CXL. The reports provides a taxonomy of CXL market segments and revenue forecasts through 2028.
Torry Steed, Sr. Product Marketing Manager at SMART Modular, provides an overview of CXL PCIe Add-in Cards (AICs) and memory modules that can be used to expand capacity in servers or in external memory pooling systems.
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Memory Fabric Forum
- Memory intensive workloads are dominating computing and increasing memory capacity just with CPU-attached DRAM is getting expensive.
- CXL allows augmenting system memory footprint at lower cost by running over existing PCIe links to add memory outside of the CPU package.
- Intel Xeon roadmap fully supports CXL starting with 5th Gen Xeons, and Intel CPUs offer unique hardware-based tiering modes between native DRAM and CXL memory without depending on the operating system.
- CXL has full industry support as the standard for coherent input/output.
During the CXL Forum at OCP Global Summit, memory system architect Jungmin Choi of SK hynix talks about the need for memory bandwidth and capacity, and the SK hynix Niagara solution.
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateMemory Fabric Forum
OCP Steering Committee member and ex-President of the CXL Consortium, Siamak Tavallaei, provides an update on the CXL specifications with a focus on the recently released 3.1 specification.
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLMemory Fabric Forum
Thibault Grossi, Sr. Technology & Market Analyst, shares excerpts from the recently published report, Memory Processor Interface, Focus on CXL. The reports provides a taxonomy of CXL market segments and revenue forecasts through 2028.
Torry Steed, Sr. Product Marketing Manager at SMART Modular, provides an overview of CXL PCIe Add-in Cards (AICs) and memory modules that can be used to expand capacity in servers or in external memory pooling systems.
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Memory Fabric Forum
- Memory intensive workloads are dominating computing and increasing memory capacity just with CPU-attached DRAM is getting expensive.
- CXL allows augmenting system memory footprint at lower cost by running over existing PCIe links to add memory outside of the CPU package.
- Intel Xeon roadmap fully supports CXL starting with 5th Gen Xeons, and Intel CPUs offer unique hardware-based tiering modes between native DRAM and CXL memory without depending on the operating system.
- CXL has full industry support as the standard for coherent input/output.
During the CXL Forum at OCP Global Summit, memory system architect Jungmin Choi of SK hynix talks about the need for memory bandwidth and capacity, and the SK hynix Niagara solution.
During the CXL Forum at OCP Global Summit, Mahesh Wagh, CXL Consortium TTF Co-chair and Senior Fellow at AMD, presented and update of the CXL Consortium mission and road map.
All Presentations during CXL Forum at Flash Memory Summit 22Memory Fabric Forum
The document summarizes a full-day forum hosted by the CXL Consortium and MemVerge on CXL. The morning agenda includes presentations on CXL from representatives of Google, Intel, PCI-SIG, Marvell, Samsung, and Micron. The afternoon agenda includes panels on CXL usage models from Meta, OCP, Anthropic, and MemVerge. A keynote presentation provides an update on the CXL Consortium and the recently released CXL 3.0 specification, including its expanded fabric capabilities and management features. The specification is aimed at enabling new usage models for memory sharing and expansion to address industry trends toward increased data processing demands.
During the CXL Forum at OCP Global Summit, MemVerge CEO Charles Fan presented accomplishments of the CXL industry since 2019, the development of concept cars occurring today, and his predictions for the future of CXL
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and SwitchingMemory Fabric Forum
The document discusses CXL, a new open standard protocol for efficient CPU and memory connectivity. CXL allows for memory disaggregation and pooling across devices by enabling high-bandwidth, low-latency connections between CPUs, GPUs, accelerators, and memory. This helps address the growing CPU-memory bottleneck by allowing expansion of memory capacity beyond what can physically connect to the CPU. CXL also enables memory tiering by providing different performance and cost options for "near" directly attached memory versus "far" switched or fabric attached memory.
During the CXL Forum at OCP Global Summit, Enfabrica CEO Rochan Sankar described how to bridge the network and memory worlds with their accelerated compute fabric switch.
During the CXL Forum at OCP Global Summit, SMART Modular Director Product Marketing Arthur Sainio, provides an overview of the company's CXL memory cards and modules.
In the CXL Forum Theater at SC23 hosted by MemVerge, the Open Compute Project provided an overview of CXL, as well as CXL-related hardware and software projects at OCP
During the CXL Forum at OCP Global Summit, Dharmesh Jani of Meta and Siamak Tavalllei of the CXL Consortium describe the extensive work being done by the Open Compute Project related to CXL
Shared Memory Centric Computing with CXL & OMIAllan Cantle
Discusses how CXL can be better utilized as a separate Fabric Cache domain to a processors own Local Cache Domain. This is done by leveraging a Shared Memory Centric architectures that utilize both the Open Memory Interface OMI, and Compute eXpress Link, CXL, for the memory ports.
During the CXL Forum at OCP Global Summit 23, Rick Kutcipal and Sreeni Bagalkote of Broadcom presented their PCIe/CXL Roadmap and announced their Atlas 4 CXL switch.
Arm: Enabling CXL devices within the Data Center with Arm SolutionsMemory Fabric Forum
During the CXL Forum at OCP Summit, Arm Director of Segment Marketing Parag Beeraka provides and overview of the Arm portfolio of CXL products for the Data Center
Molex and Nvidia - Partnership to enable copper for the next generation artif...Memory Fabric Forum
During the CXL Forum at OCP Global Summit, Eddy Hwang of Nvidia and Wai Kong Poon of Molex presented a next-gen architecture for enabling copper for AI computing.
Lightelligence: Optical CXL Interconnect for Large Scale Memory PoolingMemory Fabric Forum
During the CXL Forum at OCP Global Summit, Lightelligence Director of Engineering Ron Swatzentruber provides an overview of the company's optical port expander products and test results.
In this video from the 2017 Argonne Training Program on Extreme-Scale Computing, Pavan Balaji from Argonne presents an overview of system interconnects for HPC.
Watch the video: https://wp.me/p3RLHQ-hA4
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
The document discusses Vertex AI pipelines for MLOps workflows. It begins with an introduction of the speaker and their background. It then discusses what MLOps is, defining three levels of automation maturity. Vertex AI is introduced as Google Cloud's managed ML platform. Pipelines are described as orchestrating the entire ML workflow through components. Custom components and conditionals allow flexibility. Pipelines improve reproducibility and sharing. Changes can trigger pipelines through services like Cloud Build, Eventarc, and Cloud Scheduler to continuously adapt models to new data.
PCI Express Verification using Reference ModelingDVClub
This document discusses the modeling techniques used for complete verification of a PCI Express switch using reference modeling. It presents the use of Specman eRM for modeling the ingress port logic and router of the PCI Express switch at the block and chip level. The reference models are cycle-accurate and packet-accurate models that are independent of the device under test implementation. They are integrated to enable prediction and checking of runtime behavior at the chip level. Debug messages and coverage from the individual reference models are used to verify functional correctness.
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Databricks
In this talk, we will present how we analyze, predict, and visualize network quality data, as a spark AI use case in a telecommunications company. SK Telecom is the largest wireless telecommunications provider in South Korea with 300,000 cells and 27 million subscribers. These 300,000 cells generate data every 10 seconds, the total size of which is 60TB, 120 billion records per day.
In order to address previous problems of Spark based on HDFS, we have developed a new data store for SparkSQL consisting of Redis and RocksDB that allows us to distribute and store these data in real time and analyze it right away, We were not satisfied with being able to analyze network quality in real-time, we tried to predict network quality in near future in order to quickly detect and recover network device failures, by designing network signal pattern-aware DNN model and a new in-memory data pipeline from spark to tensorflow.
In addition, by integrating Apache Livy and MapboxGL to SparkSQL and our new store, we have built a geospatial visualization system that shows the current population and signal strength of 300,000 cells on the map in real time.
In this deck, Yuichiro Ajima from Fujitsu presents: The Tofu Interconnect D.
"Through the development of post-K, which will be equipped with this CPU, Fujitsu will contribute to the resolution of social and scientific issues in such computer simulation fields as cutting-edge research, health and longevity, disaster prevention and mitigation, energy, as well as manufacturing, while enhancing industrial competitiveness and contributing to the creation of Society 5.0 by promoting applications in big data and AI fields."
Learn more: https://insidehpc.com/2018/08/fujitsu-unveils-details-post-k-supercomputer-processor-powered-arm/
and
http://www.fujitsu.com/jp/solutions/business-technology/tc/catalog/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
Fundamentals of Linux Memory Management and CMA (Contiguous Memory Allocator) In Linux.
Virtual Memory, Physical Memory, Swap Space, DMA, IOMMU, Paging, Segmentation, TLB, Hugepages, Ion google memory manager
How AI and ML are driving Memory Architecture changesDanny Sabour
Artificial intelligence and machine learning are fundamentally changing compute workloads in the cloud, the edge, and the IoT node. Memory architectures have changed to require persistence as a primary need over speed and low power. MRAM with its inherent persistence, low power and speed, is destined to become the next generation memory of choice all the way from the IoT node to the edge and the cloud.
The document discusses IBM's POWER9 processor and OpenPOWER ecosystem. It provides an overview of the POWER9 features such as its new core microarchitecture, enhanced cache hierarchy, and acceleration capabilities through technologies like NVLink 2.0 and CAPI 2.0. It also discusses the OpenCAPI open standard and IBM's efforts to build supercomputers for the US Department of Energy using POWER, NVIDIA GPUs, and Mellanox networking technologies.
During the CXL Forum at OCP Global Summit, Mahesh Wagh, CXL Consortium TTF Co-chair and Senior Fellow at AMD, presented and update of the CXL Consortium mission and road map.
All Presentations during CXL Forum at Flash Memory Summit 22Memory Fabric Forum
The document summarizes a full-day forum hosted by the CXL Consortium and MemVerge on CXL. The morning agenda includes presentations on CXL from representatives of Google, Intel, PCI-SIG, Marvell, Samsung, and Micron. The afternoon agenda includes panels on CXL usage models from Meta, OCP, Anthropic, and MemVerge. A keynote presentation provides an update on the CXL Consortium and the recently released CXL 3.0 specification, including its expanded fabric capabilities and management features. The specification is aimed at enabling new usage models for memory sharing and expansion to address industry trends toward increased data processing demands.
During the CXL Forum at OCP Global Summit, MemVerge CEO Charles Fan presented accomplishments of the CXL industry since 2019, the development of concept cars occurring today, and his predictions for the future of CXL
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and SwitchingMemory Fabric Forum
The document discusses CXL, a new open standard protocol for efficient CPU and memory connectivity. CXL allows for memory disaggregation and pooling across devices by enabling high-bandwidth, low-latency connections between CPUs, GPUs, accelerators, and memory. This helps address the growing CPU-memory bottleneck by allowing expansion of memory capacity beyond what can physically connect to the CPU. CXL also enables memory tiering by providing different performance and cost options for "near" directly attached memory versus "far" switched or fabric attached memory.
During the CXL Forum at OCP Global Summit, Enfabrica CEO Rochan Sankar described how to bridge the network and memory worlds with their accelerated compute fabric switch.
During the CXL Forum at OCP Global Summit, SMART Modular Director Product Marketing Arthur Sainio, provides an overview of the company's CXL memory cards and modules.
In the CXL Forum Theater at SC23 hosted by MemVerge, the Open Compute Project provided an overview of CXL, as well as CXL-related hardware and software projects at OCP
During the CXL Forum at OCP Global Summit, Dharmesh Jani of Meta and Siamak Tavalllei of the CXL Consortium describe the extensive work being done by the Open Compute Project related to CXL
Shared Memory Centric Computing with CXL & OMIAllan Cantle
Discusses how CXL can be better utilized as a separate Fabric Cache domain to a processors own Local Cache Domain. This is done by leveraging a Shared Memory Centric architectures that utilize both the Open Memory Interface OMI, and Compute eXpress Link, CXL, for the memory ports.
During the CXL Forum at OCP Global Summit 23, Rick Kutcipal and Sreeni Bagalkote of Broadcom presented their PCIe/CXL Roadmap and announced their Atlas 4 CXL switch.
Arm: Enabling CXL devices within the Data Center with Arm SolutionsMemory Fabric Forum
During the CXL Forum at OCP Summit, Arm Director of Segment Marketing Parag Beeraka provides and overview of the Arm portfolio of CXL products for the Data Center
Molex and Nvidia - Partnership to enable copper for the next generation artif...Memory Fabric Forum
During the CXL Forum at OCP Global Summit, Eddy Hwang of Nvidia and Wai Kong Poon of Molex presented a next-gen architecture for enabling copper for AI computing.
Lightelligence: Optical CXL Interconnect for Large Scale Memory PoolingMemory Fabric Forum
During the CXL Forum at OCP Global Summit, Lightelligence Director of Engineering Ron Swatzentruber provides an overview of the company's optical port expander products and test results.
In this video from the 2017 Argonne Training Program on Extreme-Scale Computing, Pavan Balaji from Argonne presents an overview of system interconnects for HPC.
Watch the video: https://wp.me/p3RLHQ-hA4
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
The document discusses Vertex AI pipelines for MLOps workflows. It begins with an introduction of the speaker and their background. It then discusses what MLOps is, defining three levels of automation maturity. Vertex AI is introduced as Google Cloud's managed ML platform. Pipelines are described as orchestrating the entire ML workflow through components. Custom components and conditionals allow flexibility. Pipelines improve reproducibility and sharing. Changes can trigger pipelines through services like Cloud Build, Eventarc, and Cloud Scheduler to continuously adapt models to new data.
PCI Express Verification using Reference ModelingDVClub
This document discusses the modeling techniques used for complete verification of a PCI Express switch using reference modeling. It presents the use of Specman eRM for modeling the ingress port logic and router of the PCI Express switch at the block and chip level. The reference models are cycle-accurate and packet-accurate models that are independent of the device under test implementation. They are integrated to enable prediction and checking of runtime behavior at the chip level. Debug messages and coverage from the individual reference models are used to verify functional correctness.
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Databricks
In this talk, we will present how we analyze, predict, and visualize network quality data, as a spark AI use case in a telecommunications company. SK Telecom is the largest wireless telecommunications provider in South Korea with 300,000 cells and 27 million subscribers. These 300,000 cells generate data every 10 seconds, the total size of which is 60TB, 120 billion records per day.
In order to address previous problems of Spark based on HDFS, we have developed a new data store for SparkSQL consisting of Redis and RocksDB that allows us to distribute and store these data in real time and analyze it right away, We were not satisfied with being able to analyze network quality in real-time, we tried to predict network quality in near future in order to quickly detect and recover network device failures, by designing network signal pattern-aware DNN model and a new in-memory data pipeline from spark to tensorflow.
In addition, by integrating Apache Livy and MapboxGL to SparkSQL and our new store, we have built a geospatial visualization system that shows the current population and signal strength of 300,000 cells on the map in real time.
In this deck, Yuichiro Ajima from Fujitsu presents: The Tofu Interconnect D.
"Through the development of post-K, which will be equipped with this CPU, Fujitsu will contribute to the resolution of social and scientific issues in such computer simulation fields as cutting-edge research, health and longevity, disaster prevention and mitigation, energy, as well as manufacturing, while enhancing industrial competitiveness and contributing to the creation of Society 5.0 by promoting applications in big data and AI fields."
Learn more: https://insidehpc.com/2018/08/fujitsu-unveils-details-post-k-supercomputer-processor-powered-arm/
and
http://www.fujitsu.com/jp/solutions/business-technology/tc/catalog/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
Fundamentals of Linux Memory Management and CMA (Contiguous Memory Allocator) In Linux.
Virtual Memory, Physical Memory, Swap Space, DMA, IOMMU, Paging, Segmentation, TLB, Hugepages, Ion google memory manager
How AI and ML are driving Memory Architecture changesDanny Sabour
Artificial intelligence and machine learning are fundamentally changing compute workloads in the cloud, the edge, and the IoT node. Memory architectures have changed to require persistence as a primary need over speed and low power. MRAM with its inherent persistence, low power and speed, is destined to become the next generation memory of choice all the way from the IoT node to the edge and the cloud.
The document discusses IBM's POWER9 processor and OpenPOWER ecosystem. It provides an overview of the POWER9 features such as its new core microarchitecture, enhanced cache hierarchy, and acceleration capabilities through technologies like NVLink 2.0 and CAPI 2.0. It also discusses the OpenCAPI open standard and IBM's efforts to build supercomputers for the US Department of Energy using POWER, NVIDIA GPUs, and Mellanox networking technologies.
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld
The document discusses the future of software-defined storage in 3 years. It predicts that storage media will continue to advance with higher capacities and lower latencies using technologies like 3D NAND and NVDIMMs. Networking and interconnects like NVMe over Fabrics will allow disaggregated storage resources to be pooled and shared across servers. Software-defined storage platforms will evolve to provide common services for distributed data platforms beyond just block storage, with advanced data placement and policy controls to optimize different workloads.
Everything is changing from Health Care to the Automotive markets without forgetting Financial markets or any type of engineering everything has stopped being created as an individual or best-case scenario a team effort to something that is being developed and perfectioned by using AI and hundreds of computers.And even AI is something that we no longer can run in a single computer, no matter how powerful it is. What drives everything today is HPC or High-Performance Computing heavily linked to AI In this session we will discuss about AI, HPC computing, IBM Power architecture and how it can help develop better Healthcare, better Automobiles, better financials and better everything that we run on them
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerRebekah Rodriguez
In this webinar, members of the Server Solution Team as well as a member of Supermicro’s Product Office will discuss Supermicro’s Universal GPU Server, the server’s modular, standards-based design, the important role of OCP Accelerator Module (OAM) form factor, and Universal Baseboard (UBB) in the system, as well as touching on AMD's next generation HPC accelerator. In addition, we will get some insights into trends in the HPC and AI/Machine Learning space, including the different software platforms and best practices that are driving innovation in our industry and daily lives. In particular: • Tools to enable use of the high performance hardware for HPC and Deep Learning applications • Tools to enable use of multiple GPUs, including RDMA, to solve highly demanding HPC and deep learning models, such as BERT • Running applications in containers with AMD’s next generation GPU system
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Shuquan Huang
Today data scientist is turning to cloud for AI and HPC workloads. However, AI/HPC applications require high computational throughput where generic cloud resources would not suffice. There is a strong demand for OpenStack to support hardware accelerated devices in a dynamic model.
In this session, we will introduce OpenStack Acceleration Service – Cyborg, which provides a management framework for accelerator devices (e.g. FPGA, GPU, NVMe SSD). We will also discuss Rack Scale Design (RSD) technology and explain how physical hardware resources can be dynamically aggregated to meet the AI/HPC requirements. The ability to “compose on the fly” with workload-optimized hardware and accelerator devices through an API allow data center managers to manage these resources in an efficient automated manner.
We will also introduce an enhanced telemetry solution with Gnnochi, bandwidth discovery and smart scheduling, by leveraging RSD technology, for efficient workloads management in HPC/AI cloud.
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerRebekah Rodriguez
In this webinar, members of the Server Solution Team as well as a member of Supermicro’s Product Office will discuss Supermicro’s Universal GPU Server, the server’s modular, standards-based design, the important role of OCP Accelerator Module (OAM) form factor, and Universal Baseboard (UBB) in the system, as well as touching on AMD's next generation HPC accelerator. In addition, we will get some insights into trends in the HPC and AI/Machine Learning space, including the different software platforms and best practices that are driving innovation in our industry and daily lives. In particular: • Tools to enable use of the high performance hardware for HPC and Deep Learning applications • Tools to enable use of multiple GPUs, including RDMA, to solve highly demanding HPC and deep learning models, such as BERT • Running applications in containers with AMD’s next generation GPU system
This document provides an overview of HPE solutions for challenges in AI and big data. It discusses HPE storage solutions including aggregated storage-in-compute using NVMe devices, tiered storage using flash, disk, and object storage, and zero watt storage to reduce power usage. It also covers the Scality object storage platform and WekaIO parallel file system for all-flash environments. The document aims to illustrate how HPE technologies can provide efficient, scalable storage for challenging AI and big data workloads.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/xilinx/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nick Ni, Director of Product Marketing at Xilinx, presents the "Xilinx AI Engine: High Performance with Future-proof Architecture Adaptability" tutorial at the May 2019 Embedded Vision Summit.
AI inference demands orders- of-magnitude more compute capacity than what today’s SoCs offer. At the same time, neural network topologies are changing too quickly to be addressed by ASICs that take years to go from architecture to production. In this talk, Ni introduces the Xilinx AI Engine, which complements the dynamically- programmable FPGA fabric to enable ASIC-like performance via custom data flows and a flexible memory hierarchy. This combination provides an orders-of-magnitude boost in AI performance along with the hardware architecture flexibility needed to quickly adapt to rapidly evolving neural network topologies.
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
Originally presented at the BDOOP and Spark Barcelona meetup groups: http://meetu.ps/3bwCTM
Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. The talk compares:
• The performance of both v1 and v2 for Spark and Hive
• PaaS cloud services: Azure HDinsight, Amazon Web Services EMR, Google Cloud Dataproc
• Out-of-the-box support for Spark and Hive versions from providers
• PaaS reliability, scalability, and price-performance of the solutions
Using BigBench, the new Big Data benchmark standard. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.).
Infrastructure optimization for seismic processing (eng)Vsevolod Shabad
NetProject is a system integrator focused on building optimized IT infrastructure for seismic data processing applications. They aim to configure hardware and software to maximize application performance while minimizing costs. Key optimization strategies include choosing the right CPU, RAM, and server configurations; utilizing RDMA for efficient data transfer; offloading processing to GPUs; selecting high-performance file systems and storage; and optimizing resource scheduling and infrastructure management. NetProject leverages their expertise in oil and gas IT to help customers improve seismic processing performance.
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
In this deck from the Hot Chips conference, Chris Nicol from Wave Computing presents: A Dataflow Processing Chip for Training Deep Neural Networks.
Watch the video: https://wp.me/p3RLHQ-k6W
Learn more: https://wavecomp.ai/
and
http://www.hotchips.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This document provides a summary of the IBM POWER9 AC922 system with 6 GPUs. It includes details on the POWER9 processor which features 24 cores per die, an enhanced cache hierarchy up to 120MB, and on-chip accelerators. The AC922 system utilizes two POWER9 processors, supports up to 512GB memory via 16 DDR4 DIMMs, and has three Nvidia Volta GPUs per socket connected via NVLink 2.0. It also discusses the POWER ISA v3.0 instruction set and how POWER9 serves as a premier acceleration platform with technologies like CAPI, OpenCAPI, and NVLink.
Ecosystem Alliance Manager Michael Ocampo talks about the CXL industry's effort to break through the memory wall, memory bound use cases, CXL for modular shared infrastructure, and critical CXL collaboration that's happening now.
Heterogeneous Computing : The Future of SystemsAnand Haridass
Charts from NITK-IBM Computer Systems Research Group (NCSRG)
- Dennard Scaling,Moore's Law, OpenPOWER, Storage Class Memory, FPGA, GPU, CAPI, OpenCAPI, nVidia nvlink, Google Microsoft Heterogeneous system usage
The document summarizes several AI accelerators for cloud datacenters including Google TPU, HabanaLabs Gaudi, Graphcore IPU, and Baidu Kunlun. It discusses their architectures, performance, and how they address challenges in datacenters like workload diversity and energy efficiency. The accelerators use specialized hardware like systolic arrays and FPGA/ASIC designs to achieve much higher performance and efficiency than CPUs and GPUs for AI tasks like training deep learning models.
Similar to Q1 Memory Fabric Forum: Big Memory Computing for AI (20)
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Memory Fabric Forum
Nilesh Shah provide an overview of the ZeroPoint portable, hardware IP portfolio for lossless memory compression and compaction. The IP boosts memory capacity 2-4x, bandwidth and performance/watt by 50%, and is 1,000x faster than competitors.
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPMemory Fabric Forum
Gary Ruggles, Sr Product Manger for PCIe and CXL Controller IP, provides an provides example use cases for adoption of CXL, an introduction to Synopsys CXL IP Solutions, interop and proof points.
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxMemory Fabric Forum
MemVerge product manager and software architect Steve Scargall discusses key factors related to the use of CXL with AI apps including, memory expansion form factors, latency and bandwidth memory placement strategies, RDBMS investigation and results, vector database investigation, and results understanding your application behavior.
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesMemory Fabric Forum
Ravi Gummaluri, Director, CXL System Architecture at Micron describes use cases for memory expansion with tiered DRAM and CXL memory, along with performance data.
Q1 Memory Fabric Forum: CXL-Related Activities within OCPMemory Fabric Forum
OCP steering committee member, and former President of the CXL Consortium, Siamak Tavallaei, provides an overview of CXL-related activities happening within the Open Compute Project.
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyMemory Fabric Forum
For CXL AIC and memory module designers, Nilesh Shah of Montage provides and overview of their CXL memory controller product, technology, and performance.
Nick Kriczsky and Gorden Getty provide an overview of Teledyne LeCroy’s Austin Labs portfolio of products to services including: 1) testing for protocol and electrical compliance, interoperability, data integrity, and performance, 2) In depth protocol training (PCIe, USB, NVMe, NVMe-oF, Fibre Channel), and 3) Automation (solutions for analysis, jamming, generation)
Torry Steed, Sr. Staff Product Manager at SMART Modular, covers the changing shape of memory leading to new categories of CXL form factors. He dives deeper to address EDSFF and AIC variations, mechanical sizes, installation locations, capacity considerations, and power ratings.
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemMemory Fabric Forum
Eddie McMorrow, Sr. Product Manager at GigaIO, defines composable infrastructure and memory fabrics, then provides and overview of the FabreX memory fabric.
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesMemory Fabric Forum
Michael Abraham, Director of Product Management at Micron, discusses data center challenges, the memory and storage hierarchy, Micron CZ120 memory modules, database (TPC-H) improvements, AI inferencing improvements, and how to enabling in your company.
Q1 Memory Fabric Forum: Advantages of Optical CXL for Disaggregated Compute ...Memory Fabric Forum
Ron Swartzentruber, Director of Engineering at Lightelligence, explains why optical connectivity is needed for CXL fabrics, and provides an overview of the Photowave line of port expander PCIe cards and active optical cables.
Arvind Jagannath of VMware makes the case for bridging the CPU-Memory imbalance with memory tiering, describes their vision for memory disaggregation, and explains that VMware will support CXL Expanders – Specific Configurations, Memory Tiering to reduce overall TCO, and Memory Accelerators to enable CXL-based use-cases.
MemVerge Field CTO Yong Tian shows what memory expansion costs with an analysis of various server configurations with up to 8TB of tiered DRAM and CXL memory.
In the CXL Forum Theater at SC23 hosted by MemVerge, Lightelligence describes CXL's need for optical connectivity and their portfolio of CXL optical expander cards and cables
Synopsys: Achieve First Pass Silicon Success with Synopsys CXL IP SolutionsMemory Fabric Forum
This document discusses Synopsys' CXL IP solutions for enabling first pass silicon success. It provides an overview of:
- How large data sets are driving the need for CXL and larger, more efficient cache coherent storage.
- How CXL allows memory expansion by enabling one interface to connect to various memory types like DDR, LPDDR, and persistent memory.
- Synopsys' complete CXL IP solution which uses proven PCIe IP to provide a highly efficient 512-bit controller and 32GT/s PHY for maximum bandwidth and low latency.
- Synopsys' work with XConn to achieve first pass silicon success on a 256 lane CXL 2.0 switch SOC
In the CXL Forum Theater at SC23 hosted by MemVerge, Samsung described their the architecture and use cases of their hybrid drive that includes DRAM and Flash memory
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
3. The Emergence of the AI Computer
3
x86 Era AI Era
Compute
x86 CPUs
DDR Memory
Data
IP Storage
Connectivity
IP Networks
Compute
GPUs, AI Processors
HBM
Data
Fabric-Attached
Memory
AI Fabric
NVLink
CXL
8. CXL Expansion vs. DDR DRAM
Performance
8
0
200
400
600
800
1,000
1,200
1,400
0 50 100 150 200 250
Latency
(ns)
Throughput (GB/s)
DRAM vs. CXL: Consolidated Latency vs Throughput
1 DIMM DDR5 4800
4 DIMM DDR5 4800
16 DIMM DDR5 4800
CXL Gen5 x8
9. Intelligent Memory Placement Engine
9
QoS Policy Engine supports
multiple policies to maximize
bandwidth and minimize latency
Auto-tiering policy exhibits superior
performance than hardware
interleaving or OS kernel tiering
13. Memory Machine™
13
Available now! Contact mike.hoey@memverge.com to request a PoC.
Memory Machine™ X
Server Memory Expansion
(auto-tiering)
Big Memory Appliance
Learn more on February 9 at 12:30 PT
Using CXL with AI Applications
Steve Scargall
MemVerge
Register to attend
X
16. Transmitting and Sharing Data between Processes
16
TCP/IP/Ethernet
MPI/NCCL
Node
Socket, Queue, Pipe
Process
File
Single
Node
Multi-
Node
DRAM
Proces
s
Process
Node
Process Process
Process
Node
Process Process
Process
Node
Proc.
Node
Proc.
Node
Proc.
Node
Proc.
Node
Proc.
Node
Proc.
Node
Proc.
Node
Proc.
Node
Proc.
Networked Storage CXL Memory
Message Passing Shared Storage Shared Memory
17. When & Why is Shared Memory Preferred
17
• Shared Memory Vs. Message Passing:
• Benefit: Take out the networking
• Cost: Cache coherence & synchronization
• Sweet Spot: when the R:W ratio is high
• CRUD -> CRAP
• Not general purpose SHM, so not fit for the kernel
Node
Proc.
Node
Proc.
CXL Memory
Node
Proc.
Node
Proc.
Local
Memory
Local
Memory
• Other potential considerations:
o 1 to N communication
o Data not easily shardable
o Saving memory capacity cost
18. When & Why is Shared Memory Preferred
18
• Vs. Shared Storage:
• Sweet Spot: When the performance requirement is high
• When the data does not need to be persisted permanently
Node
Proc.
Node
Proc.
Node
Proc.
Node
Proc.
Node
Proc.
Node
Proc.
Networked Storage CXL Memory
19. Introducing Gismo Software
Global I/O-free Shared Memory Objects
19
Node
App
Gismo Library
Gismo
Manager
CPU
Node
App
Gismo Library
CPU
Node
App
Gismo Library
CPU
CXL Shared Memory
DDR DRAM
(NUMA 0)
DDR DRAM
(NUMA 0)
DDR DRAM
(NUMA 0)
20. Use Case 1: AI/ML
20
Baseline Ray Ray + Gismo
Local Memory
Node
Raylet
Worker Worker
Object
Store
Local Memory
Node
Raylet
Worker Worker
Object
Store
Serialize & copy over network
Node
Raylet
Worker Worker
Gismo
CXL Shared Memory
Node
Raylet
Worker Worker
Gismo
Local Memory Local Memory
21. Baseline Ray With Gismo
Local Get 1GB object 0.4 sec
0.4 sec - CXL shared memory
as fast as local memory
Remote Get 1GB object 2.7 sec 0.4 sec - 675% faster
Shuffle 50GB
4 nodes, each 4 cores,
128 GB object store
515 sec 185 sec - 280% faster
* Running in emulation environment
Shuffle Benchmark Results
21
22. Benefits of Ray + Gismo
22
IO-free: Eliminates object
serialization and transfers over
network for remote object access
Zero Copy: No more duplicate
object copies on different nodes
No Spilling: Reduces object
spilling and data skewing with each
node accessing memory pool
Node
Raylet
Worker Worker
Gismo
CXL Shared Memory
Node
Raylet
Worker Worker
Gismo
Local Memory Local Memory
23. Memory Machine X Sharing
23
Alpha Availability
Q1’24 Q2’24 Q4’24
Q3’24
Please contact cxl@memverge.com if interested
24. Introducing MemVerge
24
Award-winning software for Big Memory Computing
Memory Machine™ X
Fabric-attached CXL memory for AI
Memory Machine™ Cloud
Hybrid cloud compute platform
Select Customers