Dustin Franklin (GPGPU Applications Engineer, GE Intelligent Platforms ) presents:
"GPUDirect support for RDMA provides low-latency interconnectivity between NVIDIA GPUs and various networking, storage, and FPGA devices. Discussion will include how the CUDA 5 technology increases GPU autonomy and promotes multi-GPU topologies with high GPU-to-CPU ratios. In addition to improved bandwidth and latency, the resulting increase in GFLOPS/watt poses a significant impact to both HPC and embedded applications. We will dig into scalable PCIe switch hierarchies, as well as software infrastructure to manage device interopability and GPUDirect streaming. Highlighting emerging architectures composed of Tegra-style SoCs that further decouple GPUs from discrete CPUs to achieve greater computational density."
Learn more at: http://www.gputechconf.com/page/home.html
AMD has been away from the HPC space for a while, but now they are coming back in a big way with an open software approach to GPU computing. The Radeon Open Compute Platform (ROCm) was born from the Boltzman Initiative announced last year at SC15. Now available on GitHub, the ROCm Platform bringing a rich foundation to advanced computing by better integrating the CPU and GPU to solve real-world problems.
"We are excited to present ROCm, the first open-source HPC/ultrascale-class platform for GPU computing that’s also programming-language independent. We are bringing the UNIX philosophy of choice, minimalism and modular software development to GPU computing. The new ROCm foundation lets you choose or even develop tools and a language run time for your application."
Watch the video presentation: http://wp.me/p3RLHQ-fJT
Learn more: https://radeonopencompute.github.io/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the UK HPC Conference, Gunter Roeth from NVIDIA presents: Hardware & Software Platforms for HPC, AI and ML.
"Data is driving the transformation of industries around the world and a new generation of AI applications are effectively becoming programs that write software, powered by data, vs by computer programmers. Today, NVIDIA’s tensor core GPU sits at the core of most AI, ML and HPC applications, and NVIDIA software surrounds every level of such a modern application, from CUDA and libraries like cuDNN and NCCL embedded in every deep learning framework and optimized and delivered via the NVIDIA GPU Cloud to reference architectures designed to streamline the deployment of large scale infrastructures."
Watch the video: https://wp.me/p3RLHQ-l2Y
Learn more: http://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
InfiniBand In-Network Computing Technology and Roadmapinside-BigData.com
In this deck from the MVAPICH User Group, Gilad Shainer from Mellanox presents: InfiniBand In-Network Computing Technology and Roadmap.
"In-Network Computing transforms the data center interconnect to become a "distributed CPU", and "distributed memory", enables to overcome performance barriers and to enable faster and more scalable data analysis. HDR 200G InfiniBand In-Network Computing technology includes several elements - Scalable Hierarchical Aggregation and Reduction Protocol (SHARP), smart Tag Matching and rendezvoused protocol, and more. These technologies are in use at some of the recent large scale supercomputers around the world, including the top TOP500 platforms. The session will discuss the InfiniBand In-Network Computing technology and performance results, as well as view to future roadmap."
Watch the video: https://wp.me/p3RLHQ-kIC
Learn more: http://mellanox.com
and
http://mug.mvapich.cse.ohio-state.edu/program/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
AMD has been away from the HPC space for a while, but now they are coming back in a big way with an open software approach to GPU computing. The Radeon Open Compute Platform (ROCm) was born from the Boltzman Initiative announced last year at SC15. Now available on GitHub, the ROCm Platform bringing a rich foundation to advanced computing by better integrating the CPU and GPU to solve real-world problems.
"We are excited to present ROCm, the first open-source HPC/ultrascale-class platform for GPU computing that’s also programming-language independent. We are bringing the UNIX philosophy of choice, minimalism and modular software development to GPU computing. The new ROCm foundation lets you choose or even develop tools and a language run time for your application."
Watch the video presentation: http://wp.me/p3RLHQ-fJT
Learn more: https://radeonopencompute.github.io/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the UK HPC Conference, Gunter Roeth from NVIDIA presents: Hardware & Software Platforms for HPC, AI and ML.
"Data is driving the transformation of industries around the world and a new generation of AI applications are effectively becoming programs that write software, powered by data, vs by computer programmers. Today, NVIDIA’s tensor core GPU sits at the core of most AI, ML and HPC applications, and NVIDIA software surrounds every level of such a modern application, from CUDA and libraries like cuDNN and NCCL embedded in every deep learning framework and optimized and delivered via the NVIDIA GPU Cloud to reference architectures designed to streamline the deployment of large scale infrastructures."
Watch the video: https://wp.me/p3RLHQ-l2Y
Learn more: http://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
InfiniBand In-Network Computing Technology and Roadmapinside-BigData.com
In this deck from the MVAPICH User Group, Gilad Shainer from Mellanox presents: InfiniBand In-Network Computing Technology and Roadmap.
"In-Network Computing transforms the data center interconnect to become a "distributed CPU", and "distributed memory", enables to overcome performance barriers and to enable faster and more scalable data analysis. HDR 200G InfiniBand In-Network Computing technology includes several elements - Scalable Hierarchical Aggregation and Reduction Protocol (SHARP), smart Tag Matching and rendezvoused protocol, and more. These technologies are in use at some of the recent large scale supercomputers around the world, including the top TOP500 platforms. The session will discuss the InfiniBand In-Network Computing technology and performance results, as well as view to future roadmap."
Watch the video: https://wp.me/p3RLHQ-kIC
Learn more: http://mellanox.com
and
http://mug.mvapich.cse.ohio-state.edu/program/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Tutorial: Using GoBGP as an IXP connecting routerShu Sugimoto
- Show you how GoBGP can be used as a software router in conjunction with quagga
- (Tutorial) Walk through the setup of IXP connecting router using GoBGP
In this deck from the HPC Advisory Council Spain Conference, DK Panda from Ohio State University presents: Communication Frameworks for HPC and Big Data.
Watch the video presentation: http://insidehpc.com/2015/09/video-communication-frameworks-for-hpc-and-big-data/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Learn more: http://www.hpcadvisorycouncil.com/events/2015/spain-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In a series of announcements that left more than 1,200 gamers gathered in Cologne alternately breathless, giddy with laughter, and shouting their enthusiasm, Jensen Huang introduced the GeForce RTX series of gaming processors, representing the biggest leap in performance in NVIDIA’s history.
SOSCON 2019.10.17
What are the methods for packet processing on Linux? And how fast are each packet processing methods? In this presentation, we will learn how to handle packets on Linux (User space, socket filter, netfilter, tc), and compare performance with analysis of where each packet processing is done in the network stack (hook point). Also, we will discuss packet processing using XDP, an in-kernel fast-path recently added to the Linux kernel. eXpress Data Path (XDP) is a high-performance programmable network data-path within the Linux kernel. The XDP is located at the lowest level of access through SW in the network stack, the point at which driver receives the packet. By using the eBPF infrastructure at this hook point, the network stack can be expanded without modifying the kernel.
Daniel T. Lee (Hoyeon Lee)
@danieltimlee
Daniel T. Lee currently works as Software Engineer at Kosslab and contributing to Linux kernel BPF project. He has interest in cloud, Linux networking, and tracing technologies, and likes to analyze the kernel's internal using BPF technology.
DPDK (Data Plane Development Kit) Overview by Rami Rosen
* Background and short history
* Advantages and disadvantages
- Very High speed networking acceleration in L2
- How this acceleration is achieved (hugepages, optimizations)
- rte_kni (and KCP)
- VPP (and FD.io project) , providing routing and switching.
- TLDK (Transport Layer Development Kit, TCP/UDP)
* Anatomy of a simple DPDK application.
* Development and governance model
* Testpmd: DPDK CLI tool
* DDP - Dynamic Device Profiles
Rami Rosen is a Linux Kernel expert, the author of "Linux Kernel Networking", Apress, 2014.
Rami had published two articles about DPDK in the last year:
"Network acceleration with DPDK"
https://lwn.net/Articles/725254/
"Userspace Networking with DPDK"
https://www.linuxjournal.com/content/userspace-networking-dpdk
Some resources how to navigate in the hardware space in order to build your own workstation for training deep learning models.
Alternative download link: https://www.dropbox.com/s/o7cwla30xtf9r74/deepLearning_buildComputer.pdf?dl=0
Tutorial: Using GoBGP as an IXP connecting routerShu Sugimoto
- Show you how GoBGP can be used as a software router in conjunction with quagga
- (Tutorial) Walk through the setup of IXP connecting router using GoBGP
In this deck from the HPC Advisory Council Spain Conference, DK Panda from Ohio State University presents: Communication Frameworks for HPC and Big Data.
Watch the video presentation: http://insidehpc.com/2015/09/video-communication-frameworks-for-hpc-and-big-data/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Learn more: http://www.hpcadvisorycouncil.com/events/2015/spain-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In a series of announcements that left more than 1,200 gamers gathered in Cologne alternately breathless, giddy with laughter, and shouting their enthusiasm, Jensen Huang introduced the GeForce RTX series of gaming processors, representing the biggest leap in performance in NVIDIA’s history.
SOSCON 2019.10.17
What are the methods for packet processing on Linux? And how fast are each packet processing methods? In this presentation, we will learn how to handle packets on Linux (User space, socket filter, netfilter, tc), and compare performance with analysis of where each packet processing is done in the network stack (hook point). Also, we will discuss packet processing using XDP, an in-kernel fast-path recently added to the Linux kernel. eXpress Data Path (XDP) is a high-performance programmable network data-path within the Linux kernel. The XDP is located at the lowest level of access through SW in the network stack, the point at which driver receives the packet. By using the eBPF infrastructure at this hook point, the network stack can be expanded without modifying the kernel.
Daniel T. Lee (Hoyeon Lee)
@danieltimlee
Daniel T. Lee currently works as Software Engineer at Kosslab and contributing to Linux kernel BPF project. He has interest in cloud, Linux networking, and tracing technologies, and likes to analyze the kernel's internal using BPF technology.
DPDK (Data Plane Development Kit) Overview by Rami Rosen
* Background and short history
* Advantages and disadvantages
- Very High speed networking acceleration in L2
- How this acceleration is achieved (hugepages, optimizations)
- rte_kni (and KCP)
- VPP (and FD.io project) , providing routing and switching.
- TLDK (Transport Layer Development Kit, TCP/UDP)
* Anatomy of a simple DPDK application.
* Development and governance model
* Testpmd: DPDK CLI tool
* DDP - Dynamic Device Profiles
Rami Rosen is a Linux Kernel expert, the author of "Linux Kernel Networking", Apress, 2014.
Rami had published two articles about DPDK in the last year:
"Network acceleration with DPDK"
https://lwn.net/Articles/725254/
"Userspace Networking with DPDK"
https://www.linuxjournal.com/content/userspace-networking-dpdk
Some resources how to navigate in the hardware space in order to build your own workstation for training deep learning models.
Alternative download link: https://www.dropbox.com/s/o7cwla30xtf9r74/deepLearning_buildComputer.pdf?dl=0
Greater Chicago Area - Independent Non-Profit Organization Management Professional
View clifford sugerman's professional profile on LinkedIn. LinkedIn is the world's largest business network, helping professionals like clifford sugerman discover.
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...AMD Developer Central
Presentation PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by Wu Feng and Mark Gardner at the AMD Developer Summit (APU13) November 11-13, 2013.
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Storti Mario
In this article we compare the results obtained with an implementation of the Finite Volume for structured meshes on GPGPUs with experimental results and also with a Finite Element code with boundary fitted strategy. The example is a fully submerged spherical buoy immersed in a cubic water recipient. The recipient undergoes an harmonic linear motion imposed with a shake table. The experiment is recorded with a high speed camera and the displacement of the buoy if obtained from the video with a MoCap (Motion Capture) algorithm. The amplitude and phase of the resulting motion allows to determine indirectly the added mass and drag of the sphere.
NVIDIA CEO Jen-Hsun Huang introduces NVLink and shares a roadmap of the GPU. Primary topics also include an introduction of the GeForce GTX Titan Z, CUDA for machine learning, and Iray VCA.
Introduction to Deep Learning with Will ConstableIntel Nervana
Deep Residual Nets, Activity recognition in videos, and Q&A systems using neon and the Nervana Cloud
Will Constable will start with an introduction to the field of Deep Learning, neon and the Nervana Cloud. The presentation will be followed by an interactive workshop using neon. neon is an open-source Python based Deep Learning framework that has been built from the ground up for speed, scalability and ease of use.
Case Study: Porting Qt for Embedded Linux on Embedded Processorsaccount inactive
Qt has been crucial for Texas Instruments to develop attractive applications as system demonstrations including appealing graphics and communication features within a defined time space and resource environment. This session will discuss porting and using Qt for Embedded Linux on several embedded processors. Walzer will present TI's experience and the current status of configuring Qt for ARM based platforms running Linux as the operating system, as well as have a look at the current state of integrating hardware accelerators such as DSP and graphics cores into Qt4.
Presentation by Frank Walzer held during Qt Developer Days 2009.
http://qt.nokia.com/developer/learning/elearning
Waupaca County and Trempealeau County have both been running ArcGIS server (AGS) web mapping applications since late 2007. The two Counties worked together throughout the implementation process. They hope to share how to plan for an AGS implementation and potential pitfalls to avoid. The session is intended to be a starting point for users planning to implement AGS web mapping applications. Topics covered will be relevant
to software versions 9.2 and 9.3.
Overview of the BF609 dual-core Blackfin processor series covering main features including the Pipelined Vision Processor including the hardware and software development tools. By Analog Devices
VMworld 2013: How Good is PCoIP - A Remoting Protocol ShootoutVMworld
VMworld 2013
Shawn Bass, shawnbass.com
Cyndie Zikmund, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
In this deck from the Stanford HPC Conference, Shahin Khan from OrionX describes major market Shifts in IT.
"We will discuss the digital infrastructure of the future enterprise and the state of these trends."
"We work with clients on the impact of Digital Transformation (DX) on them, their customers, and their messages. Generally, they want to track, in one place, trends like IoT, 5G, AI, Blockchain, and Quantum Computing. And they want to know what these trends mean, how they affect each other, and when they demand action, and how to formulate and execute an effective plan. If that describes you, we can help."
Watch the video: https://wp.me/p3RLHQ-lPP
Learn more: http://orionx.net
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
In this deck from IWOCL / SYCLcon 2020, Hal Finkel from Argonne National Laboratory presents: Preparing to program Aurora at Exascale - Early experiences and future directions.
"Argonne National Laboratory’s Leadership Computing Facility will be home to Aurora, our first exascale supercomputer. Aurora promises to take scientific computing to a whole new level, and scientists and engineers from many different fields will take advantage of Aurora’s unprecedented computational capabilities to push the boundaries of human knowledge. In addition, Aurora’s support for advanced machine-learning and big-data computations will enable scientific workflows incorporating these techniques along with traditional HPC algorithms. Programming the state-of-the-art hardware in Aurora will be accomplished using state-of-the-art programming models. Some of these models, such as OpenMP, are long-established in the HPC ecosystem. Other models, such as Intel’s oneAPI, based on SYCL, are relatively-new models constructed with the benefit of significant experience. Many applications will not use these models directly, but rather, will use C++ abstraction libraries such as Kokkos or RAJA. Python will also be a common entry point to high-performance capabilities. As we look toward the future, features in the C++ standard itself will become increasingly relevant for accessing the extreme parallelism of exascale platforms.
This presentation will summarize the experiences of our team as we prepare for Aurora, exploring how to port applications to Aurora’s architecture and programming models, and distilling the challenges and best practices we’ve developed to date. oneAPI/SYCL and OpenMP are both critical models in these efforts, and while the ecosystem for Aurora has yet to mature, we’ve already had a great deal of success. Importantly, we are not passive recipients of programming models developed by others. Our team works not only with vendor-provided compilers and tools, but also develops improved open-source LLVM-based technologies that feed both open-source and vendor-provided capabilities. In addition, we actively participate in the standardization of OpenMP, SYCL, and C++. To conclude, I’ll share our thoughts on how these models can best develop in the future to support exascale-class systems."
Watch the video: https://wp.me/p3RLHQ-lPT
Learn more: https://www.iwocl.org/iwocl-2020/conference-program/
and
https://www.anl.gov/topic/aurora
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Greg Wahl from Advantech presents: Transforming Private 5G Networks.
Advantech Networks & Communications Group is driving innovation in next-generation network solutions with their High Performance Servers. We provide business critical hardware to the world's leading telecom and networking equipment manufacturers with both standard and customized products. Our High Performance Servers are highly configurable platforms designed to balance the best in x86 server-class processing performance with maximum I/O and offload density. The systems are cost effective, highly available and optimized to meet next generation networking and media processing needs.
“Advantech’s Networks and Communication Group has been both an innovator and trusted enabling partner in the telecommunications and network security markets for over a decade, designing and manufacturing products for OEMs that accelerate their network platform evolution and time to market.” Said Advantech Vice President of Networks & Communications Group, Ween Niu. “In the new IP Infrastructure era, we will be expanding our expertise in Software Defined Networking (SDN) and Network Function Virtualization (NFV), two of the essential conduits to 5G infrastructure agility making networks easier to install, secure, automate and manage in a cloud-based infrastructure.”
In addition to innovation in air interface technologies and architecture extensions, 5G will also need a new generation of network computing platforms to run the emerging software defined infrastructure, one that provides greater topology flexibility, essential to deliver on the promises of high availability, high coverage, low latency and high bandwidth connections. This will open up new parallel industry opportunities through dedicated 5G network slices reserved for specific industries dedicated to video traffic, augmented reality, IoT, connected cars etc. 5G unlocks many new doors and one of the keys to its enablement lies in the elasticity and flexibility of the underlying infrastructure.
Advantech’s corporate vision is to enable an intelligent planet. The company is a global leader in the fields of IoT intelligent systems and embedded platforms. To embrace the trends of IoT, big data, and artificial intelligence, Advantech promotes IoT hardware and software solutions with the Edge Intelligence WISE-PaaS core to assist business partners and clients in connecting their industrial chains. Advantech is also working with business partners to co-create business ecosystems that accelerate the goal of industrial intelligence."
Watch the video: https://wp.me/p3RLHQ-lPQ
* Company website: https://www.advantech.com/
* Solution page: https://www2.advantech.com/nc/newsletter/NCG/SKY/benefits.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
In this deck from the Stanford HPC Conference, Katie Lewis from Lawrence Livermore National Laboratory presents: The Incorporation of Machine Learning into Scientific Simulations at Lawrence Livermore National Laboratory.
"Scientific simulations have driven computing at Lawrence Livermore National Laboratory (LLNL) for decades. During that time, we have seen significant changes in hardware, tools, and algorithms. Today, data science, including machine learning, is one of the fastest growing areas of computing, and LLNL is investing in hardware, applications, and algorithms in this space. While the use of simulations to focus and understand experiments is well accepted in our community, machine learning brings new challenges that need to be addressed. I will explore applications for machine learning in scientific simulations that are showing promising results and further investigation that is needed to better understand its usefulness."
Watch the video: https://youtu.be/NVwmvCWpZ6Y
Learn more: https://computing.llnl.gov/research-area/machine-learning
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
In this deck from the Stanford HPC Conference, DK Panda from Ohio State University presents: How to Achieve High-Performance, Scalable and Distributed DNN Training on Modern HPC Systems?
"This talk will start with an overview of challenges being faced by the AI community to achieve high-performance, scalable and distributed DNN training on Modern HPC systems with both scale-up and scale-out strategies. After that, the talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of- core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented."
Watch the video: https://youtu.be/LeUNoKZVuwQ
Learn more: http://web.cse.ohio-state.edu/~panda.2/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
In this deck from the Stanford HPC Conference, Nick Nystrom and Paola Buitrago provide an update from the Pittsburgh Supercomputing Center.
Nick Nystrom is Chief Scientist at the Pittsburgh Supercomputing Center (PSC). Nick is architect and PI for Bridges, PSC's flagship system that successfully pioneered the convergence of HPC, AI, and Big Data. He is also PI for the NIH Human Biomolecular Atlas Program’s HIVE Infrastructure Component and co-PI for projects that bring emerging AI technologies to research (Open Compass), apply machine learning to biomedical data for breast and lung cancer (Big Data for Better Health), and identify causal relationships in biomedical big data (the Center for Causal Discovery, an NIH Big Data to Knowledge Center of Excellence). His current research interests include hardware and software architecture, applications of machine learning to multimodal data (particularly for the life sciences) and to enhance simulation, and graph analytics.
Watch the video: https://youtu.be/LWEU1L1o7yY
Learn more: https://www.psc.edu/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Stanford HPC Conference, Ryan Quick from Providentia Worldwide describes how DNNs can be used to improve EDA simulation runs.
"Systems Intelligence relies on a variety of methods for providing insight into the core mechanisms for driving automated behavioral changes in self-healing command and control platforms. This talk reports on initial efforts with leveraging Semiconductor Electronic Design Automation (EDA) telemetry data from cross-domain sources including power, network, storage, nodes, and applications in neural networks as a driving method for insight into SI automation systems."
Watch the video: https://youtu.be/2WbR8tq-XbM
Learn more: http://www.providentiaworldwide.com/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
In this deck from the Stanford HPC Conference, Nicole Xu from Stanford University describes how she transformed a common jellyfish into a bionic creature that is part animal and part machine.
"Animal locomotion and bioinspiration have the potential to expand the performance capabilities of robots, but current implementations are limited. Mechanical soft robots leverage engineered materials and are highly controllable, but these biomimetic robots consume more power than corresponding animal counterparts. Biological soft robots from a bottom-up approach offer advantages such as speed and controllability but are limited to survival in cell media. Instead, biohybrid robots that comprise live animals and self- contained microelectronic systems leverage the animals’ own metabolism to reduce power constraints and body as an natural scaffold with damage tolerance. We demonstrate that by integrating onboard microelectronics into live jellyfish, we can enhance propulsion up to threefold, using only 10 mW of external power input to the microelectronics and at only a twofold increase in cost of transport to the animal. This robotic system uses 10 to 1000 times less external power per mass than existing swimming robots in literature and can be used in future applications for ocean monitoring to track environmental changes."
Watch the video: https://youtu.be/HrmJFyvInj8
Learn more: https://sanfrancisco.cbslocal.com/2020/02/05/stanford-research-project-common-jellyfish-bionic-sea-creatures/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Stanford HPC Conference, Peter Dueben from the European Centre for Medium-Range Weather Forecasts (ECMWF) presents: Machine Learning for Weather Forecasts.
"I will present recent studies that use deep learning to learn the equations of motion of the atmosphere, to emulate model components of weather forecast models and to enhance usability of weather forecasts. I will than talk about the main challenges for the application of deep learning in cutting-edge weather forecasts and suggest approaches to improve usability in the future."
Peter is contributing to the development and optimization of weather and climate models for modern supercomputers. He is focusing on a better understanding of model error and model uncertainty, on the use of reduced numerical precision that is optimised for a given level of model error, on global cloud- resolving simulations with ECMWF's forecast model, and the use of machine learning, and in particular deep learning, to improve the workflow and predictions. Peter has graduated in Physics and wrote his PhD thesis at the Max Planck Institute for Meteorology in Germany. He worked as Postdoc with Tim Palmer at the University of Oxford and has taken up a position as University Research Fellow of the Royal Society at the European Centre for Medium-Range Weather Forecasts (ECMWF) in 2017.
Watch the video: https://youtu.be/ks3fkRj8Iqc
Learn more: https://www.ecmwf.int/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Gilad Shainer from the HPC AI Advisory Council describes how this organization fosters innovation in the high performance computing community.
"The HPC-AI Advisory Council’s mission is to bridge the gap between high-performance computing (HPC) and Artificial Intelligence (AI) use and its potential, bring the beneficial capabilities of HPC and AI to new users for better research, education, innovation and product manufacturing, bring users the expertise needed to operate HPC and AI systems, provide application designers with the tools needed to enable parallel computing, and to strengthen the qualification and integration of HPC and AI system products."
Watch the video: https://wp.me/p3RLHQ-lNz
Learn more: http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Today RIKEN in Japan announced that the Fugaku supercomputer will be made available for research projects aimed to combat COVID-19.
"Fugaku is currently being installed and is scheduled to be available to the public in 2021. However, faced with the devastating disaster unfolding before our eyes, RIKEN and MEXT decided to make a portion of the computational resources of Fugaku available for COVID-19-related projects ahead of schedule while continuing the installation process.
Fugaku is being developed not only for the progress in science, but also to help build the society dubbed as the “Society 5.0” by the Japanese government, where all people will live safe and comfortable lives. The current initiative to fight against the novel coronavirus is driven by the philosophy behind the development of Fugaku."
Initial Projects
Exploring new drug candidates for COVID-19 by "Fugaku"
Yasushi Okuno, RIKEN / Kyoto University
Prediction of conformational dynamics of proteins on the surface of SARS-Cov-2 using Fugaku
Yuji Sugita, RIKEN
Simulation analysis of pandemic phenomena
Nobuyasu Ito, RIKEN
Fragment molecular orbital calculations for COVID-19 proteins
Yuji Mochizuki, Rikkyo University
In this deck from the Performance Optimisation and Productivity group, Lubomir Riha from IT4Innovations presents: Energy Efficient Computing using Dynamic Tuning.
"We now live in a world of power-constrained architectures and systems and power consumption represents a significant cost factor in the overall HPC system economy. For these reasons, in recent years researchers, supercomputing centers and major vendors have developed new tools and methodologies to measure and optimize the energy consumption of large-scale high performance system installations. Due to the link between energy consumption, power consumption and execution time of an application executed by the final user, it is important for these tools and the methodology used to consider all these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.
This webinar focused on tools designed to improve the energy-efficiency of HPC applications using a methodology of dynamic tuning of HPC applications, developed under the H2020 READEX project. The READEX methodology has been designed for exploiting the dynamic behaviour of software. At design time, different runtime situations (RTS) are detected and optimized system configurations are determined. RTSs with the same configuration are grouped into scenarios, forming the tuning model. At runtime, the tuning model is used to switch system configurations dynamically.
The MERIC tool, that implements the READEX methodology, is presented. It supports manual or binary instrumentation of the analysed applications to simplify the analysis. This instrumentation is used to identify and annotate the significant regions in the HPC application. Automatic binary instrumentation annotates regions with significant runtime. Manual instrumentation, which can be combined with automatic, allows code developer to annotate regions of particular interest."
Watch the video: https://wp.me/p3RLHQ-lJP
Learn more: https://pop-coe.eu/blog/14th-pop-webinar-energy-efficient-computing-using-dynamic-tuning
and
https://code.it4i.cz/vys0053/meric
Sign up for our insideHPC Newsletter: http://insidehpc.com/newslett
In this deck from GTC Digital, William Beaudin from DDN presents: HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD.
Enabling high performance computing through the use of GPUs requires an incredible amount of IO to sustain application performance. We'll cover architectures that enable extremely scalable applications through the use of NVIDIA’s SuperPOD and DDN’s A3I systems.
The NVIDIA DGX SuperPOD is a first-of-its-kind artificial intelligence (AI) supercomputing infrastructure. DDN A³I with the EXA5 parallel file system is a turnkey, AI data storage infrastructure for rapid deployment, featuring faster performance, effortless scale, and simplified operations through deeper integration. The combined solution delivers groundbreaking performance, deploys in weeks as a fully integrated system, and is designed to solve the world's most challenging AI problems.
Watch the video: https://wp.me/p3RLHQ-lIV
Learn more: https://www.ddn.com/download/nvidia-superpod-ddn-a3i-ai400-appliance-with-the-exa5-filesystem/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Paul Isaacs from Linaro presents: State of ARM-based HPC. This talk provides an overview of applications and infrastructure services successfully ported to Aarch64 and benefiting from scale.
"With its debut on the TOP500, the 125,000-core Astra supercomputer at New Mexico’s Sandia Labs uses Cavium ThunderX2 chips to mark Arm’s entry into the petascale world. In Japan, the Fujitsu A64FX Arm-based CPU in the pending Fugaku supercomputer has been optimized to achieve high-level, real-world application performance, anticipating up to one hundred times the application execution performance of the K computer. K was the first computer to top 10 petaflops in 2011."
Watch the video: https://wp.me/p3RLHQ-lIT
Learn more: https://www.linaro.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
Today Xilinx announced Versal Premium, the third series in the Versal ACAP portfolio. The Versal Premium series features highly integrated, networked and power-optimized cores and the industry’s highest bandwidth and compute density on an adaptable platform. Versal Premium is designed for the highest bandwidth networks operating in thermally and spatially constrained environments, as well as for cloud providers who need scalable, adaptable application acceleration.
Versal is the industry’s first adaptive compute acceleration platform (ACAP), a revolutionary new category of heterogeneous compute devices with capabilities that far exceed those of conventional silicon architectures. Developed on TSMC’s 7-nanometer process technology, Versal Premium combines software programmability with dynamically configurable hardware acceleration and pre-engineered connectivity and security features to enable a faster time-to- market. The Versal Premium series delivers up to 3X higher throughput compared to current generation FPGAs, with built-in Ethernet, Interlaken, and cryptographic engines that enable fast and secure networks. The series doubles the compute density of currently deployed mainstream FPGAs and provides the adaptability to keep pace with increasingly diverse and evolving cloud and networking workloads.
Learn more: https://insidehpc.com/2020/03/xilinx-announces-versal-premium-acap-for-network-and-cloud-acceleration/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
In this video from the Rice Oil & Gas Conference, Chin Fang from Zettar presents: Moving Massive Amounts of Data across Any Distance Efficiently.
The objective of this talk is to present two on-going projects aiming at improving and ensuring highly efficient bulk transferring or streaming of massive amounts of data over digital connections across any distance. It examines the current state of the art, a few very common misconceptions, the differences among the three major type of data movement solutions, a current initiative attempting to improve the data movement efficiency from the ground up, and another multi-stage project that shows how to conduct long distance large scale data movement at speed and scale internationally. Both projects have real world motivations, e.g. the ambitious data transfer requirements of Linac Coherent Light Source II (LCLS-II) [1], a premier preparation project of the U.S. DOE Exascale Computing Initiative (ECI) [2]. Their immediate goals are described and explained, together with the solution used for each. Findings and early results are reported. Possible future works are outlined.
Watch the video: https://wp.me/p3RLHQ-lBX
Learn more: https://www.zettar.com/
and
https://rice2020oghpc.rice.edu/program-2/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Rice Oil & Gas Conference, Bradley McCredie from AMD presents: Scaling TCO in a Post Moore's Law Era.
"While foundries bravely drive forward to overcome the technical and economic challenges posed by scaling to 5nm and beyond, Moore’s law alone can provide only a fraction of the performance / watt and performance / dollar gains needed to satisfy the demands of today’s high performance computing and artificial intelligence applications. To close the gap, multiple strategies are required. First, new levels of innovation and design efficiency will supplement technology gains to continue to deliver meaningful improvements in SoC performance. Second, heterogenous compute architectures will create x-factor increases of performance efficiency for the most critical applications. Finally, open software frameworks, APIs, and toolsets will enable broad ecosystems of application level innovation."
Watch the video:
Learn more: http://amd.com
and
https://rice2020oghpc.rice.edu/program-2/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
In this deck from the ECSS Symposium, Abe Stern from NVIDIA presents: CUDA-Python and RAPIDS for blazing fast scientific computing.
"We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming."
Watch the video: https://wp.me/p3RLHQ-lvu
Learn more: https://developer.nvidia.com/rapids
and
https://www.xsede.org/for-users/ecss/ecss-symposium
Sign up for our insideHPC Newsletter: http://insidehp.com/newsletter
In this deck from FOSDEM 2020, Colin Sauze from Aberystwyth University describes the development of a RaspberryPi cluster for teaching an introduction to HPC.
"The motivation for this was to overcome four key problems faced by new HPC users:
* The availability of a real HPC system and the effect running training courses can have on the real system, conversely the availability of spare resources on the real system can cause problems for the training course.
* A fear of using a large and expensive HPC system for the first time and worries that doing something wrong might damage the system.
* That HPC systems are very abstract systems sitting in data centres that users never see, it is difficult for them to understand exactly what it is they are using.
* That new users fail to understand resource limitations, in part because of the vast resources in modern HPC systems a lot of mistakes can be made before running out of resources. A more resource constrained system makes it easier to understand this.
The talk will also discuss some of the technical challenges in deploying an HPC environment to a Raspberry Pi and attempts to keep that environment as close to a "real" HPC as possible. The issue to trying to automate the installation process will also be covered."
Learn more: https://github.com/colinsauze/pi_cluster
and
https://fosdem.org/2020/schedule/events/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from ATPESC 2019, Ken Raffenetti from Argonne presents an overview of HPC interconnects.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two-week training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-luc
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
1. GPUDirect RDMA and Green
Multi-GPU Architectures
GE Intelligent Platforms
Mil/Aero Embedded Computing
Dustin Franklin, GPGPU Applications Engineer
dustin.franklin@ge.com
443.310.9812 (Washington, DC)
2. What this talk is about
• GPU Autonomy
• GFLOPS/watt and SWAP
• Project Denver
• Exascale
3. Without GPUDirect
• In a standard plug & play OS, the two drivers have separate DMA buffers in system memory
• Three transfers to move data between I/O endpoint and GPU
1
I/O driver
space
I/O PCIe CPU System 2
endpoint switch memory
GPU driver
space
3
GPU
GPU
memory
1 I/O endpoint DMA’s into system memory
2 CPU copies data from I/O driver DMA buffer into GPU DMA buffer
3 GPU DMA’s from system memory into GPU memory
4. GPUDirect RDMA
• I/O endpoint and GPU communicate directly, only one transfer required.
• Traffic limited to PCIe switch, no CPU involvement in DMA
• x86 CPU is still necessary to have in the system, to run NVIDIA driver
I/O PCIe CPU System
endpoint switch memory
1
GPU
GPU
memory
1 I/O endpoint DMA’s into GPU memory
5. Endpoints
• GPUDirect RDMA is flexible and works with a wide range of existing devices
– Built on open PCIe standards
– Any I/O device that has a PCIe endpoint and DMA engine can utilize GPUDirect RDMA
– GPUDirect permeates both the frontend ingest and backend interconnects
• FPGAs
• Ethernet / InfiniBand adapters
• Storage devices
• Video capture cards
• PCIe non-transparent (NT) ports
• It’s free.
– Supported in CUDA 5.0 and Kepler
– Users can leverage APIs to implement RDMA with 3rd-party endpoints in their system
• Practically no integration required
– No changes to device HW
– No changes to CUDA algorithms
– I/O device drivers need to use DMA addresses of GPU instead of SYSRAM pages
8. Pipeline with GPUDirect RDMA
FPGA
block 0 block 1 block 2 block 3 block
ADC
FPGA
block 0 block 1 block 2 block 3 b
DMA
GPU
block 0 block 1 block 2 block
CUDA
GPU
block 0 block 1 block
DMA
FPGA
Transfer block directly to GPU via PCIe switch
DMA
GPU CUDA DSP kernels (FIR, FFT, ect.)
CUDA
GPU
Transfer results to next processor
DMA
9. Pipeline without GPUDirect RDMA
FPGA
block 0 block 1 block 2 block 3 block
ADC
FPGA
block 0 block 1 block 2 block 3 b
DMA
GPU block 0 block 1
DMA
GPU
block 0 block
CUDA
GPU
block 0
DMA
FPGA
Transfer block to system memory
DMA
GPU
Transfer from system memory to GPU
DMA
GPU CUDA DSP kernels (FIR, FFT, ect.)
CUDA
GPU
Transfer results to next processor
DMA
10. Backend Interconnects
• Utilize GPUDirect RDMA across the network for low-latency IPC and system scalability
• Mellanox OFED integration with GPUDirect RDMA – Q2 2013
11. Topologies
• GPUDirect RDMA works in many system topologies
– Single I/O endpoint and single GPU
– Single I/O endpoint and multiple GPUs
with or without PCIe switch downstream of CPU
– Multiple I/O endpoints and single GPU
– Multiple I/O endpoints and multiple GPUs
System
CPU
memory
PCIe
switch
I/O I/O
GPU GPU
endpoint endpoint
GPU GPU
memory memory
stream stream
DMA pipeline #1 DMA pipeline #2
12. Impacts of GPUDirect
• Decreased latency
– Eliminate redundant copies over PCIe + added latency from CPU
– ~5x reduction, depending on the I/O endpoint
– Perform round-trip operations on GPU in microseconds, not milliseconds
enables new CUDA applications
• Increased PCIe efficiency + bandwidth
– bypass system memory, MMU, root complex: limiting factors for GPU DMA transfers
• Decrease in CPU utilization
– CPU is no longer burning cycles shuffling data around for the GPU
– System memory is no longer being thrashed by DMA engines on endpoint + GPU
– Go from 100% core utilization per GPU to < 10% utilization per GPU
promotes multi-GPU
13. Before GPUDirect…
• Many multi-GPU systems had 2 GPU nodes
• Additional GPUs quickly choked system memory and CPU resources
• Dual-socket CPU design very common
• Dual root-complex prevents P2P across CPUs (QPI/HT untraversable)
System System
memory memory
I/O
CPU CPU
endpoint
GPU GPU
GFLOPS/watt
Xeon E5 K20X system
SGEMM 2.32 12.34 8.45
DGEMM 1.14 5.19 3.61
14. Rise of Multi-GPU
• Higher GPU-to-CPU ratios permitted by increased GPU autonomy from GPUDirect
• PCIe switches integral to design, for true CPU bypass and fully-connected P2P
System
CPU
memory
I/O PCIe
endpoint switch
GPU GPU GPU GPU
GFLOPS/watt
GPU:CPU ratio 1 to 1 4 to 1
SGEMM 8.45 10.97
DGEMM 3.61 4.64
15. Nested PCIe switches
• Nested hierarchies avert 96-lane limit on current PCIe switches
System
CPU
memory
PCIe
switch
I/O PCIe PCIe I/O
endpoint switch switch endpoint
GPU GPU GPU GPU GPU GPU GPU GPU
GFLOPS/watt
GPU:CPU ratio 1 to 1 4 to 1 8 to 1
SGEMM 8.45 10.97 11.61
DGEMM 3.61 4.64 4.91
16. PCIe over Fiber
• SWaP is our new limiting factor
• PCIe over Fiber-Optic can interconnect expansion blades and assure GPU:CPU growth
• Supports PCIe gen3 over at least 100 meters
system
CPU memory
PCIe
switch
PCIe over Fiber
1 2 3 4
I/O PCIe I/O PCIe I/O PCIe I/O PCIe
endpoint switch endpoint switch endpoint switch endpoint switch
GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU
GFLOPS/watt
GPU:CPU ratio 1 to 1 4 to 1 8 to 1 16 to 1
SGEMM 8.45 10.97 11.61 11.96
DGEMM 3.61 4.64 4.91 5.04
17. Scalability – 10 petaflops
Processor Power Consumption for 10 petaflops
GPU:CPU ratio 1 to 1 4 to 1 8 to 1 16 to 1
SGEMM 1184 kW 911 kW 862 kW 832 kW
DGEMM 2770 kW 2158 kW 2032 kW 1982 kW
Yearly Energy Bill
GPU:CPU ratio 1 to 1 4 to 1 8 to 1 16 to 1
SGEMM $1,050,326 $808,148 $764,680 $738,067
DGEMM $2,457,266 $1,914,361 $1,802,586 $1,758,232
Efficiency Savings
GPU:CPU ratio 1 to 1 4 to 1 8 to 1 16 to 1
SGEMM -- 23.05% 27.19% 29.73%
DGEMM -- 22.09% 26.64% 28.45%
18. Road to Exascale
GPUDirect RDMA Project Denver
Integration with Hybrid CPU/GPU
peripherals & interconnects for
for system scalability heterogeneous compute
Project Osprey Software
Parallel microarchitecture & CUDA, OpenCL, OpenACC
fab process optimizations Drivers & Mgmt Layer
for power efficiency Hybrid O/S
19. Project Denver
• 64-bit ARMv8 architecture
• ISA by ARM, chip by NVIDIA
• ARM’s RISC-based approach aligns with NVIDIA’s perf/Watt initiative
• Unlike licensed Cortex-A53 and –A57 cores, NVIDIA’s cores are highly customized
• Design flexibility required for tight CPU/GPU integration
20. ceepie-geepie
Memory controller Memory controller Memory controller
NPN240
h.264
ARM
DVI
SM SM SM
ARM
crypto
DVI
L2 L2 cache
SATA
DVI
ARM
SM SM SM
USB
DVI
ARM
Grid management + scheduler
PCIe root complex / endpoint
21. Mem ctrl Mem ctrl Mem ctrl
ceepie-geepie
crypto h.264
ARM NPN
DVI
SM SM SM
ARM 240
DVI
L2 L2 cache
SATA
DVI
ARM
SM SM SM
USB
DVI
ARM
Grid management + scheduler
PCIe root complex
I/O PCIe
endpoint switch
PCIe endpoint
Grid management + scheduler
crypto h.264
ARM NPN
DVI
SM SM SM
ARM 240
DVI
L2 L2 cache
SATA
ARM DVI
SM SM SM
USB
DVI
ARM
Mem ctrl Mem ctrl Mem ctrl
22. Impacts of Project Denver
• Heterogeneous compute
– share mixed work efficiently between CPU/GPU
– unified memory subsystem
• Decreased latency
– no PCIe required for synchronization + command queuing
• Power efficiency
– ARMv8 ISA
– on-chip buses & interconnects
– “4+1” power scaling
• Natively boot & run operating systems
• Direct connectivity with peripherals
• Maximize multi-GPU
24. Summary
• System-wide integration of GPUDirect RDMA
• Increase GPU:CPU ratio for better GFLOPS/watt
• Utilize PCIe switches instead of dual-socket CPU/IOH
• Exascale compute – what’s good for HPC is good for embedded/mobile
• Denver & Osprey – what’s good for embedded/mobile is good for HPC
questions?
booth 201