VMware ESXi - Intel and Qlogic NIC throughput difference v0.6David Pasek
We are observing different network throughputs on Intel X710 NICs and QLogic FastLinQ QL41xxx NIC. ESXi hardware supports NIC hardware offloading and queueing on 10Gb, 25Gb, 40Gb and 100Gb NIC adapters. Multiple hardware queues per NIC interface (vmnic) and multiple software threads on ESXi VMkernel is depicted and documented in this paper which may or may not be the root cause of the observed problem. The key objective of this document is to clearly document and collect NIC information on two specific Network Adapters and do a comparison to find the difference or at least root cause hypothesis for further troubleshooting.
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
In this deck from the ECSS Symposium, Abe Stern from NVIDIA presents: CUDA-Python and RAPIDS for blazing fast scientific computing.
"We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming."
Watch the video: https://wp.me/p3RLHQ-lvu
Learn more: https://developer.nvidia.com/rapids
and
https://www.xsede.org/for-users/ecss/ecss-symposium
Sign up for our insideHPC Newsletter: http://insidehp.com/newsletter
Luigi Brochard from Lenovo gave this talk at the Switzerland HPC Conference. "High performance computing is converging more and more with the big data topic and related infrastructure requirements in the field. Lenovo is investing in developing systems designed to resolve todays and future problems in a more efficient way and respond to the demands of Industrial and research application landscape."
Watch the video: http://wp.me/p3RLHQ-gDC
Learn more: http://www3.lenovo.com/us/en/data-center/solutions/hpc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6David Pasek
We are observing different network throughputs on Intel X710 NICs and QLogic FastLinQ QL41xxx NIC. ESXi hardware supports NIC hardware offloading and queueing on 10Gb, 25Gb, 40Gb and 100Gb NIC adapters. Multiple hardware queues per NIC interface (vmnic) and multiple software threads on ESXi VMkernel is depicted and documented in this paper which may or may not be the root cause of the observed problem. The key objective of this document is to clearly document and collect NIC information on two specific Network Adapters and do a comparison to find the difference or at least root cause hypothesis for further troubleshooting.
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
In this deck from the ECSS Symposium, Abe Stern from NVIDIA presents: CUDA-Python and RAPIDS for blazing fast scientific computing.
"We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming."
Watch the video: https://wp.me/p3RLHQ-lvu
Learn more: https://developer.nvidia.com/rapids
and
https://www.xsede.org/for-users/ecss/ecss-symposium
Sign up for our insideHPC Newsletter: http://insidehp.com/newsletter
Luigi Brochard from Lenovo gave this talk at the Switzerland HPC Conference. "High performance computing is converging more and more with the big data topic and related infrastructure requirements in the field. Lenovo is investing in developing systems designed to resolve todays and future problems in a more efficient way and respond to the demands of Industrial and research application landscape."
Watch the video: http://wp.me/p3RLHQ-gDC
Learn more: http://www3.lenovo.com/us/en/data-center/solutions/hpc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from Switzerland HPC Conference, Gunter Roeth from NVIDIA presents: Deep Learning on the SaturnV Cluster.
"Machine Learning is among the most important developments in the history of computing. Deep learning is one of the fastest growing areas of machine learning and a hot topic in both academia and industry. It has dramatically improved the state-of-the-art in areas such as speech recognition, computer vision, predicting the activity of drug molecules, and many other machine learning tasks. The basic idea of deep learning is to automatically learn to represent data in multiple layers of increasing abstraction, thus helping to discover intricate structure in large datasets. NVIDIA has invested in SaturnV, a large GPU-accelerated cluster, (#28 on the November 2016 Top500 list) to support internal machine learning projects. After an introduction to deep learning on GPUs, we will address a selection of open questions programmers and users may face when using deep learning for their work on these clusters."
Watch the video: http://wp.me/p3RLHQ-gDv
Learn more: http://www.nvidia.com/object/dgx-saturnv.html
and
http://hpcadvisorycouncil.com/events/2017/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
SPACK: A Package Manager for Supercomputers, Linux, and MacOSinside-BigData.com
“HPC software is becoming increasingly complex. The space of possible build configurations is combinatorial, and existing package management tools do not handle these complexities well. Because of this, most HPC software is built by hand. This talk introduces “Spack“, an open-source tool for scientific package management which helps developers and cluster administrators avoid having to waste countless hours porting and rebuilding software. Spack uses concise package recipes written in Python to automate builds with arbitrary combinations of compilers, MPI versions, and dependency libraries. With Spack, users can rapidly install software without knowing how to build it; developers can efficiently manage automatic builds of tens or hundreds of dependency libraries; and HPC centers staff can deploy many versions of software for thousands of users.”
Watch the video: http://insidehpc.com/2017/04/spack-package-manager-supercomputers-linux-macos/
Learn more: https://spack.io/
and
http://www.hpcadvisorycouncil.com/events/2017/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
02 ai inference acceleration with components all in open hardware: opencapi a...Yutaka Kawai
This was presented by Peng Fei GOU (IBM China) at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/68/NVDLA%20on%20OpenCAPI.pdf
This real customer case POC demonstrate how Exadata X5-2 with OVM can be the best consolidation solution and how it can replace existing AIX P7 infrastructure.
Traditionally database systems were optimized either for OLAP either for OLTP workloads. Such mainstream DBMSes like Postgres,MySQL,... are mostly used for OLTP, while Greenplum, Vertica, Clickhouse, SparkSQL,... are oriented on analytic queries. But right now many companies do not want to have two different data stores for OLAP/OLTP and need to perform analytic queries on most recent data. I want to discuss which features should be added to Postgres to efficiently handle HTAP workload.
In this deck from the Performance Optimisation and Productivity group, Lubomir Riha from IT4Innovations presents: Energy Efficient Computing using Dynamic Tuning.
"We now live in a world of power-constrained architectures and systems and power consumption represents a significant cost factor in the overall HPC system economy. For these reasons, in recent years researchers, supercomputing centers and major vendors have developed new tools and methodologies to measure and optimize the energy consumption of large-scale high performance system installations. Due to the link between energy consumption, power consumption and execution time of an application executed by the final user, it is important for these tools and the methodology used to consider all these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.
This webinar focused on tools designed to improve the energy-efficiency of HPC applications using a methodology of dynamic tuning of HPC applications, developed under the H2020 READEX project. The READEX methodology has been designed for exploiting the dynamic behaviour of software. At design time, different runtime situations (RTS) are detected and optimized system configurations are determined. RTSs with the same configuration are grouped into scenarios, forming the tuning model. At runtime, the tuning model is used to switch system configurations dynamically.
The MERIC tool, that implements the READEX methodology, is presented. It supports manual or binary instrumentation of the analysed applications to simplify the analysis. This instrumentation is used to identify and annotate the significant regions in the HPC application. Automatic binary instrumentation annotates regions with significant runtime. Manual instrumentation, which can be combined with automatic, allows code developer to annotate regions of particular interest."
Watch the video: https://wp.me/p3RLHQ-lJP
Learn more: https://pop-coe.eu/blog/14th-pop-webinar-energy-efficient-computing-using-dynamic-tuning
and
https://code.it4i.cz/vys0053/meric
Sign up for our insideHPC Newsletter: http://insidehpc.com/newslett
In this deck from the UK HPC Conference, Gunter Roeth from NVIDIA presents: Hardware & Software Platforms for HPC, AI and ML.
"Data is driving the transformation of industries around the world and a new generation of AI applications are effectively becoming programs that write software, powered by data, vs by computer programmers. Today, NVIDIA’s tensor core GPU sits at the core of most AI, ML and HPC applications, and NVIDIA software surrounds every level of such a modern application, from CUDA and libraries like cuDNN and NCCL embedded in every deep learning framework and optimized and delivered via the NVIDIA GPU Cloud to reference architectures designed to streamline the deployment of large scale infrastructures."
Watch the video: https://wp.me/p3RLHQ-l2Y
Learn more: http://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Learning from ZFS to Scale Storage on and under Containersinside-BigData.com
Evan Powell presented this deck at the MSST 2107 Mass Storage Conference.
"What is so new about the container environment that a new class of storage software is emerging to address these use cases? And can container orchestration systems themselves be part of the solution? As is often the case in storage, metadata matters here. We are implementing in the open source OpenEBS.io some approaches that are in some regards inspired by ZFS to enable much more efficient scale out block storage for containers that itself is containerized. The goal is to enable storage to be treated in many regards as just another application while, of course, also providing storage services to stateful applications in the environment."
Watch the video: http://wp.me/p3RLHQ-gPs
Learn more: blog.openebs.io
and
http://storageconference.us
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
What do data center operators need to know when deploying Hadoop in the Data Center? Multi-tenancy, network topology, workload types, and myriad other factors affect the way applications run and perform in the data center. Understanding performance characteristics of the distributed system is key to not only optimize for Hadoop, but allows Hadoop to seamlessly operate side-by-side existing applications.
Host Data Plane Acceleration: SmartNIC Deployment ModelsNetronome
SIGCOMM 2018: This tutorial introduces multiple models for host data plane acceleration with SmartNICs, provides a detailed understanding of SmartNIC deployment models at hyperscale cloud vendors and telecom service providers, and introduces various open source resources available for research and product development in this space.
Presenter Bio
Simon focuses on upstream open source activities at Netronome. He is working on allowing offload of OVS offload on the Agilio platform as well as the broader question of how best to enable programming hardware offload in the Linux kernel and other upstream open source projects.
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...inside-BigData.com
In this deck from the Switzerland HPC Conference, Maxime Martinasso from CSCS presents: Best Practices: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers.
"MeteoSwiss, the Swiss national weather forecast institute, has selected densely populated accelerator servers as their primary system to compute weather forecast simulation. Servers with multiple accelerator devices that are primarily connected by a PCI-Express (PCIe) network achieve a significantly higher energy efficiency. Memory transfers between accelerators in such a system are subjected to PCIe arbitration policies. In this paper, we study the impact of PCIe topology and develop a congestion-aware performance model for PCIe communication. We present an algorithm for computing congestion factors of every communication in a congestion graph that characterizes the dynamic usage of network resources by an application. Our model applies to any PCIe tree topology. Our validation results on two different topologies of 8 GPU devices demonstrate that our model achieves an accuracy of over 97% within the PCIe network. We demonstrate the model on a weather forecast application to identify the best algorithms for its communication patterns among GPUs."
Watch the video: http://wp.me/p3RLHQ-gDi
Learn more: http://www.hpcadvisorycouncil.com/events/2017/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
dCUDA: Distributed GPU Computing with Hardware Overlapinside-BigData.com
Torsten Hoefler from ETH Zurich presented this deck at the Switzerland HPC Conference.
"Over the last decade, CUDA and the underlying GPU hardware architecture have continuously gained popularity in various high-performance computing application domains such as climate modeling, computational chemistry, or machine learning. Despite this popularity, we lack a single coherent programming model for GPU clusters. We therefore introduce the dCUDA programming model, which implements device-side remote memory access with target notification. To hide instruction pipeline latencies, CUDA programs over-decompose the problem and over-subscribe the device by running many more threads than there are hardware execution units. Whenever a thread stalls, the hardware scheduler immediately proceeds with the execution of another thread ready for execution. This latency-hiding technique is key to make best use of the available hardware resources. With dCUDA, we apply latency hiding at cluster scale to automatically overlap computation and communication. Our benchmarks demonstrate perfect overlap for memory bandwidth-bound tasks and good overlap for compute-bound tasks."
Watch the video presentation: http://wp.me/p3RLHQ-gCB
In this deck from the Switzerland HPC Conference, Francis Lam from Huawei presents: A Fresh Look at HPC from Huawei Enterprise.
"High performance computing is rapidly finding new uses in many applications and businesses, enabling the creation of disruptive products and services. Huawei, a global leader in information and communication technologies, brings a broad spectrum of innovative solutions to HPC. This talk examines Huawei's world class HPC solutions and explores creative new ways to solve HPC problems."
Watch the video: http://wp.me/p3RLHQ-gCV
Learn more: http://e.huawei.com/en/solutions/business-needs/data-center/high-performance-computing
and
http://www.hpcadvisorycouncil.com/events/2017/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Deep Dive - Usage of on premises data gateway for hybrid integration scenariosSajith C P Nair
Presentation delivered by Sajith C P, Integration Architect at the 2017 Global Integration Bootcamp, Bangalore.
https://www.biztalk360.com/gib2017-india/#speakers[inline]/7/
In this session the speaker talked about ‘on-premises data gateway’ as a secure centralized gateway that can be used for accessing on premise data from various Azure Services. He took a deep dive on how it works, how to install and various methods to troubleshoot connectivity. He concluded the session with few demos of its use in Azure Logic App, Microsoft Flow, Power Apps and Power BI.
In this deck from the Argonne Training Program on Extreme-Scale Computing 2019, Howard Pritchard from LANL and Simon Hammond from Sandia present: NNSA Explorations: ARM for Supercomputing.
"The Arm-based Astra system at Sandia will be used by the National Nuclear Security Administration (NNSA) to run advanced modeling and simulation workloads for addressing areas such as national security, energy and science.
"By introducing Arm processors with the HPE Apollo 70, a purpose-built HPC architecture, we are bringing powerful elements, like optimal memory performance and greater density, to supercomputers that existing technologies in the market cannot match,” said Mike Vildibill, vice president, Advanced Technologies Group, HPE. “Sandia National Laboratories has been an active partner in leveraging our Arm-based platform since its early design, and featuring it in the deployment of the world’s largest Arm-based supercomputer, is a strategic investment for the DOE and the industry as a whole as we race toward achieving exascale computing.”
Watch the video: https://wp.me/p3RLHQ-l29
Learn more: https://insidehpc.com/2018/06/arm-goes-big-hpe-builds-petaflop-supercomputer-sandia/
and
https://extremecomputingtraining.anl.gov/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Iperconvergenza come migliora gli economics del tuo ITNetApp
In this NetApp Webinar we present how NetApp HCI helps improve the economics of IT: accelerating and ensuring performance for each application, simplifying your Data Center and make your architecture more scalable by reducing waste, implementing and expanding your HCI infrastructure quickly and inexpensively, making your management even simpler and more intuitive, saving time and using the skills you already have in the company.
Plan with confidence: Route to a successful Do178c multicore certificationMassimo Talia
The modern approach Multi-Processor in the civil and military Embedded equipments certification. The Processor assessment is conduct by Rockwell Collins Inc., the operating system selection is conducted by Windriver Inc.
In this deck from Switzerland HPC Conference, Gunter Roeth from NVIDIA presents: Deep Learning on the SaturnV Cluster.
"Machine Learning is among the most important developments in the history of computing. Deep learning is one of the fastest growing areas of machine learning and a hot topic in both academia and industry. It has dramatically improved the state-of-the-art in areas such as speech recognition, computer vision, predicting the activity of drug molecules, and many other machine learning tasks. The basic idea of deep learning is to automatically learn to represent data in multiple layers of increasing abstraction, thus helping to discover intricate structure in large datasets. NVIDIA has invested in SaturnV, a large GPU-accelerated cluster, (#28 on the November 2016 Top500 list) to support internal machine learning projects. After an introduction to deep learning on GPUs, we will address a selection of open questions programmers and users may face when using deep learning for their work on these clusters."
Watch the video: http://wp.me/p3RLHQ-gDv
Learn more: http://www.nvidia.com/object/dgx-saturnv.html
and
http://hpcadvisorycouncil.com/events/2017/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
SPACK: A Package Manager for Supercomputers, Linux, and MacOSinside-BigData.com
“HPC software is becoming increasingly complex. The space of possible build configurations is combinatorial, and existing package management tools do not handle these complexities well. Because of this, most HPC software is built by hand. This talk introduces “Spack“, an open-source tool for scientific package management which helps developers and cluster administrators avoid having to waste countless hours porting and rebuilding software. Spack uses concise package recipes written in Python to automate builds with arbitrary combinations of compilers, MPI versions, and dependency libraries. With Spack, users can rapidly install software without knowing how to build it; developers can efficiently manage automatic builds of tens or hundreds of dependency libraries; and HPC centers staff can deploy many versions of software for thousands of users.”
Watch the video: http://insidehpc.com/2017/04/spack-package-manager-supercomputers-linux-macos/
Learn more: https://spack.io/
and
http://www.hpcadvisorycouncil.com/events/2017/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
02 ai inference acceleration with components all in open hardware: opencapi a...Yutaka Kawai
This was presented by Peng Fei GOU (IBM China) at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/68/NVDLA%20on%20OpenCAPI.pdf
This real customer case POC demonstrate how Exadata X5-2 with OVM can be the best consolidation solution and how it can replace existing AIX P7 infrastructure.
Traditionally database systems were optimized either for OLAP either for OLTP workloads. Such mainstream DBMSes like Postgres,MySQL,... are mostly used for OLTP, while Greenplum, Vertica, Clickhouse, SparkSQL,... are oriented on analytic queries. But right now many companies do not want to have two different data stores for OLAP/OLTP and need to perform analytic queries on most recent data. I want to discuss which features should be added to Postgres to efficiently handle HTAP workload.
In this deck from the Performance Optimisation and Productivity group, Lubomir Riha from IT4Innovations presents: Energy Efficient Computing using Dynamic Tuning.
"We now live in a world of power-constrained architectures and systems and power consumption represents a significant cost factor in the overall HPC system economy. For these reasons, in recent years researchers, supercomputing centers and major vendors have developed new tools and methodologies to measure and optimize the energy consumption of large-scale high performance system installations. Due to the link between energy consumption, power consumption and execution time of an application executed by the final user, it is important for these tools and the methodology used to consider all these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.
This webinar focused on tools designed to improve the energy-efficiency of HPC applications using a methodology of dynamic tuning of HPC applications, developed under the H2020 READEX project. The READEX methodology has been designed for exploiting the dynamic behaviour of software. At design time, different runtime situations (RTS) are detected and optimized system configurations are determined. RTSs with the same configuration are grouped into scenarios, forming the tuning model. At runtime, the tuning model is used to switch system configurations dynamically.
The MERIC tool, that implements the READEX methodology, is presented. It supports manual or binary instrumentation of the analysed applications to simplify the analysis. This instrumentation is used to identify and annotate the significant regions in the HPC application. Automatic binary instrumentation annotates regions with significant runtime. Manual instrumentation, which can be combined with automatic, allows code developer to annotate regions of particular interest."
Watch the video: https://wp.me/p3RLHQ-lJP
Learn more: https://pop-coe.eu/blog/14th-pop-webinar-energy-efficient-computing-using-dynamic-tuning
and
https://code.it4i.cz/vys0053/meric
Sign up for our insideHPC Newsletter: http://insidehpc.com/newslett
In this deck from the UK HPC Conference, Gunter Roeth from NVIDIA presents: Hardware & Software Platforms for HPC, AI and ML.
"Data is driving the transformation of industries around the world and a new generation of AI applications are effectively becoming programs that write software, powered by data, vs by computer programmers. Today, NVIDIA’s tensor core GPU sits at the core of most AI, ML and HPC applications, and NVIDIA software surrounds every level of such a modern application, from CUDA and libraries like cuDNN and NCCL embedded in every deep learning framework and optimized and delivered via the NVIDIA GPU Cloud to reference architectures designed to streamline the deployment of large scale infrastructures."
Watch the video: https://wp.me/p3RLHQ-l2Y
Learn more: http://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Learning from ZFS to Scale Storage on and under Containersinside-BigData.com
Evan Powell presented this deck at the MSST 2107 Mass Storage Conference.
"What is so new about the container environment that a new class of storage software is emerging to address these use cases? And can container orchestration systems themselves be part of the solution? As is often the case in storage, metadata matters here. We are implementing in the open source OpenEBS.io some approaches that are in some regards inspired by ZFS to enable much more efficient scale out block storage for containers that itself is containerized. The goal is to enable storage to be treated in many regards as just another application while, of course, also providing storage services to stateful applications in the environment."
Watch the video: http://wp.me/p3RLHQ-gPs
Learn more: blog.openebs.io
and
http://storageconference.us
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
What do data center operators need to know when deploying Hadoop in the Data Center? Multi-tenancy, network topology, workload types, and myriad other factors affect the way applications run and perform in the data center. Understanding performance characteristics of the distributed system is key to not only optimize for Hadoop, but allows Hadoop to seamlessly operate side-by-side existing applications.
Host Data Plane Acceleration: SmartNIC Deployment ModelsNetronome
SIGCOMM 2018: This tutorial introduces multiple models for host data plane acceleration with SmartNICs, provides a detailed understanding of SmartNIC deployment models at hyperscale cloud vendors and telecom service providers, and introduces various open source resources available for research and product development in this space.
Presenter Bio
Simon focuses on upstream open source activities at Netronome. He is working on allowing offload of OVS offload on the Agilio platform as well as the broader question of how best to enable programming hardware offload in the Linux kernel and other upstream open source projects.
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...inside-BigData.com
In this deck from the Switzerland HPC Conference, Maxime Martinasso from CSCS presents: Best Practices: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers.
"MeteoSwiss, the Swiss national weather forecast institute, has selected densely populated accelerator servers as their primary system to compute weather forecast simulation. Servers with multiple accelerator devices that are primarily connected by a PCI-Express (PCIe) network achieve a significantly higher energy efficiency. Memory transfers between accelerators in such a system are subjected to PCIe arbitration policies. In this paper, we study the impact of PCIe topology and develop a congestion-aware performance model for PCIe communication. We present an algorithm for computing congestion factors of every communication in a congestion graph that characterizes the dynamic usage of network resources by an application. Our model applies to any PCIe tree topology. Our validation results on two different topologies of 8 GPU devices demonstrate that our model achieves an accuracy of over 97% within the PCIe network. We demonstrate the model on a weather forecast application to identify the best algorithms for its communication patterns among GPUs."
Watch the video: http://wp.me/p3RLHQ-gDi
Learn more: http://www.hpcadvisorycouncil.com/events/2017/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
dCUDA: Distributed GPU Computing with Hardware Overlapinside-BigData.com
Torsten Hoefler from ETH Zurich presented this deck at the Switzerland HPC Conference.
"Over the last decade, CUDA and the underlying GPU hardware architecture have continuously gained popularity in various high-performance computing application domains such as climate modeling, computational chemistry, or machine learning. Despite this popularity, we lack a single coherent programming model for GPU clusters. We therefore introduce the dCUDA programming model, which implements device-side remote memory access with target notification. To hide instruction pipeline latencies, CUDA programs over-decompose the problem and over-subscribe the device by running many more threads than there are hardware execution units. Whenever a thread stalls, the hardware scheduler immediately proceeds with the execution of another thread ready for execution. This latency-hiding technique is key to make best use of the available hardware resources. With dCUDA, we apply latency hiding at cluster scale to automatically overlap computation and communication. Our benchmarks demonstrate perfect overlap for memory bandwidth-bound tasks and good overlap for compute-bound tasks."
Watch the video presentation: http://wp.me/p3RLHQ-gCB
In this deck from the Switzerland HPC Conference, Francis Lam from Huawei presents: A Fresh Look at HPC from Huawei Enterprise.
"High performance computing is rapidly finding new uses in many applications and businesses, enabling the creation of disruptive products and services. Huawei, a global leader in information and communication technologies, brings a broad spectrum of innovative solutions to HPC. This talk examines Huawei's world class HPC solutions and explores creative new ways to solve HPC problems."
Watch the video: http://wp.me/p3RLHQ-gCV
Learn more: http://e.huawei.com/en/solutions/business-needs/data-center/high-performance-computing
and
http://www.hpcadvisorycouncil.com/events/2017/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Deep Dive - Usage of on premises data gateway for hybrid integration scenariosSajith C P Nair
Presentation delivered by Sajith C P, Integration Architect at the 2017 Global Integration Bootcamp, Bangalore.
https://www.biztalk360.com/gib2017-india/#speakers[inline]/7/
In this session the speaker talked about ‘on-premises data gateway’ as a secure centralized gateway that can be used for accessing on premise data from various Azure Services. He took a deep dive on how it works, how to install and various methods to troubleshoot connectivity. He concluded the session with few demos of its use in Azure Logic App, Microsoft Flow, Power Apps and Power BI.
In this deck from the Argonne Training Program on Extreme-Scale Computing 2019, Howard Pritchard from LANL and Simon Hammond from Sandia present: NNSA Explorations: ARM for Supercomputing.
"The Arm-based Astra system at Sandia will be used by the National Nuclear Security Administration (NNSA) to run advanced modeling and simulation workloads for addressing areas such as national security, energy and science.
"By introducing Arm processors with the HPE Apollo 70, a purpose-built HPC architecture, we are bringing powerful elements, like optimal memory performance and greater density, to supercomputers that existing technologies in the market cannot match,” said Mike Vildibill, vice president, Advanced Technologies Group, HPE. “Sandia National Laboratories has been an active partner in leveraging our Arm-based platform since its early design, and featuring it in the deployment of the world’s largest Arm-based supercomputer, is a strategic investment for the DOE and the industry as a whole as we race toward achieving exascale computing.”
Watch the video: https://wp.me/p3RLHQ-l29
Learn more: https://insidehpc.com/2018/06/arm-goes-big-hpe-builds-petaflop-supercomputer-sandia/
and
https://extremecomputingtraining.anl.gov/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Iperconvergenza come migliora gli economics del tuo ITNetApp
In this NetApp Webinar we present how NetApp HCI helps improve the economics of IT: accelerating and ensuring performance for each application, simplifying your Data Center and make your architecture more scalable by reducing waste, implementing and expanding your HCI infrastructure quickly and inexpensively, making your management even simpler and more intuitive, saving time and using the skills you already have in the company.
Plan with confidence: Route to a successful Do178c multicore certificationMassimo Talia
The modern approach Multi-Processor in the civil and military Embedded equipments certification. The Processor assessment is conduct by Rockwell Collins Inc., the operating system selection is conducted by Windriver Inc.
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...Embarcados
Com o lançamento do SSP 1.5.0 para a Plataforma Renesas Synergy, ficou muito mais fácil desenvolver sistemas embarcados para IoT com os requisitos de segurança demandados pelo mercado. Além disso mostraremos as inovações e facilidades que o SSP 1.5.0 adiciona à plataforma. Também destacaremos o lançamento de uma placa de desenvolvimento da Renesas, a AE-Cloud2, para utilização de comunicação via rede móvel LTE, que tem suporte nessa nova versão do SSP da Plataforma Renesas Synergy.
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreinside-BigData.com
In this deck from the 2018 Swiss HPC Conference, Erez Cohen from Mellanox presents: Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more.
"While InfiniBand, RDMA and GPU-Direct are an HPC mainstay, these advanced networking technologies are increasingly becoming a core differentiator to the data center. In fact, within just a few short years so far, where only a handful of bleeding edge industrial leaders emulated classic HPC disciplines, today almost every commercial market is usurping HPC technologies and disciplines in mass. Additionally, with the rampant adoption of demanding workloads like Machine Learning, cloud to on premise providers are now deploying the same advanced networking technologies and delivering the same core capabilities and performance as traditional HPC environments. These same data centers embracing AI are also driving the increased adoption of complex technologies including containers and virtualization that must also be optimized for performance, optimal profit and operational efficiency. In this talk we explore how high performance networking has emerged from HPC to become the critical path for the cloud, machine learning and much more."
Watch the video: https://wp.me/p3RLHQ-ixP
Learn more: http://mellanox.com
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...Linaro
Session ID: HKG18-301
Session Name: HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integrated Processors
Speaker: Glenn Steiner
Track: LITE
★ Session Summary ★
Key Takeaways:
With the drive to increase integration, reduce system costs, accelerate performance, and enhance reliability, software developers are discovering the processor they would like to target is simply not fast enough. This session will help you the system architect or software developer understand how you can architect and develop software on an FPGA integrated processor, and accelerate software code via FPGA accelerators.
Abstract:
As a software developer, in order to meet system level performance requirements, you may have realized that your next software project will be targeting a processor inside of an FPGA. How will this impact your development process and what benefits might you gain with this tight integration of processor and FPGA? Starting from the basics of what FPGAs are (in terms of software programming), this session will provide a simple to understand primer of what modern FPGAs with embedded processors can do. We will wrap up with examples of how high level synthesis tools can move software to programmable logic hardware enabling dramatic software acceleration.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-301/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-301.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-301.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: LITE
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - E...Cloud Native Day Tel Aviv
Machine Learning is no doubt the hottest trend in IT nowadays. Deep Neural Network (DNN), a subfield of Machine Learning with mode of operation loosely inspired by the brain, allows us to solve complex problems such as image recognition that has been very difficult to solve using standard programming paradigms. DNN concepts are not new. However, and until recently, applying them in practice could not be realized due to their high computational demands. With the recent development in parallel computing, especially around GPU acceleration and high speed and efficient networking, DNN has become a reality in modern data centers. In this talk we will describe the system requirements to effectively run a machine learning cluster with popular frameworks such as TensorFlow. We will discuss how such a system can be deployed in an OpenStack-based cloud without compromises, enjoying high-performance DNN programming paradigm as well as the benefits of cloud and software-defined data centers.
Aleksejs Nemirovskis - Manage your data using oracle BDAAndrejs Vorobjovs
Manage Your Data, Using Oracle Big Data Appliance - Tips & Tricksngest, process and manage the data, using Oracle Big Data Appliance (end-to-end BigData solution from Oracle):
- Oracle BDA architecture and componets overview - Oracle platform, Cloudera CDH, Clodera Manager and specific Oracle components;
- Advantages and additional value of an Oracle BDA;
- Challenges, faced inside whole stack (BDA, Cloudera);
- Challenges, which came from original Hadoop EcoSystem;
- Customer case (anonymized): how to utilize a power of an Oracle BDA, including external Informatica Big Data Management tool.
Netronome invented the flexible network flow processor and hardware-accelerated server-based networking. Learn more from Netronome's Corporate Brochure.
Missed us at this April FIS event? Learn how IBM Power Systems can enable the most data intensive and mission-critical workloads in private and hybrid cloud environments. With IBM POWER9 based Power Systems, you can dynamically scale compute and memory on demand and build a cloud designed for the most data intensive workloads. These systems are ideal for FIS workloads and more.
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session we provide an overview of the Amazon EC2 instance platform key platform features and the concept of instance generations. We dive into the current generation design choices of the different instance families including the General Purpose Compute Optimized Storage Optimized Memory Optimized and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2020/12/making-edge-ai-inference-programming-easier-and-flexible-a-presentation-from-texas-instruments/
For more information about edge AI and computer vision, please visit:
https://www.edge-ai-vision.com
Manisha Agrawal, Product Marketing Engineer at Texas Instruments, presents the “Making Edge AI Inference Programming Easier and Flexible” tutorial at the September 2020 Embedded Vision Summit.
Deploying an AI model at the edge doesn’t have to be challenging—but it often is. Embedded processing vendors have unique sets of software tools for deploying models. It takes time and investment to learn to use proprietary tools and to optimize the edge implementation to achieve your desired performance. While embedded vendors are providing proprietary tools for model deployment, the open source community is also advancing to standardize the model deployment process and make it hardware agnostic.
Texas Instruments has adopted open source software frameworks to make model deployment easier and more flexible. In this talk, you will learn about the struggles developers face when deploying models for inference on embedded processors and how TI addresses these critical software development challenges. You will also discover how TI enables faster time-to-market using a flexible open source development approach without the need to compromise performance, accuracy or power requirements.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
33. Jacques Kostic, Principal Consultant
Tel. +41-79-909 7263 Jacques.Kostic@trivadis.com
Emilian Fusaglia, Principal Consultant
Tel. +41-79-909 7213 Emiliano.Fusaglia@trivadis.com
35 TechEvent September 201814.09.2018
34. Session Feedback – now
TechEvent September 201836 14.09.2018
Please use the Trivadis Events mobile app to give feedback on each session
Use "My schedule" if you have registered for a session
Otherwise use "Agenda" and the search function
If the mobile app does not work (or if you have a Windows smartphone), use your
smartphone browser
– URL: http://trivadis.quickmobileplatform.eu/
– User name: <your_loginname> (such as "svv")
– Password: sent by e-mail...