In this deck from the Hot Chips conference, Chris Nicol from Wave Computing presents: A Dataflow Processing Chip for Training Deep Neural Networks.
Watch the video: https://wp.me/p3RLHQ-k6W
Learn more: https://wavecomp.ai/
and
http://www.hotchips.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
DPDK Summit 2015 in San Francisco.
Presentation by Charles Shiflett, Aspera.
For additional details and the video recording please visit www.dpdksummit.com.
DPDK Summit 2015 - Sprint - Arun RajagopalJim St. Leger
DPDK Summit 2015 in San Francisco.
Presentation by Arun Rajagopal, Sprint, and Sameh Gobriel, Intel.
For additional details and the video recording please visit www.dpdksummit.com.
DPDK Summit 2015 - NTT - Yoshihiro NakajimaJim St. Leger
DPDK Summit 2015 in San Francisco.
NTT presentation by Yoshihiro Nakajima.
For additional details and the video recording please visit www.dpdksummit.com.
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
DPDK Summit 2015 in San Francisco.
Presentation by Charles Shiflett, Aspera.
For additional details and the video recording please visit www.dpdksummit.com.
DPDK Summit 2015 - Sprint - Arun RajagopalJim St. Leger
DPDK Summit 2015 in San Francisco.
Presentation by Arun Rajagopal, Sprint, and Sameh Gobriel, Intel.
For additional details and the video recording please visit www.dpdksummit.com.
DPDK Summit 2015 - NTT - Yoshihiro NakajimaJim St. Leger
DPDK Summit 2015 in San Francisco.
NTT presentation by Yoshihiro Nakajima.
For additional details and the video recording please visit www.dpdksummit.com.
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
Accelerate Service Function Chaining Vertical Solution with DPDKOPNFV
Service Function Chaining (SFC) is one of top 5 NFV use case. Supporting SFC in provider and enterprise networks requires performance assurance. Specifically, the Classifier and the Service Function Forwarder which are typically implemented in software such as virtual switches need to match line rate requirement. DPDK (Data Plane Development Kit) is an open source project comprising a set of libraries and drivers for fast packet processing. In this presentation, we will discuss our experiences accelerating SFC with DPDK. In addition, Telco and Datacenter carriers demands dynamic SFC that requires new SFC wire protocols (e.g. VxLAN-GPE and NSH) support in both data and control planes. We intend to share our experiences and future works of a high performance, NSH-aware SFC vertical solution with open-source ingredients: Openstack, Opendaylight, OpenvSwitch with DPDK acceleration.
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...Jim St. Leger
Thomas Monjalon, 6WIND, presents on where/how to use DPDK, the DPDK ecosystem, and the DPDK.org community.
Thomas is the community maintainer of DPDK.org.
This was presented by Yong LU at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/16/OpenCAPI%20Acceleration%20Framework_YongLu_ver2.pdf
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/ceva/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit-siegel
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Yair Siegel, Director of Segment Marketing at CEVA, presents the "Fast Deployment of Low-power Deep Learning on CEVA Vision Processors" tutorial at the May 2016 Embedded Vision Summit.
Image recognition capabilities enabled by deep learning are benefitting more and more applications, including automotive safety, surveillance and drones. This is driving a shift towards running neural networks inside embedded devices. But, there are numerous challenges in squeezing deep learning into resource-limited devices. This presentation details a fast path for taking a neural network from research into an embedded implementation on a CEVA vision processor core, making use of CEVA’s neural network software framework. Siegel explains how the CEVA framework integrates with existing deep learning development environments like Caffe, and how it can be used to create low-power embedded systems with neural network capabilities.
In this deck from the Performance Optimisation and Productivity group, Lubomir Riha from IT4Innovations presents: Energy Efficient Computing using Dynamic Tuning.
"We now live in a world of power-constrained architectures and systems and power consumption represents a significant cost factor in the overall HPC system economy. For these reasons, in recent years researchers, supercomputing centers and major vendors have developed new tools and methodologies to measure and optimize the energy consumption of large-scale high performance system installations. Due to the link between energy consumption, power consumption and execution time of an application executed by the final user, it is important for these tools and the methodology used to consider all these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.
This webinar focused on tools designed to improve the energy-efficiency of HPC applications using a methodology of dynamic tuning of HPC applications, developed under the H2020 READEX project. The READEX methodology has been designed for exploiting the dynamic behaviour of software. At design time, different runtime situations (RTS) are detected and optimized system configurations are determined. RTSs with the same configuration are grouped into scenarios, forming the tuning model. At runtime, the tuning model is used to switch system configurations dynamically.
The MERIC tool, that implements the READEX methodology, is presented. It supports manual or binary instrumentation of the analysed applications to simplify the analysis. This instrumentation is used to identify and annotate the significant regions in the HPC application. Automatic binary instrumentation annotates regions with significant runtime. Manual instrumentation, which can be combined with automatic, allows code developer to annotate regions of particular interest."
Watch the video: https://wp.me/p3RLHQ-lJP
Learn more: https://pop-coe.eu/blog/14th-pop-webinar-energy-efficient-computing-using-dynamic-tuning
and
https://code.it4i.cz/vys0053/meric
Sign up for our insideHPC Newsletter: http://insidehpc.com/newslett
Learning from ZFS to Scale Storage on and under Containersinside-BigData.com
Evan Powell presented this deck at the MSST 2107 Mass Storage Conference.
"What is so new about the container environment that a new class of storage software is emerging to address these use cases? And can container orchestration systems themselves be part of the solution? As is often the case in storage, metadata matters here. We are implementing in the open source OpenEBS.io some approaches that are in some regards inspired by ZFS to enable much more efficient scale out block storage for containers that itself is containerized. The goal is to enable storage to be treated in many regards as just another application while, of course, also providing storage services to stateful applications in the environment."
Watch the video: http://wp.me/p3RLHQ-gPs
Learn more: blog.openebs.io
and
http://storageconference.us
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Workgroup
Achronix presented, "ODSA Proof of Concept SmartNIC Speeds & Feeds," at the ODSA Workshop. The charter of the ODSA (Open Domain Specification Architecture) Workgroup is to define an open specification that enables building of Domain Specific Accelerator silicon using best-of-breed components from the industry made available as chiplet dies that can be integrated together as Lego blocks on an organic substrate packaging layer. The resulting multi-chip module (MCM) silicon can be produced at significantly lower development and manufacturing costs, and will deliver much needed performance per watt and performance per dollar efficiencies in networking, security, machine learning and other applications. The ODSA Workgroup also intends to deliver implementations of the specification as board-level prototypes, RTL code and libraries.
TitanIC presented, "ODSA Use Case - SmartNIC," at the ODSA Workshop. The charter of the ODSA (Open Domain Specification Architecture) Workgroup is to define an open specification that enables building of Domain Specific Accelerator silicon using best-of-breed components from the industry made available as chiplet dies that can be integrated together as Lego blocks on an organic substrate packaging layer. The resulting multi-chip module (MCM) silicon can be produced at significantly lower development and manufacturing costs, and will deliver much needed performance per watt and performance per dollar efficiencies in networking, security, machine learning and other applications. The ODSA Workgroup also intends to deliver implementations of the specification as board-level prototypes, RTL code and libraries.
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM Research
IBM and the Netherlands Institute for Radio Astronomy ASTRON have unveiled the world’s first water-cooled 64-bit microserver. The prototype, which is roughly the size of a smartphone, is part of the proposed IT roadmap for the Square Kilometre Array (SKA), an international consortium to build the world’s largest and most sensitive radio telescope. Scientists estimate that the processing power required to operate the telescope will be equal to several millions of today’s fastest computers.
The microserver’s team has designed and demonstrated a prototype 64-bit microserver using a PowerPC based chip from Freescale Semiconductor running Linux Fedora and IBM DB2. At 133 × 55 mm2 the microserver contains all of the essential functions of today’s servers, which are 4 to 10 times larger in size.
Not only is the microserver compact, it is also very energy-efficient. One of its innovations is hotwater cooling, which in addition to keeping the chip operating temperature below 85 degrees C, will also transport electrical power by means of a copper plate. The concept is based on the same technology IBM developed for the SuperMUC supercomputer located outside of Munich, Germany. IBM scientists hope to keep each microserver operating between 35–40 watts including the system on a chip (SOC) — the current design is 60 watts.
The next step for scientists is to begin to take 128 of the microserver boards using the newest T4240 chips to create a 2U rack unit with 1536 cores and 3072 threads with up to 6 terabytes of DRAM. In addition, they will be adding an Ethernet switch and power module to the integrated water-cooling.
NXP presented, "ODSA Workshop: Development Effort Summary," at the ODSA Workshop. The charter of the ODSA (Open Domain Specification Architecture) Workgroup is to define an open specification that enables building of Domain Specific Accelerator silicon using best-of-breed components from the industry made available as chiplet dies that can be integrated together as Lego blocks on an organic substrate packaging layer. The resulting multi-chip module (MCM) silicon can be produced at significantly lower development and manufacturing costs, and will deliver much needed performance per watt and performance per dollar efficiencies in networking, security, machine learning and other applications. The ODSA Workgroup also intends to deliver implementations of the specification as board-level prototypes, RTL code and libraries.
DPDK Summit 2015 in San Francisco.
Intel's presentation by Keith Wiles.
For additional details and the video recording please visit www.dpdksummit.com.
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
https://sites.google.com/view/itri-icl-dla/
(Public Information Share) This is our lightweight DNN inference processor presentation, including a system solution (from Caffe prototxt to HW controls files), hardware features, and an example of object detection (Tiny YOLO) RTL simulation results. We modified open-source NVDLA, small configuration, and developed a RISC-V MCU in this accelerating system.
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Michelle Holley
Speaker: Daniel Towner, System Architect for Wireless Access, Intel Corporation
5G brings many new capabilities over 4G including higher bandwidths, lower latencies, and more efficient use of radio spectrum. However, these improvements require a large increase in computing power in the base station. Fortunately the Xeon Scalable Processor series (Skylake-SP) recently introduced by Intel has a new high-performance instruction set called Intel® Advanced Vector Extensions 512 (Intel® AVX-512) which is capable of delivering the compute needed to support the exciting new world of 5G.
In his talk Daniel will give an overview of the new capabilities of the Intel AVX-512 instruction set and show why they are so beneficial to supporting 5G efficiently. The most obvious difference is that Intel AVX-512 has double the compute performance of previous generations of instruction sets. Perhaps surprisingly though it is the addition of brand new instructions that can make the biggest improvements. The new instructions mean that software algorithms can become more efficient, thereby enabling even more effective use of the improvements in computing performance and leading to very high performance 5G NR software implementations.
Accelerate Service Function Chaining Vertical Solution with DPDKOPNFV
Service Function Chaining (SFC) is one of top 5 NFV use case. Supporting SFC in provider and enterprise networks requires performance assurance. Specifically, the Classifier and the Service Function Forwarder which are typically implemented in software such as virtual switches need to match line rate requirement. DPDK (Data Plane Development Kit) is an open source project comprising a set of libraries and drivers for fast packet processing. In this presentation, we will discuss our experiences accelerating SFC with DPDK. In addition, Telco and Datacenter carriers demands dynamic SFC that requires new SFC wire protocols (e.g. VxLAN-GPE and NSH) support in both data and control planes. We intend to share our experiences and future works of a high performance, NSH-aware SFC vertical solution with open-source ingredients: Openstack, Opendaylight, OpenvSwitch with DPDK acceleration.
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...Jim St. Leger
Thomas Monjalon, 6WIND, presents on where/how to use DPDK, the DPDK ecosystem, and the DPDK.org community.
Thomas is the community maintainer of DPDK.org.
This was presented by Yong LU at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/16/OpenCAPI%20Acceleration%20Framework_YongLu_ver2.pdf
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/ceva/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit-siegel
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Yair Siegel, Director of Segment Marketing at CEVA, presents the "Fast Deployment of Low-power Deep Learning on CEVA Vision Processors" tutorial at the May 2016 Embedded Vision Summit.
Image recognition capabilities enabled by deep learning are benefitting more and more applications, including automotive safety, surveillance and drones. This is driving a shift towards running neural networks inside embedded devices. But, there are numerous challenges in squeezing deep learning into resource-limited devices. This presentation details a fast path for taking a neural network from research into an embedded implementation on a CEVA vision processor core, making use of CEVA’s neural network software framework. Siegel explains how the CEVA framework integrates with existing deep learning development environments like Caffe, and how it can be used to create low-power embedded systems with neural network capabilities.
In this deck from the Performance Optimisation and Productivity group, Lubomir Riha from IT4Innovations presents: Energy Efficient Computing using Dynamic Tuning.
"We now live in a world of power-constrained architectures and systems and power consumption represents a significant cost factor in the overall HPC system economy. For these reasons, in recent years researchers, supercomputing centers and major vendors have developed new tools and methodologies to measure and optimize the energy consumption of large-scale high performance system installations. Due to the link between energy consumption, power consumption and execution time of an application executed by the final user, it is important for these tools and the methodology used to consider all these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.
This webinar focused on tools designed to improve the energy-efficiency of HPC applications using a methodology of dynamic tuning of HPC applications, developed under the H2020 READEX project. The READEX methodology has been designed for exploiting the dynamic behaviour of software. At design time, different runtime situations (RTS) are detected and optimized system configurations are determined. RTSs with the same configuration are grouped into scenarios, forming the tuning model. At runtime, the tuning model is used to switch system configurations dynamically.
The MERIC tool, that implements the READEX methodology, is presented. It supports manual or binary instrumentation of the analysed applications to simplify the analysis. This instrumentation is used to identify and annotate the significant regions in the HPC application. Automatic binary instrumentation annotates regions with significant runtime. Manual instrumentation, which can be combined with automatic, allows code developer to annotate regions of particular interest."
Watch the video: https://wp.me/p3RLHQ-lJP
Learn more: https://pop-coe.eu/blog/14th-pop-webinar-energy-efficient-computing-using-dynamic-tuning
and
https://code.it4i.cz/vys0053/meric
Sign up for our insideHPC Newsletter: http://insidehpc.com/newslett
Learning from ZFS to Scale Storage on and under Containersinside-BigData.com
Evan Powell presented this deck at the MSST 2107 Mass Storage Conference.
"What is so new about the container environment that a new class of storage software is emerging to address these use cases? And can container orchestration systems themselves be part of the solution? As is often the case in storage, metadata matters here. We are implementing in the open source OpenEBS.io some approaches that are in some regards inspired by ZFS to enable much more efficient scale out block storage for containers that itself is containerized. The goal is to enable storage to be treated in many regards as just another application while, of course, also providing storage services to stateful applications in the environment."
Watch the video: http://wp.me/p3RLHQ-gPs
Learn more: blog.openebs.io
and
http://storageconference.us
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Workgroup
Achronix presented, "ODSA Proof of Concept SmartNIC Speeds & Feeds," at the ODSA Workshop. The charter of the ODSA (Open Domain Specification Architecture) Workgroup is to define an open specification that enables building of Domain Specific Accelerator silicon using best-of-breed components from the industry made available as chiplet dies that can be integrated together as Lego blocks on an organic substrate packaging layer. The resulting multi-chip module (MCM) silicon can be produced at significantly lower development and manufacturing costs, and will deliver much needed performance per watt and performance per dollar efficiencies in networking, security, machine learning and other applications. The ODSA Workgroup also intends to deliver implementations of the specification as board-level prototypes, RTL code and libraries.
TitanIC presented, "ODSA Use Case - SmartNIC," at the ODSA Workshop. The charter of the ODSA (Open Domain Specification Architecture) Workgroup is to define an open specification that enables building of Domain Specific Accelerator silicon using best-of-breed components from the industry made available as chiplet dies that can be integrated together as Lego blocks on an organic substrate packaging layer. The resulting multi-chip module (MCM) silicon can be produced at significantly lower development and manufacturing costs, and will deliver much needed performance per watt and performance per dollar efficiencies in networking, security, machine learning and other applications. The ODSA Workgroup also intends to deliver implementations of the specification as board-level prototypes, RTL code and libraries.
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM Research
IBM and the Netherlands Institute for Radio Astronomy ASTRON have unveiled the world’s first water-cooled 64-bit microserver. The prototype, which is roughly the size of a smartphone, is part of the proposed IT roadmap for the Square Kilometre Array (SKA), an international consortium to build the world’s largest and most sensitive radio telescope. Scientists estimate that the processing power required to operate the telescope will be equal to several millions of today’s fastest computers.
The microserver’s team has designed and demonstrated a prototype 64-bit microserver using a PowerPC based chip from Freescale Semiconductor running Linux Fedora and IBM DB2. At 133 × 55 mm2 the microserver contains all of the essential functions of today’s servers, which are 4 to 10 times larger in size.
Not only is the microserver compact, it is also very energy-efficient. One of its innovations is hotwater cooling, which in addition to keeping the chip operating temperature below 85 degrees C, will also transport electrical power by means of a copper plate. The concept is based on the same technology IBM developed for the SuperMUC supercomputer located outside of Munich, Germany. IBM scientists hope to keep each microserver operating between 35–40 watts including the system on a chip (SOC) — the current design is 60 watts.
The next step for scientists is to begin to take 128 of the microserver boards using the newest T4240 chips to create a 2U rack unit with 1536 cores and 3072 threads with up to 6 terabytes of DRAM. In addition, they will be adding an Ethernet switch and power module to the integrated water-cooling.
NXP presented, "ODSA Workshop: Development Effort Summary," at the ODSA Workshop. The charter of the ODSA (Open Domain Specification Architecture) Workgroup is to define an open specification that enables building of Domain Specific Accelerator silicon using best-of-breed components from the industry made available as chiplet dies that can be integrated together as Lego blocks on an organic substrate packaging layer. The resulting multi-chip module (MCM) silicon can be produced at significantly lower development and manufacturing costs, and will deliver much needed performance per watt and performance per dollar efficiencies in networking, security, machine learning and other applications. The ODSA Workgroup also intends to deliver implementations of the specification as board-level prototypes, RTL code and libraries.
DPDK Summit 2015 in San Francisco.
Intel's presentation by Keith Wiles.
For additional details and the video recording please visit www.dpdksummit.com.
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
https://sites.google.com/view/itri-icl-dla/
(Public Information Share) This is our lightweight DNN inference processor presentation, including a system solution (from Caffe prototxt to HW controls files), hardware features, and an example of object detection (Tiny YOLO) RTL simulation results. We modified open-source NVDLA, small configuration, and developed a RISC-V MCU in this accelerating system.
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Michelle Holley
Speaker: Daniel Towner, System Architect for Wireless Access, Intel Corporation
5G brings many new capabilities over 4G including higher bandwidths, lower latencies, and more efficient use of radio spectrum. However, these improvements require a large increase in computing power in the base station. Fortunately the Xeon Scalable Processor series (Skylake-SP) recently introduced by Intel has a new high-performance instruction set called Intel® Advanced Vector Extensions 512 (Intel® AVX-512) which is capable of delivering the compute needed to support the exciting new world of 5G.
In his talk Daniel will give an overview of the new capabilities of the Intel AVX-512 instruction set and show why they are so beneficial to supporting 5G efficiently. The most obvious difference is that Intel AVX-512 has double the compute performance of previous generations of instruction sets. Perhaps surprisingly though it is the addition of brand new instructions that can make the biggest improvements. The new instructions mean that software algorithms can become more efficient, thereby enabling even more effective use of the improvements in computing performance and leading to very high performance 5G NR software implementations.
Introduction to HPC & Supercomputing in AITyrone Systems
Catch up with our live webinar on Natural Language Processing! Learn about how it works and how it applies to you. We have provided all the information in our video recording you would not miss out on.
Watch the Natural Language Processing webinar here!
Ron Swartzentruber's (Senior Principal Engineer, Silicon Development at Netronome) presentation from IEEE SOCC 2016 "SoC Solutions Enabling Server-Based Networking" from September 8, 2016.
Ariel Waizel discusses the Data Plane Development Kit (DPDK), an API for developing fast packet processing code in user space.
* Who needs this library? Why bypass the kernel?
* How does it work?
* How good is it? What are the benchmarks?
* Pros and cons
Ariel worked on kernel development at the IDF, Ben Gurion University, and several companies. He is interested in networking, security, machine learning, and basically everything except UI development. Currently a Solution Architect at ConteXtream (an HPE company), which specializes in SDN solutions for the telecom industry.
Knowing what's inside and how it works will help you design, develop, and implement applications better, faster, cheaper, more efficient, and easier to use because you will be able to make informed decisions instead of guestimating and assuming.
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
Apache Apex is a next gen big data analytics platform. Originally developed at DataTorrent it comes with a powerful stream processing engine, rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on YARN and HDFS and is used in production in various industries. You will learn about the Apex architecture, including its unique features for scalability, fault tolerance and processing guarantees, programming model and use cases.
http://apachebigdata2016.sched.org/event/6M0L/next-gen-big-data-analytics-with-apache-apex-thomas-weise-datatorrent
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
HighLoad++ 2017
Зал «Москва», 7 ноября, 13:00
Тезисы:
http://www.highload.ru/2017/abstracts/2909.html
OpenDataPlane (ODP, https://www.opendataplane.org) является open-source-разработкой API для сетевых data plane-приложений, представляющий абстракцию между сетевым чипом и приложением. Сейчас вендоры, такие как TI, Freescale, Cavium, выпускают SDK с поддержкой ODP на своих микросхемах SoC. Если проводить аналогию с графическим стеком, то ODP можно сравнить с OpenGL API, но только в области сетевого программирования.
...
HPC and cloud distributed computing, as a journeyPeter Clapham
Introducing an internal cloud brings new paradigms, tools and infrastructure management. When placed alongside traditional HPC the new opportunities are significant But getting to the new world with micro-services, autoscaling and autodialing is a journey that cannot be achieved in a single step.
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
This slide introduces technical specs and details about Backend.AI 19.09.
* On-premise clustering / container orchestration / scaling on cloud
* Container-level fractional GPU technology to use one GPU as many GPUs on many containers at the same time.
* NVidia GPU Cloud integrations
* Enterprise features
Similar to A Dataflow Processing Chip for Training Deep Neural Networks (20)
In this deck from the Stanford HPC Conference, Shahin Khan from OrionX describes major market Shifts in IT.
"We will discuss the digital infrastructure of the future enterprise and the state of these trends."
"We work with clients on the impact of Digital Transformation (DX) on them, their customers, and their messages. Generally, they want to track, in one place, trends like IoT, 5G, AI, Blockchain, and Quantum Computing. And they want to know what these trends mean, how they affect each other, and when they demand action, and how to formulate and execute an effective plan. If that describes you, we can help."
Watch the video: https://wp.me/p3RLHQ-lPP
Learn more: http://orionx.net
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
In this deck from IWOCL / SYCLcon 2020, Hal Finkel from Argonne National Laboratory presents: Preparing to program Aurora at Exascale - Early experiences and future directions.
"Argonne National Laboratory’s Leadership Computing Facility will be home to Aurora, our first exascale supercomputer. Aurora promises to take scientific computing to a whole new level, and scientists and engineers from many different fields will take advantage of Aurora’s unprecedented computational capabilities to push the boundaries of human knowledge. In addition, Aurora’s support for advanced machine-learning and big-data computations will enable scientific workflows incorporating these techniques along with traditional HPC algorithms. Programming the state-of-the-art hardware in Aurora will be accomplished using state-of-the-art programming models. Some of these models, such as OpenMP, are long-established in the HPC ecosystem. Other models, such as Intel’s oneAPI, based on SYCL, are relatively-new models constructed with the benefit of significant experience. Many applications will not use these models directly, but rather, will use C++ abstraction libraries such as Kokkos or RAJA. Python will also be a common entry point to high-performance capabilities. As we look toward the future, features in the C++ standard itself will become increasingly relevant for accessing the extreme parallelism of exascale platforms.
This presentation will summarize the experiences of our team as we prepare for Aurora, exploring how to port applications to Aurora’s architecture and programming models, and distilling the challenges and best practices we’ve developed to date. oneAPI/SYCL and OpenMP are both critical models in these efforts, and while the ecosystem for Aurora has yet to mature, we’ve already had a great deal of success. Importantly, we are not passive recipients of programming models developed by others. Our team works not only with vendor-provided compilers and tools, but also develops improved open-source LLVM-based technologies that feed both open-source and vendor-provided capabilities. In addition, we actively participate in the standardization of OpenMP, SYCL, and C++. To conclude, I’ll share our thoughts on how these models can best develop in the future to support exascale-class systems."
Watch the video: https://wp.me/p3RLHQ-lPT
Learn more: https://www.iwocl.org/iwocl-2020/conference-program/
and
https://www.anl.gov/topic/aurora
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Greg Wahl from Advantech presents: Transforming Private 5G Networks.
Advantech Networks & Communications Group is driving innovation in next-generation network solutions with their High Performance Servers. We provide business critical hardware to the world's leading telecom and networking equipment manufacturers with both standard and customized products. Our High Performance Servers are highly configurable platforms designed to balance the best in x86 server-class processing performance with maximum I/O and offload density. The systems are cost effective, highly available and optimized to meet next generation networking and media processing needs.
“Advantech’s Networks and Communication Group has been both an innovator and trusted enabling partner in the telecommunications and network security markets for over a decade, designing and manufacturing products for OEMs that accelerate their network platform evolution and time to market.” Said Advantech Vice President of Networks & Communications Group, Ween Niu. “In the new IP Infrastructure era, we will be expanding our expertise in Software Defined Networking (SDN) and Network Function Virtualization (NFV), two of the essential conduits to 5G infrastructure agility making networks easier to install, secure, automate and manage in a cloud-based infrastructure.”
In addition to innovation in air interface technologies and architecture extensions, 5G will also need a new generation of network computing platforms to run the emerging software defined infrastructure, one that provides greater topology flexibility, essential to deliver on the promises of high availability, high coverage, low latency and high bandwidth connections. This will open up new parallel industry opportunities through dedicated 5G network slices reserved for specific industries dedicated to video traffic, augmented reality, IoT, connected cars etc. 5G unlocks many new doors and one of the keys to its enablement lies in the elasticity and flexibility of the underlying infrastructure.
Advantech’s corporate vision is to enable an intelligent planet. The company is a global leader in the fields of IoT intelligent systems and embedded platforms. To embrace the trends of IoT, big data, and artificial intelligence, Advantech promotes IoT hardware and software solutions with the Edge Intelligence WISE-PaaS core to assist business partners and clients in connecting their industrial chains. Advantech is also working with business partners to co-create business ecosystems that accelerate the goal of industrial intelligence."
Watch the video: https://wp.me/p3RLHQ-lPQ
* Company website: https://www.advantech.com/
* Solution page: https://www2.advantech.com/nc/newsletter/NCG/SKY/benefits.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
In this deck from the Stanford HPC Conference, Katie Lewis from Lawrence Livermore National Laboratory presents: The Incorporation of Machine Learning into Scientific Simulations at Lawrence Livermore National Laboratory.
"Scientific simulations have driven computing at Lawrence Livermore National Laboratory (LLNL) for decades. During that time, we have seen significant changes in hardware, tools, and algorithms. Today, data science, including machine learning, is one of the fastest growing areas of computing, and LLNL is investing in hardware, applications, and algorithms in this space. While the use of simulations to focus and understand experiments is well accepted in our community, machine learning brings new challenges that need to be addressed. I will explore applications for machine learning in scientific simulations that are showing promising results and further investigation that is needed to better understand its usefulness."
Watch the video: https://youtu.be/NVwmvCWpZ6Y
Learn more: https://computing.llnl.gov/research-area/machine-learning
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
In this deck from the Stanford HPC Conference, DK Panda from Ohio State University presents: How to Achieve High-Performance, Scalable and Distributed DNN Training on Modern HPC Systems?
"This talk will start with an overview of challenges being faced by the AI community to achieve high-performance, scalable and distributed DNN training on Modern HPC systems with both scale-up and scale-out strategies. After that, the talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of- core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented."
Watch the video: https://youtu.be/LeUNoKZVuwQ
Learn more: http://web.cse.ohio-state.edu/~panda.2/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
In this deck from the Stanford HPC Conference, Nick Nystrom and Paola Buitrago provide an update from the Pittsburgh Supercomputing Center.
Nick Nystrom is Chief Scientist at the Pittsburgh Supercomputing Center (PSC). Nick is architect and PI for Bridges, PSC's flagship system that successfully pioneered the convergence of HPC, AI, and Big Data. He is also PI for the NIH Human Biomolecular Atlas Program’s HIVE Infrastructure Component and co-PI for projects that bring emerging AI technologies to research (Open Compass), apply machine learning to biomedical data for breast and lung cancer (Big Data for Better Health), and identify causal relationships in biomedical big data (the Center for Causal Discovery, an NIH Big Data to Knowledge Center of Excellence). His current research interests include hardware and software architecture, applications of machine learning to multimodal data (particularly for the life sciences) and to enhance simulation, and graph analytics.
Watch the video: https://youtu.be/LWEU1L1o7yY
Learn more: https://www.psc.edu/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Stanford HPC Conference, Ryan Quick from Providentia Worldwide describes how DNNs can be used to improve EDA simulation runs.
"Systems Intelligence relies on a variety of methods for providing insight into the core mechanisms for driving automated behavioral changes in self-healing command and control platforms. This talk reports on initial efforts with leveraging Semiconductor Electronic Design Automation (EDA) telemetry data from cross-domain sources including power, network, storage, nodes, and applications in neural networks as a driving method for insight into SI automation systems."
Watch the video: https://youtu.be/2WbR8tq-XbM
Learn more: http://www.providentiaworldwide.com/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
In this deck from the Stanford HPC Conference, Nicole Xu from Stanford University describes how she transformed a common jellyfish into a bionic creature that is part animal and part machine.
"Animal locomotion and bioinspiration have the potential to expand the performance capabilities of robots, but current implementations are limited. Mechanical soft robots leverage engineered materials and are highly controllable, but these biomimetic robots consume more power than corresponding animal counterparts. Biological soft robots from a bottom-up approach offer advantages such as speed and controllability but are limited to survival in cell media. Instead, biohybrid robots that comprise live animals and self- contained microelectronic systems leverage the animals’ own metabolism to reduce power constraints and body as an natural scaffold with damage tolerance. We demonstrate that by integrating onboard microelectronics into live jellyfish, we can enhance propulsion up to threefold, using only 10 mW of external power input to the microelectronics and at only a twofold increase in cost of transport to the animal. This robotic system uses 10 to 1000 times less external power per mass than existing swimming robots in literature and can be used in future applications for ocean monitoring to track environmental changes."
Watch the video: https://youtu.be/HrmJFyvInj8
Learn more: https://sanfrancisco.cbslocal.com/2020/02/05/stanford-research-project-common-jellyfish-bionic-sea-creatures/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Stanford HPC Conference, Peter Dueben from the European Centre for Medium-Range Weather Forecasts (ECMWF) presents: Machine Learning for Weather Forecasts.
"I will present recent studies that use deep learning to learn the equations of motion of the atmosphere, to emulate model components of weather forecast models and to enhance usability of weather forecasts. I will than talk about the main challenges for the application of deep learning in cutting-edge weather forecasts and suggest approaches to improve usability in the future."
Peter is contributing to the development and optimization of weather and climate models for modern supercomputers. He is focusing on a better understanding of model error and model uncertainty, on the use of reduced numerical precision that is optimised for a given level of model error, on global cloud- resolving simulations with ECMWF's forecast model, and the use of machine learning, and in particular deep learning, to improve the workflow and predictions. Peter has graduated in Physics and wrote his PhD thesis at the Max Planck Institute for Meteorology in Germany. He worked as Postdoc with Tim Palmer at the University of Oxford and has taken up a position as University Research Fellow of the Royal Society at the European Centre for Medium-Range Weather Forecasts (ECMWF) in 2017.
Watch the video: https://youtu.be/ks3fkRj8Iqc
Learn more: https://www.ecmwf.int/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Gilad Shainer from the HPC AI Advisory Council describes how this organization fosters innovation in the high performance computing community.
"The HPC-AI Advisory Council’s mission is to bridge the gap between high-performance computing (HPC) and Artificial Intelligence (AI) use and its potential, bring the beneficial capabilities of HPC and AI to new users for better research, education, innovation and product manufacturing, bring users the expertise needed to operate HPC and AI systems, provide application designers with the tools needed to enable parallel computing, and to strengthen the qualification and integration of HPC and AI system products."
Watch the video: https://wp.me/p3RLHQ-lNz
Learn more: http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Today RIKEN in Japan announced that the Fugaku supercomputer will be made available for research projects aimed to combat COVID-19.
"Fugaku is currently being installed and is scheduled to be available to the public in 2021. However, faced with the devastating disaster unfolding before our eyes, RIKEN and MEXT decided to make a portion of the computational resources of Fugaku available for COVID-19-related projects ahead of schedule while continuing the installation process.
Fugaku is being developed not only for the progress in science, but also to help build the society dubbed as the “Society 5.0” by the Japanese government, where all people will live safe and comfortable lives. The current initiative to fight against the novel coronavirus is driven by the philosophy behind the development of Fugaku."
Initial Projects
Exploring new drug candidates for COVID-19 by "Fugaku"
Yasushi Okuno, RIKEN / Kyoto University
Prediction of conformational dynamics of proteins on the surface of SARS-Cov-2 using Fugaku
Yuji Sugita, RIKEN
Simulation analysis of pandemic phenomena
Nobuyasu Ito, RIKEN
Fragment molecular orbital calculations for COVID-19 proteins
Yuji Mochizuki, Rikkyo University
In this deck from GTC Digital, William Beaudin from DDN presents: HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD.
Enabling high performance computing through the use of GPUs requires an incredible amount of IO to sustain application performance. We'll cover architectures that enable extremely scalable applications through the use of NVIDIA’s SuperPOD and DDN’s A3I systems.
The NVIDIA DGX SuperPOD is a first-of-its-kind artificial intelligence (AI) supercomputing infrastructure. DDN A³I with the EXA5 parallel file system is a turnkey, AI data storage infrastructure for rapid deployment, featuring faster performance, effortless scale, and simplified operations through deeper integration. The combined solution delivers groundbreaking performance, deploys in weeks as a fully integrated system, and is designed to solve the world's most challenging AI problems.
Watch the video: https://wp.me/p3RLHQ-lIV
Learn more: https://www.ddn.com/download/nvidia-superpod-ddn-a3i-ai400-appliance-with-the-exa5-filesystem/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Paul Isaacs from Linaro presents: State of ARM-based HPC. This talk provides an overview of applications and infrastructure services successfully ported to Aarch64 and benefiting from scale.
"With its debut on the TOP500, the 125,000-core Astra supercomputer at New Mexico’s Sandia Labs uses Cavium ThunderX2 chips to mark Arm’s entry into the petascale world. In Japan, the Fujitsu A64FX Arm-based CPU in the pending Fugaku supercomputer has been optimized to achieve high-level, real-world application performance, anticipating up to one hundred times the application execution performance of the K computer. K was the first computer to top 10 petaflops in 2011."
Watch the video: https://wp.me/p3RLHQ-lIT
Learn more: https://www.linaro.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
Today Xilinx announced Versal Premium, the third series in the Versal ACAP portfolio. The Versal Premium series features highly integrated, networked and power-optimized cores and the industry’s highest bandwidth and compute density on an adaptable platform. Versal Premium is designed for the highest bandwidth networks operating in thermally and spatially constrained environments, as well as for cloud providers who need scalable, adaptable application acceleration.
Versal is the industry’s first adaptive compute acceleration platform (ACAP), a revolutionary new category of heterogeneous compute devices with capabilities that far exceed those of conventional silicon architectures. Developed on TSMC’s 7-nanometer process technology, Versal Premium combines software programmability with dynamically configurable hardware acceleration and pre-engineered connectivity and security features to enable a faster time-to- market. The Versal Premium series delivers up to 3X higher throughput compared to current generation FPGAs, with built-in Ethernet, Interlaken, and cryptographic engines that enable fast and secure networks. The series doubles the compute density of currently deployed mainstream FPGAs and provides the adaptability to keep pace with increasingly diverse and evolving cloud and networking workloads.
Learn more: https://insidehpc.com/2020/03/xilinx-announces-versal-premium-acap-for-network-and-cloud-acceleration/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
In this video from the Rice Oil & Gas Conference, Chin Fang from Zettar presents: Moving Massive Amounts of Data across Any Distance Efficiently.
The objective of this talk is to present two on-going projects aiming at improving and ensuring highly efficient bulk transferring or streaming of massive amounts of data over digital connections across any distance. It examines the current state of the art, a few very common misconceptions, the differences among the three major type of data movement solutions, a current initiative attempting to improve the data movement efficiency from the ground up, and another multi-stage project that shows how to conduct long distance large scale data movement at speed and scale internationally. Both projects have real world motivations, e.g. the ambitious data transfer requirements of Linac Coherent Light Source II (LCLS-II) [1], a premier preparation project of the U.S. DOE Exascale Computing Initiative (ECI) [2]. Their immediate goals are described and explained, together with the solution used for each. Findings and early results are reported. Possible future works are outlined.
Watch the video: https://wp.me/p3RLHQ-lBX
Learn more: https://www.zettar.com/
and
https://rice2020oghpc.rice.edu/program-2/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Rice Oil & Gas Conference, Bradley McCredie from AMD presents: Scaling TCO in a Post Moore's Law Era.
"While foundries bravely drive forward to overcome the technical and economic challenges posed by scaling to 5nm and beyond, Moore’s law alone can provide only a fraction of the performance / watt and performance / dollar gains needed to satisfy the demands of today’s high performance computing and artificial intelligence applications. To close the gap, multiple strategies are required. First, new levels of innovation and design efficiency will supplement technology gains to continue to deliver meaningful improvements in SoC performance. Second, heterogenous compute architectures will create x-factor increases of performance efficiency for the most critical applications. Finally, open software frameworks, APIs, and toolsets will enable broad ecosystems of application level innovation."
Watch the video:
Learn more: http://amd.com
and
https://rice2020oghpc.rice.edu/program-2/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
In this deck from the ECSS Symposium, Abe Stern from NVIDIA presents: CUDA-Python and RAPIDS for blazing fast scientific computing.
"We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming."
Watch the video: https://wp.me/p3RLHQ-lvu
Learn more: https://developer.nvidia.com/rapids
and
https://www.xsede.org/for-users/ecss/ecss-symposium
Sign up for our insideHPC Newsletter: http://insidehp.com/newsletter
In this deck from FOSDEM 2020, Colin Sauze from Aberystwyth University describes the development of a RaspberryPi cluster for teaching an introduction to HPC.
"The motivation for this was to overcome four key problems faced by new HPC users:
* The availability of a real HPC system and the effect running training courses can have on the real system, conversely the availability of spare resources on the real system can cause problems for the training course.
* A fear of using a large and expensive HPC system for the first time and worries that doing something wrong might damage the system.
* That HPC systems are very abstract systems sitting in data centres that users never see, it is difficult for them to understand exactly what it is they are using.
* That new users fail to understand resource limitations, in part because of the vast resources in modern HPC systems a lot of mistakes can be made before running out of resources. A more resource constrained system makes it easier to understand this.
The talk will also discuss some of the technical challenges in deploying an HPC environment to a Raspberry Pi and attempts to keep that environment as close to a "real" HPC as possible. The issue to trying to automate the installation process will also be covered."
Learn more: https://github.com/colinsauze/pi_cluster
and
https://fosdem.org/2020/schedule/events/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from ATPESC 2019, Ken Raffenetti from Argonne presents an overview of HPC interconnects.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two-week training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-luc
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
In this deck from FOSDEM 2020, Frank McQuillan from Pivotal presents: Efficient Model Selection for Deep Neural Networks on Massively Parallel Processing Databases.
"In this session we will present an efficient way to train many deep learning model configurations at the same time with Greenplum, a free and open source massively parallel database based on PostgreSQL. The implementation involves distributing data to the workers that have GPUs available and hopping model state between those workers, without sacrificing reproducibility or accuracy. Then we apply optimization algorithms to generate and prune the set of model configurations to try.
Deep neural networks are revolutionizing many machine learning applications, but hundreds of trials may be needed to generate a good model architecture and associated hyperparameters. This is the challenge of model selection. It is time consuming and expensive, especially if you are only training one model at a time.
Massively parallel processing databases can have hundreds of workers, so can you use this parallel compute architecture to address the challenge of model selection for deep nets, in order to make it faster and cheaper?
It’s possible!
We will demonstrate results from this project using a version of Hyperband, which is a well known hyperparameter optimization algorithm, and the deep learning frameworks Keras and TensorFlow, all running on Greenplum database using Apache MADlib. Other topics will include architecture, scalability results and bright opportunities for the future."
Watch the video: https://wp.me/p3RLHQ-lsQ
Learn more: https://fosdem.org/2020/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
A Dataflow Processing Chip for Training Deep Neural Networks
1. A Dataflow Processing Chip for Training Deep Neural Networks
Dr. Chris Nicol
Chief Technology Officer
Wave Computing Copyright 2017.
2. Founded in 2010
• Tallwood Venture Capital
• Southern Cross Venture Partners
Headquartered in Campbell, CA
• World class team of 53 dataflow, data science, and systems experts
• 60+ patents
Invented Dataflow Processing Unit (DPU) architecture to
accelerate deep learning training by up to 1000x
• Coarse Grain Reconfigurable Array (CGRA) Architecture
• Static scheduling of data flow graphs onto massive array of processors
Now accepting qualified customers for Early Access Program
Wave Computing. Copyright 2017.
Wave Computing Profile
3. Extended training time due to increasing size of datasets
• Weeks to tune and train typical deep learning models
Hardware for accelerating ML was created for other applications
• GPUs for graphics, FPGA’s for RTL emulation
Data coming in “from the edge” is growing faster than
the datacenter can accommodate/use it…
Design
•Neural network
architecture
•Cost functions
Tune
•Parameter
initialization
•Learning rate
•Mini-batch size
Train
•Accuracy
•Convergence
Rate
Deploy
•For testing
Deploy
•For production
➢ Problem: Model
development times can
take days or weeks
Wave Computing. Copyright 2017.
Challenges of Machine Learning
4. Source: Google; http://download.tensorflow.org/paper/whitepaper2015.pdf
• Co-processors must
wait on the CPU for
instructions
• This limits
performance and
reduces efficiency
and scalability
• Restricts embedded
use cases to
inferencing-only
GPU waiting on CPU
Figure 13: EEG visualization of Inception training showing CPU and GPU activity.
Wave Computing. Copyright 2017.
Problems with Existing Solutions
5. Times
Times
I/O
Softmax
Plus
Plus
Mem I/OSigmoid
Programmed on
Deep Learning
Software
Run on Wave
Dataflow
Processor
Times
Times
Plus
Plus
Softmax
Sigmoid
Deep Learning
Networks are
Dataflow
Graphs
Wave Dataflow Processor
WaveFlow Agent Library
Wave Computing. Copyright 2017.
Wave Dataflow Processor is Ideal for Deep Learning
6. DDR4 DDR4
HMCHMC
HMCHMC
PCIe Gen3 x16 MCU
AXI4
AXI4
AXI4
AXI4
Secure DPU
Program
Buffer
Secure DPU
Program
Loader
16ff CMOS Process Node 16K Processors,
8192 DPU Arithmetic Units
Self-Timed,
MPP Synchronization
181 Peak Tera-Ops, 7.25 Tera
Bytes/sec Bisection Bandwidth
16 MB Distributed
Data Memory
8 MB Distributed
Instruction Memory
1.71 TB/s I/O Bandwidth
4096 Programmable FIFOs
270 GB/s Peak
Memory Bandwidth
2048 outstanding
memory requests
4 Billion 16-Byte Random
Access Transfers / sec
4 Hybrid Memory
Cube Interfaces
2 DDR4 Interfaces
PCIe Gen3 16-Lane
Host interface
32-b Andes N9 MCU 1 MB Program
Store for Paging
Hardware Engine for Fast
Loading of AES Encrypted
Programs
Up to 32 Programmable
dynamic reconfiguration zones
Variable Fabric Dimensions
(User Programmable at Boot)
Wave Computing. Copyright 2017.
Wave Dataflow Processing Unit
Chip Characteristics & Design Features
• Clock-less CGRA is robust to Process, Voltage & Temperature.
• Distributed memory architecture for parallel processing
• Optimized for data flow graph execution
• DMA-driven architecture – overlapping I/O and computation
7. Key DPU Board Features
• 65,536 CGRA Processing Elements
• 4 Wave DPU chips per board
• Modular, flexible design
• Multiple DPU boards per Wave
Compute Appliance
• Off-the-shelf components
• 32GB of ultra high-speed DRAM
• 512GB of DDR4 DRAM
• FPGA for high-speed
board-to-board communication
Wave Computing. Copyright 2017.
Wave Current Generation DPU Board
8. • Best-in-class, highly scalable deep learning training and inference
• More than orders of magnitude better compute-power efficiency
• Plug-and-play node in a datacenter network -- Big Data – Hadoop, Yarn, Spark, Kafka
• Native support of Google TensorFlow (initially)
Wave Computing. Copyright 2017.
Wave’s Solution: Dataflow Computer for Deep Learning
9. Pipelined 1KB Single Port Data RAM /w BIST & ECC
Pipelined 256-entry Instruction RAM /w ECC
Quad of PEs are fully
connected
PE c
PE a PE b
PE d
Wave Computing. Copyright 2017.
Dataflow Processing Element (PE)
10. • 16 Processor CLUSTER: a full custom tiled GDSII block
• Fully-Connected PE Quads with fan-out
• 8 DPU Arithmetic Units
– Per-cycle grouping into 8, 16, 24, 32, 64-b Operations
– Pipelined MAC Units with (un)Signed Saturation
– Support for floating point emulation
– Barrel Shifter, Bit Processor
– SIMD and MIMD instruction classes
– Data driven
• 16KB Data RAM
• 16 Instruction RAMs
• Full custom semi-static digital circuits
• Robust PVT insensitive operation
– Scalable to low voltages
– No global signals, no global clocks
Wave Computing. Copyright 2017.
Cluster of 16 Dataflow PEs
11. Each cluster has a pipelined instruction-driven word-level switch
Each cluster has a 4 independent pipelined
instruction driven byte-switches
Word switch supports fan-out and fan-in
fan-in
All switches have
registers for Router use
to avoid congestion
“valid” and “invalid” data in the switch enables fan-in
fan-out
Wave Computing. Copyright 2017.
Hybrid CGRA Architecture
12. From Asleep to Active
• Word switch fabric remains active
• If valid data arrives at switch input AND switch executes
instruction to send data to one of Quads THEN wake up PEs
• Copy PC from word switch to PE and byte switch iRAMs
• Send the incoming data to the PEs
From Active to Asleep
• A PE executes a “sleep” instruction
• All PE & byte switch execution is suspended
• PE can opt for fast wakeup or slow wakeup
(deep sleep with lower power)
Wave Computing. Copyright 2017.
Data-Driven Power Management
16. • Clock skew and jitter limit cycle time with traditional clock distribution
• Self-timed “done” signal from PEs if they are awake. Programmable
tuning of margin.
• Synchronized with neighboring Clusters to minimize skew
• 1-sigma local mismatch ~1.3ps and global + local mismatch ~6ps at
140ps cycle time
Clock Distribution and Generation
Network across entire Fabric
Wave Computing. Copyright 2017.
6-10 GHz Auto-Calibrated Clock Distribution
17. -4
-5
-3
-4
-6
-7
-5
-6
-2
-3
-1
-2
-4
-5
-3
-4
Up-counter in each cluster
initialized to -(1+Manhattan
distance from end cluster)
End cluster
Start cluster
When counter reaches 0, either:
- Reset the processors
- Suspend processors for configuration (at PC=0)
- Enable processors to execute (from PC=0)
-4
-5
-3
-4
-6
-7
-5
-6
-2
-3
-1
-2
-4
-5
-3
-4
Pre-program 4
clusters to ENTER
config mode.
Counters
operating
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
DMA new kernel
instructions into
cluster I-mems -4
-5
-3
-4
-6
-7
-5
-6
-2
-3
-1
-2
-4
-5
-3
-4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Old
kernel
Enter config mode Exit config modePropagate SignalPropagate Signal
Step 1 Step 2 Step 3 Step 4
Propagate control signal
from start cluster to end
cluster. Advances 1
cluster per cycle.
Pre-program 4
clusters to EXIT
config mode.
All clusters running in-synch
New
kernel
SW controls this process to manage surge current
Propagate signal
starts the up-counter
in each cluster.
Counters
operating
Counters
operating
New
kernel
Reset, Configuration Modes
Stop
(config mode)
(running)
(running)
(running)
(running)
(running)
(running) (running)
(running)
(running)
18. 1
1
1
1
2
2
2
2
-
-
-
-
2
2
2
2
Kernel1 and kernel2 mounted.
3
3
3
3
I/O
Mount(kernel3)
(note: I/Os at bottom)
New Kernel
goes here!
1
1
1
1
2
2
2
2
3
3
3
3
2
2
2
2
After mount(kernel3)
I/O I/O
I/O
Just-in-time route
through Kernel 2
The I/Os cannot
go here!
• Runtime resource manager in lightweight host.
• Mount(). Online placement algorithm with maxrects management of empty clusters.
• Uses “porosity map” for each kernel showing route-thru opportunities. (SDK provides this)
• Just-in-time Place & Route (using A*) of I/Os through other kernels without functional side-effects.
• Unmount(). Removes paths through other kernels.
• Machines are combined for mounting large kernels. Partitioned during unmount().
• Periodic garbage collection used for cleanup.
• Average mount time < 1ms
Runtime resource manager performing mount()
Dynamic Reconfiguration
19. • WFG Compiler
• WFG Linker
• WFG Simulator
• DF agent partitioning
• DFG throughput
optimization
• Runs on Session Host
WaveFlow Execution
Engine
• Resource Manager
• Monitors
• Drivers
• Runs on a Wave Deep
Learning Computer
WaveFlow Session
Manager
WaveFlow
Agent Library
• BLAS 1,2,3
• CONV2D
• SoftMax, etc.
WaveFlow SDK
On line
Off line Encrypted Agent Code
Wave Computing. Copyright 2017.
WaveFlow Software Stack
20. WaveFlow agents are pre compiled off-line using WaveFlow SDK
• Wave provides a complete agent library for TensorFlow
• Customer can create additional agents for differentiation
Customer supplied
agent source code
Wave supplied
agent source code
WaveFlow
Agent Library
Wave SDK
• WFG Compiler
• WFG Linker
• WFG Simulator
• WFG Debugger
• MATMUL
• Batchnorm
• Relu, etc.
Your new
DNN training
technique
Encrypted Agent Code
Wave Computing. Copyright 2017.
WaveFlow Agent Library
21. SATSolver WFG Compiler
LLVM Frontend
WFG Linker
AssemblerArchitectural Simulator WaveFlow
Agents
WFG
Simulator
ML function (gemm, sigmoid, …)
To appear in ICCAD 2017
WFG = Wave Flow Graph
Wave Computing. Copyright 2017.
WaveFlow SDK
22. Kernels are islands of machine code scheduled onto machine cycles
Example: Sum of Products on 16 PEs in a single cluster
WFG of Sum of ProductsSum of Products Kernel
PE 0 to 15
Time
mov mov mov mov mov mov mov mov mov mov mov mov mov mov mov mov
mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac
mov mov movcr mov mov mov mov mov mov mov movcr movi mov mov mov
movcr mov mov mov movi mov mov mov movcr mov mov mov add8 mov
mov mov mov mov mov mov add8 mov mov mov incc8 mov movcr
mov mov mov movcr mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov movcr mov mov
mov mov mov memr movcr mov memr mov mov mov mov mov mov
mov movcr mov mov mov mov movcr mov mov mov movcr mov mov mov
mov mov mac mac mov mov mov mov mov mov mac mac mov mov mov mov
mac mac mov mov mac mac mac mac mac mac mov mov mac mac mac mac
mov mov mov memr mov mov mov mov mov mov mov mov mov movcr
mov mov mov movcr mov mov memr mov mov mov mov mov mov
incc8 mov mov mov mov incc8 mov mov mov mov mov mov mov mov
mov mov mov mov mov mov incc8 mov mov mov movcr movcr memw
mov mov mov mov mov mov mov movcr mov mov mov mov mov
mov mov memr mov mov mov memr incc8 mov mov mov mov st mov mov
mov mov mov mov mov mov mov mov mov mov mov mov mov mov
mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac
mov mov mov mov mov mov mov mov mov mov mov mov mov mov
memr mov mov mov memr mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov cmuxi mov mov
mov mov mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov memr mov mov mov memr mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov mov mov mov
mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac
mov mov mov mov mov mov mov mov mov mov mov mov mov
mov memr mov mov memr mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov mov mov
mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac
mov mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov memr mov mov mov mov mov mov
mov mov memr mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov
mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac
mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov
mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac
mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov
mov mov mov mov mov mov mov
mov mov mov mov mov mov mov
mov mov mov mov mov
mov mov mov mov mov mov mov mov mov mov mov mov mov
mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac mac
mov mov mov
mov mov mov mov mov mov mov
Wave Computing. Copyright 2017.
Wave SDK: Compiler Produces Kernels
23. Session Manager Partitions & Maps to DPUs & Memory
Inference Graph generated directly from Keras model by Wave Compiler
Wave Flow Graph Format
Mapping Inception V4 to DPUs
Single Node
64-DPU Computer
24. Benchmarks on a single node 64-DPU Data Flow Computer
• ImageNet training, 90 epochs, 1.28M images, 224x224x3
• Seq2Seq training using parameters from https://papers.nips.cc/paper/5346-sequence-to-sequence-
learning-with-neural-networks.pdf by I. Sutskever, O. Vinyals & Q. Le
Network Inferencing
(Images/sec)
Training time
AlexNet 962,000 40 mins
GoogleNet 420,000 1 hour 45 mins
Squeezenet 75,000 3 hours
Seq2Seq - 7 hours 15 min
Deep Neural Network Performance
Wave Computing. Copyright 2017.
25. Wave is now accepting qualified customers to its Early Access Program (EAP)
Provides select companies access to a Wave machine learning computer for testing
and benchmarking months before official system sales begin
For details about participation in the limited number of EAP positions,
contact info@wavecomp.com
Wave Computing. Copyright 2017.
Wave’s Early Access Program