In this deck from ATPESC 2019, James Moawad and Greg Nash from Intel present: FPGAs and Machine Learning.
"Neural networks are inspired by biological systems, in particular the human brain. Through the combination of powerful computing resources and novel architectures for neurons, neural networks have achieved state-of-the-art results in many domains such as computer vision and machine translation. FPGAs are a natural choice for implementing neural networks as they can handle different algorithms in computing, logic, and memory resources in the same device. Faster performance comparing to competitive implementations as the user can hardcore operations into the hardware. Software developers can use the OpenCL device C level programming standard to target FPGAs as accelerators to standard CPUs without having to deal with hardware level design."
Watch the video: https://wp.me/p3RLHQ-lnc
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
and
https://www.intel.com/content/www/us/en/products/programmable/fpga.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Short Survey on the current state of Field-programmable gate array usage in Deep learning by several companies like Intel Nervana and Google's TPU (tensor processing units) vs GPU usage in terms of energy consumption and performance.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/altera/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Bill Jenkins, Senior Product Specialist for High Level Design Tools at Intel, presents the "Accelerating Deep Learning Using Altera FPGAs" tutorial at the May 2016 Embedded Vision Summit.
While large strides have recently been made in the development of high-performance systems for neural networks based on multi-core technology, significant challenges in power, cost and, performance scaling remain. Field-programmable gate arrays (FPGAs) are a natural choice for implementing neural networks because they can combine computing, logic, and memory resources in a single device. Intel's Programmable Solutions Group has developed a scalable convolutional neural network reference design for deep learning systems using the OpenCL programming language built with our SDK for OpenCL. The design performance is being benchmarked using several popular CNN benchmarks: CIFAR-10, ImageNet and KITTI.
Building the CNN with OpenCL kernels allows true scaling of the design from smaller to larger devices and from one device generation to the next. New designs can be sized using different numbers of kernels at each layer. Performance scaling from one generation to the next also benefits from architectural advancements, such as floating-point engines and frequency scaling. Thus, you achieve greater than linear performance and performance per watt scaling with each new series of devices.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/xilinx/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nick Ni, Director of Product Marketing at Xilinx, presents the "Xilinx AI Engine: High Performance with Future-proof Architecture Adaptability" tutorial at the May 2019 Embedded Vision Summit.
AI inference demands orders- of-magnitude more compute capacity than what today’s SoCs offer. At the same time, neural network topologies are changing too quickly to be addressed by ASICs that take years to go from architecture to production. In this talk, Ni introduces the Xilinx AI Engine, which complements the dynamically- programmable FPGA fabric to enable ASIC-like performance via custom data flows and a flexible memory hierarchy. This combination provides an orders-of-magnitude boost in AI performance along with the hardware architecture flexibility needed to quickly adapt to rapidly evolving neural network topologies.
Short Survey on the current state of Field-programmable gate array usage in Deep learning by several companies like Intel Nervana and Google's TPU (tensor processing units) vs GPU usage in terms of energy consumption and performance.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/altera/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Bill Jenkins, Senior Product Specialist for High Level Design Tools at Intel, presents the "Accelerating Deep Learning Using Altera FPGAs" tutorial at the May 2016 Embedded Vision Summit.
While large strides have recently been made in the development of high-performance systems for neural networks based on multi-core technology, significant challenges in power, cost and, performance scaling remain. Field-programmable gate arrays (FPGAs) are a natural choice for implementing neural networks because they can combine computing, logic, and memory resources in a single device. Intel's Programmable Solutions Group has developed a scalable convolutional neural network reference design for deep learning systems using the OpenCL programming language built with our SDK for OpenCL. The design performance is being benchmarked using several popular CNN benchmarks: CIFAR-10, ImageNet and KITTI.
Building the CNN with OpenCL kernels allows true scaling of the design from smaller to larger devices and from one device generation to the next. New designs can be sized using different numbers of kernels at each layer. Performance scaling from one generation to the next also benefits from architectural advancements, such as floating-point engines and frequency scaling. Thus, you achieve greater than linear performance and performance per watt scaling with each new series of devices.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/xilinx/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nick Ni, Director of Product Marketing at Xilinx, presents the "Xilinx AI Engine: High Performance with Future-proof Architecture Adaptability" tutorial at the May 2019 Embedded Vision Summit.
AI inference demands orders- of-magnitude more compute capacity than what today’s SoCs offer. At the same time, neural network topologies are changing too quickly to be addressed by ASICs that take years to go from architecture to production. In this talk, Ni introduces the Xilinx AI Engine, which complements the dynamically- programmable FPGA fabric to enable ASIC-like performance via custom data flows and a flexible memory hierarchy. This combination provides an orders-of-magnitude boost in AI performance along with the hardware architecture flexibility needed to quickly adapt to rapidly evolving neural network topologies.
Small introduction to FPGA acceleration and the impact of the new High Level Synthesis toolchains to their programmability
Video here: https://www.linkedin.com/posts/marcobarbone_can-my-application-benefit-from-fpga-acceleration-activity-6848674747375460352-0fua
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/hailo/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Orr Danon, CEO of Hailo, presents the "Emerging Processor Architectures for Deep Learning: Options and Trade-offs" tutorial at the May 2019 Embedded Vision Summit.
In the past year, numerous new processor architectures for machine learning have emerged. Many of these focus on edge applications, reflecting the growing demand for deploying machine learning outside of data centers. This intensive focus on processor architecture innovation comes at a perfect time in light of the slowing progress in silicon fabrication technology and the massive opportunities for deployment of AI applications using vision and other sensors.
In this presentation, Danon explores the architectural concepts underlying these diverse processors and analyzes their suitability for various applications. He derives the performance bounds of each architecture approach and provides insights on the practical deployment of machine learning using these specialized architectures. In addition, using a case study, he explores the opportunities enabled through designing neural networks to exploit specialized processor architectures.
Introduction Artificial Intelligence a modern approach by Russel and Norvig 1Garry D. Lasaga
In computer science, artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals. - Wikipedia
The field of artificial intelligence (AI) has witnessed tremendous growth in recent years with the advent of Deep Neural Networks (DNNs) that surpass humans in a variety of cognitive tasks.
Mr. Koushal Kumar Has done his M.Tech degree in Computer Science and Engineering from Lovely Professional University, Jalandhar, India. He obtained his B.S.C and M.S.C in computer science from D.A.V College Amritsar Punjab. His area of research interests lies in Artificial Neural Networks, Soft computing, Computer Networks, Grid Computing, and data base management systems
5G Cellular Technology, Internet of Things, 5G, and IoT, The Evolution of 5G, 5G: A Paradigm Shift and Rethinking of Mobile Business, 5G Cellular Network Architecture, 5G working with 4G, Technology behind 5G, Millimeter Waves, 5G Core Network Architecture, Network Slice Definition, 5G Service-Based Architecture (SBA), 5G will enrich the Telecommunication Ecosystem, The Internet Of Things, EVOLUTION OF IOT, FOUR LAYER MODEL FOR IOT, Typical IoT Architecture, 5G + IoT: Ushering in a New Era, Impact of 5G on IoT, KEY TECHNOLOGIES WHICH ENABLE 5G–IoT, Wireless Network Function Virtualization.(WNFV), The architecture of 5G–IoT, Device to Device (D2D) Communication, 5G and IoT applications, Research Challenges for 5G, Future of IoT
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY
HPC DAY 2017 - http://www.hpcday.eu/
Accelerating tomorrow's HPC and AI workflows with Intel Architecture
Atanas Atanasov | HPC solution architect, EMEA region at Intel
Small introduction to FPGA acceleration and the impact of the new High Level Synthesis toolchains to their programmability
Video here: https://www.linkedin.com/posts/marcobarbone_can-my-application-benefit-from-fpga-acceleration-activity-6848674747375460352-0fua
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/hailo/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Orr Danon, CEO of Hailo, presents the "Emerging Processor Architectures for Deep Learning: Options and Trade-offs" tutorial at the May 2019 Embedded Vision Summit.
In the past year, numerous new processor architectures for machine learning have emerged. Many of these focus on edge applications, reflecting the growing demand for deploying machine learning outside of data centers. This intensive focus on processor architecture innovation comes at a perfect time in light of the slowing progress in silicon fabrication technology and the massive opportunities for deployment of AI applications using vision and other sensors.
In this presentation, Danon explores the architectural concepts underlying these diverse processors and analyzes their suitability for various applications. He derives the performance bounds of each architecture approach and provides insights on the practical deployment of machine learning using these specialized architectures. In addition, using a case study, he explores the opportunities enabled through designing neural networks to exploit specialized processor architectures.
Introduction Artificial Intelligence a modern approach by Russel and Norvig 1Garry D. Lasaga
In computer science, artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals. - Wikipedia
The field of artificial intelligence (AI) has witnessed tremendous growth in recent years with the advent of Deep Neural Networks (DNNs) that surpass humans in a variety of cognitive tasks.
Mr. Koushal Kumar Has done his M.Tech degree in Computer Science and Engineering from Lovely Professional University, Jalandhar, India. He obtained his B.S.C and M.S.C in computer science from D.A.V College Amritsar Punjab. His area of research interests lies in Artificial Neural Networks, Soft computing, Computer Networks, Grid Computing, and data base management systems
5G Cellular Technology, Internet of Things, 5G, and IoT, The Evolution of 5G, 5G: A Paradigm Shift and Rethinking of Mobile Business, 5G Cellular Network Architecture, 5G working with 4G, Technology behind 5G, Millimeter Waves, 5G Core Network Architecture, Network Slice Definition, 5G Service-Based Architecture (SBA), 5G will enrich the Telecommunication Ecosystem, The Internet Of Things, EVOLUTION OF IOT, FOUR LAYER MODEL FOR IOT, Typical IoT Architecture, 5G + IoT: Ushering in a New Era, Impact of 5G on IoT, KEY TECHNOLOGIES WHICH ENABLE 5G–IoT, Wireless Network Function Virtualization.(WNFV), The architecture of 5G–IoT, Device to Device (D2D) Communication, 5G and IoT applications, Research Challenges for 5G, Future of IoT
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY
HPC DAY 2017 - http://www.hpcday.eu/
Accelerating tomorrow's HPC and AI workflows with Intel Architecture
Atanas Atanasov | HPC solution architect, EMEA region at Intel
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Igor José F. Freitas
O objetivo desta palestra é apresentar aos desenvolvedores como o universo da Computação de Alto Desempenho (computação paralela) está se tornando cada vez mais acessível e se democratizando nos softwares de Big Data e Inteligência Artificial. Supercomputadores que até pouco tempo eram utilizados apenas em indústrias de nicho, setores do governo e pela ciência, estão contribuindo para a solução de grandes desafios da sociedade, da indústria e da ciência. Esta palestra terá uma abordagem técnica envolvendo conceitos de software e hardware com o intuito de provocar o desenvolvedor a fazer uso de grandes servidores para desenvolverem aplicações inovadoras.
Como criar um mundo autônomo e conectado - Jomar SilvaiMasters
Jomar Silva - Technical Evangelist, Intel
A evolução das tecnologias de hardware, software e comunicação nos últimos anos permite projetar um novo mundo digital, autônomo e conectado.
A Internet das Coisas quando utilizada em conjunto com Inteligência Artificial propiciam um novo patamar de aplicações autônomas e conectadas, que serão a base para a criação deste novo mundo digital.
O grande desafio neste novo cenário é o imenso volume de dados que precisa ser capturado e processado em tempo real para permitir o desenvolvimento de soluções como carros autônomos e sistemas automáticos de segurança baseado em monitoramento por vídeo.
Na palestra iremos abordar estes desafios técnicos, Internet das Coisas, Inteligência Artificial, Visão Computacional, arquiteturas base para o desenvolvimento de soluções autônomas end-to-end e sobre tecnologias e produtos de hardware e software da Intel que podem te ajudar a enfrentar estes desafios de forma otimizada.
Serão abordados diversos projetos de software Open Source, bem como repositórios de soluções de código aberto que poderão ser utilizados para acelerar o aprendizado do desenvolvedor neste novo mundo digital, autônomo e conectado.
Apresentado no InterCon 2018 - https://eventos.imasters.com.br/intercon
This session was held by Vladimir Brenner, Partner Account Manager, Disruptors & AI, Intel AI at the Dive into H2O: London training on June 17, 2019.
Please find the recording here: https://youtu.be/60o3eyG5OLM
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2020/11/acceleration-of-deep-learning-using-openvino-3d-seismic-case-study-a-presentation-from-intel/
For more information about edge AI and computer vision, please visit:
https://www.edge-ai-vision.com
Manas Pathak, Global AI Lead for Oil and Gas at Intel, presents the “Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study” tutorial at the September 2020 Embedded Vision Summit.
The use of deep learning for automatic seismic data interpretation is gaining the attention of many researchers across the oil and gas industry. The integration of high-performance computing (HPC) AI workflows in seismic data interpretation brings the challenge of moving and processing large amounts of data from HPC to AI computing solutions and vice-versa.
In this presentation, Pathak illustrates this challenge via a case study using a public deep learning model for salt identification applied on a 3D seismic survey from the F3 Dutch block in the North Sea. He presents a workflow to address this challenge and perform accelerated AI on seismic data. The Intel Distribution of OpenVINO toolkit was used to increase the inference performance of a pre-trained model on an Intel CPU. OpenVINO allows CPU users to get significant improvement in AI inference performance for high memory capacity deep learning models used on large datasets without any significant loss in accuracy.
AWS Summit Singapore - Make Business Intelligence Scalable and AdaptableAmazon Web Services
Akanksha Bilani, APAC Director, Intel Software
Kapil Bansal, Alliance Head, APJ, Intel
Business, science, and academia are using AI applications — in the data center, the cloud, and at the edge — supported by a broad, growing portfolio of Intel technologies. Come join Kapil Bansal (Alliance Head – APJ) and Akanksha Bilani (APAC Director, Intel Software) to learn how Intel helps make AI initiatives practical and straightforward. Learn from the opportunities AI brings in and be part of the era of the convergence of data and the power of compute.
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureJim St. Leger
Venky Venkatesan presents information on the Data Plane Development Kit (DPDK) including an overview, background, methodology, and future direction and developments.
E5 Intel Xeon Processor E5 Family Making the Business Case Intel IT Center
This presentation highlights cloud computing advantages of the Intel® Xeon® processor E5 family and helps you make the business case for investing. Includes access to an ROI calculator.
In this deck from the Stanford HPC Conference, Shahin Khan from OrionX describes major market Shifts in IT.
"We will discuss the digital infrastructure of the future enterprise and the state of these trends."
"We work with clients on the impact of Digital Transformation (DX) on them, their customers, and their messages. Generally, they want to track, in one place, trends like IoT, 5G, AI, Blockchain, and Quantum Computing. And they want to know what these trends mean, how they affect each other, and when they demand action, and how to formulate and execute an effective plan. If that describes you, we can help."
Watch the video: https://wp.me/p3RLHQ-lPP
Learn more: http://orionx.net
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
In this deck from IWOCL / SYCLcon 2020, Hal Finkel from Argonne National Laboratory presents: Preparing to program Aurora at Exascale - Early experiences and future directions.
"Argonne National Laboratory’s Leadership Computing Facility will be home to Aurora, our first exascale supercomputer. Aurora promises to take scientific computing to a whole new level, and scientists and engineers from many different fields will take advantage of Aurora’s unprecedented computational capabilities to push the boundaries of human knowledge. In addition, Aurora’s support for advanced machine-learning and big-data computations will enable scientific workflows incorporating these techniques along with traditional HPC algorithms. Programming the state-of-the-art hardware in Aurora will be accomplished using state-of-the-art programming models. Some of these models, such as OpenMP, are long-established in the HPC ecosystem. Other models, such as Intel’s oneAPI, based on SYCL, are relatively-new models constructed with the benefit of significant experience. Many applications will not use these models directly, but rather, will use C++ abstraction libraries such as Kokkos or RAJA. Python will also be a common entry point to high-performance capabilities. As we look toward the future, features in the C++ standard itself will become increasingly relevant for accessing the extreme parallelism of exascale platforms.
This presentation will summarize the experiences of our team as we prepare for Aurora, exploring how to port applications to Aurora’s architecture and programming models, and distilling the challenges and best practices we’ve developed to date. oneAPI/SYCL and OpenMP are both critical models in these efforts, and while the ecosystem for Aurora has yet to mature, we’ve already had a great deal of success. Importantly, we are not passive recipients of programming models developed by others. Our team works not only with vendor-provided compilers and tools, but also develops improved open-source LLVM-based technologies that feed both open-source and vendor-provided capabilities. In addition, we actively participate in the standardization of OpenMP, SYCL, and C++. To conclude, I’ll share our thoughts on how these models can best develop in the future to support exascale-class systems."
Watch the video: https://wp.me/p3RLHQ-lPT
Learn more: https://www.iwocl.org/iwocl-2020/conference-program/
and
https://www.anl.gov/topic/aurora
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Greg Wahl from Advantech presents: Transforming Private 5G Networks.
Advantech Networks & Communications Group is driving innovation in next-generation network solutions with their High Performance Servers. We provide business critical hardware to the world's leading telecom and networking equipment manufacturers with both standard and customized products. Our High Performance Servers are highly configurable platforms designed to balance the best in x86 server-class processing performance with maximum I/O and offload density. The systems are cost effective, highly available and optimized to meet next generation networking and media processing needs.
“Advantech’s Networks and Communication Group has been both an innovator and trusted enabling partner in the telecommunications and network security markets for over a decade, designing and manufacturing products for OEMs that accelerate their network platform evolution and time to market.” Said Advantech Vice President of Networks & Communications Group, Ween Niu. “In the new IP Infrastructure era, we will be expanding our expertise in Software Defined Networking (SDN) and Network Function Virtualization (NFV), two of the essential conduits to 5G infrastructure agility making networks easier to install, secure, automate and manage in a cloud-based infrastructure.”
In addition to innovation in air interface technologies and architecture extensions, 5G will also need a new generation of network computing platforms to run the emerging software defined infrastructure, one that provides greater topology flexibility, essential to deliver on the promises of high availability, high coverage, low latency and high bandwidth connections. This will open up new parallel industry opportunities through dedicated 5G network slices reserved for specific industries dedicated to video traffic, augmented reality, IoT, connected cars etc. 5G unlocks many new doors and one of the keys to its enablement lies in the elasticity and flexibility of the underlying infrastructure.
Advantech’s corporate vision is to enable an intelligent planet. The company is a global leader in the fields of IoT intelligent systems and embedded platforms. To embrace the trends of IoT, big data, and artificial intelligence, Advantech promotes IoT hardware and software solutions with the Edge Intelligence WISE-PaaS core to assist business partners and clients in connecting their industrial chains. Advantech is also working with business partners to co-create business ecosystems that accelerate the goal of industrial intelligence."
Watch the video: https://wp.me/p3RLHQ-lPQ
* Company website: https://www.advantech.com/
* Solution page: https://www2.advantech.com/nc/newsletter/NCG/SKY/benefits.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
In this deck from the Stanford HPC Conference, Katie Lewis from Lawrence Livermore National Laboratory presents: The Incorporation of Machine Learning into Scientific Simulations at Lawrence Livermore National Laboratory.
"Scientific simulations have driven computing at Lawrence Livermore National Laboratory (LLNL) for decades. During that time, we have seen significant changes in hardware, tools, and algorithms. Today, data science, including machine learning, is one of the fastest growing areas of computing, and LLNL is investing in hardware, applications, and algorithms in this space. While the use of simulations to focus and understand experiments is well accepted in our community, machine learning brings new challenges that need to be addressed. I will explore applications for machine learning in scientific simulations that are showing promising results and further investigation that is needed to better understand its usefulness."
Watch the video: https://youtu.be/NVwmvCWpZ6Y
Learn more: https://computing.llnl.gov/research-area/machine-learning
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
In this deck from the Stanford HPC Conference, DK Panda from Ohio State University presents: How to Achieve High-Performance, Scalable and Distributed DNN Training on Modern HPC Systems?
"This talk will start with an overview of challenges being faced by the AI community to achieve high-performance, scalable and distributed DNN training on Modern HPC systems with both scale-up and scale-out strategies. After that, the talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of- core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented."
Watch the video: https://youtu.be/LeUNoKZVuwQ
Learn more: http://web.cse.ohio-state.edu/~panda.2/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
In this deck from the Stanford HPC Conference, Nick Nystrom and Paola Buitrago provide an update from the Pittsburgh Supercomputing Center.
Nick Nystrom is Chief Scientist at the Pittsburgh Supercomputing Center (PSC). Nick is architect and PI for Bridges, PSC's flagship system that successfully pioneered the convergence of HPC, AI, and Big Data. He is also PI for the NIH Human Biomolecular Atlas Program’s HIVE Infrastructure Component and co-PI for projects that bring emerging AI technologies to research (Open Compass), apply machine learning to biomedical data for breast and lung cancer (Big Data for Better Health), and identify causal relationships in biomedical big data (the Center for Causal Discovery, an NIH Big Data to Knowledge Center of Excellence). His current research interests include hardware and software architecture, applications of machine learning to multimodal data (particularly for the life sciences) and to enhance simulation, and graph analytics.
Watch the video: https://youtu.be/LWEU1L1o7yY
Learn more: https://www.psc.edu/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Stanford HPC Conference, Ryan Quick from Providentia Worldwide describes how DNNs can be used to improve EDA simulation runs.
"Systems Intelligence relies on a variety of methods for providing insight into the core mechanisms for driving automated behavioral changes in self-healing command and control platforms. This talk reports on initial efforts with leveraging Semiconductor Electronic Design Automation (EDA) telemetry data from cross-domain sources including power, network, storage, nodes, and applications in neural networks as a driving method for insight into SI automation systems."
Watch the video: https://youtu.be/2WbR8tq-XbM
Learn more: http://www.providentiaworldwide.com/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
In this deck from the Stanford HPC Conference, Nicole Xu from Stanford University describes how she transformed a common jellyfish into a bionic creature that is part animal and part machine.
"Animal locomotion and bioinspiration have the potential to expand the performance capabilities of robots, but current implementations are limited. Mechanical soft robots leverage engineered materials and are highly controllable, but these biomimetic robots consume more power than corresponding animal counterparts. Biological soft robots from a bottom-up approach offer advantages such as speed and controllability but are limited to survival in cell media. Instead, biohybrid robots that comprise live animals and self- contained microelectronic systems leverage the animals’ own metabolism to reduce power constraints and body as an natural scaffold with damage tolerance. We demonstrate that by integrating onboard microelectronics into live jellyfish, we can enhance propulsion up to threefold, using only 10 mW of external power input to the microelectronics and at only a twofold increase in cost of transport to the animal. This robotic system uses 10 to 1000 times less external power per mass than existing swimming robots in literature and can be used in future applications for ocean monitoring to track environmental changes."
Watch the video: https://youtu.be/HrmJFyvInj8
Learn more: https://sanfrancisco.cbslocal.com/2020/02/05/stanford-research-project-common-jellyfish-bionic-sea-creatures/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Stanford HPC Conference, Peter Dueben from the European Centre for Medium-Range Weather Forecasts (ECMWF) presents: Machine Learning for Weather Forecasts.
"I will present recent studies that use deep learning to learn the equations of motion of the atmosphere, to emulate model components of weather forecast models and to enhance usability of weather forecasts. I will than talk about the main challenges for the application of deep learning in cutting-edge weather forecasts and suggest approaches to improve usability in the future."
Peter is contributing to the development and optimization of weather and climate models for modern supercomputers. He is focusing on a better understanding of model error and model uncertainty, on the use of reduced numerical precision that is optimised for a given level of model error, on global cloud- resolving simulations with ECMWF's forecast model, and the use of machine learning, and in particular deep learning, to improve the workflow and predictions. Peter has graduated in Physics and wrote his PhD thesis at the Max Planck Institute for Meteorology in Germany. He worked as Postdoc with Tim Palmer at the University of Oxford and has taken up a position as University Research Fellow of the Royal Society at the European Centre for Medium-Range Weather Forecasts (ECMWF) in 2017.
Watch the video: https://youtu.be/ks3fkRj8Iqc
Learn more: https://www.ecmwf.int/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Gilad Shainer from the HPC AI Advisory Council describes how this organization fosters innovation in the high performance computing community.
"The HPC-AI Advisory Council’s mission is to bridge the gap between high-performance computing (HPC) and Artificial Intelligence (AI) use and its potential, bring the beneficial capabilities of HPC and AI to new users for better research, education, innovation and product manufacturing, bring users the expertise needed to operate HPC and AI systems, provide application designers with the tools needed to enable parallel computing, and to strengthen the qualification and integration of HPC and AI system products."
Watch the video: https://wp.me/p3RLHQ-lNz
Learn more: http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Today RIKEN in Japan announced that the Fugaku supercomputer will be made available for research projects aimed to combat COVID-19.
"Fugaku is currently being installed and is scheduled to be available to the public in 2021. However, faced with the devastating disaster unfolding before our eyes, RIKEN and MEXT decided to make a portion of the computational resources of Fugaku available for COVID-19-related projects ahead of schedule while continuing the installation process.
Fugaku is being developed not only for the progress in science, but also to help build the society dubbed as the “Society 5.0” by the Japanese government, where all people will live safe and comfortable lives. The current initiative to fight against the novel coronavirus is driven by the philosophy behind the development of Fugaku."
Initial Projects
Exploring new drug candidates for COVID-19 by "Fugaku"
Yasushi Okuno, RIKEN / Kyoto University
Prediction of conformational dynamics of proteins on the surface of SARS-Cov-2 using Fugaku
Yuji Sugita, RIKEN
Simulation analysis of pandemic phenomena
Nobuyasu Ito, RIKEN
Fragment molecular orbital calculations for COVID-19 proteins
Yuji Mochizuki, Rikkyo University
In this deck from the Performance Optimisation and Productivity group, Lubomir Riha from IT4Innovations presents: Energy Efficient Computing using Dynamic Tuning.
"We now live in a world of power-constrained architectures and systems and power consumption represents a significant cost factor in the overall HPC system economy. For these reasons, in recent years researchers, supercomputing centers and major vendors have developed new tools and methodologies to measure and optimize the energy consumption of large-scale high performance system installations. Due to the link between energy consumption, power consumption and execution time of an application executed by the final user, it is important for these tools and the methodology used to consider all these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.
This webinar focused on tools designed to improve the energy-efficiency of HPC applications using a methodology of dynamic tuning of HPC applications, developed under the H2020 READEX project. The READEX methodology has been designed for exploiting the dynamic behaviour of software. At design time, different runtime situations (RTS) are detected and optimized system configurations are determined. RTSs with the same configuration are grouped into scenarios, forming the tuning model. At runtime, the tuning model is used to switch system configurations dynamically.
The MERIC tool, that implements the READEX methodology, is presented. It supports manual or binary instrumentation of the analysed applications to simplify the analysis. This instrumentation is used to identify and annotate the significant regions in the HPC application. Automatic binary instrumentation annotates regions with significant runtime. Manual instrumentation, which can be combined with automatic, allows code developer to annotate regions of particular interest."
Watch the video: https://wp.me/p3RLHQ-lJP
Learn more: https://pop-coe.eu/blog/14th-pop-webinar-energy-efficient-computing-using-dynamic-tuning
and
https://code.it4i.cz/vys0053/meric
Sign up for our insideHPC Newsletter: http://insidehpc.com/newslett
In this deck from GTC Digital, William Beaudin from DDN presents: HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD.
Enabling high performance computing through the use of GPUs requires an incredible amount of IO to sustain application performance. We'll cover architectures that enable extremely scalable applications through the use of NVIDIA’s SuperPOD and DDN’s A3I systems.
The NVIDIA DGX SuperPOD is a first-of-its-kind artificial intelligence (AI) supercomputing infrastructure. DDN A³I with the EXA5 parallel file system is a turnkey, AI data storage infrastructure for rapid deployment, featuring faster performance, effortless scale, and simplified operations through deeper integration. The combined solution delivers groundbreaking performance, deploys in weeks as a fully integrated system, and is designed to solve the world's most challenging AI problems.
Watch the video: https://wp.me/p3RLHQ-lIV
Learn more: https://www.ddn.com/download/nvidia-superpod-ddn-a3i-ai400-appliance-with-the-exa5-filesystem/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Paul Isaacs from Linaro presents: State of ARM-based HPC. This talk provides an overview of applications and infrastructure services successfully ported to Aarch64 and benefiting from scale.
"With its debut on the TOP500, the 125,000-core Astra supercomputer at New Mexico’s Sandia Labs uses Cavium ThunderX2 chips to mark Arm’s entry into the petascale world. In Japan, the Fujitsu A64FX Arm-based CPU in the pending Fugaku supercomputer has been optimized to achieve high-level, real-world application performance, anticipating up to one hundred times the application execution performance of the K computer. K was the first computer to top 10 petaflops in 2011."
Watch the video: https://wp.me/p3RLHQ-lIT
Learn more: https://www.linaro.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
Today Xilinx announced Versal Premium, the third series in the Versal ACAP portfolio. The Versal Premium series features highly integrated, networked and power-optimized cores and the industry’s highest bandwidth and compute density on an adaptable platform. Versal Premium is designed for the highest bandwidth networks operating in thermally and spatially constrained environments, as well as for cloud providers who need scalable, adaptable application acceleration.
Versal is the industry’s first adaptive compute acceleration platform (ACAP), a revolutionary new category of heterogeneous compute devices with capabilities that far exceed those of conventional silicon architectures. Developed on TSMC’s 7-nanometer process technology, Versal Premium combines software programmability with dynamically configurable hardware acceleration and pre-engineered connectivity and security features to enable a faster time-to- market. The Versal Premium series delivers up to 3X higher throughput compared to current generation FPGAs, with built-in Ethernet, Interlaken, and cryptographic engines that enable fast and secure networks. The series doubles the compute density of currently deployed mainstream FPGAs and provides the adaptability to keep pace with increasingly diverse and evolving cloud and networking workloads.
Learn more: https://insidehpc.com/2020/03/xilinx-announces-versal-premium-acap-for-network-and-cloud-acceleration/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
In this video from the Rice Oil & Gas Conference, Chin Fang from Zettar presents: Moving Massive Amounts of Data across Any Distance Efficiently.
The objective of this talk is to present two on-going projects aiming at improving and ensuring highly efficient bulk transferring or streaming of massive amounts of data over digital connections across any distance. It examines the current state of the art, a few very common misconceptions, the differences among the three major type of data movement solutions, a current initiative attempting to improve the data movement efficiency from the ground up, and another multi-stage project that shows how to conduct long distance large scale data movement at speed and scale internationally. Both projects have real world motivations, e.g. the ambitious data transfer requirements of Linac Coherent Light Source II (LCLS-II) [1], a premier preparation project of the U.S. DOE Exascale Computing Initiative (ECI) [2]. Their immediate goals are described and explained, together with the solution used for each. Findings and early results are reported. Possible future works are outlined.
Watch the video: https://wp.me/p3RLHQ-lBX
Learn more: https://www.zettar.com/
and
https://rice2020oghpc.rice.edu/program-2/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Rice Oil & Gas Conference, Bradley McCredie from AMD presents: Scaling TCO in a Post Moore's Law Era.
"While foundries bravely drive forward to overcome the technical and economic challenges posed by scaling to 5nm and beyond, Moore’s law alone can provide only a fraction of the performance / watt and performance / dollar gains needed to satisfy the demands of today’s high performance computing and artificial intelligence applications. To close the gap, multiple strategies are required. First, new levels of innovation and design efficiency will supplement technology gains to continue to deliver meaningful improvements in SoC performance. Second, heterogenous compute architectures will create x-factor increases of performance efficiency for the most critical applications. Finally, open software frameworks, APIs, and toolsets will enable broad ecosystems of application level innovation."
Watch the video:
Learn more: http://amd.com
and
https://rice2020oghpc.rice.edu/program-2/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
In this deck from the ECSS Symposium, Abe Stern from NVIDIA presents: CUDA-Python and RAPIDS for blazing fast scientific computing.
"We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming."
Watch the video: https://wp.me/p3RLHQ-lvu
Learn more: https://developer.nvidia.com/rapids
and
https://www.xsede.org/for-users/ecss/ecss-symposium
Sign up for our insideHPC Newsletter: http://insidehp.com/newsletter
In this deck from FOSDEM 2020, Colin Sauze from Aberystwyth University describes the development of a RaspberryPi cluster for teaching an introduction to HPC.
"The motivation for this was to overcome four key problems faced by new HPC users:
* The availability of a real HPC system and the effect running training courses can have on the real system, conversely the availability of spare resources on the real system can cause problems for the training course.
* A fear of using a large and expensive HPC system for the first time and worries that doing something wrong might damage the system.
* That HPC systems are very abstract systems sitting in data centres that users never see, it is difficult for them to understand exactly what it is they are using.
* That new users fail to understand resource limitations, in part because of the vast resources in modern HPC systems a lot of mistakes can be made before running out of resources. A more resource constrained system makes it easier to understand this.
The talk will also discuss some of the technical challenges in deploying an HPC environment to a Raspberry Pi and attempts to keep that environment as close to a "real" HPC as possible. The issue to trying to automate the installation process will also be covered."
Learn more: https://github.com/colinsauze/pi_cluster
and
https://fosdem.org/2020/schedule/events/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from ATPESC 2019, Ken Raffenetti from Argonne presents an overview of HPC interconnects.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two-week training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-luc
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
1. Intel Network & Custom Logic Group
Greg Nash, System Architect – Government, Aerospace & Military
Jim Moawad, Technical Solution Specialist – FPGA Acceleration
July 29, 2019 - Argonne Training Program on Extreme Scale Computing
Public
3. 3
Agenda
• Intel® and AI / Machine Learning
• Accelerate Deep Learning Using OpenVINO Toolkit
• Deep Learning Acceleration with FPGA
– FPGAs and Machine Learning
– Intel® FPGA Deep Learning Acceleration Suite
– Execution on the FPGA (Model Optimizer & Inference Engine)
• Intel® Agilex® FPGA
• OneAPI
17. 17
1 5.7x inference throughput improvement with Intel® Optimizations for Caffe ResNet-50 on Intel® Xeon® Platinum 8180 Processor in Feb 2019 compared to performance at launch in July 2017. See configuration details on Config 1
Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates.28/24/2018) Results have been estimated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect
your actual performance. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific
instruction sets covered by this notice. No product can be absolutely secure. See configuration disclosure for details. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for
optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any
optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved
for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have
been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any
of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with
other products. For more complete information visit: http://www.intel.com/performance
Baseline
50x
285x
2S Intel® Xeon® Scalable Processor (Skylake)
July 2017 July 2017 Skylake launch February 2019
vs. Baseline vs. Baseline
DeeplearningperformanceonCPU
I
N
T
8
V
N
N
I
Resnet-50 1.9x2
Inception v3 1.8x2
SSD-VGG16 2.0x2
Caffe
Resnet-50 1.9x2
Inception v3 1.8x2
TensorFlow
SKYLAKE
CASCADE
LAKE
Hardware + Software Improvements for Intel® Xeon® Processors
24. 24
Framework Topology Library
DeeplearningSoftware
Provides a standard way to
customize and deploy Deep
Neural Networks
Supplies generic functionality
which can be selectively extended
with user defined additions to
enhance specific tasks
A set of algorithms modeled
loosely after the human brain
that forms a “network” designed
to recognize complex patterns
Also referred to as a network,
NN, or model.
Highly optimized functions intended
to be used in Neural Network
implementations to maximize
performance
Libraries are framework and model
agnostic, hardware specific
MACHINE LEARNING LIBRARIES
Python R Distributed
• Scikit-learn
• Pandas
• NumPy
• Cart
• Random
Forest
• e1071
• MlLib (on Spark)
• Mahout
DAAL MKL-DNN clDNN
Intel® Data Analytics
Acceleration Library
(includes machine
learning)
Open-source deep neural
network functions for
CPU / integrated graphics
26. 26
ConvolutionalNeuralNetwork(CNN)
→ Layers of neurons connected by learned weights
→ Three layer classes:
• Convolution Layer
• Pooling Layer
• Fully connected Layer
→ Connections between the layers point in one direction (Feed
Forward Network)
→ Used to extract features from images for recognition or detection
RecurrentNeuralNetwork(RNN)
→ Allows long-term dependencies to affect the output by relying
on internal memory to process inputs
→ Types of RNN:
• LSTM (Long Short Term Memory)
• GRU (Gated Recurrent Unit)
→ Designed to analyze sequential data (i.e., NLP)
=
x1 x2 x3 x4
y1 y2 y3
30. 31
Agenda
• Intel® and AI / Machine Learning
• Accelerate Deep Learning Using OpenVINO Toolkit
• Deep Learning Acceleration with FPGA
– FPGAs and Machine Learning
– Intel® FPGA Deep Learning Acceleration Suite
– Execution on the FPGA (Model Optimizer & Inference Engine)
• Intel® Agilex® FPGA
• OneAPI
41. 42
Agenda
• Intel® and AI / Machine Learning
• Accelerate Deep Learning Using OpenVINO Toolkit
• Deep Learning Acceleration with FPGA
– FPGAs and Machine Learning
– Intel® FPGA Deep Learning Acceleration Suite
– Execution on the FPGA (Model Optimizer & Inference Engine)
• Intel® Agilex® FPGA
• OneAPI
43. 44
VisionInferencerequirementsVary
Design Variable Considerations
Performance Frames/sec, latency (real-time constraints)
Power Power envelope, power per inference
Cost Hardware cost, development cost, cost per inference
Batch size 1, 8, 32, etc.
Precision FP32, FP16, FP11
Image size 720p, 1080p, 4K
Camera input Raw video, encoded
Network support CNN, RNN, LSTM, custom
Operating environment Temperature controlled environment, outdoor use, 24/7 operation
Product life Operational lifespan, product availability
Regulation Security, privacy
There is no “one-size-fits-all” solution . . .
. . . and sometimes requirements change
44. 45
• Field Programmable Gate Array (FPGA)
– Millions of logic elements
– Thousands of embedded memory blocks
– Thousands of DSP blocks
– Programmable routing
– High speed transceivers
– Various built-in hardened IP
• Used to create Custom Hardware
• Well-suited for Matrix Multiplation
DSP Block
Memory Block
Programmable
Routing Switch
Logic
Modules
FPGAOverview
45. 46
Intel® FPGA PAC
N3000 for networking
Intel® FPGA PAC
D5005 for Datacentre
Mustang-F-100-A10*
Intel®FPGA–applicationAccelerationfromEdgetoCloud
Combining Intel FPGA hardware and software
to efficiently accelerate workloads for processing-intense tasks
Programmable Acceleration Card (PAC) offering
Devices/edge Network/NFV Cloud/enterprise
* Other names and brands may be claimed as the property of others.
Intel® FPGA PAC
with Arria®10 GX
46. 47
WhyFPGAForDeepLearningInference
Keydifferentiators• Low latency
• Low power
• High throughput (FPS)
• Flexibility to adapt to new, evolving, and custom networks
• Supports large image sizes (e.g., 4K)
• Large networks (up to 4 billion parameters)
• Security / Encryption
• Wide ambient temperature range (-40° C to 105° C)*
• 24/7/365 operation
• Long lifespan (8–10 years)
Vision Acceleration
Design with Intel®
Arria® 10 FPGA
*-40°C to 105°C temp range is for chip down design. Board temp range is 0° to 65°C.
47. 48
IncreaseDeepLearningWorkloadPerformanceonPublicModels
usingIntel®DistributionofOpenVINO™toolkit&Intel®Architecture
Optimize with OpenVINO on CPU, Get an even Bigger Performance Boost with Intel® FPGA
1Depending on workload, quality/resolution for FP16 may be marginally impacted. A performance/quality tradeoff from FP32 to FP16 can affect accuracy; customers are encouraged to experiment to find what works best
for their situation. Performance results are based on testing as of June 13, 2018 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. For
more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Configuration: Testing by Intel as of June 13, 2018. Intel® Core™ i7-6700K CPU @ 2.90GHz fixed, GPU GT2 @
1.00GHz fixed Internal ONLY testing, Test v3.15.21 – Ubuntu* 16.04, OpenVINO 2018 RC4, Intel® Arria® 10 FPGA 1150GX. Tests were based on various parameters such as model used (these are public), batch size, and
other factors. Different models can be accelerated with different Intel hardware solutions, yet use the same Intel software tools.
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2,
SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by
Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804
Public Models (Batch Size)
RelativePerformance
Improvement
Standard
Caffe*
Baseline
19.9x1
ComparisonofFramesperSecond(FPS)
48. 49
Solving Machine Learning Challenges with FPGA
Real-Time
deterministic
lowlatency
Ease-of-use
softwareabstraction,
platforms&libraries
Flexibility
customizablehardware
fornextgenDNNarchitectures
Intel FPGAs can be customized
to enable advances in machine
learning algorithms.
Intel FPGA hardware
implements a deterministic low
latency data path unlike any
other competing compute
device.
Intel FPGA solutions enable
software-defined programming
of customized machine
learning accelerator libraries.
49. 50
Public Intel FPGA Machine Learning Success
Microsoft: “Microsoft has revealed that Altera FPGAs have been installed across
every Azure cloud server, creating what the company is calling “the world’s first AI
supercomputer.” -Sept 2016
NEC: “To create the NeoFace Accelerator, the engine software IP was integrated
into an Intel Arria 10 FPGA, which operate in Xeon processor–based servers. ” -June
2017
JD.COM: “Arria®10 FPGA can achieve 5x improvement in the performance of LSTM
accelerator card compared to GPU.” –Apr 2017
Inspur/iFlytech: “Leading server vendor Inspur Group and Altera today
launched a speech recognition acceleration solution based on Altera's Arria® 10
FPGAs and DNN algorithm from iFLYTEK.” -Dec 2016
ZTE: “Using Intel’s Arria 10 FPGA, ZTE engineers were able to achieve more than
1000 images per second in facial recognition.” –Jan 2017
50. 51
Why FPGAs for Deep Learning?
PE PE PE PE
PE PE PE PE
feeder
feeder
feeder
feeder
feeder feeder feeder feeder
PE PE PE PE
PE PE PE PE
51. 52
Customized Performance
Convolutional Neural Networks are Compute Intensive Feature Benefit
Highly parallel
architecture
Facilitates efficient low-batch video
stream processing and reduces latency
Configurable
Distributed
Floating Point DSP
Blocks
FP32 9Tflops, FP16, FP11
Accelerates computation by tuning
compute performance
Tightly coupled
high-bandwidth
memory
>50TB/s on chip SRAM bandwidth,
random access, reduces latency,
minimizes external memory access
Programmable
Data Path
Reduces unnecessary data movement,
improving latency and efficiency
Configurability
Support for variable precision (trade-off
throughput and accuracy). Future proof
designs, and system connectivity
Fine-grained & low latency
between compute and memory
Convolutional Neural Networks are Compute Intensive
Function 2Function 1 Function 3
IO IO
Optional
Memory
Optional Memory
Pipeline Parallelism
60. 61
Agenda
• Intel® and AI / Machine Learning
• Accelerate Deep Learning Using OpenVINO Toolkit
• Deep Learning Acceleration with FPGA
– FPGAs and Machine Learning
– Intel® FPGA Deep Learning Acceleration Suite
– Execution on the FPGA (Model Optimizer & Inference Engine)
• Intel® Agilex® FPGA
• OneAPI
61. 62
Intel® FPGADeepLearningAcceleration(DLAforOpenVINO)
Pre-compiledGraphArchitecture ExampleTopologies
DDR
DDR
DDR
DDR
Configuration
Engine
AlexNet GoogLeNet Tiny Yolo
SqueezeNetVGG16 ResNet 18
…*
ResNet 50ResNet 101
Crossbar
CUSTOM* PRIM
Conv
PE Array
Feature Map Cache
PRIM PRIM
*More topologies added with every release
MobileNet ResNet SSD
SqueezeNet
SDD
Memory
Reader
/Writer
Convolution
Processing
Element
Array
Crossbar
ReLU Norm MaxPool
Memory
Reader
/Writer
Feature Map Cache
Config
Engine
*Deeper customization options
COMING SOON!
* Other names and brands may be claimed as the property of others.
81. 82
FeatureCacheandPEArrayTradeoffs
Performance
Feature Cache Size
Small Feature cache
Bigger PE array,
but more slicing
Performance
bottleneck = DDR
Big Feature cache
Less slicing,
but smaller PE array
Performance bottleneck
= PE compute
Sweet spot varies
based on graph
dimensions
PE Array Size
90. 91
Agenda
• Intel® and AI / Machine Learning
• Accelerate Deep Learning Using OpenVINO Toolkit
• Deep Learning Acceleration with FPGA
– FPGAs and Machine Learning
– Intel® FPGA Deep Learning Acceleration Suite
– Execution on the FPGA (Model Optimizer & Inference Engine)
• Intel® Agilex® FPGA
• OneAPI
103. 104
Create IE Instance
Load IE Plugin (MKL-DNN/clDNN/DLA)
Read an IR Model
Set Target Device (CPU/GPU/Movidius/FPGA)
Load network to plugin
Allocate input, output buffers
Fill input buffer with data
Run inference
Interpret output results
Initialization
Main loop
Initialize
• Fill Input
• Infer
• Interpret
Output
Inference Engine Workflow
126. 127
Agenda
• Intel® and AI / Machine Learning
• Accelerate Deep Learning Using OpenVINO Toolkit
• Deep Learning Acceleration with FPGA
– FPGAs and Machine Learning
– Intel® FPGA Deep Learning Acceleration Suite
– Execution on the FPGA (Model Optimizer & Inference Engine)
• Intel® Agilex® FPGA
• OneAPI
128. Introducing Intel® Agilex™ FPGAs
Intel innovation for ultimate
agility and flexibility
AnyDeveloper
Intel® Quartus® Prime
Design Tool for
Hardware Developers
One API for Software
Developers
Any-to-anyintegration
Memory, Analog, Logic
Any Node, Supplier, IP
Rapid Intel® eASIC™
Devices Optimization
10nm Process
Memory-Coherent Intel® Xeon®
Processor Acceleration
Massive Bandwidth
High-Performancecompute
129
129. The FPGA for the Data-Centric World
130
Upto
40%
Higher
performance1,3
Upto
40%
Lower
Power1,3
Process
Data
Store
Data
Move
Data
DDR5&
HBM
Intel®Optane™DC
PersistentMemorysupport
Transceiver
datarates
Intel®Xeon®Processorcoherent
connectivity&PCIe Gen5
112G
1 Compared to Intel® Stratix® 10 FPGAs
2 With FP16 configuration
3 Based on current estimates, see slide 180 for details
Upto
40TFLOPS
DSPPerformance
2
Generation
Intel®
Hyperflex™
Architecture
nd2
2,3
130. Any-to-Any Heterogenous 3D Integration
131
Chiplet-BasedArchitecture
Library of chiplets:
transceivers, custom I/O &
custom compute tiles
Intel®eASIC™TileCustomization
Ultimate customization with
integrated Intel® eASIC™ tile
Embedded Multi-dieinterconnectbridge
No-compromise die-to-die
3D packaging interconnect
131. Custom Logic Continuum
132
Flexible and Agile Optimization Across Product Lifecycle
FPGA
Fastest Time to Market
Highest Flexibility
Highest Performance
Lowest Power & Cost*
Optimized TTM with
Performance / Cost
eASIC ASIC
Reusable IP
1Refers to Intel® eASIC™ Devices
* Compared to other options shown
1
132. First Memory-Coherent Accelerator
for Intel® Xeon® Scalable Processors
133
FPGA
Memory
Processor
Memory
Compute
Express Link*
(CXL)
Accelerates Diverse Workloads
• LowLatency
• MemoryCoherence
• IncreasedMemorySpace
(Including data analytics, database acceleration
and function-as-a-service)
134. Well Suited for Artificial Intelligence Applications
3,4 based on current estimates, see slide 180 for details
135
Complementary to:
Flexibility for evolving workloads and multi-
function integration
AI+ usage model
Only FPGA supporting hardened
BFLOAT16 & FP16
Configurable DSP
• FP32
• BFLOAT16
• FP16
• INT8 through INT2
One APIOpenVINO™ toolkit
40TFLOPS
3
Upto
FP16Performance
Upto
92TOPS
INT8Performance
4
135. Intel® Agilex™ FPGA Tools for Developers
136
Single source, heterogenous programming environment
Support for common performance library APIs
FPGA support with Intel software development tools including Intel®
VTune™ Amplifier & Intel® Advisor
5See slide 180 for details
Higher productivity:
30% improvement in compile times5
New productivity flows and usability features for faster design
convergence
Higher efficiency: 15% improvement in memory utilization5
Hardware Developers
Software Developers
ONEAPI
136. Intel® Agilex™ FPGA Family
137
F-Series
For wide range of applications
Up to 58G transceivers
PCIe* Gen4
DDR4
I-SeriesFor high-performance
processor interface and
bandwidth-intensive applications
Up to 112G transceivers
PCIe* Gen5
DDR4
Quad-Core ARM* Cortex*-A53
M-Series
For compute-intensive applications
Up to 112G transceivers
PCIe* Gen5
DDR4, DDR5 and Intel® Optane™
DC Persistent Memory support
Quad-Core ARM* Cortex*-A53
Quad-Core ARM* Cortex*-A53
SoC Option
HBM Option
Coherent attach to Intel® Xeon®
Scalable Processor option (CXL)
Coherent attach to Intel® Xeon®
Scalable Processor option (CXL)
Intel® Quartus® Prime Design Software Support April 2019;
First Device Availability Q3 2019
137. Efficient Network Transformation
First FPGA Providing Flexibility and Agility From 100Gbps to 1Tbps
InfrastructureOffload
OptimizedArchitecture
vSwitch | vRouter | Security
SmallFormFactor&LowPower
Wide Range of Servers
DatapathAcceleration
VNFPerformanceOptimization
Load Balancing | Data Integrity
Network Translation
SignificantImprovements
Throughput | Jitter | Latency
28G-112G
per channel
PCIe Gen 4/5
138
1
1 Compared to Intel® Stratix® 10 FPGAs
138. Converged Workload Acceleration for the Datacenter
First FPGA with Comprehensive Memory Support and
Coherent Attach to Intel® Xeon® Scalable Processors
Memory
Coherent
(CXL)
infrastructureacceleration
Network | Security | Remote Memory Access
Applicationacceleration
AI | Search | Video Transcode | Database
40 TFLOPs of DSP Performance1
StorageAcceleration
Compression | Decompression | Encryption
Memory Hierarchy ManagementDDR5 SSD
139
1 with FP16 configuration, based on current estimates, see slide 180 for details
139. Radio
Baseband
Agility & Flexibility for All Stages of 5G Implementation
+
FPGAFlexibility
High Flexibility | Short Time-to-Market
RapidIntel®eASIC™Deviceoptimization
Power & Cost Optimization
FullCustomASICOPtimization
Best Power1 | Best Performance1 | Best Cost1
Application-SpecificTileOptions
Data Converter | Vector Engine | Custom Compute
Data Converter
DFE
Beamforming
Fronthaul
Only Intel® Provides All of These Customization Options
CustomLogicContinuum
FEC (Turbo/LDPC)
uRLLC
Fronthaul
140
1 Compared to the other customization options shown
140. Intel® Agilex™ FPGAs for the Data-Centric World
Any-to-Any integration enables Intel® FPGAs with
application-specific optimization and customization, delivering
new levels of flexibility & agility
Intel® Agilex™ FPGAs for transformative applications in edge
computing, embedded, networking (5G/NFV) and data centers
First FPGAs leveraging Intel’s unmatched innovation:
10nm process, 3D integration, Intel® Xeon® Scalable Processor memory
coherency (CXL), 112G XCVRs, PCIe* Gen 5, Intel® eASIC™ devices,
One API, Intel® Optane™ DC Persistent Memory support
141
141. 142
Agenda
• Intel® and AI / Machine Learning
• Accelerate Deep Learning Using OpenVINO Toolkit
• Deep Learning Acceleration with FPGA
– FPGAs and Machine Learning
– Intel® FPGA Deep Learning Acceleration Suite
– Execution on the FPGA (Model Optimizer & Inference Engine)
• Intel® Agilex® FPGA
• OneAPI