At Netweb we believe that innovation is a critical business need. As data analytics, high-performance computing and artificial intelligence continue to evolve, we are building solutions and to help you keep pace with the constantly evolving landscape.
DCEU 18: Designing a Global Centralized Container Platform for a Multi-Cluste...Docker, Inc.
Mijo Safradin - Linux Engineer, Robert Bosch GmbH
Deploying, operating and maintaining many independent clusters is always a key challenge for central service providers in large enterprises. The number of customers and different use-cases realized on the provided platform requires an architecture that is highly integrated into the enterprise IT ecosystem. In this talk we highlight the challenges that came up during the development of the “Container as a Service” Platform based on Docker Enterprise. We also address the architectural and operational decisions we made to cope with requirements of different stakeholders. Further we will show the integration of a multi-cluster and multi-tenant Platform into our existing IT factory.
HP CAST 2017 Frankfurt : HPE UberCloud boosting HPC as a ServiceThomas Francis
HPC as a Service – Why is it important? Server HW Storage In the beginning - IT had to build, maintain and manage the HPC stack with very little help. Now IT finally has the tools to manage the full stack
www.theubercloud.com/hpc-as-a-service
oneAPI: Industry Initiative & Intel ProductTyrone Systems
With the growth of AI, machine learning, and data-centric applications, the industry needs a programming model that allows developers to take advantage of rapid innovation in processor architectures. TensorFlow supports the oneAPI industry initiative and its standards-based open specification.
oneAPI complements TensorFlow’s modular design and provides increased choice of hardware vendor and processor architecture, and faster support of next-generation accelerators. TensorFlow uses oneAPI today on Xeon processors and we look forward to using oneAPI to run on future Intel architectures.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2019-alliance-vitf-khronos
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Neil Trevett, President of the Khronos Group and Vice President of Developer Ecosystems at NVIDIA, delivers the presentation "Current and Planned Standards for Computer Vision and Machine Learning" at the Embedded Vision Alliance's December 2019 Vision Industry and Technology Forum. Trevett shares updates on recent, current and planned Khronos standardization activities aimed at streamlining the deployment of embedded vision and AI.
At Netweb we believe that innovation is a critical business need. As data analytics, high-performance computing and artificial intelligence continue to evolve, we are building solutions and to help you keep pace with the constantly evolving landscape.
DCEU 18: Designing a Global Centralized Container Platform for a Multi-Cluste...Docker, Inc.
Mijo Safradin - Linux Engineer, Robert Bosch GmbH
Deploying, operating and maintaining many independent clusters is always a key challenge for central service providers in large enterprises. The number of customers and different use-cases realized on the provided platform requires an architecture that is highly integrated into the enterprise IT ecosystem. In this talk we highlight the challenges that came up during the development of the “Container as a Service” Platform based on Docker Enterprise. We also address the architectural and operational decisions we made to cope with requirements of different stakeholders. Further we will show the integration of a multi-cluster and multi-tenant Platform into our existing IT factory.
HP CAST 2017 Frankfurt : HPE UberCloud boosting HPC as a ServiceThomas Francis
HPC as a Service – Why is it important? Server HW Storage In the beginning - IT had to build, maintain and manage the HPC stack with very little help. Now IT finally has the tools to manage the full stack
www.theubercloud.com/hpc-as-a-service
oneAPI: Industry Initiative & Intel ProductTyrone Systems
With the growth of AI, machine learning, and data-centric applications, the industry needs a programming model that allows developers to take advantage of rapid innovation in processor architectures. TensorFlow supports the oneAPI industry initiative and its standards-based open specification.
oneAPI complements TensorFlow’s modular design and provides increased choice of hardware vendor and processor architecture, and faster support of next-generation accelerators. TensorFlow uses oneAPI today on Xeon processors and we look forward to using oneAPI to run on future Intel architectures.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2019-alliance-vitf-khronos
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Neil Trevett, President of the Khronos Group and Vice President of Developer Ecosystems at NVIDIA, delivers the presentation "Current and Planned Standards for Computer Vision and Machine Learning" at the Embedded Vision Alliance's December 2019 Vision Industry and Technology Forum. Trevett shares updates on recent, current and planned Khronos standardization activities aimed at streamlining the deployment of embedded vision and AI.
Whether you are an AI, HPC, IoT, Graphics, Networking or Media developer, visit the Intel Developer Zone today to access the latest software products, resources, training, and support. Test-drive the latest Intel hardware and software products on DevCloud, our online development sandbox, and use DevMesh, our online collaboration portal, to meet and work with other innovators and product leaders. Get started by joining the Intel Developer Community @ software.intel.com.
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
Integrated into Intel® Advisor, Cache-aware Roofline Modeling (CARM) provides insight into how an application behaves by helping to determine a) how optimally it works on a given hardware, b) the main factors that limit performance, c) if the workload is memory or compute-bound, and d) the right strategy to improve application performance.
DCEU 18: Edge Computing with Docker EnterpriseDocker, Inc.
Marc Meunier - Director of Business Development, Docker
Adam Parco - Director of Engineering, Edge & IoT, Docker
The Internet of Things (IoT) is pushing more computing to the edge - where data from devices can be aggregated, filtered, and analyzed before it’s sent somewhere else. As edge devices become more powerful and capable of running sophisticated applications, the edge servers have to keep pace with development. The challenge for edge computing is that these servers and devices are distributed geographically across many sites and sometimes inaccessible. The Docker platform is designed for distributed computing and provides an easy way to securely distribute and run applications at the edge. In this session, we will outline some of the major trends around edge computing and the common architectures and use cases across different industries. We will highlight some of the work we’re doing with our customers to deliver on these edge use cases and where Docker is headed.
Resilient microservices with Kubernetes - Mete AtamelITCamp
Creating a single microservice is a well understood problem. Creating a cluster of load-balanced microservices that are resilient and self-healing is not so easy. Managing that cluster with rollouts and rollbacks, scaling individual services on demand, securely sharing secrets and configuration among services is even harder. Kubernetes, an open-source container management system, can help with this. In this talk, we will start with a simple microservice, containerize it using Docker, and scale it to a cluster of resilient microservices managed by Kubernetes. Along the way, we will learn what makes Kubernetes a great system for automating deployment, operations, and scaling of containerized applications.
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC ComputingHPC DAY
HPC DAY 2017 - http://www.hpcday.eu/
Altair's PBS Pro: Your Gateway to HPC Computing
Dr. Jochen Krebs | Director Enterprise Sales Central & Eastern Europe at Altaire
Axel Koehler from Nvidia presented this deck at the 2016 HPC Advisory Council Switzerland Conference.
“Accelerated computing is transforming the data center that delivers unprecedented through- put, enabling new discoveries and services for end users. This talk will give an overview about the NVIDIA Tesla accelerated computing platform including the latest developments in hardware and software. In addition it will be shown how deep learning on GPUs is changing how we use computers to understand data.”
In related news, the GPU Technology Conference takes place April 4-7 in Silicon Valley.
Watch the video presentation: http://insidehpc.com/2016/03/tesla-accelerated-computing/
See more talks in the Swiss Conference Video Gallery:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter:
http://insidehpc.com/newsletter
Journey Through Four Stages of Kubernetes Deployment MaturityAltoros
In this webinar we will discuss a crawl, walk, run approach to continuous delivery (CD) for applications, point by point:
Where to start, how to advance, and how to reach the level of maximum automation.
How to orchestrate CI/CD processes along with routing and business continuity.
When the automation level is sufficient.
GitOps principles and their benefits.
What tools should be used to automate CI, CD, GitOps, Container Registry, Secrets management, etc
Fusion simulations have traditionally required the use of leadership scale HPC resources in order to produce advances in physics. One such package is CGYRO, a premier tool for multi-scale plasma turbulence simulation. CGYRO is a typical HPC application that will not fit into a single node, as it requires several TeraBytes of memory and O(100) TFLOPS compute capability for cutting-edge simulations. CGYRO also requires high-throughput and low-latency networking, due to its reliance on global FFT computations. While in the past such compute may have required hundreds, or even thousands of nodes, recent advances in hardware capabilities allow for just tens of nodes to deliver the necessary compute power. We explored the feasibility of running CGYRO on Cloud resources provided by Microsoft on their Azure platform, using the infiniband-connected HPC resources in spot mode. We observed both that CPU-only resources were very efficient, and that running in spot mode was doable, with minimal side effects. The GPU-enabled resources were less cost effective but allowed for higher scaling.
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...Kuralamudhan Ramakrishnan
The first wave of NFV was about taking a network function and running it as-is in a virtual environment. The web giants follow a different approach called Cloud Native. Cloud Native views the cloud as a huge distributed compute platform, applications are broken into micro-services and deployed in a container based environment using DevOps.
Communication Service Providers are looking to adopt Cloud Native, yet the existing Cloud Native principles are not sufficient to meet their business and NFV use case needs. In this session, Intel and Cisco will explore and share experiences addressing challenges, technology gaps and migration path to Cloud Native for NFV.
Join us to alleviate your concerns around data plane performance, control, and DevOps deployment when using micro-services, Containers, and Kubernetes implementations.
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...James Anderson
Infrastructure as Code (IaC) is a concept that has been around for a while now and much research has been done to not only prove out the value but also how to enhance IaC implementations. We have a full guest list including Steve Cravens, who can speak to the school of hard knocks of why IaC is important. Stenio Ferreira, who prior to Google worked at Hashicorp and has vast experience on how to successfully implement IaC with Terraform. Lastly, Josh Addington, who is an Sr. Solutions Engineer at Hashicorp and will be speaking to the Day 2 operations as well as other offerings that can enhance IaC implementations.
Here is the high level overview:
• IaC overview
• Terraform Tactical
• IaC day 2 and Governance
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...NETWAYS
Machine learning is a method of data analysis that automates the building of analytical models. By using algorithms that iteratively learn from data, computers are able to find hidden insights without the help of explicit programming. These insights bring tremendous benefits into many different domains. For business users, in particular, these insights help organizations improve customer experience, become more competitive, and respond much faster to opportunities or threats. The availability of very powerful in-memory computing platforms, such as Apache Ignite, means that more organizations can benefit from machine learning today. In this presentation we will look at some of the main components of Apache Ignite, such as the Compute Grid, Data Grid and the Machine Learning Grid. Through examples, attendees will learn how Apache Ignite can be used for data analysis.
Whether you are an AI, HPC, IoT, Graphics, Networking or Media developer, visit the Intel Developer Zone today to access the latest software products, resources, training, and support. Test-drive the latest Intel hardware and software products on DevCloud, our online development sandbox, and use DevMesh, our online collaboration portal, to meet and work with other innovators and product leaders. Get started by joining the Intel Developer Community @ software.intel.com.
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
Integrated into Intel® Advisor, Cache-aware Roofline Modeling (CARM) provides insight into how an application behaves by helping to determine a) how optimally it works on a given hardware, b) the main factors that limit performance, c) if the workload is memory or compute-bound, and d) the right strategy to improve application performance.
DCEU 18: Edge Computing with Docker EnterpriseDocker, Inc.
Marc Meunier - Director of Business Development, Docker
Adam Parco - Director of Engineering, Edge & IoT, Docker
The Internet of Things (IoT) is pushing more computing to the edge - where data from devices can be aggregated, filtered, and analyzed before it’s sent somewhere else. As edge devices become more powerful and capable of running sophisticated applications, the edge servers have to keep pace with development. The challenge for edge computing is that these servers and devices are distributed geographically across many sites and sometimes inaccessible. The Docker platform is designed for distributed computing and provides an easy way to securely distribute and run applications at the edge. In this session, we will outline some of the major trends around edge computing and the common architectures and use cases across different industries. We will highlight some of the work we’re doing with our customers to deliver on these edge use cases and where Docker is headed.
Resilient microservices with Kubernetes - Mete AtamelITCamp
Creating a single microservice is a well understood problem. Creating a cluster of load-balanced microservices that are resilient and self-healing is not so easy. Managing that cluster with rollouts and rollbacks, scaling individual services on demand, securely sharing secrets and configuration among services is even harder. Kubernetes, an open-source container management system, can help with this. In this talk, we will start with a simple microservice, containerize it using Docker, and scale it to a cluster of resilient microservices managed by Kubernetes. Along the way, we will learn what makes Kubernetes a great system for automating deployment, operations, and scaling of containerized applications.
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC ComputingHPC DAY
HPC DAY 2017 - http://www.hpcday.eu/
Altair's PBS Pro: Your Gateway to HPC Computing
Dr. Jochen Krebs | Director Enterprise Sales Central & Eastern Europe at Altaire
Axel Koehler from Nvidia presented this deck at the 2016 HPC Advisory Council Switzerland Conference.
“Accelerated computing is transforming the data center that delivers unprecedented through- put, enabling new discoveries and services for end users. This talk will give an overview about the NVIDIA Tesla accelerated computing platform including the latest developments in hardware and software. In addition it will be shown how deep learning on GPUs is changing how we use computers to understand data.”
In related news, the GPU Technology Conference takes place April 4-7 in Silicon Valley.
Watch the video presentation: http://insidehpc.com/2016/03/tesla-accelerated-computing/
See more talks in the Swiss Conference Video Gallery:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter:
http://insidehpc.com/newsletter
Journey Through Four Stages of Kubernetes Deployment MaturityAltoros
In this webinar we will discuss a crawl, walk, run approach to continuous delivery (CD) for applications, point by point:
Where to start, how to advance, and how to reach the level of maximum automation.
How to orchestrate CI/CD processes along with routing and business continuity.
When the automation level is sufficient.
GitOps principles and their benefits.
What tools should be used to automate CI, CD, GitOps, Container Registry, Secrets management, etc
Fusion simulations have traditionally required the use of leadership scale HPC resources in order to produce advances in physics. One such package is CGYRO, a premier tool for multi-scale plasma turbulence simulation. CGYRO is a typical HPC application that will not fit into a single node, as it requires several TeraBytes of memory and O(100) TFLOPS compute capability for cutting-edge simulations. CGYRO also requires high-throughput and low-latency networking, due to its reliance on global FFT computations. While in the past such compute may have required hundreds, or even thousands of nodes, recent advances in hardware capabilities allow for just tens of nodes to deliver the necessary compute power. We explored the feasibility of running CGYRO on Cloud resources provided by Microsoft on their Azure platform, using the infiniband-connected HPC resources in spot mode. We observed both that CPU-only resources were very efficient, and that running in spot mode was doable, with minimal side effects. The GPU-enabled resources were less cost effective but allowed for higher scaling.
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...Kuralamudhan Ramakrishnan
The first wave of NFV was about taking a network function and running it as-is in a virtual environment. The web giants follow a different approach called Cloud Native. Cloud Native views the cloud as a huge distributed compute platform, applications are broken into micro-services and deployed in a container based environment using DevOps.
Communication Service Providers are looking to adopt Cloud Native, yet the existing Cloud Native principles are not sufficient to meet their business and NFV use case needs. In this session, Intel and Cisco will explore and share experiences addressing challenges, technology gaps and migration path to Cloud Native for NFV.
Join us to alleviate your concerns around data plane performance, control, and DevOps deployment when using micro-services, Containers, and Kubernetes implementations.
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...James Anderson
Infrastructure as Code (IaC) is a concept that has been around for a while now and much research has been done to not only prove out the value but also how to enhance IaC implementations. We have a full guest list including Steve Cravens, who can speak to the school of hard knocks of why IaC is important. Stenio Ferreira, who prior to Google worked at Hashicorp and has vast experience on how to successfully implement IaC with Terraform. Lastly, Josh Addington, who is an Sr. Solutions Engineer at Hashicorp and will be speaking to the Day 2 operations as well as other offerings that can enhance IaC implementations.
Here is the high level overview:
• IaC overview
• Terraform Tactical
• IaC day 2 and Governance
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...NETWAYS
Machine learning is a method of data analysis that automates the building of analytical models. By using algorithms that iteratively learn from data, computers are able to find hidden insights without the help of explicit programming. These insights bring tremendous benefits into many different domains. For business users, in particular, these insights help organizations improve customer experience, become more competitive, and respond much faster to opportunities or threats. The availability of very powerful in-memory computing platforms, such as Apache Ignite, means that more organizations can benefit from machine learning today. In this presentation we will look at some of the main components of Apache Ignite, such as the Compute Grid, Data Grid and the Machine Learning Grid. Through examples, attendees will learn how Apache Ignite can be used for data analysis.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/intel/embedded-vision-training/videos/pages/may-2017-embedded-vision-summit-pisarevsky
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Vadim Pisarevsky, Software Engineering Manager at Intel, presents the "Making OpenCV Code Run Fast" tutorial at the May 2017 Embedded Vision Summit.
OpenCV is the de facto standard framework for computer vision developers, with a 16+ year history, approximately one million lines of code, thousands of algorithms and tens of thousands of unit tests. While OpenCV delivers decent performance out-of-the-box for some classical algorithms on desktop PCs, it lacks sufficient performance when using some modern algorithms, such as deep neural networks, and when running on embedded platforms. Pisarevsky examines current and forthcoming approaches to performance optimization of OpenCV, including the existing OpenCL-based transparent API, newly added support for OpenVX, and early experimental results using Halide.
He demonstrates the use of the OpenCL-based transparent API on a popular CV problem: pedestrian detection. Because OpenCL does not provide good performance-portability, he explores additional approaches. He discusses how OpenVX support in OpenCV accelerates image processing pipelines and deep neural network execution. He also presents early experimental results using Halide, which provides a higher level of abstraction and ease of use, and is being actively considered for future support in OpenCV.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/intel-video-ai-box-converging-ai-media-and-computing-in-a-compact-and-open-platform-a-presentation-from-intel/
Richard Chuang, Principal AI Engineer at Intel, presents the “Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open Platform” tutorial at the May 2022 Embedded Vision Summit.
As a system integrator, solution provider or AI developer, you need to run your AI applications efficiently at the edge with sufficient throughput. Does your edge device run either generic computing or deep learning inferencing, but not both? Intel Video AI Box with Core CPU and integrated Xe LP graphics offers a compact solution to run video AI analytics at the edge with the support to orchestrate AI applications and workloads in cloud-to-edge deployments.
In this presentation, you’ll learn about Intel’s new platform, comprising an Intel CPU with integrated graphics and the Edge AI Box for Video Analytics software package, and how it enables developing cutting-edge video solutions faster. Chuang also explores EFLOW enablement on the platform, which allows Windows-based business applications to run rich Linux AI workload containers with Azure cloud connections for scalable deployments.
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
In this deck from IWOCL / SYCLcon 2020, Hal Finkel from Argonne National Laboratory presents: Preparing to program Aurora at Exascale - Early experiences and future directions.
"Argonne National Laboratory’s Leadership Computing Facility will be home to Aurora, our first exascale supercomputer. Aurora promises to take scientific computing to a whole new level, and scientists and engineers from many different fields will take advantage of Aurora’s unprecedented computational capabilities to push the boundaries of human knowledge. In addition, Aurora’s support for advanced machine-learning and big-data computations will enable scientific workflows incorporating these techniques along with traditional HPC algorithms. Programming the state-of-the-art hardware in Aurora will be accomplished using state-of-the-art programming models. Some of these models, such as OpenMP, are long-established in the HPC ecosystem. Other models, such as Intel’s oneAPI, based on SYCL, are relatively-new models constructed with the benefit of significant experience. Many applications will not use these models directly, but rather, will use C++ abstraction libraries such as Kokkos or RAJA. Python will also be a common entry point to high-performance capabilities. As we look toward the future, features in the C++ standard itself will become increasingly relevant for accessing the extreme parallelism of exascale platforms.
This presentation will summarize the experiences of our team as we prepare for Aurora, exploring how to port applications to Aurora’s architecture and programming models, and distilling the challenges and best practices we’ve developed to date. oneAPI/SYCL and OpenMP are both critical models in these efforts, and while the ecosystem for Aurora has yet to mature, we’ve already had a great deal of success. Importantly, we are not passive recipients of programming models developed by others. Our team works not only with vendor-provided compilers and tools, but also develops improved open-source LLVM-based technologies that feed both open-source and vendor-provided capabilities. In addition, we actively participate in the standardization of OpenMP, SYCL, and C++. To conclude, I’ll share our thoughts on how these models can best develop in the future to support exascale-class systems."
Watch the video: https://wp.me/p3RLHQ-lPT
Learn more: https://www.iwocl.org/iwocl-2020/conference-program/
and
https://www.anl.gov/topic/aurora
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
ELC North America 2021 Introduction to pin muxing and gpio control under linuxNeil Armstrong
In the last 10 years, the GPIO and PINCTRL subsystem matured to support almost every possible handling of Programmable Input/Outputs and more generally multiplexing of multiple functions on single "Pins" or group of "Pins". However, what is a "Pin"? What is a multiplexed "Function"? How programmable I/Os and pin functions are designed on the majority of System-On-Chips? Neil will describe this from the Hardware design Point-Of-View, the constraints and the requirements. Then Neil will explain how this particular subject was handled over the years in the Linux kernel, to finally get to the current GPIO & PINCTRL subsystems, and how it articulates with the Device Tree and other Firmware based protocols.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/01/khronos-standard-apis-for-accelerating-vision-and-inferencing-a-presentation-from-the-khronos-group/
Neil Trevett, President of the Khronos Group and Vice President of Developer Ecosystems at NVIDIA, presents the “Khronos Standard APIs for Accelerating Vision and Inferencing” tutorial at the September 2020 Embedded Vision Summit.
The landscape of processors and tools for accelerating inferencing and vision applications continues to evolve rapidly. Khronos standards, such as OpenCL, OpenVX, SYCL and NNEF, play an increasingly central role in connecting application developers to the latest silicon—productively, efficiently and portably.
In this talk, Trevett provides an overview and the latest updates on Khronos standards relevant for machine learning and computer vision, and previews how they are likely to evolve in the future.
As more and more enterprises look at leveraging the capabilities of public clouds, they face an array of important decisions. for example, they must decide which cloud(s) and what technologies they should use, how they operate and manage resources, and how they deploy applications.
Design and Optimize your code for high-performance with Intel® Advisor and I...Tyrone Systems
For all that we’re unable to attend or would like to recap our live webinar Unleash the Secrets of Performance Profiling with Intel® oneAPI Profiling Tools, all the resources you need are available to you!
Learn about locating and removing bottlenecks is an inherent challenge for every application developer. And it’s made more complex when porting an app to a new platform (say, from a CPU to a GPU). Developers must not only identify bottlenecks; they must figure out which parts of the code will benefit from offloading in the first place. This webinar will focus on how to do just that using two profiling tools from Intel: Intel® VTune Amplifier and Intel Advisor.
How can Artificial Intelligence improve software development process?Tyrone Systems
Artificial intelligence has impacted retail, finance, healthcare and many industries around the world. It has transformed the way the software industry functions. With the help of the below SlideShare, let's explore how can Artificial Intelligence improve software development process:
Four ways to digitally transform with HPC in the cloudTyrone Systems
As cloud computing rapidly becomes better, faster, and cheaper than on-premises, no workload will be left untouched, and companies will need to adapt it to remain competitive over the next decade and beyond. So what is the cloud transformation in HPC? Why are on-premises HPC systems not enough anymore? Check out this slideshare to know more.
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...Tyrone Systems
Modern workloads are incredibly diverse and so are architectures. No single architecture is best for every workload. Maximizing performance takes a mix of architectures deployed in CPU, GPU, FPGA, and other future accelerators. Intel® oneAPI products deliver the tools needed to deploy applications and solutions across SVMS architectures. Learn about oneAPI and how they can be used in multiple domains including HPC, IoT, Data Science, and AI.
Top 5 Benefits of Hyper-Converged InfrastructureTyrone Systems
Organizations need faster and more reliable storage performance than ever before. Hyperconverged infrastructure (HCI) provides a path to a secure, modern infrastructure. HCI simplifies management, consolidates resources and reduces costs by combining compute, storage and networking into a single system.
Because of these benefits, HCI adoption continues growing, and many organizations consider the solution critical to their strategic IT priorities. Watch how eight companies use the benefits of hyperconverged infrastructure to modernize the data center for agility, scalability and cost efficiency to support rapid business innovation.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
4. Intel Confidential 4
XPUs
Programming
Challenges
Growth in specialized workloads
Variety of data-centric hardware required
No common programming language or APIs
Inconsistent tool support across platforms
Each platform requires unique software investment
Middleware / Frameworks
Application Workloads Need Diverse Hardware
Language & Libraries
Scalar Vector Matrix Spatial
4
CP
U
GP
U
FP
GA
Other
accel.
5. Intel Confidential 5
5
introducing
oneapi
Unified programming model to simplify development across diverse
architectures
Unified and simplified language and libraries for expressing parallelism
Uncompromised native high-level language performance
Based on industry standards and open specifications
Interoperable with existing HPC programming models
Industry Intel
Initiative Product
Middleware / Frameworks
Application Workloads Need Diverse Hardware
Scalar Vector Matrix Spatial
XPUs
CP
U
GP
U
FP
GA
Other
accel.
oneAPI
8. Intel Confidential 8
Intel® oneAPI DPC++ Overview
DPC++
SYCL Next
(Intel
Extensions)
Latest Available
SYCL Spec
C++ 17
9. Intel Confidential 9
Intel® oneAPI DPC++ Overview
1.
• Data Parallel C++ is a high-level language designed to target
heterogenous architecture and take advantage of data parallelism.
2.
• Reuse Code across CPU and accelerators while performing custom
tuning.
3.
• Open-source implementation in Github helps to incorporate ideas
from end users.
9
10. Intel Confidential 10
Before we start
Lambda Expressions #include <algorithm>
#include <cmath>
void abssort(float* x, unsigned n) {
std::sort(x, x + n,
// Lambda expression
[ ](float a, float b)
{
return (std::abs(a) < std::abs(b));
}
);
}
• A convenient way of defining an
anonymous function object right at
the location where it is invoked or
passed as an argument to a function
• Lambda functions can be used to
define kernels in SYCL
• The kernel lambda MUST use copy
for all its captures (i.e., [=])
Capture clause
Parameter list
Lambda body
10
11. Intel Confidential 11
COMMAND GROUP
HANDLER
DEVICE (S)
Query for the
Available device
Kernel Model: Send a kernel (lambda) for
execution.
Queue executes the commands on the
device
parallel_for will execute in parallel across
the compute elements of the device
BUF A
BUF B
BUF C
ACC B
ACC C
Read
Read
Write
ACC A
Command groups control
execution on the device
Dispatches Kernels to the
device
Buffers and Accessors
manage memory across
Host and Device
QUEUE
HOST
DPC++ Program Flow
15. Intel Confidential 15
Step 3
gpu_selector deviceSelector;
queue myQueue(deviceSelector);
15
• The device selector can be a default selector or a cpu or gpu selector or intel::fpga_selector.
• If the device is not explicitly mentioned during the creation of command queue, the runtime
selects one for you.
• It is a good practice to specify the selector to make sure the right device is chosen.
17. Intel Confidential 17
Step 5
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
17
18. Intel Confidential 18
Step 6
cgh.parallel_for<class vector_add>(N, [=](auto i) {
C[i] = A[i] + B[i];});
18
Each iteration (work-
item) will have a
separate index id (i)
19. Intel Confidential 19
int main() {
float A[N], B[N], C[N];
{ buffer bufA (A, range(N));
buffer bufB (B, range(N));
buffer bufC (C, range(N));
queue myQueue;
myQueue.submit([&](handler& cgh) {
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
cgh.parallel_for<class vector_add>(N, [=](auto i){
C[i] = A[i] + B[i];});
});
}
for (int i = 0; i < 5; i++){
cout << "C[" << i << "] = " << C[i] <<std::endl;
}
return 0;
}
DPC++ “Hello World”: Vector Addition Entire Code
19
20. Intel Confidential 20
int main() {
float A[N], B[N], C[N];
{ buffer bufA (A, range(N));
buffer bufB (B, range(N));
buffer bufC (C, range(N));
queue myQueue;
myQueue.submit([&](handler& cgh) {
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
cgh.parallel_for<class vector_add>(N, [=](auto i) {
C[i] = A[i] + B[i];});
});
}
for (int i = 0; i < 5; i++){
cout << "C[" << i << "] = " << C[i] <<std::endl;}
return 0;
}
Host code
Anatomy of a DPC++ Application
20
Host code
21. Intel Confidential 21
int main() {
float A[N], B[N], C[N];
{ buffer bufA (A, range(N));
buffer bufB (B, range(N));
buffer bufC (C, range(N));
queue myQueue;
myQueue.submit([&](handler& cgh) {
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
cgh.parallel_for<class vector_add>(N, [=](auto i) {
C[i] = A[i] + B[i];});
});
}
for (int i = 0; i < 5; i++){
cout << "C[" << i << "] = " << C[i] <<std::endl;
}
return 0;
}
Accelerator
device code
Anatomy of a DPC++ Application
21
Host code
Host code
22. Intel Confidential 22
int main() {
float A[N], B[N], C[N];
{ buffer bufA (A, range(N));
buffer bufB (B, range(N));
buffer bufC (C, range(N));
queue myQueue;
myQueue.submit([&](handler& cgh) {
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
cgh.parallel_for<class vector_add>(N, [=](auto i) {
C[i] = A[i] + B[i];});
});
}
for (int i = 0; i < 5; i++){
cout << "C[" << i << "] = " << C[i] <<std::endl;
}
return 0;
}
22
DPC++ basics
Write-buffer is now out-of-scope, so
kernel completes, and host pointer
has consistent view of output.
23. Intel Confidential 23
int main() {
float A[N], B[N], C[N];
{ buffer bufA (A, range(N));
buffer bufB (B, range(N));
buffer bufC (C, range(N));
queue myQueue;
myQueue.submit([&](handler& cgh) {
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
cgh.parallel_for<class vector_add>(N, [=](auto i) {
C[i] = A[i] + B[i];});
});
}
for (int i = 0; i < 5; i++){
cout << "C[" << i << "] = " << C[i] <<std::endl;
}
return 0;
}
23
DPC++ basics
27. Intel Confidential 27
DPC++ Summary
•DPC++ is an open standard based programming model for Heterogenous Platforms.
•It can target different accelerators from different vendors
•Single sourced programming model
•oneAPI specifications available publicly:
https://github.com/intel/llvm/tree/sycl/sycl/doc/extensions
Feedback and active participation encouraged
29. Intel Confidential 29
Migrates some portion of their existing code written in CUDA to the newly developed DPC++
language.
Our experience has shown that this can vary greatly, but on average, about 80-90% of CUDA code in
applications can be migrated by this tool.
Completion of the code and verification of the final code is expected to be manual process done by
the developer.
https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-intel-dpcpp-
compatibility-tool/top.html
What is the Intel® DPC++ Compatibility Tool?
38. Intel Confidential 38
Global memory:
Accessible to all work-items in all work-
groups.
Reads and writes may be cached.
Persistent across kernel invocations
Memory Model
Constant memory:
• A region of global memory that
remains constant during the
execution of a kernel
Local Memory:
• Memory region shared between work-items
in a single work-group.
Private Memory:
• Region of memory private to a work-item.
Variables defined in one work-item’s private
memory are not visible to another work-item
Global/Constant Memory
Device (GPU, FPGA, …)
Compute Unit
(CU)
LocalMemoryLocalMemoryLocalMemoryLocalMemory
Private Memory
38
40. Intel Confidential 40
Unified Shared Memory
SYCL 1.2.1 specification offers: – Buffer/Accessor: For tracking and managing memory transfer and
guarantee data consistency across host and DPC++ devices.
Many HPC and Enterprise applications use pointers to manage data.
DPC++ Extension for Pointer Based programming: – Unified Shared Memory (USM): Device Kernels
can access the data using pointers
41. Intel Confidential 41
USM Allocation
Device(Explicit
data movement)
Host(Data sent
over bus, such
as PCIe)
Shared(Data can
migrate b/w host
and memory)
Types of USM
43. Intel Confidential 43
Kernel Execution Model
Kernel Parallelism
Multi Dimensional Kernel
ND-Range
Sub-group
Work-Group
Work Item
44. Intel Confidential 44
Kernel Execution Model
Explicit ND-range for control- similar to programming models such as OpenCL, SYCL, CUDA.
ND-range
Global work size
Work-group
Work-item
44
45. Intel Confidential 45
nd_range & nd_item
Example: Process every pixel in a 1920x1080 image
Each pixel needs processing, kernel is executed on each pixel (work-item)
1920 x 1080 = 2M pixels = global size
Not all 2M can run in parallel on device, there is hardware resource limits.
We have to split into smaller groups of pixel blocks = local size (work-group)
Either let the complier determine work-group size OR we can specify the work-group size using nd_range()
46. Intel Confidential 46
Example: Process every pixel in a 1920x1080 image
Let compiler determine work-group size
Programmer specifies work-group size
h.parallel_for(nd_range<2>(range<2>(1920,1080),range<2>(8,8)),
[=](id<2> item){
// CODE THAT RUNS ON DEVICE
})
h.parallel_for(range<2>(1920,1080), [=](id<2>
item){
// CODE THAT RUNS ON DEVICE
});
nd_range & nd_item
global
size
local size
(work-group
size)
47. Intel Confidential 47
nd_range & nd_item
Example: Process every pixel in a 1920x1080 image
How do we choose work-group size?
• Work-group size of 8x8 divides equally for 1920x1080
• Work-group size of 9x9 does not divide equally for 1920x1080
• Compiler will throw error (invalid work group size error)
• Work-group size of 10x10 divides equally for 1920x1080
• Works, but always better to use multiple of 8 for better resource utilization
• Work-group size of 24x24 divides equally for 1920x1080
• 24x24=576, will fail compile assuming GPU max work-group size is 256
GOOD