Presentation Hc-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton at the AMD Developer Summit (APU13) November 11-13, 2013.
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerAMD Developer Central
The document discusses the challenges of memory access when using a GPU. It describes the programmer's view of memory as a flat address space and how GPUs complicate this model. GPUs have their own memory hierarchies with local memory caches and different types of memory. GPU memory is accessed through specialized APIs that allocate objects like buffers and textures instead of regular malloc memory. This introduces complexity in ensuring coherency between CPU and GPU memory views. The talk will address these memory challenges and how solutions like HSA and hUMA aim to provide a more unified memory model.
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
Presentation CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Java applications, by Gary Frost and Vignesh Ravi at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...AMD Developer Central
This document provides an overview of using the Synthetic Workload Analysis Toolkit (SWAT) and IPython notebooks to analyze big data workloads. SWAT is a software platform that automates the creation, deployment, execution, and data gathering of synthetic compute workloads on clusters. IPython notebooks can be used to interactively explore system logs gathered by SWAT to identify performance bottlenecks and optimize workloads. Graphs of resource utilization are generated to determine if the system is CPU-bound, disk-bound, or network-bound. This analysis helps tune workloads and characterize systems.
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD, at the Embedded Vision Alliance Summit, May 2014.
Harris Gasparakis, Ph.D., is AMD’s OpenCV manager. In addition to enhancing OpenCV with OpenCL acceleration, he is engaged in AMD’s Computer Vision strategic planning, ISVs, and AMD Ventures engagements, including technical leadership and oversight in the AMD Gesture product line. He holds a Ph.D. in theoretical high energy physics from YITP at SUNYSB. He is credited with enabling real-time volumetric visualization and analysis in Radiology Information Systems (Terarecon), including the first commercially available virtual colonoscopy system (Vital Images). He was responsible for cutting edge medical technology (Biosense Webster, Stereotaxis, Boston Scientific), incorporating image and signal processing with AI and robotic control.
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
This document discusses debugging and profiling challenges with OpenCL and how AMD CodeXL addresses them. It provides an overview of CodeXL's debugging and profiling capabilities for OpenCL, including API-level debugging, kernel source debugging, profiling views for APIs, objects, and kernel variables, and integrated support in Visual Studio. Demo code is included to illustrate pinpointing OpenCL errors and optimizing work item loads.
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
This document discusses optimizing FFmpeg and Handbrake using OpenCL. It describes FFmpeg as a popular open-source multimedia software library used for recording, converting, and streaming audio and video. It was optimized to leverage heterogeneous computing by accelerating video decoding and encoding using hardware accelerators and accelerating video processing filters using the GPU. Specific filters were implemented in OpenCL for improved performance compared to CPU. Performance tests showed the accelerated FFmpeg approach achieved significantly higher frame rates than the original CPU-only FFmpeg.
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...AMD Developer Central
Presentation, HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU," by Mayank Daga and Mark Nutter at the AMD Developer Summit (APU13) Nov. 11-13.
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerAMD Developer Central
The document discusses the challenges of memory access when using a GPU. It describes the programmer's view of memory as a flat address space and how GPUs complicate this model. GPUs have their own memory hierarchies with local memory caches and different types of memory. GPU memory is accessed through specialized APIs that allocate objects like buffers and textures instead of regular malloc memory. This introduces complexity in ensuring coherency between CPU and GPU memory views. The talk will address these memory challenges and how solutions like HSA and hUMA aim to provide a more unified memory model.
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
Presentation CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Java applications, by Gary Frost and Vignesh Ravi at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...AMD Developer Central
This document provides an overview of using the Synthetic Workload Analysis Toolkit (SWAT) and IPython notebooks to analyze big data workloads. SWAT is a software platform that automates the creation, deployment, execution, and data gathering of synthetic compute workloads on clusters. IPython notebooks can be used to interactively explore system logs gathered by SWAT to identify performance bottlenecks and optimize workloads. Graphs of resource utilization are generated to determine if the system is CPU-bound, disk-bound, or network-bound. This analysis helps tune workloads and characterize systems.
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD, at the Embedded Vision Alliance Summit, May 2014.
Harris Gasparakis, Ph.D., is AMD’s OpenCV manager. In addition to enhancing OpenCV with OpenCL acceleration, he is engaged in AMD’s Computer Vision strategic planning, ISVs, and AMD Ventures engagements, including technical leadership and oversight in the AMD Gesture product line. He holds a Ph.D. in theoretical high energy physics from YITP at SUNYSB. He is credited with enabling real-time volumetric visualization and analysis in Radiology Information Systems (Terarecon), including the first commercially available virtual colonoscopy system (Vital Images). He was responsible for cutting edge medical technology (Biosense Webster, Stereotaxis, Boston Scientific), incorporating image and signal processing with AI and robotic control.
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
This document discusses debugging and profiling challenges with OpenCL and how AMD CodeXL addresses them. It provides an overview of CodeXL's debugging and profiling capabilities for OpenCL, including API-level debugging, kernel source debugging, profiling views for APIs, objects, and kernel variables, and integrated support in Visual Studio. Demo code is included to illustrate pinpointing OpenCL errors and optimizing work item loads.
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
This document discusses optimizing FFmpeg and Handbrake using OpenCL. It describes FFmpeg as a popular open-source multimedia software library used for recording, converting, and streaming audio and video. It was optimized to leverage heterogeneous computing by accelerating video decoding and encoding using hardware accelerators and accelerating video processing filters using the GPU. Specific filters were implemented in OpenCL for improved performance compared to CPU. Performance tests showed the accelerated FFmpeg approach achieved significantly higher frame rates than the original CPU-only FFmpeg.
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...AMD Developer Central
Presentation, HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU," by Mayank Daga and Mark Nutter at the AMD Developer Summit (APU13) Nov. 11-13.
Direct3D12 aims to address issues with existing APIs by providing a more direct mapping to hardware capabilities. It features command buffers that allow work to be built in parallel threads and scheduled more efficiently. Pipeline state objects avoid runtime compilation overhead. Descriptor tables provide bindless resources through pointers and reduce state changes. While this gives more control and efficiency, it also means applications have more responsibility to avoid errors. Overall, Direct3D12 is designed to better expose the capabilities of modern graphics hardware.
The document discusses the specifications and architecture of the AMD Radeon R9-290X graphics processing unit (GPU). Some key points:
- The R9-290X contains 44 compute units with a total of 2816 stream processors. It has a 512-bit GDDR5 memory interface providing 320 GB/sec of memory bandwidth.
- The GPU uses AMD's Graphics Core Next (GCN) architecture. This includes improvements to geometry processing, new local data share memory operations, and enhanced media processing instructions.
- The GCN architecture includes compute units containing vector units and a local data store. Compute units provide computational power through 2816 stream processors.
- New features include support for flat
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...AMD Developer Central
Keynote presentation, The Programmers Guide to Reaching for the Cloud, by Phil Rogers, AMD Corporate Fellow, AMD, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...AMD Developer Central
Presentation CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with Windows Server, by Derrick Isoka at the AMD Developer Summit (APU13) November 11-13, 2013
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
This deck presents highlights from the Introduction to OpenCL™ Programming Webinar presented by Acceleware & AMD on Sept. 17, 2014. Watch a replay of this popular webinar on the AMD Dev Central YouTube channel here: https://www.youtube.com/user/AMDDevCentral or here for the direct link: http://bit.ly/1r3DgfF
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
This presentation discusses the Mantle API, what it is, why choose it, and abstraction level, small batch performance and platform efficiency.
Download the presentation from the AMD Developer website here: http://bit.ly/TrEUeC
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
The document provides an agenda and overview for an ISCA tutorial on Heterogeneous System Architecture (HSA). The tutorial will cover the HSAIL Virtual Parallel ISA, HSA Runtime, HSA Memory Model, HSA Queuing Model, HSA Applications, and HSA Compilation. Speakers from AMD, ARM, National Tsing Hua University, Qualcomm, and the University of Illinois will present on these topics. The HSA Foundation aims to make heterogeneous systems easier to program, optimize, and achieve higher performance with lower power through an open architecture approach.
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...AMD Developer Central
Presentation WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang, John Yoon and Nicolas Lorain at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
This document discusses heterogeneous systems architecture and its potential to enable technologies for virtual reality environments like holodecks. It provides an overview of holodeck enabling technologies such as computational photography, directional audio, natural user interfaces, and augmented reality. It then discusses how heterogeneous systems architecture can accelerate these technologies by allowing more flexible partitioning of workloads between the CPU and GPU for improved performance and energy efficiency. As an example, it analyzes how HSA could improve the performance of face detection algorithms by offloading certain stages to the GPU. Overall, the document argues that HSA is key to realizing the advanced computing capabilities needed for future immersive virtual environments.
This document summarizes improvements to the TressFX hair rendering and simulation technology. TressFX 2.0 features improved performance through deferred lighting and shadowing, continuous LOD, and code restructuring. Rendering is faster through optimizations to the anti-aliasing, self-shadowing, and transparency techniques. The simulation is formulated with general constraints and solved using a tridiagonal matrix approach for better stability under various hair conditions like wet, dry, or with wind. Overall, TressFX 2.0 provides over 2x performance increases for hair rendering compared to the previous version.
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...AMD Developer Central
The document discusses porting and optimizing OpenMP applications to AMD APUs using CAPS tools. It provides an overview of CAPS Enterprise, which develops compilers and tools to help customers leverage the performance of multi-core and many-core processors. It then discusses CAPS' OpenACC and OpenMP compilers, which can generate code for AMD GPUs and APUs from directive-based programming models. The document demonstrates how the CAPS OpenMP compiler can analyze OpenMP applications and generate optimized code for execution on AMD APUs, showing speedups for the HydroC benchmark application.
Mantle allows Battlefield 4 to significantly improve CPU and GPU performance compared to DirectX 11. The game utilizes Mantle's low-level access to optimize shader compilation, pipeline state management, asynchronous compute and memory handling. Multi-GPU rendering is supported through Alternate Frame Rendering where resources are duplicated and updated asynchronously across GPUs.
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
Presentation MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Achievements, by Joseph Hsieh at the AMD Developer Summit, November 11-13, 2013.
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
Presentation HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated Processing Units, by Robert Engel at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
Presentation CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman at the AMD Developer Summit (APU13) November 11-13, 2013.
CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...AMD Developer Central
Presentation CE-4117, HSA optimizations and impact on end user experiences for AfterShot Pro and WinZip, by Rick Champagne at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...AMD Developer Central
This document discusses optimizing a photo editing application called PhotoDirector to take advantage of AMD's heterogeneous system architecture (HSA). It describes how photo editing pipelines involve computationally intensive RAW processing that could benefit from GPU acceleration. HSA allows sharing memory between the CPU and GPU to reduce bottlenecks. Performance tests show the potential for a 2x speedup using coarse-grained shared virtual memory buffers over OpenCL. The document concludes that HSA has great potential to improve performance for parallelizable and memory-intensive tasks in photo editing applications.
Direct3D12 aims to address issues with existing APIs by providing a more direct mapping to hardware capabilities. It features command buffers that allow work to be built in parallel threads and scheduled more efficiently. Pipeline state objects avoid runtime compilation overhead. Descriptor tables provide bindless resources through pointers and reduce state changes. While this gives more control and efficiency, it also means applications have more responsibility to avoid errors. Overall, Direct3D12 is designed to better expose the capabilities of modern graphics hardware.
The document discusses the specifications and architecture of the AMD Radeon R9-290X graphics processing unit (GPU). Some key points:
- The R9-290X contains 44 compute units with a total of 2816 stream processors. It has a 512-bit GDDR5 memory interface providing 320 GB/sec of memory bandwidth.
- The GPU uses AMD's Graphics Core Next (GCN) architecture. This includes improvements to geometry processing, new local data share memory operations, and enhanced media processing instructions.
- The GCN architecture includes compute units containing vector units and a local data store. Compute units provide computational power through 2816 stream processors.
- New features include support for flat
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...AMD Developer Central
Keynote presentation, The Programmers Guide to Reaching for the Cloud, by Phil Rogers, AMD Corporate Fellow, AMD, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...AMD Developer Central
Presentation CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with Windows Server, by Derrick Isoka at the AMD Developer Summit (APU13) November 11-13, 2013
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
This deck presents highlights from the Introduction to OpenCL™ Programming Webinar presented by Acceleware & AMD on Sept. 17, 2014. Watch a replay of this popular webinar on the AMD Dev Central YouTube channel here: https://www.youtube.com/user/AMDDevCentral or here for the direct link: http://bit.ly/1r3DgfF
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
This presentation discusses the Mantle API, what it is, why choose it, and abstraction level, small batch performance and platform efficiency.
Download the presentation from the AMD Developer website here: http://bit.ly/TrEUeC
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
The document provides an agenda and overview for an ISCA tutorial on Heterogeneous System Architecture (HSA). The tutorial will cover the HSAIL Virtual Parallel ISA, HSA Runtime, HSA Memory Model, HSA Queuing Model, HSA Applications, and HSA Compilation. Speakers from AMD, ARM, National Tsing Hua University, Qualcomm, and the University of Illinois will present on these topics. The HSA Foundation aims to make heterogeneous systems easier to program, optimize, and achieve higher performance with lower power through an open architecture approach.
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...AMD Developer Central
Presentation WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang, John Yoon and Nicolas Lorain at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
This document discusses heterogeneous systems architecture and its potential to enable technologies for virtual reality environments like holodecks. It provides an overview of holodeck enabling technologies such as computational photography, directional audio, natural user interfaces, and augmented reality. It then discusses how heterogeneous systems architecture can accelerate these technologies by allowing more flexible partitioning of workloads between the CPU and GPU for improved performance and energy efficiency. As an example, it analyzes how HSA could improve the performance of face detection algorithms by offloading certain stages to the GPU. Overall, the document argues that HSA is key to realizing the advanced computing capabilities needed for future immersive virtual environments.
This document summarizes improvements to the TressFX hair rendering and simulation technology. TressFX 2.0 features improved performance through deferred lighting and shadowing, continuous LOD, and code restructuring. Rendering is faster through optimizations to the anti-aliasing, self-shadowing, and transparency techniques. The simulation is formulated with general constraints and solved using a tridiagonal matrix approach for better stability under various hair conditions like wet, dry, or with wind. Overall, TressFX 2.0 provides over 2x performance increases for hair rendering compared to the previous version.
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...AMD Developer Central
The document discusses porting and optimizing OpenMP applications to AMD APUs using CAPS tools. It provides an overview of CAPS Enterprise, which develops compilers and tools to help customers leverage the performance of multi-core and many-core processors. It then discusses CAPS' OpenACC and OpenMP compilers, which can generate code for AMD GPUs and APUs from directive-based programming models. The document demonstrates how the CAPS OpenMP compiler can analyze OpenMP applications and generate optimized code for execution on AMD APUs, showing speedups for the HydroC benchmark application.
Mantle allows Battlefield 4 to significantly improve CPU and GPU performance compared to DirectX 11. The game utilizes Mantle's low-level access to optimize shader compilation, pipeline state management, asynchronous compute and memory handling. Multi-GPU rendering is supported through Alternate Frame Rendering where resources are duplicated and updated asynchronously across GPUs.
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
Presentation MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Achievements, by Joseph Hsieh at the AMD Developer Summit, November 11-13, 2013.
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
Presentation HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated Processing Units, by Robert Engel at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
Presentation CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman at the AMD Developer Summit (APU13) November 11-13, 2013.
CE-4117, HSA Optimizations and Impact on end User Experiences for AfterShot P...AMD Developer Central
Presentation CE-4117, HSA optimizations and impact on end user experiences for AfterShot Pro and WinZip, by Rick Champagne at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...AMD Developer Central
This document discusses optimizing a photo editing application called PhotoDirector to take advantage of AMD's heterogeneous system architecture (HSA). It describes how photo editing pipelines involve computationally intensive RAW processing that could benefit from GPU acceleration. HSA allows sharing memory between the CPU and GPU to reduce bottlenecks. Performance tests show the potential for a 2x speedup using coarse-grained shared virtual memory buffers over OpenCL. The document concludes that HSA has great potential to improve performance for parallelizable and memory-intensive tasks in photo editing applications.
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyAMD Developer Central
The document introduces AMD's developer tools strategy and CodeXL tool. It discusses how AMD is converging its CPU and GPU tools into a unified HSA Developer Tools Suite, with CodeXL being a key tool. CodeXL allows debugging, profiling, and analyzing applications across AMD CPUs, GPUs, and APUs in a "white box" view. It is available for Windows, Visual Studio, and Linux. The document then describes several CodeXL capabilities such as GPU debugging, CPU and GPU profiling, static kernel analysis, and what is new in CodeXL.
The document provides an overview of Sundance Multiprocessor Technology Ltd. and their EM3V - Embedded Vision product. Some key points:
- Sundance is an employee-owned company with over 300 years of experience designing and building their own products.
- Their VCS-1 (EMC2) system is a modular and reconfigurable hardware platform compatible with Zynq UltraScale+ MPSoC devices and a wide range of sensors.
- The system includes open source software, firmware and documentation and is compatible with popular frameworks like ROS, OpenCV and deep learning stacks for running neural networks.
Debug, Analyze and Optimize Games with Intel Tools Matteo Valoriani
This document summarizes an introduction to the Intel Graphics Performance Analyzers (Intel GPA) tool. The presentation provides an overview of Intel GPA's capabilities for optimizing game performance on Intel graphics through in-game analysis, frame capture, and trace analysis. It demonstrates Intel GPA's system analyzer, frame analyzer and trace analyzer features. The document also gives examples of optimizations that can be achieved through techniques like script culling, memory management, occlusion culling, level of detail modeling and terrain optimization.
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Codemotion
Use the full potential of your favorite platform while improving a videogame's frame rate and performance with GPA (Graphic Performance Analyzer), a free tool powered by Intel. Featuring a convenient panel overlay, you can quickly identify problem areas and experiment with improvements without having to recompile the source code. System Analyzing to isolate common bottlenecks that affect your game's performance in real time. Analyze performance on a single frame down to the draw call level. Identify where you can evenly distribute workloads across the CPU and GPU.
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Codemotion
Use the full potential of your favorite platform while improving a videogame's frame rate and performance with GPA (Graphic Performance Analyzer), a free tool powered by Intel. Featuring a convenient panel overlay, you can quickly identify problem areas and experiment with improvements without having to recompile the source code. System Analyzing to isolate common bottlenecks that affect your game's performance in real time. Analyze performance on a single frame down to the draw call level. Identify where you can evenly distribute workloads across the CPU and GPU.
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing WorldOmer Kilic
The document discusses challenges in modern heterogeneous computing systems and how Erlang can be used to program these systems. It describes hardware accelerators like GPUs and the Parallella board. It introduces Erlang/ALE, a library that brings embedded peripheral interfaces into Erlang to easily access devices like GPIO pins. Examples show controlling an LED using GPIO and handling interrupts. The talk promotes Erlang for programming heterogeneous systems and developing embedded applications.
LAS16-209: Finished and Upcoming Projects in LMGLinaro
LMG's finished and upcoming projects include:
- Memory allocator and file system analyses to reduce memory usage on low-RAM devices.
- Monthly LCR releases and migrating their builds to ci.linaro.org.
- Updating toolchains and enabling new hardware like the HiKey board in AOSP.
- Increasing participation in upstream projects like merging an SystemUI patch.
- Integrating features in AOSP like Energy Aware Scheduling, OP-TEE, and an Overlay Manager.
- Continuing work on the HiKey board in AOSP including new features, fixes, and upstreaming components.
This document discusses OpenCAPI, an open standard for high-performance input/output between processors and accelerators. It provides background on the industry drivers for developing such a standard, an overview of OpenCAPI technology and capabilities, examples of OpenCAPI-based systems from IBM and partners, and performance metrics. The document aims to promote OpenCAPI and growing an open ecosystem around it to support accelerated computing workloads.
The document discusses HSA compiler technology. It outlines the architecture of HSA compilers, which leverage the LLVM framework and generate the HSAIL intermediate representation. Performance is improved through optimizations in the high-level compiler and a thin finalizer. OpenCL 2.0 features like shared virtual memory and platform atomics will be supported. The first release of the OpenCL/HSA compiler is planned for Q2 2014.
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderAMD Developer Central
Presentation PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander, at the AMD Developer Summit (APU13) November 11-13, 2013.
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Embarcados
Nesse webinar será apresentado o passo a passo de como criar projetos com Machine Learning utilizando ferramentas de terceiros como Sensi ML e Edge Impulse.
Tópicos que serão apresentados:
Kits de desenvolvimento para Machine Learning:
EV18H79A: SAMD21 ML Evaluation Kit with TDK 6-axis MEMS
EV45Y33A: SAMD21 ML Evaluation Kit with BOSCH IMU
SAMC21 xPlained Pro evaluation kit (ATSAMC21-XPRO) plus its QT8 xPlained Pro Extension Kit (AC164161)
Ferramentas de desenvolvimento:
MPLAB X
Data Visualizer
Ambiente de terceiros: Sensi ML e Edge Impulse
Coleta de dados
Como desenvolver um projeto usando Machine Learning sem conhecimentos específicos sobre o assunto e com conhecimentos sobre Machine Learning.
Relax and Recover (http://rear.sourceforge.net) is an automated tool for Linux bare-metal disaster recovery.
This presentation by one of the authors explores ideas about build a centralized managemend server to complement the ReaR software installed on all your Linux servers in your data center.
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Intel® Software
This document discusses optimizing deep learning inference on Intel processor graphics using the OpenVINOTM toolkit. Some key points include:
- Running inference on client devices provides advantages over cloud like privacy, bandwidth savings, and responsiveness.
- OpenVINOTM provides tools to optimize models for Intel hardware and achieve 5-10x speedups on Intel GPUs compared to CPU baselines.
- A case study demonstrates optimizing a deep image matting model, reducing inference time from 2.35 seconds to 291 milliseconds on Intel GPU using OpenVINOTM.
- Emerging technologies like federated learning are discussed which could improve privacy for on-device inference.
Since the introduction of replication in MySQL, users have been trying to automate the promotion of a replica to a primary as well as automating the failover of TCP connections from one database server to another in the event of a database failure: planned or unplanned. For over a decade, users and organizations have designed various types of solutions to achieve this. Though, many of these solutions were done manually or were using third party software, mostly open source, to automate and integrate various architectures.
For more than 5 years now, MySQL offers complete and very easy-to-use solutions to set up database architectures that provide High-Availability and recently added Disaster Recovery capabilities. Completely built in-house and supported by Oracle, many enterprises large and small have adopted these solutions into business-critical applications.
Business requirements dictate what type of database architecture is required for your system. Disaster tolerance is key and can be measured at different levels: data loss, data availability, and uptime. In this session, the various MySQL Database Architecture solutions will be covered to help you choose the right solution based on your business requirements
The document discusses turning Maven into a high scalable, resource efficient, cloud ready microservice for compiling business rules and processes. It describes requirements for an incremental compiler that respects the user's POM, has low latency and memory footprint, supports multiple users and threads, can execute builds locally or remotely, and returns rich data beyond just success/failure. The proposed solution is a Maven as a Service API that implements an enhanced compiler with asynchronous request-response behavior, enabling features like per-request Maven repositories, incremental compilation, and returning compiled rule and process metadata beyond just the build result.
Improving User Experience with Ubiquitous QuickBootICS
In this webinar, we will introduce QuickBoot and show how it can solve slow cold boot times. You will: • Learn the difference from other fast boot techniques on Linux or Android devices. • Get technical details of QuickBoot. • See a demonstration of a real-world embedded application illustrating the boot time performance.
SiriusCon2016 - Modelling Spacecraft On-board Software with SiriusObeo
>> These slides were presented at SiriusCon Paris 2016, on November 15th by Andreas Jung (European Space Agency)
The European Space Agency, together with industry, has lead an analysis into the issues faced by spacecraft software developers now and in the future, considering several aspects as for example raising complexity of the software, shorter development life cycles, etc. The analysis resulted in the development of an On-board Software Reference Architecture (OSRA) founded on the principles of component-based software engineering (CBSE) and strong separation of concerns.
A dedicated Domain Specific Language for the component model was developed, called Space Component Model (SCM), to allow the precise definition with clear semantical meaning, in particular considering the domain specific elements like observability and commandability of spacecrafts via Telemetry and Telecommand. The SCM was implemented as a meta-model in ecore. The R&D activity that have developed the OSRA and the SCM have also prototyped a graphical editor to experiment and test the complete approach, from modelling down to code generation for the target.
The original prototype of the graphical editor was based on Eclipse and Obeo Designer, which allowed very quick and simple prototyping of a graphical editor. Following the R&D activities, it was clear that an improved version of the editor, in terms of usability, is needed. An improvement activity has been started with Obeo, using now the open source version of Obeo Designer, namely Sirius. The intention was also to push Obeo's technology further to evaluate it for applicability in a commercial tool.
This talk will give a brief overview of the challenges of spacecraft software development, the needs for a graphical editor, present the results of the improvement activity, show the benefits of the Eclipse and Sirius frameworks and provide an overall evaluation.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/luxoft/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Alexey Rybakov, Senior Director at LUXOFT, presents the "Making Computer Vision Software Run Fast on Your Embedded Platform" tutorial at the May 2016 Embedded Vision Summit.
Many computer vision algorithms perform well on desktop class systems, but struggle on resource constrained embedded platforms. This how-to talk provides a comprehensive overview of various optimization methods that make vision software run fast on low power, small footprint hardware that is widely used in automotive, surveillance, and mobile devices. The presentation explores practical aspects of deep algorithm and software optimization such as thinning of input data, using dynamic regions of interest, mastering data pipelines and memory access, overcoming compiler inefficiencies, and more.
Similar to HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton (20)
This document discusses new graphics APIs like DX12 and Vulkan that aim to provide lower overhead and more direct hardware access compared to earlier APIs. It covers topics like increased parallelism, explicit memory management using descriptor sets and pipelines, and best practices like batching draw calls and using multiple asynchronous queues. Overall, the new APIs allow more explicit control over GPU hardware for improved performance but require following optimization best practices around areas like parallelism, memory usage, and command batching.
AMD’s math libraries can support a range of programmers from hobbyists to ninja programmers. Kent Knox from AMD’s library team introduces you to OpenCL libraries for linear algebra, FFT, and BLAS, and shows you how to leverage the speed of OpenCL through the use of these libraries.
Review the material presented in the AMD Math libraries webinar in this deck.
For more:
Visit the AMD Developer Forums:http://devgurus.amd.com/welcome
Watch the replay: www.youtube.com/user/AMDDevCentral
Follow us on Twitter: https://twitter.com/AMDDevCentral
This is the slide deck from the popular "Introduction to Node.js" webinar with AMD and DevelopIntelligence, presented by Joshua McNeese. Watch our AMD Developer Central YouTube channel for the replay at https://www.youtube.com/user/AMDDevCentral.
This presentation accompanies the webinar replay located here: http://bit.ly/1zmvlkL
AMD Media SDK Software Architect Mikhail Mironov shows you how to leverage an AMD platform for multimedia processing using the new Media Software Development Kit. He discusses how to use a new set of C++ interfaces for easy access to AMD hardware blocks, and shows you how to leverage the Media SDK in the development of video conferencing, wireless display, remote desktop, video editing, transcoding, and more.
This document discusses AMD's DirectGMA technology, which allows direct access to GPU memory from other devices. It introduces DirectGMA and explains how it enables peer-to-peer transfers between GPUs and GPUs and FPGAs. It then provides details on implementing DirectGMA in APIs like OpenGL, OpenCL, DirectX 9, 10 and 11 to enable efficient data transfers without CPU involvement.
This Webinar explores a variety of new and updated features in Java 8, and discuss how these changes can positively impact your day-to-day programming.
Watch the video replay here: http://bit.ly/1vStxKN
Your Webinar presenter, Marnie Knue, is an instructor for Develop Intelligence and has taught Sun & Oracle certified Java classes, RedHat JBoss administration, Spring, and Hibernate. Marnie also has spoken at JavaOne.
The document is about an AMD and Microsoft Game Developer Day event held in Stockholm, Sweden on June 2, 2014. It provides the date and location of the event multiple times but no other details.
This document discusses the TressFX hair and fur rendering technique. It begins by stating that next-gen quality hair is expected in current generation titles. It then covers the key components needed for high quality hair, including antialiasing, self-shadowing, and transparency. The document discusses isoline tessellation versus a vertex shader approach and describes TressFX's deferred rendering pipeline with selective shading of only the closest fragments. It demonstrates that TressFX can achieve next-gen quality hair and fur at real-time performance through techniques like variable ratio hair simulation, extrusion into triangles in the vertex shader, selective shading, and distance-based level of detail.
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
The document discusses low-level shader optimization techniques for next-generation consoles and DirectX 11 hardware. It provides lessons from last year on writing efficient shader code, and examines how modern GPU hardware has evolved over the past 7-8 years. Key points include separating scalar and vector work, using hardware-mapped functions like reciprocals and trigonometric functions, and being aware of instruction throughput and costs on modern GCN-based architectures.
The document summarizes a presentation given by Stephan Hodes on optimizing performance for AMD's Graphics Core Next (GCN) architecture. The presentation covers key aspects of the GCN architecture, including compute units, registers, and latency hiding. It then provides a top 10 list of performance advice for GCN, such as using DirectCompute threads in groups of 64, avoiding over-tessellation, keeping shader pipelines short, and batching drawing calls.
The document repeatedly states that AMD and Microsoft held a Game Developer Day event in Stockholm, Sweden on June 2, 2014 to work with game developers.
Direct3D 12 aims to reduce CPU overhead and increase scalability across CPU cores by allowing developers greater control over the graphics pipeline. It optimizes pipeline state handling through pipeline state objects and reduces redundant resource binding by introducing descriptor heaps and tables. Command lists and bundles further improve performance by enabling parallel command list generation and reuse of draw commands.
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
The document discusses faster particle rendering using DirectCompute. It describes using the GPU for particle simulation by taking advantage of its parallel processing capabilities. It discusses using compute shaders to simulate particle behavior, handle collisions via the depth buffer, sort particles using bitonic sort, and render particles in tiles via DirectCompute to avoid overdraw from large particles. Tiled rendering involves culling particles, building per-tile particle indices, and sorting particles within each tile before shading them in parallel threads to composite onto the scene.
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
This document provides an overview of OpenCL libraries for GPU programming. It discusses specialized GPU libraries like clFFT for fast Fourier transforms and Random123 for random number generation. It also covers general GPU libraries like Bolt, OpenCV, and ArrayFire. ArrayFire is highlighted as it provides a flexible array data structure and hundreds of parallel functions across domains like image processing, machine learning, and linear algebra. It supports JIT compilation and data-parallel constructs like GFOR to improve performance.
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
Johan Andersson will show how the Frostbite 3 game engine is using the low-level graphics API Mantle to deliver significantly improved performance in Battlefield 4 on PC and future games from Electronic Arts in this presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
RapidFire is a dedicated cloud gaming hardware and software solution from AMD that aims to simplify integration and deliver more high-definition game streams per GPU with low latency. It utilizes AMD hardware on both the server and client sides. The API provides functions for encoding and decoding video and audio streams, capturing input events, and displaying frames with low latency for cloud gaming applications. Eureva has implemented RapidFire in their Swiich solution to virtualize and stream any DirectX or OpenGL game in real-time with ultra-low latency over existing networks.
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
Oxide Games Partners Dan Baker and Tim Kipp will show you how to build a high throughput renderer using the Mantle API in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
This AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21 explains how Mantle features can enable developers to improve both CPU and GPU performance in their titles. Also view this and other presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
A look at how new Direct3D advancements enhance efficiency and enable fully-threaded building of command buffers in this prentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
2. COREL AFTERSHOT™ PRO
What is Corel AfterShot™ Pro?
Corel AfterShot™ Pro is photo workflow software
Non-destructive photo editing of JPEG, TIFF, and Raw formats from hundreds of cameras
Photo Management
Batch Processing of modified files
2 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
5. AFTERSHOT TASK MANAGEMENT
Work is broken down into Tasks. Tasks
typically:
‒ Contain execution logic (code)
‒ May store resultant data
‒ Track whether they are complete
Disk
Photo
Thumbnail
File Reader
The Task Scheduler:
‒ Allocates a worker thread per CPU core
‒ Runs Tasks based on priority
‒ Allows Tasks to block on each other
JPEG Decoder
Task Dependency
Data
A Simple Task Dependency Graph
5 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
6. PROCESSING WITH TILES
The standard simpler approach is to use large monolithic images
Images are broken down into tiles for processing
Tiling provides faster screen updates. Only compute the visible parts of the image
Tiling allows more effective memory management
6 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
7. PROCESSING WITH TILES CONTINUED
The Image Processing Pipeline is made
up of several discrete steps [or filters]
To process a single tile:
‒ Load the input data (e.g. raw or jpeg data)
‒ Apply each Filter step in turn
Generally, we only need the output of
the last step, the top Tile in the Stack
Raw Data
7 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
Final Image
8. ADVANCED TILE PROCESSING
Some Image Filters require a radius of pixels
as input
Partially processed neighbor Tiles must
complete before the main Tile can continue
Intermediate Tiles must be stored in memory
so they do not rerun
Example Filters:
‒ Sharpening
‒ Lens Correction
‒ Noise Reduction
‒ Cropping
8 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
Requires multiple source tiles
10. ACCELERATING AFTERSHOT WITH OPENCL™
Goals for the AfterShot Pro OpenCL port
Offload image processing from Tiles
Work within the existing System
‒ Contain changes to a few critical modules
‒ Maintain full CPU utilization
‒ Integrate OpenCL Events into the Task System
10 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
11. GETTING WORK TO OPENCL
Identify the longest running image Filter functions and replace them with OpenCL
kernels
Do not block CPU threads, use OpenCL event callbacks.
Processing becomes Asynchronous
Limit total work in flight to conserve memory
Marshall data automatically
11 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
12. CAVEATS OF ASYNCHRONOUS OPENCL PROCESSING
High Buffer Usage
‒ Each kernel that runs needs input, output, and possibly scratch buffers.
‒ Buffers must “stick around” until the kernels complete
‒ Multiple chains of kernels a needed to keep the GPU busy
Buffer
Buffer
Buffer
Buffer
Buffer
Buffer
Buffer
Kernel
1
Kernel
2
Kernel
3
Kernel
4
Kernel
5
Processing one 512 x 512 image requires multiple 3 MB buffers resident in device memory (VRAM)
12 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
13. CAVEATS OF ASYNCHRONOUS OPENCL PROCESSING – CONTINUED
Dependencies Must Be Resolved in Advance
‒ For best performance all kernels in a chain should be enqueued together
‒ The state of all dependencies must be known before the first kernel is queued
‒ Difficult to track
‒ Compromise: only use OpenCL for Filters with simple linear dependencies
Kernel chaining and asynchronous execution provides excellent GPU utilization.
13 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
15. LARGE RADIUS IMAGE FILTERS
Several image processing operations require neighbor pixels. In AfterShot image Filters
are broken down into one of two categories:
Normal
Large Radius
Only requires the local Tile
Requires multiple Tiles
15 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
16. LARGE RADIUS IMAGE FILTERS ARE DIFFICULT
Large Radius AfterShot Filters are particularly difficult to implement in OpenCL
Large Radius filters will “break” kernel chaining
A extra layer of Intermediate Tiles must be resident, which will:
‒ Exhaust Device Memory, or
‒ Cause excessive bus transfers, hurting performance
And the solution is…
16 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
17. LARGE RADIUS FILTERS - NO
Don’t do it.
Large Radius filters are possible but at great development cost
Performance would ultimately depend on tricky optimizations
Large radius filters were left to run on the CPU
17 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
18. AFTERSHOT OPENCL RESULTS
Approximately 70% of image processing work was moved off of the CPU cores*
Batch processing speed improved by 3.5x*
Maintains 100% utilization on 8 CPU cores*
Only a mid-level GPU is required
Supported on Windows, Linux, and OS X
AfterShot Pro with OpenCL was a success
*measured on developer’s system
18 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
20. OPENCL 2.0 SHARED VIRTUAL MEMORY
OpenCL 2.0 introduces Shared Virtual Memory (SVM)
Basic [Coarse Grain] SVM
‒ Host and kernels can share pointers
Advanced [Fine Grain] SVM is available on some hardware
‒ Host and kernels can operate concurrently on the same memory
Fine Grain System SVM
‒ Kernels can access the entire host process’ address space. Kernels can read or write malloc
buffers
‒ System SVM can greatly simplify buffer management in an OpenCL application
20 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
22. RECONSIDERING LARGE RADIUS FILTERS
Large Radius OpenCL filters were dropped as an AfterShot feature. The reasons were
both technical and resource related
Can System SVM make Large Radius AfterShot filters feasible? Signs point to yes
‒ No Device Memory required for Intermediate buffers
‒ Input streams from SVM, no buffer transfers
‒ Behavior more in-line with Software [non-OpenCL] filters
‒ Dependencies could be resolved just as they would for a Software filter
22 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
23. LOCAL CONTRAST – A LARGE RADIUS AFTERSHOT FILTER
The next version of AfterShot Pro will contain a new Local Contrast filter.
‒ GPU accelerated on systems with OpenCL and SVM.
‒ Increases image contrast in detailed areas while leaving large constant areas unchanged
‒ The effect is achieved through a large radius Unsharp Mask (10-20% of the overall image width)
23 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
24. SETTING UP A KERNEL TO USE SVM MEMORY
24 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
25. LOADING SVM MEMORY FROM INSIDE THE KERNEL
25 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
26. LOCAL CONTRAST RESULTS
System SVM simplified Local Contrast
‒ No complicated buffer management
‒ No clever optimizations were required to hide Device memory transfers
‒ Additional memory pressure is similar to a software filter
Performance is good. The OpenCL code runs in ¼ the time of the optimized software
filter*
*measured on developer’s system
26 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013
27. THANK YOU
Questions
27 | Enhancing OpenCL Performance in Corel AfterShot™ Pro with HSA | NOVEMBER 19, 2013