Graphics processing unit or GPU (also occasionally called visual processing unit or VPU) is a specialized microprocessor that offloads and accelerates graphics rendering from the central (micro) processor. Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. In CPU, only a fraction of the chip does computations where as the GPU devotes more transistors to data processing.
GPGPU is a programming methodology based on modifying algorithms to run on existing GPU hardware for increased performance. Unfortunately, GPGPU programming is significantly more complex than traditional programming for several reasons.
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUScsandit
Graphics Processing Units (GPUs) have been emerged as powerful parallel compute platforms for various
application domains. A GPU consists of hundreds or even thousands processor cores and adopts Single
Instruction Multiple Threading (SIMT) architecture. Previously, we have proposed an approach that
optimizes the Tabu Search algorithm for solving the Permutation Flowshop Scheduling Problem (PFSP)
on a GPU by using a math function to generate all different permutations, avoiding the need of placing all
the permutations in the global memory. Based on the research result, this paper proposes another
approach that further improves the performance by avoiding duplicated computation among threads,
which is incurred when any two permutations have the same prefix. Experimental results show that the
GPU implementation of our proposed Tabu Search for PFSP runs up to 1.5 times faster than another GPU
implementation proposed by Czapinski and Barnes
Graphics processing unit or GPU (also occasionally called visual processing unit or VPU) is a specialized microprocessor that offloads and accelerates graphics rendering from the central (micro) processor. Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. In CPU, only a fraction of the chip does computations where as the GPU devotes more transistors to data processing.
GPGPU is a programming methodology based on modifying algorithms to run on existing GPU hardware for increased performance. Unfortunately, GPGPU programming is significantly more complex than traditional programming for several reasons.
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUScsandit
Graphics Processing Units (GPUs) have been emerged as powerful parallel compute platforms for various
application domains. A GPU consists of hundreds or even thousands processor cores and adopts Single
Instruction Multiple Threading (SIMT) architecture. Previously, we have proposed an approach that
optimizes the Tabu Search algorithm for solving the Permutation Flowshop Scheduling Problem (PFSP)
on a GPU by using a math function to generate all different permutations, avoiding the need of placing all
the permutations in the global memory. Based on the research result, this paper proposes another
approach that further improves the performance by avoiding duplicated computation among threads,
which is incurred when any two permutations have the same prefix. Experimental results show that the
GPU implementation of our proposed Tabu Search for PFSP runs up to 1.5 times faster than another GPU
implementation proposed by Czapinski and Barnes
GPU programing
The Brick Wall -- UC Berkeley's View
Power Wall: power expensive, transistors free
Memory Wall: Memory slow, multiplies fast ILP Wall: diminishing returns on more ILP HW
The presentation is given during the Computer Graphics seminar at the University of Tartu. It is an introductory overview of the GPGPU idea in general and gives "hello world" examples using old-school shader computing, OpenCL and CUDA. The code is available in my <a>Github repository</a>.
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...AMD Developer Central
Presentation HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu at the AMD Developer Summit (APU13) November 11-13, 2013.
Despite the increase of deep learning practitioners and researchers, many of them do not use GPUs, this may lead to long training/evaluation cycles and non-practical research.
In his talk, Lior shares how to get started with GPUs and some of the best practices that helped him during research and work. The talk is for everyone who works with machine learning (deep learning experience is NOT mandatory!), It covers the very basics of how GPU works, CUDA drivers, IDE configuration, training, inference, and multi-GPU training.
A brief technical overview about GPU power consumption and performance, with references to the latest architecture developed by Nvidia: Maxwell and Tegra X1.
Co-Author: Pietro Piscione (https://www.linkedin.com/pub/pietro-piscione/84/b37/926)
http://cs264.org
Abstract:
High-level scripting languages are in many ways polar opposites to
GPUs. GPUs are highly parallel, subject to hardware subtleties, and
designed for maximum throughput, and they offer a tremendous advance
in the performance achievable for a significant number of
computational problems. On the other hand, scripting languages such as
Python favor ease of use over computational speed and do not generally
emphasize parallelism. PyOpenCL and PyCUDA are two packages that
attempt to join the two together. By showing concrete examples, both
at the toy and the whole-application level, this talk aims to
demonstrate that by combining these opposites, a programming
environment is created that is greater than just the sum of its two
parts.
Speaker biography:
Andreas Klöckner obtained his PhD degree working with Jan Hesthaven at
the Department of Applied Mathematics at Brown University. He worked
on a variety of topics all aiming to broaden the utility of
discontinuous Galerkin (DG) methods. This included their use in the
simulation of plasma physics and the demonstration of their particular
suitability for computation on throughput-oriented graphics processors
(GPUs). He also worked on multi-rate time stepping methods and shock
capturing schemes for DG.
In the fall of 2010, he joined the Courant Institute of Mathematical
Sciences at New York University as a Courant Instructor. There, he is
working on problems in computational electromagnetics with Leslie
Greengard.
His research interests include:
- Discontinuous Galerkin and integral equation methods for wave
propagation
- Programming tools for parallel architectures
- High-order unstructured particle-in-cell methods for plasma simulation
This paper talks about algorithms to do database joins on a GPU. Some interesting work here, that will someday lead to implementing databases on a GPGPU like CUDA.
GPU programing
The Brick Wall -- UC Berkeley's View
Power Wall: power expensive, transistors free
Memory Wall: Memory slow, multiplies fast ILP Wall: diminishing returns on more ILP HW
The presentation is given during the Computer Graphics seminar at the University of Tartu. It is an introductory overview of the GPGPU idea in general and gives "hello world" examples using old-school shader computing, OpenCL and CUDA. The code is available in my <a>Github repository</a>.
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...AMD Developer Central
Presentation HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu at the AMD Developer Summit (APU13) November 11-13, 2013.
Despite the increase of deep learning practitioners and researchers, many of them do not use GPUs, this may lead to long training/evaluation cycles and non-practical research.
In his talk, Lior shares how to get started with GPUs and some of the best practices that helped him during research and work. The talk is for everyone who works with machine learning (deep learning experience is NOT mandatory!), It covers the very basics of how GPU works, CUDA drivers, IDE configuration, training, inference, and multi-GPU training.
A brief technical overview about GPU power consumption and performance, with references to the latest architecture developed by Nvidia: Maxwell and Tegra X1.
Co-Author: Pietro Piscione (https://www.linkedin.com/pub/pietro-piscione/84/b37/926)
http://cs264.org
Abstract:
High-level scripting languages are in many ways polar opposites to
GPUs. GPUs are highly parallel, subject to hardware subtleties, and
designed for maximum throughput, and they offer a tremendous advance
in the performance achievable for a significant number of
computational problems. On the other hand, scripting languages such as
Python favor ease of use over computational speed and do not generally
emphasize parallelism. PyOpenCL and PyCUDA are two packages that
attempt to join the two together. By showing concrete examples, both
at the toy and the whole-application level, this talk aims to
demonstrate that by combining these opposites, a programming
environment is created that is greater than just the sum of its two
parts.
Speaker biography:
Andreas Klöckner obtained his PhD degree working with Jan Hesthaven at
the Department of Applied Mathematics at Brown University. He worked
on a variety of topics all aiming to broaden the utility of
discontinuous Galerkin (DG) methods. This included their use in the
simulation of plasma physics and the demonstration of their particular
suitability for computation on throughput-oriented graphics processors
(GPUs). He also worked on multi-rate time stepping methods and shock
capturing schemes for DG.
In the fall of 2010, he joined the Courant Institute of Mathematical
Sciences at New York University as a Courant Instructor. There, he is
working on problems in computational electromagnetics with Leslie
Greengard.
His research interests include:
- Discontinuous Galerkin and integral equation methods for wave
propagation
- Programming tools for parallel architectures
- High-order unstructured particle-in-cell methods for plasma simulation
This paper talks about algorithms to do database joins on a GPU. Some interesting work here, that will someday lead to implementing databases on a GPGPU like CUDA.
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
In this paper we study NVIDIA graphics processing unit (GPU) along with its computational power and applications. Although these units are specially designed for graphics application we can employee there computation power for non graphics application too. GPU has high parallel processing power, low cost of computation and less time utilization; it gives good result of performance per energy ratio. This GPU deployment property for excessive computation of similar small set of instruction played a significant role in reducing CPU overhead. GPU has several key advantages over CPU architecture as it provides high parallelism, intensive computation and significantly higher throughput. It consists of thousands of hardware threads that execute programs in a SIMD fashion hence GPU can be an alternate to CPU in high performance environment and in supercomputing environment. The base line is GPU based general purpose computing is a hot topics of research and there is great to explore rather than only graphics processing application.
Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture IJECEIAES
As majority of the compression algorithms are implementations for CPU architecture, the primary focus of our work was to exploit the opportunities of GPU parallelism in audio compression. This paper presents an implementation of Apple Lossless Audio Codec (ALAC) algorithm by using NVIDIA GPUs Compute Unified Device Architecture (CUDA) Framework. The core idea was to identify the areas where data parallelism could be applied and parallel programming model CUDA could be used to execute the identified parallel components on Single Instruction Multiple Thread (SIMT) model of CUDA. The dataset was retrieved from European Broadcasting Union, Sound Quality Assessment Material (SQAM). Faster execution of the algorithm led to execution time reduction when applied to audio coding for large audios. This paper also presents the reduction of power usage due to running the parallel components on GPU. Experimental results reveal that we achieve about 80-90% speedup through CUDA on the identified components over its CPU implementation while saving CPU power consumption.
The complexity of Medical image reconstruction requires tens to hundreds of billions of computations per second. Until few years ago, special purpose processors designed especially for such applications were used. Such processors require significant design effort and are thus difficult to change as new algorithms in reconstructions evolve and have limited parallelism. Hence the demand for flexibility in medical applications motivated the use of stream processors with massively parallel architecture. Stream processing architectures offers data parallel kind of parallelism.
Highlighted notes on Hybrid Multicore Computing
While doing research work under Prof. Dip Banerjee, Prof. Kishore Kothapalli.
In this comprehensive report, Prof. Dip Banerjee describes about the benefit of utilizing both multicore systems, CPUs with vector instructions, and manycore systems, GPUs with large no. of low speed ALUs. Such hybrid systems are beneficial to several algorithms as an accelerator cant optimize for all parts of an algorithms (some computations are very regular, while some very irregular).
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
1. Revisiting Co-Processing for Hash Joins
on the Coupled
Cpu-GPU Architecture
BY
Jiong He
Mian Lu
Presented By
Bingsheng He
Mohamed Ragab Moawad
2. Umm.. GPU??
GPU : Graphics Processing Unit
CPU : Central Processing Unit
GPU complements CPU which performs general
processing, by more efficiently handling Graphics
Calculations.
Can also accelerate Video transcoding, Image
processing and other complex computations using
concept of GPGPU (General-purpose computing on
Graphics Processing Units).
Eg. CUDA (Compute Unified Device Architecture).
3. THEN AND NOW
Intel ASCI Red/9632 : 2,379 GFLOPS
- Fastest Supercomputer (World), 1999
PARAM PADMA : 1,000 GFLOPS
- Fastest Supercomputer (India), 2003
NVIDIA GTX 780 Ti : 5,046 GFLOPS
- Fastest GPU (General), 2013
GFLOPS : Giga FLoating-point Operations Per Second.
A measure of computer performance.
4. Memory is the boss? NO!
GTX 680
(2GB)
GT 430 (4GB)
Memory : 2GB
Memory : 4GB
Price : Rs. 41,195
Price : Rs. 4,520
Computation power : 3,090 GFLOPS
Computation power : 269 GFLOPS
5. GPU VS CPU
•
A GPU is tailored for highly parallel operation
while a CPU executes programs serially.
GPUs have significantly faster and more advanced
memory interfaces as they need to shift around a lot
more data than CPUs
• CPU is optimized for sequential code performance.
• GPU is specialized for compute-intensive highly
parallel computation.
•
•
GPU has evolved into a highly
parallel, multithreaded, many core processors with
very high computational horsepower and very high
memory bandwidth.
8. PCI-e
PCI Express (Peripheral Component Interconnect
Express), officially abbreviated as PCI-e, is a high-speed
serial computer expansion bus standard designed to
replace the older PCI, PCI-X, and AGP bus standards.
PCI-e has numerous improvements over the
aforementioned bus standards, including higher
maximum system bus throughput, lower I/O pin count
and smaller physical footprint, better performancescaling for bus devices.
9. PROBLEM !!!
•
The relatively low bandwidth and high latency
of the PCI-e bus are usually bottleneck issues
So Many hardware vendors have attempted to
resolve this overhead with new architectures.
LIKE: COUPLED
CPU-GPU Architecture.
10. COUPLED CPU-GPU ARCH.
•
•
The CPU and the GPU are integrated into a single chip
avoiding the costly data transfer via the PCI-e bus
Examples: AMD-APU
Intel IVY(2012)
11. These new heterogeneous architectures potentially
open up new optimization opportunities for GPU
query co-processing.
There are many Types of Query co-processing
1.
Fine-grained
2.
coarse-grained
3.
embarrassing
12. FINE-GRAINED, COARSEGRAINED, AND EMBARRASSING
PARALLELISM:
Applications are often classified according to how often their subtasks
need to synchronize or communicate with each other.
An application exhibits fine-grained parallelism if its subtasks must
communicate many times per second;
it exhibits coarse-grained parallelism if they do not communicate many
times per second,
and it is embarrassingly parallel if they rarely or never have to
communicate. Embarrassingly parallel applications are considered the
easiest to parallelize.
13. SO….
-In the Discrete CPU-GPU Architecture
it is preferred to have coarse-grained co-processing to reduce the data
transfer on the PCI-e bus. Moreover, as the GPU and the
CPU have their own memory controllers and caches .
-In the Discrete CPU-GPU Architecture
It is feasible to have the fine-grained co-processing
14. OPEN-CL
Open Computing Language (Open-CL) is a framework for writing
programs that execute across heterogeneous platforms consisting of
central processing units (CPUs), graphics processing units (GPUs) .
The advantage of OpenCL is that the same OpenCL code
can run on both the CPU and the GPU without modification.
Previous studies have shown that implementations with OpenCL achieve very close performance to those with native languages
such as CUDA and OpenMP on the GPU and the
CPU, respectively.
OpenCL can be used to give an application access to a graphics
processing unit for non-graphical computing ( general-purpose
computing on graphics processing units).
15. GPGPU
General-purpose computing on graphics processing units (GPGPU, rarely GPGP
or GP²U) is the utilization of a graphics processing unit (GPU), which
typically handles computation only for computer graphics, to perform
computation in applications traditionally handled by the central
processing unit (CPU).Any GPU providing a functionally complete set of
operations performed on arbitrary bits can compute any computable value.
Additionally, the use of multiple graphics cards in one computer, or large
numbers of graphics chips, further parallelizes the already parallel nature of
graphics processing.
OpenCL is the currently dominant open general-purpose GPU computing
language. The dominant proprietary framework is Nvidia's CUDA.
16. HASH JOIN CO-PROCESSING
On the coupled architecture, co-processing
should be fine-grained, and schedule the workloads carefully
to the CPU and the GPU.
Moreover, we need to consider
the memory specific optimizations for the shared cache
architecture and memory systems exposed by OpenCL.
17. ARCHITECTURE AWARE HASH JOINS
Hash joins are considered as the most efficient join algorithm
for main memory databases.
Two main Types of Hash joins:
1.Simple Hash Join.
2.Portioned Hash Join.
18. FINE-GRAINED STEPS IN HASH JOINS
A hash join operator works on two input relations, R and
S. We assume that |R| < |S|. A typical hash join algorithm
has three phases: partition, build, and probe. The partition
phase is optional, and the simple hash join does not have
a partition phase.
In SHJ, the build phase constructs an in-memory hash
table for R. Then in the probe phase, for each tuple in S, it
looks up the hash table for matching entries. Both the build
and the probe phases are divided into four steps, b1 to b4 and
p1 to p4, respectively.
A hash table consists of an array of bucket headers. Each bucket header contains two fields: total
number of tuples within that bucket and the pointer to a key list. The key list contains all the unique
keys with the same hash value, each of which links a rid list
storing the IDs for all tuples with the same key.
19. SHJ ALGORITHM:
Algorithm 1 Fine-grained steps in SHJ
/*build*/
for each tuple in R do
(b1) compute hash bucket number;
(b2) visit the hash bucket header;
(b3) visit the hash key lists and create
a key header if necessary;
(b4) insert the record id into the rid list;
/*probe*/
for each tuple in S do
(p1) compute hash bucket number;
(p2) visit the hash bucket header;
(p3) visit the hash key lists;
(p4) visit the matching build tuple to compare keys and produce
output tuple;
20. PHJ ALGORITHM:
Main Procedure for PHJ:
/*Partitioning: perform multiple passes if necessary*/
Partition (R);
Partition (S);
/*Apply SHJ on each partition pair*/
for each partition pair Ri and Si do
Apply SHJ on Ri and Si;
Procedure: Partition (R)
for each tuple in R do
(n1) compute partition number;
(n2) visit the partition header;
(n3) insert the <key, rid> into partition;
21. REVISITING CO-PROCESSING MECHANISMS
Off-loading (OL):
proposed to off-load some heavy operators like joins to the GPU while
other operators in the query remain on the CPU.
The basic idea of OL on a step series is: the GPU is designed as a
powerful massively parallel query co-processor, and a step
is evaluated entirely by either the GPU or the CPU.
Query processing continues on the CPU until the off-loaded computation
completes on the GPU, and vice versa. That is, given
a step series s1, ..., sn, we need to decide if si is performed
on the CPU or the GPU.
22. REVISITING CO-PROCESSING MECHANISMS
Data dividing (DD):
Problem: OL could under-utilize the CPU when the off-loaded computations
are being executed on the GPU, and vice versa.
Moreover: As the performance gap between the
GPU and the CPU on the coupled architecture is smaller
than that on discrete architectures, we need to keep both
the CPU and the GPU busy to further improve the performance.
So: We can model the CPU and the
GPU as two independent processors, and the problem is
to schedule the workload to them. This problem has its
root in parallel query processing . One of the most commonly
used schemes is to partition the input data among
processors, perform parallel query processing on individual
processors and merge the partial results from individual
processors as the final result. We adopt this scheme to be
the data-dividing co-processing scheme (DD) on the coupled Architecture.
23. PIPELINED EXECUTION (PL).
To address the limitations
of OL and DD, we consider fine-grained workload scheduling
between the CPU and the GPU so that we can capture
their performance differences in processing the same workload.
For example, the GPU is much more efficient than the
CPU on b1 and p1 whereas b3 and p3 are more efficient on
the CPU. Meanwhile, we should keep both processors busy.
Therefore, we leverage the concept of pipelined execution
and develop an adaptive fine-grained co-processing scheme
for maximizing the efficiency of co-processing on the coupled
architecture.