High performance computing tutorial, with checklist and tips to optimize clus...Pradeep Redddy Raamana
Introduction to high performance computing, what is it, how to use it and when to use what. Provides a detailed checklist how to build pipelines and tips to optimize cluster usage and reduce waiting time in queue. It also provides a quick overview of resources available in Compute Canada.
High performance computing tutorial, with checklist and tips to optimize clus...Pradeep Redddy Raamana
Introduction to high performance computing, what is it, how to use it and when to use what. Provides a detailed checklist how to build pipelines and tips to optimize cluster usage and reduce waiting time in queue. It also provides a quick overview of resources available in Compute Canada.
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
This presentation focuses on Nvidia GPUs and explores the topics of what a GPU is, its basic architecture, how it is different from a CPU, its basic working, and what new Nvidia has to offer in consumer as well as server market
This session introduces you to Amazon EC2 F1 instances and walks you through a typical development and deployment process, including the Approved Amazon EC2 F1 C/C++ development workflow. We also discuss a number of use cases in different domains, including financial risk simulation, genomics, video processing, big data and analytics, with a discussion about acceleration work on top of EC2 F1.
Dustin Franklin (GPGPU Applications Engineer, GE Intelligent Platforms ) presents:
"GPUDirect support for RDMA provides low-latency interconnectivity between NVIDIA GPUs and various networking, storage, and FPGA devices. Discussion will include how the CUDA 5 technology increases GPU autonomy and promotes multi-GPU topologies with high GPU-to-CPU ratios. In addition to improved bandwidth and latency, the resulting increase in GFLOPS/watt poses a significant impact to both HPC and embedded applications. We will dig into scalable PCIe switch hierarchies, as well as software infrastructure to manage device interopability and GPUDirect streaming. Highlighting emerging architectures composed of Tegra-style SoCs that further decouple GPUs from discrete CPUs to achieve greater computational density."
Learn more at: http://www.gputechconf.com/page/home.html
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Linaro
Session ID: SFO17-203
Session Name: Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Speaker: Fu Wei
Track: LEG
★ Session Summary ★
This presentation gives an updated RAS architecture on ARM64 base on RAS extension (in ARMv8.2), SDEI (Software Delegated Exception Interface), APEI, UEFI PI-SMM. Will talk about all the components of the new RAS architecture on ARM64, gives audience the current status and the next step of development.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-203/
Presentation:
Video: https://www.youtube.com/watch?v=NReFBzbeWi0
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
This slide provides a basic understanding of hypervisor support in ARM v8 and above processors. And these slides (intent to) give some guidelines to automotive engineers to compare and choose right solution!
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
In this webinar presentation, ArrayFire COO Oded Green demonstrates best practices to help you quickly get started with OpenCL™ programming. Learn how to get the best performance from AMD hardware in various programming languages using ArrayFire. Oded discusses the latest advancements in the OpenCL™ ecosystem, including cutting edge OpenCL™ libraries such as clBLAS, clFFT, clMAGMA and ArrayFire. Examples are shown in real code for common application domains.
Watch the webinar here: http://bit.ly/1obT0M2
For more developer resources, visit:
http://arrayfire.com/
http://developer.amd.com/
Follow us on Twitter: https://twitter.com/AMDDevCentral
See info in the slides for more contact information and resource links!
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
This presentation focuses on Nvidia GPUs and explores the topics of what a GPU is, its basic architecture, how it is different from a CPU, its basic working, and what new Nvidia has to offer in consumer as well as server market
This session introduces you to Amazon EC2 F1 instances and walks you through a typical development and deployment process, including the Approved Amazon EC2 F1 C/C++ development workflow. We also discuss a number of use cases in different domains, including financial risk simulation, genomics, video processing, big data and analytics, with a discussion about acceleration work on top of EC2 F1.
Dustin Franklin (GPGPU Applications Engineer, GE Intelligent Platforms ) presents:
"GPUDirect support for RDMA provides low-latency interconnectivity between NVIDIA GPUs and various networking, storage, and FPGA devices. Discussion will include how the CUDA 5 technology increases GPU autonomy and promotes multi-GPU topologies with high GPU-to-CPU ratios. In addition to improved bandwidth and latency, the resulting increase in GFLOPS/watt poses a significant impact to both HPC and embedded applications. We will dig into scalable PCIe switch hierarchies, as well as software infrastructure to manage device interopability and GPUDirect streaming. Highlighting emerging architectures composed of Tegra-style SoCs that further decouple GPUs from discrete CPUs to achieve greater computational density."
Learn more at: http://www.gputechconf.com/page/home.html
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Linaro
Session ID: SFO17-203
Session Name: Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Speaker: Fu Wei
Track: LEG
★ Session Summary ★
This presentation gives an updated RAS architecture on ARM64 base on RAS extension (in ARMv8.2), SDEI (Software Delegated Exception Interface), APEI, UEFI PI-SMM. Will talk about all the components of the new RAS architecture on ARM64, gives audience the current status and the next step of development.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-203/
Presentation:
Video: https://www.youtube.com/watch?v=NReFBzbeWi0
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
This slide provides a basic understanding of hypervisor support in ARM v8 and above processors. And these slides (intent to) give some guidelines to automotive engineers to compare and choose right solution!
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
In this webinar presentation, ArrayFire COO Oded Green demonstrates best practices to help you quickly get started with OpenCL™ programming. Learn how to get the best performance from AMD hardware in various programming languages using ArrayFire. Oded discusses the latest advancements in the OpenCL™ ecosystem, including cutting edge OpenCL™ libraries such as clBLAS, clFFT, clMAGMA and ArrayFire. Examples are shown in real code for common application domains.
Watch the webinar here: http://bit.ly/1obT0M2
For more developer resources, visit:
http://arrayfire.com/
http://developer.amd.com/
Follow us on Twitter: https://twitter.com/AMDDevCentral
See info in the slides for more contact information and resource links!
The OpenCL C++ Wrapper is designed to be built on top of the OpenCL 1.2 C API and is not a replacement. The C++ Wrapper API corresponds closely to the underlying C API and introduces no additional execution overhead.
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCLJanakiRam Raghumandala
WebCL enables you boost the performance of select HTML application where lots of computation is involved. For example, fluid simulation, image manipulation, video manipulation.
Since, OpenCL is the underlying platform, the same has been introduced in the beginning and then WebCL.
Contents:
Motivation
Introduction to OpenCL
Introduction to WebCL
Hello World Program of WebCL
explosive growth of mobile devices usage and the quick
increase of the mobile applications are facing many challenges in
their resources as low computing power, battery life, limited
bandwidth, and storage. Mobile Cloud Computing (MCC) has
been introduced to be a potential technology for mobile services
and to solve the mobile resources problem by moving the
processing and the storage of data out from mobile devices to the
cloud. The cloud enables the integration with additional
development tool as graphical processing power (GPU) to
increase the computational power. This paper presents a novel
approach for real time face detection using GPU acceleration.
The results of developed Applications demonstrate that the
proposed Mobile GPU cloud computing increase both speed and accuracy of facial detection systems.
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...AMD Developer Central
Presentation WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael Sevenier, at the AMD Developer Summit (APU13) November 11-13, 2013.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2016-member-meeting-khronos
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Mark Bünger, Vice President of Research at Lux Research, delivers the presentation "Imaging + AI: Opportunities Inside the Car and Beyond" at the December 2016 Embedded Vision Alliance Member Meeting. Bünger presents his firm’s perspective on how embedded vision will upend the automotive industry.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2016-member-meeting-khronos
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Peter McGuinness, representing the Khronos Group, delivers the presentation "New Standards for Embedded Vision and Neural Networks" at the December 2016 Embedded Vision Alliance Member Meeting. McGuinness discusses new standardization work for embedded neural network and vision software.
AMD’s math libraries can support a range of programmers from hobbyists to ninja programmers. Kent Knox from AMD’s library team introduces you to OpenCL libraries for linear algebra, FFT, and BLAS, and shows you how to leverage the speed of OpenCL through the use of these libraries.
Review the material presented in the AMD Math libraries webinar in this deck.
For more:
Visit the AMD Developer Forums:http://devgurus.amd.com/welcome
Watch the replay: www.youtube.com/user/AMDDevCentral
Follow us on Twitter: https://twitter.com/AMDDevCentral
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...AMD Developer Central
Presentation PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by Wu Feng and Mark Gardner at the AMD Developer Summit (APU13) November 11-13, 2013.
Greater Chicago Area - Independent Non-Profit Organization Management Professional
View clifford sugerman's professional profile on LinkedIn. LinkedIn is the world's largest business network, helping professionals like clifford sugerman discover.
Introduction to Open Source Hardware (OSHW) including: the philosophy, best practices, CERN Open Hardware License, Open Hardware Summit, Open Source Hardware Association (OSHWA), Open Source Hardware Certification Program, OSHW Products, Linux on OSHW, and OSHW in Science.
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and I will give a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
OpenCL & the Future of Desktop High Performance Computing in CADDesign World
Modern desktop computers have more compute capabilities than ever before. Most of these systems include both a central processing unit (CPU) and a graphics processing unit (GPU), each consisting of multiple computing cores providing tremendous processing power. To date, harnessing the total processing power of a desktop workstation, fully utilizing both the CPU and GPU, has proven difficult for software developers. CPUs and GPUs have few similarities in both design and programming models. OpenCL is the tool that bridges the gap for software developers and enables them to fully tap into the power of both processors with a single software programming interface.
This presentation will examine the details of CPUs and GPUs, explore their differences and similarities, and highlight the computing power they can provide. We will also take a look OpenCL, what it is, what it does, and how this new computing interface will change the way software developers create software and help end users fully realize the compute power contained within today’s modern desktop computers.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit-khronos
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Neil Trevett, President of the Khronos Group, presents the "Vision API Maze: Options and Trade-offs" tutorial at the May 2016 Embedded Vision Summit.
It’s been a busy year in the world of hardware acceleration APIs. Many industry-standard APIs, such as OpenCL and OpenVX, have been upgraded, and the industry has begun to adopt the new generation of low-level, explicit GPU APIs, such as Vulkan, that tightly integrate graphics and compute. Some of these APIs, like OpenVX and OpenCV, are vision-specific, while others, like OpenCL and Vulkan, are general-purpose. Some, like CUDA and Renderscript, are supplier-specific, while others are open standards that any supplier can adopt. Which ones should you use for your project?
In this presentation, Neil Trevett, President of the Khronos Group standards organization, updates the landscape of APIs for vision software development, explaining where each one fits in the development flow. Neil also highlights where these APIs overlap and where they complement each other, and previews some of the latest developments in these APIs.
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware
Hear hear dev & ops alike - ever got bitten by the fragmentation of the Cloud space at deployment time, By AWS vs Azure, Open Shift vs Heroku ? in a word, ever dreamt of configuring at once your Cloud application along with both its VMs and database ? Well, the extensible Open Cloud Computing Interface (OCCI) REST API (see http://occi-wg.org/) allows just that, by addressing the whole XaaS spectrum.
And now, OCCI is getting powerboosted by Eclipse Modeling and formal foundations. Enter Cloud Designer and other outputs of the OCCIware project (See http://www.occiware.org) : multiple visual representations, one per Cloud layer and technology. XaaS Cloud extension model validation, documentation & ops scripting generation. Simulation, decision-making comparison. Connectors that bring those models to life by getting their status from common Cloud services. Runtime middleware, deployed, monitored, adminstrated. And tackling the very interesting challenge of modeling a meta API in EMF's metamodel, while staying true to EMF, Eclipse tools and the OCCI standard.
Featuring Eclipse Sirius, Acceleo generators, EMF at runtime. Coming soon to a new Eclipse Foundation project near you, if so you'd like.
This talk includes a demonstration of the Docker connector and of how to use Cloud Designer to configure a simple Cloud application's deployment on the Roboconf PaaS system and OpenStack infrastructure.
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingSachintha Gunasena
This session discusses a basic high-level introduction to concurrency programming with Java which include:
programming basics, OOP concepts, concurrency, concurrent programming, parallel computing, concurrent vs parallel, why concurrency, real world example, terms, Moore's Law, Amdahl's Law, types of parallel computation, MIMD Variants, shared memory model, distributed memory model, client server model, scoop mechanism, scoop preview - a sequential program, in a concurrent setting - using scoop, programming then & now, sequential programming, concurrent programming,
Collective Knowledge: python and scikit-learn based open research SDK for col...Grigori Fursin
We would like to share our experience with a python-based Collective Knowledge SDK for collaborative and reproducible experimentation. It helps organize and share experimental setups (code, data and meta) as unified and reusable components with JSON API via GITHUB. It also helps unify, automate and crowdsource analysis and exploration of multi-dimensional optimization spaces using scikit-learn.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit-opencv
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Gary Bradski, President and CEO of the OpenCV Foundation, presents the "OpenCV Open Source Computer Vision Library: Latest Developments" tutorial at the May 2015 Embedded Vision Summit.
OpenCV is an enormously popular open source computer vision library, with over 9 million downloads. Originally used mainly for research and prototyping, in recent years OpenCV has increasingly been used in deployed products on a wide range of platforms from cloud to mobile.
The latest version, OpenCV 3.0 is currently in beta, and is a major overhaul, bringing OpenCV up to modern C++ standards and incorporating expanded support for 3D vision. The new release also introduces a modular “contrib” facility that enables independently developed modules to be quickly integrated with OpenCV as needed, providing a flexible mechanism to allow developers to experiment with new techniques before they are officially integrated into the library.
In this talk, Gary Bradski, head of the OpenCV Foundation, provides an insider’s perspective on the new version of OpenCV and how developers can utilize it to maximum advantage for vision research, prototyping, and product development.
Expanded Perception and Interaction Centre (EPICentre)Tomasz Bednarz
Expanded Perception and Interaction Centre (EPICentre) is a pioneering high-performance visualisation facility. It forges new ground in integrated thinking (artistic and scientific) to facilitate understanding of complex datasets and ultra-scale imagery. EPICentre promotes cross connection of visualization with applied computational simulations, artificial intelligence (AI), and creativity in arts and science.
"SoS" recreates the experiences of two Syrian asylum seekers as they lose sight of each other during a treacherous ocean voyage from Indonesia to Northern Australia.
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
Presented at the ACEMS workshop at QUT in February 2015.
Credits: whole project team (names listed in the first slide).
Approved by CSIRO to be shared externally.
Demoscene (Underground Real-Time Art) was born in the computer underground, and demos are the product of extreme programming and self-expression (see for example http://youtu.be/UmS6LtNwMcE). Many demoscene productions are inspired by real science, which is presented in very creative ways – visuals synchronised with the music to achieve maximum awesomeness, but also sending strong message to the viewer. Come and listen to stories about connecting design, art and science together, and also about some coding tricks.
Open presentation, training material. Presented at CSIRO Big Data 2.0 workshop in September 2013, North Ryde, Australia. Animated by hands-on examples.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
1. Introduction to OpenCL
How to select OpenCL devices, initialise a compute context, allocate device memory,
compile and run kernels, output results
OpenCL Workshop | December 1, 2010 | Brisbane, Australia!
Tomasz Bednarz, CESRE!
2. OpenCL is a trademark of Apple, Inc.
Welcome to Open Computing Language (OpenCLTM)
• N-Body Simulation Demo"
• Khronos Group and OpenCL standard"
• OpenCL Anatomy"
• Platform Model"
• Execution Model"
• Memory Model"
• Short Introduction to OpenCL Programming "
• OpenCL C language"
• Supported data types"
• Synchronisation primitives"
• Additional information and resources."
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
4. N-Body Simulation
Lars Nyland, Mark Harris, Jan Prins “Fast N-Body Simulation with CUDA”. In Hubert
Nguyen, editor, GPU Gems 3, chapter 31, pages 677-695, Addison Wesley 2007.
• Applications"
•
•
•
•
Molecular dynamics"
Astronomical and astrophysical simulations"
Fluid dynamics simulation"
Radiosity (Radiometric transfer)"
• N2 interactions to compute per time-step"
• For the brute force all-pairs approach
discussed here"
• Highly Parallel"
• High Arithmetic intensity"
Two of these galaxies
attract each other.
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
5. N-Body Simulation (http://developer.nvidia.com/gpugems3)
• N-Body simulation models the motion of particles subject to a
force due to the particle-particle interactions between all particles
in the system"
• Typical example: simulation of stars in a galaxy subject to the
gravitational force"
• Given N bodies with an initial position xj and velocity vj for 1≤i≤N,
the force fij on body i caused by its gravitational attraction to body
j is given by the following:"
fij = G
mi m j
rij
2
!
rij
rij
Fi =
#
fij = Gmi
1! j!N
i" j
#
m j rij
1! j!N
i" j
rij
3
where mi and mj are the masses of bodies i and j."
• The acceleration is computed as:"
F
ai =
j
i
mi
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
i
rij = x j ! xi
6. N-Body Simulation
• As bodies approach each other, the force between
them grows without bound, therefore softening factor
e2>0 may be added"
Fi ! Gmi
#
1" j"N
m j rij
(
2
rij + e
2
)
3
2
• The softening factor limits the magnitude of the force
between the bodies, which is desirable for numerical
integration of the system state"
• Acceleration:"
F
ai = i ! G " $
mi
1# j#N
m j rij
(
2
rij + e
2
)
3
2
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
7. N-Body Simulation: parallel concept
single interaction
between i and j
Outer Loop (i)
Particle i
Particle j
Inner Loop (j)
• Particles i, j interact with each other"
• OpenCL can be used to compute acceleration on all bodies in parallel "
• N/p work groups of p work items process p bodies at a time"
• Every work item loads all other body positions from off-chip memory"
• N2 loads … bandwidth bound = poor performance "
• Optimization (using tiles) to be presented in the afternoon session"
8. N-Body Simulation: body-body force calculation
Fi ! Gmi
#
1" j"N
ai =
Fi
! G" $
mi
1# j#N
m j rij
(
(
http://developer.download.nvidia.com/compute/opencl/sdk/website/samples.html#oclNbody
http://developer.apple.com/library/mac/#samplecode/OpenCL_NBody_Simulation_Example/Introduction/Intro.html
2
rij + e
2
m j rij
2
rij + e
2
)
)
3
3
2
2
13. http://www.khronos.org/opencl/
What is OpenCL?
OpenCL - Open Computing Language: open, royalty-free standard for programming
heterogeneous parallel computing at the intersection of GPU and multi-core CPU capabilities.
CPUs
Multiple cores driving
performance increases
Multi-processor
programming, threading
libraries - e.g. OpenMP
GPUs
Emerging
Intersection
Heterogeneous
Computing
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Increasingly general
purpose data-parallel
computing
Graphics APIs and
Shading Languages,
Vendor Compute APIs
Courtesy of
14. What is OpenCL?
Roadmap convergence
OpenGL 4.0 and OpenGL ES 2.0
are both streamlined, programmable
pipelines. GL and ES working groups
are working on convergence. WebGL
is a positive pressure for portable 3D
content for all platforms.
Desktop Visual Computing
OpenGL and OpenCL have direct
interoperability. OpenCL objects can be
Created from OpenGL Textures, Buffer
Objects and Renderbuffers.
Parallel computing and
visualisation
OpenCL – the center of a
visual computing
ecosystem with parallel
computations, 3D, video,
audio, and image
processing on desktop,
embedded and mobile
systems!
Desktop 3D Ecosystem
Cross-platform
desktop 3D
3D for Web
Heterogeneous
Parallel Programing
Embedded 3D
Surface and
synch abstraction
Streaming Media and
Image Processing
Mobile Visual Computing
Compute, graphics and AV APIs
interoperate through EGL.
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Hundreds of men
years invested by
industry experts in
coordinated
ecosystem!
Streamlined APIs for mobile and
embedded graphics, media and
compute acceleration
Based on http://www.khronos.org/opencl/
15. OpenCL Timeline
• OpenCL 1.0 was released six months after the proposal was created"
• OpenCL ships first on Appleʼs Mac OS X Snow Leopard"
• 18 month cadence between OpenCL 1.0 and OpenCL 1.1"
• Backward compatible to protect software investment"
Multiple conformant
implementations ship
across diverse OS and
platforms.!
Khronos releases
publicly OpenCL 1.1 as
royalty-free specification.!
June 2008
May 2009
December 2008
OpenCL working group!
is proposed by Apple. !
Draft spec is contributed!
to Khronos.!
June 2010
2nd Half 2009
Khronos releases
OpenCL 1.0 conformance
tests to ensure highquality implementations.!
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
OpenCL 1.1 spec is
released and first
implementation ship.!
Based on http://www.khronos.org/opencl/
17. Design goals of OpenCL
• Enable all compute resources in system"
• CPUs, GPUs, and other processors enabled as peers"
• Data- and task- parallel compute model"
• Efficient parallel programming model"
• ANSI C99 based kernel language"
• Low-level abstraction"
• Abstracts the specifics of the underlying hardware"
• High-performance, but device independent "
• Define precision requirements for all floating-point computations"
• Consistent results on all platforms and devices"
• Interoperability with Graphics APIs"
• Dedicated support for OpenGL, OpenGL ES and DirectX"
• Drive future hardware requirements"
• Applicable to both consumer and HPC applications"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
19. It’s heterogeneous world
• Platform model encapsulates
compute resources"
• A modern platform includes:"
•
•
•
•
One or more CPUs"
One or more GPUs"
Optional accelerators (e.g. DSPs)"
Other?"
Using OpenCL Programmers write a single portable
program that uses ALL resources !
in the heterogeneous platform!
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Based on http://www.khronos.org/opencl/
20. OpenCL Platform Model
• One Host connected to one or more Compute Devices"
• Compute device can be a CPU, GPU or other processor"
• Each Compute Device is composed of one or more Compute Units"
• Compute Unit can may be a core, multi-processor, etc."
• Each Compute Unit is further divided into one or more Processing Elements "
• Processing Elements execute code as SIMD or SPMD!
PROCESSING ELEMENT
….
COMPUTE
UNIT
COMPUTE
UNIT
COMPUTE
UNIT
COMPUTE
UNIT
COMPUTE
UNIT
COMPUTE
UNIT
.....
COMPUTE DEVICE
COMPUTE DEVICE
HOST!
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
COMPUTE
UNIT
21. Anatomy of OpenCL Application
OpenCL Application
Device Code
- Written in OpenCL C
- Executes on the device
Host Code
- Written in C/C++
- Executes on the host
COMPUTE
UNIT
COMPUTE
UNIT
COMPUTE
UNIT
COMPUTE
UNIT
COMPUTE DEVICE
….
HOST!
COMPUTE
UNIT
COMPUTE
UNIT
.....
COMPUTE
DEVICES
COMPUTE
UNIT
COMPUTE DEVICE
• Host code sends commands to the Devices:"
• To transfer data between host memory and device memories!
• To execute device code!
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
22. Anatomy of OpenCL Application
• Serial code executes in a Host (CPU) thread"
• Parallel code executes in many Device (GPU) threads across multiple processing elements"
OCL Application
Serial code
Parallel code
Serial code
Parallel code
Host = CPU
Device = GPU
…
Host = CPU
Device = GPU
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
…
24. OpenCL Execution Model
• OpenCL application runs on a Host which submits
work to the Compute Devices!
• Work item: the basic unit of work on an OpenCL device"
• Kernel: the code for a work item, which is basically C
function"
• Program: Collection of kernels and other functions
(analogous to a dynamic library). Managed by host."
• Context: The environment within which work-items
execute, which includes devices and their memories and
command queues (contains all resources for computation)"
• Command queue: A queue used by the Host application
to submit work to a Device (kernel execution instances)"
• Work is queued in-order, one queue per device"
• Work can be executed in-order or out of order"
• Events are used for synchronisation"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
MEMORY!
GPU!
CPU!
CONTEXT
GPU
&
CPU
Queues
COMMANDS
25. OpenCL Execution Model
• Portable execution model that allows a kernel to execute at each point in a
problem domain (N-dimensional computational domain) à decomposition of a
task into work-items!
Traditional loop as a function in C
OpenCL C kernel
void !
addVector(const float *A,!
const float *B,!
float *C,!
int N)!
{!
int index;!
__kernel void !
addVector(__global const float *A,!
__global const float *B,!
__global float *C,!
int N)!
{!
int index = get_global_id(0);!
!
!
for (index=0; index<N, index++)!
C[index] = A[index]+B[index];!
}!
if (index < N)!
C[index] = A[index]+B[index];!
}!
!
Work item: the basic unit of work on an OpenCL device
Kernel: the code for a work item, which is basically C function
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
26. Kernel Execution on Platform Model
Work-Item
Compute element
Work-Group
Compute unit
Kernel execution instance
• Each work-item is executed by a
compute element!
• Each work-group is executed on a
compute unit"
• Several concurrent work-groups can
reside on one compute unit depending
on work-groupʼs memory requirements
and compute unitʼs memory resources"
Compute device
…
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
• Each kernel is executed on a compute
device!
27. Benefits of Work-Groups
• Automatic scalability across devices with different numbers of compute units"
• Work-groups can execute in any order, concurrently or sequentially"
• Efficient cooperation between work-items of same work-group"
• Fast shared memory and synchronization"
• Independence between work-groups gives scalability:"
• A kernel scales across any number of compute units"
Device with 2 compute units
Kernel
Launch
Device with 4 compute units
Unit 0
Unit 1
Unit 0
Unit 1
Unit 2
Unit 3
Work-group 0!
Work-group 1!
Work-group 0!
Work-group 0!
Work-group 1!
Work-group 2!
Work-group 3!
Work-group 2!
Work-group 3!
Work-group 1!
Work-group 4!
Work-group 5!
Work-group 6!
Work-group 7!
Work-group 4!
Work-group 5!
Work-group 2!
Work-group 6!
Work-group 7!
Work-group 3!
Work-group 4!
Work-group 5!
Work-group 6!
Work-group 7!
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
28. Work-group synchronisation
• Always define the best N-dimensional index
space (NDRange) for your algorithms
(currently 1D, 2D and 3D index spaces are
supported)"
• Kernels are executed across a global domain of
work-items!
• Work-items are single points of execution and
are grouped into local work-groups!
• Global Dimensions: 1024x1024 (whole problem space)"
• Local Dimensions: 32x32 (work-group)"
Cannot synchronise outside "
of work-groups"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
1024
1024
Synchronisation between work-items"
possible only within workgroups:"
barriers and memory fences!
29. Work-items and work-groups
• A kernel is a function executed in each point of a problem
domain (for each work-item)"
• Number of work items = 4096 (16 work-groups, 256 workitems each):"
get_group_id(0) = 2
DEVICE
__kernel void !
addVector(__global const float *A,!
__global const float *B,!
__global float *C,!
int N)!
{!
int index = get_global_id(0);!
!
if (index < N)!
C[index] = A[index]+B[index];!
}!
get_global_id(0) = 1792
NDRANGE
0
1
2
3
4
…
15
get_global_size(0) = 4096
0
1
get_num_groups (0) = 16
…
WORK GROUP
255
WORK ITEM
get_local_size(0) = 256
get_local_id(0) = 255
30. Work-items and work-groups in 2D
• Number of work items to execute 128 x 128 = 16384:" (A kernel is executed in each point of a problem domain)
get_group_id(0),get_group_id(1)
DEVICE
0,0 1,0 2,0
…
7,0
0,0 1,0 2,0
0,1
1,1
0,2
0,2
…
1,1
…
15,0
4,1
…
2,2
3,4
.
0,7
get_global_size(0)
get_global_id(0),get_global_id(1)
7,7
0,15
get_local_size(0)
get_local_id(0),get_local_id(1)
get_local_size(1)
get_global_size(1)
0,1
WORK ITEMS
WORK GROUP
NDRANGE
32. OpenCL Memory Model
• Address spaces"
•
•
•
•
Private: read/write access for work-item only"
Local: read/write access for entire work-group"
Global/Constant: visible to all work-groups"
Host: accessible by the CPU"
• Synchronisation"
Private
Memory!
Private
Memory!
Private
Memory!
Private
Memory!
Work Item1
Work ItemJ
Work Item1
Work ItemJ
PE!
PE!
PE!
PE!
Compute Unit 1
Local Memory!
• All Synchronisation for all memory accesses
must be done explicitly"
Compute Unit N
Local Memory!
Global/Constant Memory!
Compute Device
Memory management is Explicit!
You must move data from host à global à local … and back"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
Host Memory!
Host
33. OpenCL Programming
•
•
•
•
How to define the platform"
How to execute code on the platform"
How to move data around in memory"
How to write (and build) programs"
35. OpenCL Language and API Highlights
• Platform Layer API (called from host)"
• Abstraction layer for diverse computational resources"
• Query, select and initialise compute devices"
• Create compute contexts and work-queues"
• Runtime API (called from host)"
• Launch compute kernels"
• Set kernel execution configuration"
• Manage scheduling, compute, and memory resources"
• OpenCL language"
• To write C-based compute kernels for execution on a compute device"
• Includes rich set of build-in functions"
• Can be compiled JIT/Online or offline"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
36. OpenCL Language Highlights
• Function qualifiers"
__kernel void !
addVector(__global const float *A,!
__global const float *B,!
__global float *C,!
int N)!
{!
int index = get_global_id(0);!
• __kernel qualifier declares a function as a kernel"
• Address space qualifiers"
!
if (index < N)!
C[index] = A[index]+B[index];!
}!
• __global, __local, __constant, __private"
• Work-item functions"
• get_work_dim(), get_global_id(), get_local_id(), get_group_id(), get_local_size()"
• Image functions"
• Images must be accessed through built-in functions"
• Read/writes performed through sampler objects from host or defined in source"
• Synchronisation functions"
• Barriers – all work-items within a work-group must execute the barrier function
before any work-item in the work-group can continue"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
37. OpenCL Framework: Overview
• Platform layer: platform query and context creation"
• Compiler for OpenCL C"
• Runtime: memory management and command execution within a context"
CPU!
GPU!
CONTEXT!
KERNELS!
PROGRAMS!
__kernel void !
addVector(!
__global float *A,!
__global const float *B,!
__global float *C)!
{!
int i = get_global_id(0);!
C[i] = A[i]+B[i];!
}!
GPU binary!
addVector!
CPU binary!
MEMORY OBJECTS!
BUFFERS!
IMAGES!
arg[0] value!
IN
ORDER!
QUEUE!
OUT OF
ORDER
QUEUE!
arg[1] value!
arg[2] value!
COMPILE CODE!
COMMAND QUEUES!
CREATE ARGS AND DATA!
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
COMPUTE DEVICE
SEND TO EXECUTION!
38. OpenCL Framework: Objects Types
•
•
•
•
•
•
•
cl_platform_id
"– identifier for a specific platform"
cl_device_id
"– identifier for a specific compute device "
cl_context
"– handle for a compute context"
cl_command_queue "– handle for a command queue (for a compute device)"
cl_mem
"– handle for a memory resource (managed by context)"
cl_program
"– handle for a program resource (library of kernels)"
cl_kernel
"– handle for a compute kernel "
• All object types are opaque handles"
• Enables cross-platform compatibility for complex data types"
• All objects are reference counted and garbage collected"
• When reference count reaches zero, object is deallocated"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
39. OpenCL Framework: Platform Layer
• To query platform information:"
• clGetPlatformIDs() à obtain the list of platforms available"
• clGetPlatformInfo() à platform profile, version, name, vendor, extensions"
• To query Devices: "
• clGetDeviceIDs() à obtain the list of devices available on platform"
• clGetDeviceInfo() à type, capabilities, vendor, name, etc."
• Create an OpenCL context for one or more devices"
One or more devices!
cl_device_id!
Context!
cl_context!
Memory and device code shared by these devices!
cl_mem
!cl_program!
Command queues to send commands to these devices!
cl_command_queue!
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
40. Context creation: platform IDs
• SIMPLE EXAMPLE get the platform ID:!
"
// get first OpenCL platform ID available"
cl_platform_id platform;"
err = clGetPlatformIDs(1, &platform, NULL);"
cl_int clGetPlatformIDs(!
cl_uint num_entries,"
cl_platform_id *platforms,"
cl_uint *num_platforms)"
• Get all platform IDs:!
"
// get number of OpenCL platforms available"
cl_int err;"
cl_uint num_platforms;"
std::vector<cl_platform_id> platformIDs;"
err = clGetPlatformIDs(NULL, NULL, &num_platforms);
if (err != CL_SUCCESS) { … }
platformIDs.resize(num_platforms);
// get all OpenCL platform IDs
err = clGetPlatformIDs(num_platforms, &platformIDs[0], NULL);
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
If NULL, the arguments are ignored
41. Context creation: device IDs
• SIMPLE: get first GPU associated with the platform:"
"
cl_device_id device;"
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);"
• Get all platform IDs:"
"
cl_uint nDevices;"
cl_device_type deviceType;"
vector<cl_device_id> deviceIDs;"
"
cl_int clGetDeviceIDs(!
cl_platform_id platform,"
cl_device_type device_type,"
cl_uint num_entries,"
cl_device_id *devices,"
cl_uint *num_devices)"
DEVICE TYPE:!
if (platformIDs.size() == 0) {"
CL_DEVICE_TYPE_CPU"
// get number of device IDs for default platform"
CL_DEVICE_TYPE_GPU"
CL_DEVICE_TYPE_ACCELERATOR"
err = clGetDeviceIDs(NULL, deviceType, 0, NULL, &nDevices); "
CL_DEVICE_TYPE_DEFAULT"
} else {"
CL_DEVICE_TYPE_ALL"
// get number of device IDs for selected platform"
err = clGetDeviceIDs(platformIDs[selectedPlatform], deviceType, 0, NULL, &nDevices); "
}"
deviceIDs.resize(nDevices);"
if (platformIDs.size() == 0) {"
// get default device IDs of default platform"
err = clGetDeviceIDs(NULL, deviceType, nDevices, &deviceIDs[0], NULL); "
} else {"
// get device IDs of selected platform"
err = clGetDeviceIDs(platformIDs[selectedPlatform], deviceType, nDevices, &deviceIDs[0], NULL); "
}"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
43. Error Handling and Resource Deallocation
• Error handling:"
• All host functions return an error code"
• Context error callback"
• The callback function may be called asynchronously by OpenCL and it is the applicationʼs
responsibility to ensure that the callback function is thread-safe"
• Resource deallocation"
• Reference counting API: clRetain*(), clRelease*()"
•
•
•
•
•
•
clRetainContext();"
clReleaseContext();"
clRetainMemObject();"
clReleaseMemObject();"
clRetainKernel();"
clReleaseKernel();"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
44. OpenCL C
• Derived from ISO C99!
• Features added to the language:!
• Work-items and work-groups"
• Vector types"
• Synchronisation"
• Address space qualifiers"
• Also includes a large set of built-in functions:!
• Image manipulation"
• Work-item manipulation"
• Math functions"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
45. OpenCL C
Language Restrictions:!
• No functions defined in C99 standard headers"
• No recursion supported"
• Pointers to function are not permitted"
• Pointers to pointers allowed within a kernel, but not as an argument"
• No variable length arrays and structures"
• Bit fields are not supported"
• Writes to a pointer to a type less than 32 bits are not supported*"
• Double types are not supported, but reserved"
• 3D Image writes are not supported"
"
"
*Some restrictions are addressed through extensions
"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
46. OpenCL C Optional Extensions
• Extensions are optional features exposed through OpenCL"
• The OpenCL working group has already approved many extensions to the
OpenCL specification:"
•
•
•
•
•
•
Double precision floating-point types"
Built-in functions to support doubles"
Atomic functions*"
Byte-addressable stores (write to pointers to types < 32 bits)*"
3D Image writes"
Built-in functions to support half types"
* New core features in OpenCL 1.1
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
47. OpenCL C: Data Types
• Scalar data types"
• char, uchar, short, ushort, int, uint, long, ulong, float"
• bool, intptr_t, ptrdiff_t, size_t, uintptr_t, void, half (storage)"
• Image types"
• Image2d_t, image3d_t, sampler_t, event_t"
• Vector data types"
•
•
•
•
•
Vector lengths 2, 3*, 4, 8, 16 (char2, ushort4, int8, float16, double2^, …)"
Endian safe"
Aligned at vector length"
Vector operations"
Built-in function "
* New core features in OpenCL 1.1
^ Double is optional type in OpenCL
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
48. OpenCL C: Synchronisation Primitives
• Built-in functions to order memory operations and synchronise execution:"
• mem_fence(CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE)"
• Waits until all reads/writes to local and/or global memory made by calling work-item prior to
mem_fence() are visible to all threads in the work-group"
• barrier(CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE)"
• Waits until all work-items in the work-group have reached this point and calls mem_fence
(CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE)"
• Used to coordinate accesses to local or global memory shared among workitems "
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
49. OpenCL Runtime
•
•
•
•
Command queues creation and management"
Device memory allocation and management"
Device code compilation and execution"
Event creation and management
(synchronisation and profiling)"
50. Kernel Compilation
• We use cl_program object that encapsulates some source code and its last
successful build (it may contain several kernel functions): "
• clCreateProgramWithSource() à creates a program object for a context, and loads
the source code specified by the strings array into the program object"
• clCreateProgramWithBinary() à create program objects and loads the binary there"
• clBuildProgram() à compiles and links a program executable from program source
or binary"
• Weʼll use also cl_kernel object which encapsulates the values of the kernelʼs
arguments used when the kernel is executed: "
• clCreateKernel() à creates a kernel object from successfully compiled program "
• clSetKernelArg() à sets the argument value for a specific argument of a kernel"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
52. Memory Objects
• Memory objects (cl_mem) are categorized into two types:"
• Buffer objects"
• Image objects!
• Memory objects can be copied to host memory, from host memory, or to other
memory objects"
• Kernels take memory objects as input, and output to one or more memory
objects"
• Regions of a memory object can be accessed by host by mapping them into
the host address space"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
53. Memory Objects: Buffer Object
• A buffer object stored a one-dimensional collection of elements (1D array)"
• Elements of a buffer object can be:"
• Scalar data type (such as an int, float)"
• Vector data type"
• User-defined structure"
• Elements in a buffer are stored in sequential fashion and can be accessed
using pointer by a kernel executing on a device"
• Data is stored in the same format as it is accessed by the kernel"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
54. Memory Objects: Image Object
• Image object stores a two- or three-dimensional texture, frame-buffer or
image"
• Can be created from existing OpenGL texture or render-buffer"
• The elements of an image object are selected from a list of predefined image
formats"
• Image elements are always a 4-component vector (each component can be a
float or signed/unsigned integer) in a kernel"
• Accessed within device via built-in functions (storage format not exposed to
application)"
• Sampler objects are used to configure how built-in functions sample images
(addressing modes, filtering modes)"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
55. Command Queue
• Memory, program and kernel objects à created using a context"
• Operations on objects performed using a command-queue"
• The command-queue used to schedule commands for execution on a device"
• En-queuing functions: clEnqueue*()"
• Multiple queues can execute on the same device"
• Modes of execution:"
• In-order: Each command in the queue executes only when the proceeding
command has completed (including memory writes) "
• Out-of-order: No guaranteed order of completion for commands"
• CL_QUEUE_PROFILING ENABLE: enable or disable profiling commands in the
command-queue"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
56. Command Queue
• Create command queue for a specific device"
cl_command_queue queue = clCreateCommandQueue(context, device, 0, NULL); "
cl_command_queue clCreateCommandQueue(!
cl_context context,"
cl_device_id device,"
cl_command_queue_properties properties,"
cl_int *errcode_ret)"
• Properties"
• CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE determines if command-queue are
executed in-order or out-of-order. If set, the commands are executed out-of-order."
• CL_QUEUE_PROFILING_ENABLE enables or disables profiling of commands in the
command-queue. If set, the profiling of commands is enabled. "
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
57. Data Transfer between Host and Device
• Create buffers on host and device
"
size_t size = 100000*sizeof(int);"
int *host_buffer = (int*)malloc(size); "
cl_mem devSrcA =
clCreateBuffer(context, CL_MEM_READ_WRITE, size, NULL, NULL); "
cl_mem devSrcB =
clCreateBuffer(context, CL_MEM_READ_WRITE, size, NULL, NULL);
…"
• Write to buffer objects from host memory
"
clEnqueueWriteBuffer(queue, devSrcA, "
CL_FALSE, 0, size, host_buffer, 0, NULL, NULL); "
…"
• Read from buffer object to host memory
"
clEnqueueReadBuffer(queue, devDst, "
CL_TRUE, 0, size, host_buffer, 0, NULL, NULL); "
…"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
cl_mem clCreateBuffer(!
cl_context context,"
cl_mem_flags flags,"
size_t size,"
void *host_ptr,"
cl_int *errcode_ret)"
CL_MEM_READ_WRITE,!
CL_MEM_WRITE_ONLY,!
CL_MEM_READ_ONLY,!
…"
cl_int clEnqueueWriteBuffer(!
cl_command_queue queue,"
cl_mem buffer,"
cl_bool blocking_write,"
size_t offset,"
size_t size,"
const void *ptr,"
cl_uint num_events_in_wait_list,!
const cl_event *event_wait_list,"
cl_event *event)"
58. Kernel Invocation over NDRange
• Host code invokes a kernel over an index space NDRange (1D, 2D or 3D)!
• Work-group dimensionality matches work-item dimensionality"
• Set number of work-items in a work-group"
size_t localWorkSize = 256;"
int numWorkGroups = (N+localWorkSize-1)/localWorkSize; // round up"
size_t globalWorkSize = numWorkGroups * localWorkSize; // must be divisible by localWorkSize
• Enqueue kernel"
clEnqueueNDRangeKernel("
queue, kernel 1, NULL, &globalWorkSize, &localWorkSize, 0, NULL, NULL); "
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
cl_int clEnqueueNDRangeKernel(!
cl_command_queue queue,"
cl_kernel kernel,"
Cl_uint work_dim,"
cont size_t *global_work_offset,"
cont size_t *global_work_size,"
cont size_t *local_work_offset,"
cl_uint num_events_in_wait_list,!
const cl_event *event_wait_list,"
cl_event *event)"
59. Command Synchronisation
• Queue barrier command: clEnqueueBarrier()"
• Commands after the barrier start executing only after all commands before the
barrier have completed"
• Events: a cl_event object can be associated with each command"
• Commands return evens and obey event waitlist"
• clEnqueue*(…, num_events_in_waitlist, *event_waitlist, *event);"
• Any commands (or clWaitForEvents()) can wait on events before executing"
• Event object can be queried to track execution status of associated command and
get profiling information"
• Some clEnqueue*() calls can be optionally blocking"
• clEnqueueReadBuffer(…, CL_TRUE, …);"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
60. Synchronisation: Queues & Events
• You must explicitly synchronise between queues"
• Multiple devices each have their own queue (possibly multiple queues per device)"
• Use events to synchronise kernel executions between queues"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
61. OpenCL Resources
• OpenCL at Khronos"
• http://www.khronos.org/opencl (spec, registry, man, forums, reference card)"
• NVIDIA OpenCL website, forum"
• http://www.nvidia.com/object/cuda_opencl_new.html"
• http://developer.nvidia.com/object/opencl.html (drivers, profiler, code samples)"
• AMD Developer Central"
• http://developer.amd.com/gpu/atistreamsdk/pages/default.aspx"
• Intel OpenCL SDK"
• http://software.intel.com/en-us/articles/intel-opencl-sdk/"
• IBM OpenCL Development Kid for Linux on Power"
• http://www.alphaworks.ibm.com/tech/opencl"
• OpenCL Studio"
• http://www.opencldev.com (develop, visualize, prototype UIs)"
CSIRO. Introduction to OpenCL. OpenCL Workshop at the OzViz 2010, Brisbane, December 2010.
62. Earth Science and Resource Engineering
Tomasz P Bednarz
3D Visualisation Engineer
Mining Technology Team
Mobile: +61 429 153 274
Email: tomasz.bednarz(_at_)csiro.au
Web: www.tomaszbednarz.com
Acknowledgments
Mark Harris, Derek Gerstmann, Mike Houston, Justin Hensley, Jason Young, Dominik Behr, Con Caris,
John Taylor, Khronos Group, AMD, NVIDIA and all others for sharing publicly their GPGPU knowledge
(this presentation is based on)
Thank you …
Contact us
Phone: 1300 363 400 or +61 3 9545 2176
Email: enquiries@csiro.au Web: www.csiro.au