De gemiddelde GPU bevat tegenwoordig meer PK's dan de CPU. Naar aanleiding hiervan komen er steeds meer mogelijkheden om computationele problemen te verplaatsen van de CPU naar de GPU. Deze presentatie zal een inleiding zijn hoe je dit in Java kunt doen met behulp van Jogamp JoCL. Aan de hand van enkele simpele problemen wordt aangetoond wanneer een GPU beter ingezet kan worden dan een CPU en vice versa. Dit is ook een van de speerpunten in Java 9 (Project Sumatra) wat o.a. JoCL als inspiratie gebruikt.
Have you ever stopped to think about all the things that have to take place when you execute a .NET program? As the quote from Neal Ford says "Understand one level below your usual abstraction", this talk will look at why this is important and how can it help you if we apply it to the .NET framework. We will delve into the internals of the recently open-sourced .NET Core Runtime, looking at what happens, when it happens and why. Using freely available diagnostic tools such as PerfView, libraries including ClrMD and even the source code itself! Along the way we'll examine the Execution Engine, Type Loader, Just-in-Time (JIT) Compiler and the CLR Hosting API, to see how all these components play a part in making a 'Hello World' app possible.
Have you ever stopped to think about all the things that have to take place when you execute a .NET program? As the quote from Neal Ford says "Understand one level below your usual abstraction", this talk will look at why this is important and how can it help you if we apply it to the .NET framework. We will delve into the internals of the recently open-sourced .NET Core Runtime, looking at what happens, when it happens and why. Using freely available diagnostic tools such as PerfView, libraries including ClrMD and even the source code itself! Along the way we'll examine the Execution Engine, Type Loader, Just-in-Time (JIT) Compiler and the CLR Hosting API, to see how all these components play a part in making a 'Hello World' app possible.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs
Talk given at JAX DevOps London on 2019-05-15
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
Presentation by Jonathan Cohen & Mark Berger at Bioinformatics conference July 2013. It covers
- GPU Programming in 10 slides
- GPUs in Bioinformatics
- Porting SeqAn to CUDA
- Resources for developers and bioinformatics professionals
Performance is a feature! - Developer South Coast - part 2Matt Warren
Starting with the premise that "Performance is a Feature", this session will look at how to measure, what to measure and how get the best performance from your .NET code.
We will look at real-world examples from the Roslyn code-base and StackOverflow (the product), including how the .NET Garbage Collector needs to be tamed!
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
Shape is a fundamental three dimensional molecular property and a powerful descriptor for molecular comparison and similarity assessment; similarity in shape has proven to be a very effective method for predicting similarity in biology. As such shape-based virtual screening has become an integral part of computational drug discovery, due to both its speed and efficacy. OpenEye’s recent port of their shape similarity application, ROCS, to the GPU has resulted in a virtual screening tool of unprecedented power – FastROCS. FastROCS’ speed allows it to perform large-scale calculations of a kind inaccessible in the past and has accelerated more routine shape searching to the point that it has become competitive with more traditional, but less effective, two dimensional methods. Go through the slides to learn more. Try GPUs for free here: www.Nvidia.com/GPUTestDrive
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs
Talk given at JAX DevOps London on 2019-05-15
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
Presentation by Jonathan Cohen & Mark Berger at Bioinformatics conference July 2013. It covers
- GPU Programming in 10 slides
- GPUs in Bioinformatics
- Porting SeqAn to CUDA
- Resources for developers and bioinformatics professionals
Performance is a feature! - Developer South Coast - part 2Matt Warren
Starting with the premise that "Performance is a Feature", this session will look at how to measure, what to measure and how get the best performance from your .NET code.
We will look at real-world examples from the Roslyn code-base and StackOverflow (the product), including how the .NET Garbage Collector needs to be tamed!
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
Shape is a fundamental three dimensional molecular property and a powerful descriptor for molecular comparison and similarity assessment; similarity in shape has proven to be a very effective method for predicting similarity in biology. As such shape-based virtual screening has become an integral part of computational drug discovery, due to both its speed and efficacy. OpenEye’s recent port of their shape similarity application, ROCS, to the GPU has resulted in a virtual screening tool of unprecedented power – FastROCS. FastROCS’ speed allows it to perform large-scale calculations of a kind inaccessible in the past and has accelerated more routine shape searching to the point that it has become competitive with more traditional, but less effective, two dimensional methods. Go through the slides to learn more. Try GPUs for free here: www.Nvidia.com/GPUTestDrive
Customize and Secure the Runtime and Dependencies of Your Procedural Language...VMware Tanzu
Customize and Secure the Runtime and Dependencies of Your Procedural Languages Using PL/Container
Greenplum Summit at PostgresConf US 2018
Hubert Zhang and Jack Wu
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
Modern graphics processing units (GPUs) are efficient general-purpose stream processors. Learn how Java can exploit the power of GPUs to optimize high-performance enterprise and technical computing applications such as big data and analytics workloads. This presentation covers principles and considerations for GPU programming from Java and looks at the software stack and developer tools available. It also presents a demo showing GPU acceleration and discusses what is coming in the future.
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
The ACM SIGPLAN 6th Annual Chapel Implementers and Users Workshop (CHIUW2019) co-located with PLDI 2019 / ACM FCRC 2019.
PGAS (Partitioned Global Address Space) programming models were originally designed to facilitate productive parallel programming at both the intra-node and inter-node levels in homogeneous parallel machines. However, there is a growing need to support accelerators, especially GPU accelerators, in heterogeneous nodes in a cluster. Among high-level PGAS programming languages, Chapel is well suited for this task due to its use of locales and domains to help abstract away low-level details of data and compute mappings for different compute nodes, as well as for different processing units (CPU vs. GPU) within a node. In this paper, we address some of the key limitations of past approaches on mapping Chapel on to GPUs as follows. First, we introduce a Chapel module, GPUIterator, which is a portable programming interface that supports GPU execution of a Chapel forall loop. This module makes it possible for Chapel programmers to easily use hand-tuned native GPU programs/libraries, which is an important requirement in practice since there is still a big performance gap between compiler-generated GPU code and hand-turned GPU code; hand-optimization of CPU-GPU data transfers is also an important contributor to this performance gap. Second, though Chapel programs are regularly executed on multi-node clusters, past work on GPU enablement of Chapel programs mainly focused on single-node execution. In contrast, our work supports execution across multiple CPU+GPU nodes by accepting Chapel's distributed domains. Third, our approach supports hybrid execution of a Chapel parallel (forall) loop across both a GPU and CPU cores, which is beneficial for specific platforms. Our preliminary performance evaluations show that the use of the GPUIterator is a promising approach for Chapel programmers to easily utilize a single or multiple CPU+GPU node(s) while maintaining portability.
CUDA lab's slides of "parallel programming" courseShuai Yuan
online version:
http://yszheda.github.io/CUDA-lab
I made the slides as a part-time TA for the lab course.
The slides are generated by the great reveal.js.
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Akihiro Hayashi
Akihiro Hayashi, Max Grossman, Jisheng Zhao, Jun Shirako, Vivek Sarkar. The 26th International Workshop on Languages and Compilers for Parallel Computing (LCPC2013), September 25-27, 2013 Qualcomm Research Silicon Valley, Santa Clara, CA (co-located with CnC-2013).
The Java Memory Model describes how threads in the Java programming language interact through memory. Together with the description of single-threaded execution of code, the memory model provides the semantics of the Java programming language.
It is crucial for a programmer to know how, according to Java Language Specification, write correctly synchronized, race free programs.
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
Third Workshop on Accelerator Programming Using Directives (WACCPD2016, co-located with SC16)
While GPUs are increasingly popular for high-performance
computing, optimizing the performance of GPU programs is a time-consuming and non-trivial process in general. This complexity stems from the low abstraction level of standard
GPU programming models such as CUDA and OpenCL:
programmers are required to orchestrate low-level operations
in order to exploit the full capability of GPUs. In terms of
software productivity and portability, a more attractive approach
would be to facilitate GPU programming by providing high-level
abstractions for expressing parallel algorithms.
OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years.
From OpenMP 4.0 onwards, GPU platforms are supported
by extending OpenMP’s high-level parallel abstractions with
accelerator programming. This extension allows programmers to
write GPU programs in standard C/C++ or Fortran languages,
without exposing too many details of GPU architectures.
However, such high-level parallel programming strategies generally impose additional program optimizations on compilers,
which could result in lower performance than fully hand-tuned
code with low-level programming models.To study potential
performance improvements by compiling and optimizing high-level GPU programs, in this paper, we 1) evaluate a set of
OpenMP 4.x benchmarks on an IBM POWER8 and NVIDIA
Tesla GPU platform and 2) conduct a comparable performance
analysis among hand-written CUDA and automatically-generated
GPU programs by the IBM XL and clang/LLVM compilers.
Best Practices for performance evaluation and diagnosis of Java Applications ...IndicThreads
Session Presented at 5th IndicThreads.com Conference On Java held on 10-11 December 2010 in Pune, India
WEB: http://J10.IndicThreads.com
------------
Enterprise applications typically comprise of multi layered stacks including the application modules, application servers, the Java Virtual Machine and the underlying Operating System. Consequently the performance of these applications are a factor of these different layers. In the eventuality of a performance problem, it is often difficult to determine the starting point for diagnosis. The Java Virtual Machine is the ‘engine’ for most of the applications. It is responsible broadly for efficient execution and memory management of applications. End users have difficulty attributing the effect of the JVM on the performance of the application. This is because usually JVM is viewed as a ‘black box’.
This talk provides an insight into the key subsystems of the JVM by looking under the hood of a high performance JVM. It ventures onto talk about approaches and techniques for analyzing performance issues. It concludes by introducing the audience to a tool called the “Health Center” which is useful for evaluating and comprehending the JVM behavior of a running application in an unobtrusive, lightweight manner.
Takeaways for the Audience A better understanding of key JVM components, approaches and techniques to diagnose performance issues and performance evaluation using the Health Center
LAS16-403: GDB Linux Kernel Awareness
Speakers: Peter Griffin
Date: September 29, 2016
★ Session Description ★
The presentation will look at the ways in which GDB can be enhanced when debugging the Linux kernel to give it better knowledge of the underlying operating system to enable a better debugging experience. It will also provide a status of the current work being undertaken in this area by the ST landing team, a demo and potential future work.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-403
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-403/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
10. Voorbeeld (MacBook Pro)
Platform name: Apple
Platform profile: FULL_PROFILE
Platform spec version: OpenCL 1.2
Platform vendor: Apple
Device 16925696 HD Graphics 4000
Driver:1.2(Aug 17 2014 20:29:07)
Max work group size:512
Global mem size: 1073741824
Local mem size: 65536
Max clock freq: 1200
Max compute units: 16
Device 16918272 GeForce GT 650M
Driver:8.26.28 310.40.55b01
Max work group size:1024
Global mem size: 1073741824
Local mem size: 49152
Max clock freq: 900
Max compute units: 2
Device 4294967295 Intel(R) Core(TM) i7-3720QM CPU @
2.60GHz
Driver:1.1
Max work group size:1024
Global mem size: 17179869184
Local mem size: 32768
Max clock freq: 2600
Max compute units: 8
24. Tips & tricks
● Unit testen
– Aparte test kernels
– Test cases in batches
kernel void testDifficultCalculation(const int testCount,
global const double* distance, global double* results) {
const int testId = get_global_id(0);
if (testId < testCount) {
results[testId] = difficultCalculation(distance[testId]);
}
}
25. Direct memory management
● -XX:MaxDirectMemorySize=??M
● ByteBuffer.allocateDirect(int capacity)
– Max 2GB per buffer
● Garbage collection te laat
– Getriggered door heap collection
– Handmatig vrijgeven
– ((sun.nio.ch.DirectBuffer)
myBuffer).cleaner().clean();
● VisualVM plugin voor direct buffers
26. GPU vs CPU
● GPU's checken minder dan CPU's
– Div by zero
– Out of bounds checks
– Test eerst op CPU
27. Portabiliteit
● OpenCL is portable, de performance
niet
– Memory sizes verschillen
– Memory latencies verschillen
– Work group sizes verschillen
– Compute devices verschillen
– OpenCL implementatie verschillen
● Develop dus voor de productie
hardware
28. Ten slotte
● Float vs Double
– Dubbele precisie
– Halve performance
– Double support optioneel
30. Conclusie
● Wanneer te gebruiken?
– Als performance echt nodig is
– Als probleem hoge concurrency heeft
– Als probleem partitioneerbaar is
31. Vragen?
Setting up OpenCL test on Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
Warming up OpenCL test
[thread 32003 also had an error][thread 33027 also had an error]
##
A fatal error has been detected by the Java Runtime Environment:
##
SIGSEGV[thread 32515 also had an error]
(0xb)[thread 32771 also had an error]
[thread 32259 also had an error]
at pc=0x00000001250ded70, pid=99851, tid=29475
##
JRE version: Java(TM) SE Runtime Environment (8.0_20-b26) (build 1.8.0_20-b26)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# [thread 17415 also had an error]
C [cl_kernels+0x1d70] sort_wrapper+0x1b0
##
Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
##
An error report file with more information is saved as:
# /Users/arjanl/Documents/opencl/workspace/opencl-test/jogamp/hs_err_pid99851.log
[thread 31763 also had an error]
##
If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
#
Editor's Notes
Wij zijn Arjan & Maarten
Arjan: software architect, schaalbaarheid en performance interesse
Maarten: senior developer, performance en concurrency, 3d interesse
PAS: programmatische aanpak stikstof
Balanceren van milieu en economische ontwikkelingen.
Rekeninstrument: monitoren doelstellingen en ondersteuning aanvraag vergunningen
Berekend concentraties/deposities
Exporteer voor vergunning aanvraag
Vergelijk meerdere situaties
OpenCL toepassing: wegverkeer
Snelheid van belang ivm wachten
Importeer set bronnen
Bereken per bron – rekenpunt
Tel resultaten op per rekenpunt
Emissie van de weg
Afstand tot de weg
Windsnelheid
Windrichting
Ozon concentratie
Locatie
Creatief met tekst files
OpenCL file inladen + pre-processen
Java constanten toevoegen dmv #define
Locale 1.0 vs 1,0
Configureerbare opties
Tijd voor testen!
Test kernels toevoegen, alleen in test mode.
Junit test functie:
Buffers met test waarden
Buffers met verwachtte resultaten
Test → &apos;Uitdagingen&apos; met direct memory
Niet genoeg geheugen → Direct memory size
Max 2 GB per buffer
Eerste run goed, tweede run faalt? → Garbage Collection getriggered op heap space.
Buffer release → geheugen handmatig vrijgeven
Sun classes → JVM specifiek
Handige tool: plugin voor VisualVM
Division by zero → geen probleem, resultaten waardeloos
Lezen/schrijven buiten gealloceerd geheugen?
CPU → Crash
GPU → Geen probleem
(Waarden veranderen per test run)
Test eerst op CPU! (Maar nog geen garantie)
Nog meer device verschillen...
“OpenCL is portable, de performance niet”
OpenCL ook niet altijd portable
“Write once, debug anywhere” ?
Develop voor productie hardware/drivers
Performance of precisie?
Is double echt nodig?
Double support optioneel, maar high end meestal wel.
Alleen als de performance nodig is EN
Het probleem hoge concurrency vertoont
Partioneerbaar meestal handig