SlideShare a Scribd company logo
Java GPU Computing 
Maarten Steur & Arjan Lamers
● Overzicht OpenCL 
● Simpel voorbeeld 
● Casus 
● Tips & tricks 
● Vragen
Waarom GPU Computing
Afkortingen 
● CPU, GPU, APU 
● Khronos: OpenCL, OpenGL 
● Nvidia: CUDA 
● JogAmp JOCL, JavaCL, JOCL
GPU vergeleken met CPU 
● Veel simpele cores 
● Veel high bandwidth geheugen 
● Intel core i7 GeForce GT 650M 
8 cores 384 cores 
180 Gflops 650 Gflops
Programmeer model 
● Definieer stream (flow) 
● Run in parallel
Gebruik 
● Algorithme: 
– Hoge Concurrency 
– Partitioneerbaar 
● Maar: 
– Extra latency door on- en offloaden op 
de GPU 
– Extra complexiteit
Componenten
Componenten
Voorbeeld (MacBook Pro) 
Platform name: Apple 
Platform profile: FULL_PROFILE 
Platform spec version: OpenCL 1.2 
Platform vendor: Apple 
Device 16925696 HD Graphics 4000 
Driver:1.2(Aug 17 2014 20:29:07) 
Max work group size:512 
Global mem size: 1073741824 
Local mem size: 65536 
Max clock freq: 1200 
Max compute units: 16 
Device 16918272 GeForce GT 650M 
Driver:8.26.28 310.40.55b01 
Max work group size:1024 
Global mem size: 1073741824 
Local mem size: 49152 
Max clock freq: 900 
Max compute units: 2 
Device 4294967295 Intel(R) Core(TM) i7-3720QM CPU @ 
2.60GHz 
Driver:1.1 
Max work group size:1024 
Global mem size: 17179869184 
Local mem size: 32768 
Max clock freq: 2600 
Max compute units: 8
Work & Memory
Application / Kernel 
● Schrijf .cl files in C variant 
● Kernels zijn de 'publieke' functies 
● Java Bytecode 
– Aparapi (OpenCL) 
– RootBeer (CUDA)
Disclaimer
Parallel sort 
kernel void sort(global const float* in, global float* out, int size) { 
int i = get_global_id(0); // current thread 
float id = in[i]; 
int pos = 0; 
for (int j=0;j<size;j++) 
{ 
float jd = in[j]; 
// in[j] < in[i] ? 
bool smaller = (jx < ix) || (jx == ix && j < i); 
pos += (smaller)?1:0; 
} 
out[pos] = id; 
}
Java GPU Computing 
CLContext globalContext = CLContext.create(); 
CLDevice device = globalContext.getMaxFlopsDevice(Type.GPU); 
CLContext context = CLContext.create(device); 
CLCommandQueue queue = device.createCommandQueue(); 
CLProgram program = 
context.createProgram( 
First8GpuComputing.class.getResourceAsStream("MyTask.cl") 
).build(); 
Je kunt ook builden voor specifieke devices: build(device)
Java GPU Computing 
CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( 
input.length , READ_ONLY); 
CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( 
input.length, WRITE_ONLY); 
mapToBuffer(inBuffer.getBuffer(), workLoad);
Java GPU Computing 
CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( 
input.length , READ_ONLY); 
CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( 
input.length, WRITE_ONLY); 
mapToBuffer(inBuffer.getBuffer(), workLoad); 
CLKernel kernel = program.createCLKernel("MyTask"); 
kernel.putArgs(inBuffer, outBuffer).putArg(workLoad.length);
Java GPU Computing 
CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( 
input.length , READ_ONLY); 
CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( 
input.length, WRITE_ONLY); 
mapToBuffer(inBuffer.getBuffer(), workLoad); 
CLKernel kernel = program.createCLKernel("MyTask"); 
kernel.putArgs(inBuffer, outBuffer).putArg(workLoad.length); 
queue.putWriteBuffer(inBuffer, false) 
.put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize) 
.putReadBuffer(outBuffer, true); 
FloatBuffer output = outBuffer.getBuffer();
Praktijkcasus
Praktijk casus 
● Rekeninstrument ter ondersteuning van 
de Programmatische Aanpak Stikstof. 
● http://www.aerius.nl
Praktijk casus
Praktijk casus
Tips & tricks 
● CL beheer 
– getResourceAsStream()? 
– Java constanten → #define 
– Locale? Oops!
Tips & tricks 
● Unit testen 
– Aparte test kernels 
– Test cases in batches 
kernel void testDifficultCalculation(const int testCount, 
global const double* distance, global double* results) { 
const int testId = get_global_id(0); 
if (testId < testCount) { 
results[testId] = difficultCalculation(distance[testId]); 
} 
}
Direct memory management 
● -XX:MaxDirectMemorySize=??M 
● ByteBuffer.allocateDirect(int capacity) 
– Max 2GB per buffer 
● Garbage collection te laat 
– Getriggered door heap collection 
– Handmatig vrijgeven 
– ((sun.nio.ch.DirectBuffer) 
myBuffer).cleaner().clean(); 
● VisualVM plugin voor direct buffers
GPU vs CPU 
● GPU's checken minder dan CPU's 
– Div by zero 
– Out of bounds checks 
– Test eerst op CPU
Portabiliteit 
● OpenCL is portable, de performance 
niet 
– Memory sizes verschillen 
– Memory latencies verschillen 
– Work group sizes verschillen 
– Compute devices verschillen 
– OpenCL implementatie verschillen 
● Develop dus voor de productie 
hardware
Ten slotte 
● Float vs Double 
– Dubbele precisie 
– Halve performance 
– Double support optioneel
Conclusie
Conclusie 
● Wanneer te gebruiken? 
– Als performance echt nodig is 
– Als probleem hoge concurrency heeft 
– Als probleem partitioneerbaar is
Vragen? 
Setting up OpenCL test on Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz 
Warming up OpenCL test 
[thread 32003 also had an error][thread 33027 also had an error] 
## 
A fatal error has been detected by the Java Runtime Environment: 
## 
SIGSEGV[thread 32515 also had an error] 
(0xb)[thread 32771 also had an error] 
[thread 32259 also had an error] 
at pc=0x00000001250ded70, pid=99851, tid=29475 
## 
JRE version: Java(TM) SE Runtime Environment (8.0_20-b26) (build 1.8.0_20-b26) 
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode bsd-amd64 compressed oops) 
# Problematic frame: 
# [thread 17415 also had an error] 
C [cl_kernels+0x1d70] sort_wrapper+0x1b0 
## 
Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again 
## 
An error report file with more information is saved as: 
# /Users/arjanl/Documents/opencl/workspace/opencl-test/jogamp/hs_err_pid99851.log 
[thread 31763 also had an error] 
## 
If you would like to submit a bug report, please visit: 
# http://bugreport.sun.com/bugreport/crash.jsp 
#

More Related Content

What's hot

Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyOptimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Henning Jacobs
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformatics
Shanker Trivedi
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Ural-PDC
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
Kohei KaiGai
 
Performance is a feature! - Developer South Coast - part 2
Performance is a feature!  - Developer South Coast - part 2Performance is a feature!  - Developer South Coast - part 2
Performance is a feature! - Developer South Coast - part 2
Matt Warren
 
Deploying Prometheus stacks with Juju
Deploying Prometheus stacks with JujuDeploying Prometheus stacks with Juju
Deploying Prometheus stacks with Juju
J.J. Ciarlante
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Kohei KaiGai
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
Kohei KaiGai
 
GPU profiling for computer vision applications
GPU profiling for computer vision applicationsGPU profiling for computer vision applications
GPU profiling for computer vision applications
Mai Nishimura
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New World
Can Ozdoruk
 
Nvidia® cuda™ 5 sample evaluationresult_2
Nvidia® cuda™ 5 sample evaluationresult_2Nvidia® cuda™ 5 sample evaluationresult_2
Nvidia® cuda™ 5 sample evaluationresult_2
Yukio Saito
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
 
chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論
belltailjp
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012DefCamp
 
Sun jdk 1.6 gc english version
Sun jdk 1.6 gc english versionSun jdk 1.6 gc english version
Sun jdk 1.6 gc english version
bluedavy lin
 

What's hot (18)

Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyOptimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformatics
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
Performance is a feature! - Developer South Coast - part 2
Performance is a feature!  - Developer South Coast - part 2Performance is a feature!  - Developer South Coast - part 2
Performance is a feature! - Developer South Coast - part 2
 
Deploying Prometheus stacks with Juju
Deploying Prometheus stacks with JujuDeploying Prometheus stacks with Juju
Deploying Prometheus stacks with Juju
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
GPU profiling for computer vision applications
GPU profiling for computer vision applicationsGPU profiling for computer vision applications
GPU profiling for computer vision applications
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New World
 
Nvidia® cuda™ 5 sample evaluationresult_2
Nvidia® cuda™ 5 sample evaluationresult_2Nvidia® cuda™ 5 sample evaluationresult_2
Nvidia® cuda™ 5 sample evaluationresult_2
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
 
Sun jdk 1.6 gc english version
Sun jdk 1.6 gc english versionSun jdk 1.6 gc english version
Sun jdk 1.6 gc english version
 

Similar to Java gpu computing

開放運算&GPU技術研究班
開放運算&GPU技術研究班開放運算&GPU技術研究班
開放運算&GPU技術研究班
Paul Chao
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
Alcides Fonseca
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
Dilum Bandara
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
VMware Tanzu
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
J On The Beach
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
libfetion
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
Akihiro Hayashi
 
clWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPUclWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPU
John Colvin
 
CUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseCUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" course
Shuai Yuan
 
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Akihiro Hayashi
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time OptimizationKan-Ru Chen
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
Łukasz Koniecki
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Akihiro Hayashi
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
AnastasiaStulova
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...
IndicThreads
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness Peter Griffin
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel Awareness
Linaro
 

Similar to Java gpu computing (20)

開放運算&GPU技術研究班
開放運算&GPU技術研究班開放運算&GPU技術研究班
開放運算&GPU技術研究班
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
 
clWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPUclWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPU
 
CUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseCUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" course
 
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time Optimization
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel Awareness
 

Recently uploaded

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 

Recently uploaded (20)

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 

Java gpu computing

  • 1. Java GPU Computing Maarten Steur & Arjan Lamers
  • 2. ● Overzicht OpenCL ● Simpel voorbeeld ● Casus ● Tips & tricks ● Vragen
  • 4. Afkortingen ● CPU, GPU, APU ● Khronos: OpenCL, OpenGL ● Nvidia: CUDA ● JogAmp JOCL, JavaCL, JOCL
  • 5. GPU vergeleken met CPU ● Veel simpele cores ● Veel high bandwidth geheugen ● Intel core i7 GeForce GT 650M 8 cores 384 cores 180 Gflops 650 Gflops
  • 6. Programmeer model ● Definieer stream (flow) ● Run in parallel
  • 7. Gebruik ● Algorithme: – Hoge Concurrency – Partitioneerbaar ● Maar: – Extra latency door on- en offloaden op de GPU – Extra complexiteit
  • 10. Voorbeeld (MacBook Pro) Platform name: Apple Platform profile: FULL_PROFILE Platform spec version: OpenCL 1.2 Platform vendor: Apple Device 16925696 HD Graphics 4000 Driver:1.2(Aug 17 2014 20:29:07) Max work group size:512 Global mem size: 1073741824 Local mem size: 65536 Max clock freq: 1200 Max compute units: 16 Device 16918272 GeForce GT 650M Driver:8.26.28 310.40.55b01 Max work group size:1024 Global mem size: 1073741824 Local mem size: 49152 Max clock freq: 900 Max compute units: 2 Device 4294967295 Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz Driver:1.1 Max work group size:1024 Global mem size: 17179869184 Local mem size: 32768 Max clock freq: 2600 Max compute units: 8
  • 12. Application / Kernel ● Schrijf .cl files in C variant ● Kernels zijn de 'publieke' functies ● Java Bytecode – Aparapi (OpenCL) – RootBeer (CUDA)
  • 14. Parallel sort kernel void sort(global const float* in, global float* out, int size) { int i = get_global_id(0); // current thread float id = in[i]; int pos = 0; for (int j=0;j<size;j++) { float jd = in[j]; // in[j] < in[i] ? bool smaller = (jx < ix) || (jx == ix && j < i); pos += (smaller)?1:0; } out[pos] = id; }
  • 15. Java GPU Computing CLContext globalContext = CLContext.create(); CLDevice device = globalContext.getMaxFlopsDevice(Type.GPU); CLContext context = CLContext.create(device); CLCommandQueue queue = device.createCommandQueue(); CLProgram program = context.createProgram( First8GpuComputing.class.getResourceAsStream("MyTask.cl") ).build(); Je kunt ook builden voor specifieke devices: build(device)
  • 16. Java GPU Computing CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( input.length , READ_ONLY); CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( input.length, WRITE_ONLY); mapToBuffer(inBuffer.getBuffer(), workLoad);
  • 17. Java GPU Computing CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( input.length , READ_ONLY); CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( input.length, WRITE_ONLY); mapToBuffer(inBuffer.getBuffer(), workLoad); CLKernel kernel = program.createCLKernel("MyTask"); kernel.putArgs(inBuffer, outBuffer).putArg(workLoad.length);
  • 18. Java GPU Computing CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( input.length , READ_ONLY); CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( input.length, WRITE_ONLY); mapToBuffer(inBuffer.getBuffer(), workLoad); CLKernel kernel = program.createCLKernel("MyTask"); kernel.putArgs(inBuffer, outBuffer).putArg(workLoad.length); queue.putWriteBuffer(inBuffer, false) .put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize) .putReadBuffer(outBuffer, true); FloatBuffer output = outBuffer.getBuffer();
  • 20. Praktijk casus ● Rekeninstrument ter ondersteuning van de Programmatische Aanpak Stikstof. ● http://www.aerius.nl
  • 23. Tips & tricks ● CL beheer – getResourceAsStream()? – Java constanten → #define – Locale? Oops!
  • 24. Tips & tricks ● Unit testen – Aparte test kernels – Test cases in batches kernel void testDifficultCalculation(const int testCount, global const double* distance, global double* results) { const int testId = get_global_id(0); if (testId < testCount) { results[testId] = difficultCalculation(distance[testId]); } }
  • 25. Direct memory management ● -XX:MaxDirectMemorySize=??M ● ByteBuffer.allocateDirect(int capacity) – Max 2GB per buffer ● Garbage collection te laat – Getriggered door heap collection – Handmatig vrijgeven – ((sun.nio.ch.DirectBuffer) myBuffer).cleaner().clean(); ● VisualVM plugin voor direct buffers
  • 26. GPU vs CPU ● GPU's checken minder dan CPU's – Div by zero – Out of bounds checks – Test eerst op CPU
  • 27. Portabiliteit ● OpenCL is portable, de performance niet – Memory sizes verschillen – Memory latencies verschillen – Work group sizes verschillen – Compute devices verschillen – OpenCL implementatie verschillen ● Develop dus voor de productie hardware
  • 28. Ten slotte ● Float vs Double – Dubbele precisie – Halve performance – Double support optioneel
  • 30. Conclusie ● Wanneer te gebruiken? – Als performance echt nodig is – Als probleem hoge concurrency heeft – Als probleem partitioneerbaar is
  • 31. Vragen? Setting up OpenCL test on Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz Warming up OpenCL test [thread 32003 also had an error][thread 33027 also had an error] ## A fatal error has been detected by the Java Runtime Environment: ## SIGSEGV[thread 32515 also had an error] (0xb)[thread 32771 also had an error] [thread 32259 also had an error] at pc=0x00000001250ded70, pid=99851, tid=29475 ## JRE version: Java(TM) SE Runtime Environment (8.0_20-b26) (build 1.8.0_20-b26) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode bsd-amd64 compressed oops) # Problematic frame: # [thread 17415 also had an error] C [cl_kernels+0x1d70] sort_wrapper+0x1b0 ## Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again ## An error report file with more information is saved as: # /Users/arjanl/Documents/opencl/workspace/opencl-test/jogamp/hs_err_pid99851.log [thread 31763 also had an error] ## If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp #

Editor's Notes

  1. Wij zijn Arjan &amp; Maarten Arjan: software architect, schaalbaarheid en performance interesse Maarten: senior developer, performance en concurrency, 3d interesse
  2. Werkzaam ministerie economische zaken Project Aerius
  3. PAS: programmatische aanpak stikstof Balanceren van milieu en economische ontwikkelingen. Rekeninstrument: monitoren doelstellingen en ondersteuning aanvraag vergunningen
  4. Berekend concentraties/deposities Exporteer voor vergunning aanvraag Vergelijk meerdere situaties OpenCL toepassing: wegverkeer Snelheid van belang ivm wachten
  5. Importeer set bronnen Bereken per bron – rekenpunt Tel resultaten op per rekenpunt Emissie van de weg Afstand tot de weg Windsnelheid Windrichting Ozon concentratie Locatie
  6. Creatief met tekst files OpenCL file inladen + pre-processen Java constanten toevoegen dmv #define Locale 1.0 vs 1,0 Configureerbare opties Tijd voor testen!
  7. Test kernels toevoegen, alleen in test mode. Junit test functie: Buffers met test waarden Buffers met verwachtte resultaten Test → &amp;apos;Uitdagingen&amp;apos; met direct memory
  8. Niet genoeg geheugen → Direct memory size Max 2 GB per buffer Eerste run goed, tweede run faalt? → Garbage Collection getriggered op heap space. Buffer release → geheugen handmatig vrijgeven Sun classes → JVM specifiek Handige tool: plugin voor VisualVM
  9. Division by zero → geen probleem, resultaten waardeloos Lezen/schrijven buiten gealloceerd geheugen? CPU → Crash GPU → Geen probleem (Waarden veranderen per test run) Test eerst op CPU! (Maar nog geen garantie) Nog meer device verschillen...
  10. “OpenCL is portable, de performance niet” OpenCL ook niet altijd portable “Write once, debug anywhere” ? Develop voor productie hardware/drivers
  11. Performance of precisie? Is double echt nodig? Double support optioneel, maar high end meestal wel.
  12. Alleen als de performance nodig is EN Het probleem hoge concurrency vertoont Partioneerbaar meestal handig