Intro To GPU Development
for a Java Developer
Bonus JDK 10 Quick Review (Time Permitting)
Will Iverson
Speaker
• Will Iverson
• Frequent SeaJUG speaker since ~2001
• Professional developer/technologist since 1990
• Diverse background includes…
• Statistical analysis of data from NASA Space Shuttle
• Product management for Apple
• Developer relations for Symantec'sVisual Cafe
• Clients over last two decades include Sun, BEA, Canal+
Technologies, AT&T,T-Mobile, State of Washington & many,
many more…
• 2010-2016, founder of Dev9, premier local consulting firm
MythBusters/Nvidia Demo
GPU Agenda
• Brief History
• Primary Uses
• Conceptual Overview
• Primary Native APIs
• Graphics
• Compute
• Java Developer Frameworks
• Local & Cloud Options
• Challenges
• Suggested Strategies
Disclaimer
• GPU development is huge
• ”Let’s cover software development in an hour…”
• Goals
• Cover lots of things from high level
• Conceptual framework
• Ideas and leads for more research
• If you are an experienced GPU/AI/ML/Crypto/etc dev
• Hold feedback to end
• Please do contribute at end! 
Brief History
CISC CPU
• 8086, 68000
CISC CPU + FPU
• 8086+8087, 68020+68881
RISC CPU
• PowerPC, ARM
CISC as RISC CPU
• Modern Intel/AMD CPU
CISC as RISC + GPU
• Modern PC
First Consumer 3D Cards
(circa 1997)
GPU Advantage
320x240 PS1 640x480 PC w/3dfx
GPU Primary Uses
•3D Graphics
•Specialized Compute
GPU Conceptual Overview
Regular
CPU App
Code/Scripts
for GPU
Video Card
Parallel
Processing
Fast
Dedicated
Memory
Driver
“Compiler”
Data Assets
e.g. 3d
geometry, 2d
texture data
Driver Data
Loading
Video Buffer
Video Output
Driver Data
Retrieval
More
Detail
Wikipedia
NVIDIA
Pascal
GP104
(GTX 1080,
1070)
NVIDIA
Pascal
GP100
(Tesla P100)
Primary Native APIs
•Graphics & Compute
•ProprietaryVendor APIs
•Microsoft, Apple, NVIDIA…
•Open Standard Multiplatform APIs
Khronos Open Standards
Primary Native APIs – Graphics
•3D Development
•OpenGL
•WebGL
•Vulkan
•DirectX (Microsoft)
•Metal (Apple)
OpenGL & Vulkan
WebGL
•OpenGL ES
• Target mobile
• Most of what you need for (non-cutting) edge 3D
•WebGL
• (Basically) OpenGL ES for the Web
• Target rendering to a HTML canvas
• https://www.shadertoy.com/browse
• https://www.shadertoy.com/howto
• https://www.construct.net/
Unity3D & libGDX Supported
Targets
libGDX PrimaryTargets libGDXWIP
Primary Native APIs – Compute
•Pure Computation
•OpenCL
•CUDA (NVIDIA)
OpenCL Development Model
ITCS 6/8010 CUDA
Programming, UNC-Charlotte,
B. Wilkinson
OpenCL Code Examples
https://github.com/bgaster/opencl-book-samples
Java Developer Frameworks
•Graphics
•JOGL
• http://jogamp.org/jogl/www/
•libGDX
• https://libgdx.badlogicgames.com/
• iOS via RoboVM fork
•JMonkeyEngine http://jmonkeyengine.org/
• iOS viaAvian JVM
LibGDX OpenGL Shader
Implementation
https://github.com/libgdx/libgdx
Java Compute Frameworks
•BitCoin!
•BitCoin Management
• Not actually mining
• https://bitcoinj.github.io/
•Java & OpenCL Miner
• https://github.com/Diablo-D3/DiabloMiner
• Abandoned!
Java Compute
•Deep Learning 4J
•https://deeplearning4j.org/gpu
•Deeplearning4j is a Java-based toolkit for building,
training and deploying deep neural networks, the
regressions and KNN.
Deeplearning4j Components
• DataVec performs data ingestion, normalization & transformation into feature vectors
• Deeplearning4j provides tools to configure neural networks & build computation graphs
• DL4J-Examples contains working examples for classification and clustering of images, time series
& text.
• As of 5/12/18, Lombok incompatibility breaks on JDK 10, use JDK 8 instead
• Keras Model Import helps import trained models from Python & Keras to DeepLearning4J & Java.
• ND4J Java access Native Libraries to quickly process Matrix Data on CPUs or GPUs.
• Choose GPUs or native CPUs for your backend linear algebra operations by changing the dependencies
ND4J’s POM.xml file
• CUDA, not OpenCL!
• ScalNet Scala wrapper for Deeplearning4j inspired by Keras.
• Runs on multi-GPUs with Spark.
• RL4J implements Deep Q Learning, A3C and other reinforcement learning algorithms for the JVM.
• Arbiter helps search the hyperparameter space to find the best neural net configuration.
DeepLearning4j
https://github.com/deeplearning4j/dl4j-examples.git
Local & Cloud Options
•NVIDIA cards support OpenCL & CUDA
•Mac OS X, eGPU…
• https://github.com/marnovo/macOS-eGPU-CUDA-
guide
•NVIDIA Product Line Exploding
•http://www.nvidia.com/page/home.html
AWS
•https://aws.amazon.com/blogs/aws/new-
amazon-ec2-instances-with-up-to-8-nvidia-
tesla-v100-gpus-p3/
AWS ML Services
Google GPU
Google Cloud
ML, AI, Big Data, etc…
Tiny Subset of
Services…
Just learning what
they all do would
take
Challenges
•Very difficult to predict & manage performance
•Could see 10x or 100x perf gains
•…or not.
•One small change could blow up parallel
execution performance
•Relatively difficult to test in advance
Suggested Strategies
• GPU Shaders
• Very specialized, visual effects
• Mock up in Photoshop, Motion, Final Cut, etc.
• Look to existing implementations & tweak
• Compute
• Think of kernels as specialized drivers, or stored procs, or
whatever
• Specialist field
• Existing kernels where possible
• Get really clear about modeling data movement
• Get really clear about how minor algo tweeks can blow things up
Interested in AI/ML?
•Start with statistics!
•Existing off-the-shelf/in-the-cloud tools
Other Potential Uses
•http://gkrypt.com/
•Java SDK, wraps GPU encryption/decryption
•AES on AMD & NVIDIA
•Let’s talk about Rootbeer & JVM-On-GPU
Even More NVIDIA Targets…
GPU Uses
Other Uses
•Deep Learning
• https://databricks.com/blog/2016/10/27/gpu-
acceleration-in-databricks.html
•TensorFlow
• https://www.tensorflow.org/versions/master/install/ins
tall_java#gpu_support
Java Unique Solutions
• https://stackoverflow.com/questions/22866901/using-java-with-
nvidia-gpus-cuda
• https://github.com/aparapi/aparapi/blob/master/doc/UsersGuide.md
• http://www.jcuda.org/tutorial/TutorialIndex.html
• https://www.codeproject.com/Articles/86551/Part-Programming-
your-Graphics-Card-GPU-with-Jav
• http://aparapi.com/
• OpenJDK Sumatra
• http://openjdk.java.net/projects/sumatra/
• https://github.com/aparapi/aparapi
What About Compiling to GPU?
• Fundamental problem
• JVM emulates a traditional CPU
• Probably a bad general solution fit
• Too many differences
• Reminds me of bad ORM abstractions
• Seems to be simplifying, actually making things horrible
• What is the purpose?
• Lots of fast parallel data processing
• IBM inline GPU
• https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.
0/com.ibm.java.lnx.80.doc/diag/understanding/gpu_jit.html
Cloud Execution Options
• AWS…
• Dedicated cloud ”GPU” systems
• GPUs with no… graphics output
• https://aws.amazon.com/ec2/elastic-gpus/
• Or… use higher level APIs focused on task
• https://cloud.google.com/gpu/
• Complicated math to figure out approach
• Data transfer costs
• For learning, check out already uploaded public data sets
• Pricing is impacted by things like cyrptomining
Some additional notes
•https://skillsmatter.com/skillscasts/8457-java-
gpu-all-you-need-to-know
JDK 10
•https://dzone.com/articles/109-new-features-in-
jdk-
10?edition=366203&utm_source=Daily%20Dige
st&utm_medium=email&utm_campaign=Daily
%20Digest%202018-03-05
Cloud Qubit Computing

SeaJUG 5 15-2018

Editor's Notes

  • #7 CPU Introduced Focus on simple integer processing Floating point added later For a brief time, floating point coprocessors Simple single threaded model Multi-threading “hacked” in later GPU introduced Lots and lots of transistors Unlike CPU, GPU just keeps adding cores CPU, multithreading as afterthought GPU, multicore as… uhh… core. Totally different programming model Future History Specialized, different dev model CPU, GPU, Q-Bit…?
  • #11 Key points Drivers are a lot more complicated than simple memory mapping and event triggers Effectively, operating systems, compilers, support for multiple APIs Huge variety in capabilities, including specialist support for various image and data formats This is why an NVIDIA driver update may weigh in at 500MB – closer to a giant OS, with lots and lots of legacy system support Drivers appear to create their own IR format for the various supported APIs, which is then processed by the video card Easy to imagine tweaking hardware for different use cases. For example, no need for the video output for CPU only tasks Also easy to imagine tweaking for different uses. For example, less memory and more processing for AI, crypto Leaky abstraction – those blue arrows are (relatively) slow bus movement. Still wicked, wicked fast but (relatively) slow
  • #25 libgdx/gdx/src/com/badlogic/gdx/graphics/g3d/shaders/depth.fragment.glsl libgdx/gdx/src/com/badlogic/gdx/graphics/g3d/shaders/default.vertex.glsl