Writing GPU-Ready AI Models in Pure Java with Babylon

Writing GPU-Ready AI Models
in Pure Java with Babylon
Ana-Maria Mihalceanu
Senior Developer Advocate
Java Platform Group @ Oracle
Lize Raes
Senior Developer Advocate
Java Platform Group @ Oracle

Copyright © 2025, Oracle and/or its affiliates
2
CPU
• General-purpose
processor with few,
complex cores
• Ideal for serial work
where one operation
depends on the other.
GPU
• Highly parallel
processor with many
simple cores.
• Ideal for parallel work
(SIMD)
and matrix
multiplications.
CU
Shorthand for 'see you'
Cu
Cuprum (copper), highly
conductive metal
Central Processing Unit vs Graphical Processing Unit
Anti-confusion chart

Why GPU?
Massive Parallelism
Thousands of small cores that can perform many
arithmetic operations simultaneously.
High Throughput and Memory Bandwidth
GPU memory hierarchy and bandwidth are optimized for
bulk data operations.
Energy Efficiency for Highly Parallel Workloads
GPUs deliver more performance per watt compared to
scaling CPU clusters.
3 Copyright © 2025, Oracle and/or its affiliates
Source: https://pixabay.com/photos/gpu-graphic-card-pcb-hardware-4885250/

From Code to Hardware
Java code traditionally runs on CPU
What does ‘running on the GPU’ imply?
Source Code
(Java)
IR (Bytecode)
Interpreter
(JVM JIT)
Machine Code CPU
Legend:
IR = Intermediate Representation
PTX = Parallel Thread Execution
Going from written code to machine code that runs on vendor-specific hardware
Source Code (e.g.
CUDA/C++)
IR
(eg. PTX)
Runtime/Driver Machine Code GPU
Going from written code to machine code that runs on vendor-specific hardware

About Us
Ana-Maria Mihalceanu
Senior Developer Advocate @Oracle
Lize Raes
Senior Developer Advocate @Oracle

Babylon HAT
@Kernel, @Compute
Babylon Code Reflection
@Reflection
Talk Overview
Building blocks to access the GPU in different ways
GPU
ONNX runtime (native)
Panama bindings for main
runtime methods
.onnx ML model
GPU
Panama bindings for ML
operators
Java ML model
GPU
Accelerator-aware Java
code

RUNTIME
Deep Learning Models
MODEL
• .pt
• .pb
• .onnx
• .gguf
• ...
Graph
(ops + layers)
Weights
IN (img, tokens, …) OUT (cat., tokens, …)
Loads Model
Dispatches Load
to HW
→ PyTorch
→ TensorFlow Runtime
→ ONNX Runtime
→ Llama.cpp

8
ONNX
Open Neural Network
Exchange, format for
sharing AI models +
runtime
Onyx
Dark gemstone, often
black or banded, used in
jewelry and tabletops
Onix
210kg-weighing ground-
type Pokemon shaped
like a stone serpent
Oh niks
Dutch for 'oh nothing'
Open Neural Network Exchange

ONNX and Java
ONNX from Java Perspective
The Java platform knows nothing about ONNX.
Java considers ONNX runtime a foreign (native) library.
Java considers the ONNX programming model a foreign
programming model.

Deploy and Execute an ONNX Model
Image
(.png)
Java Client Classification/Probabilities
ONNX Model
(emotion-ferplus-8.onnx)
GPU
(libonnxruntime.dylib | libonnxruntime.dll |
libonnxruntime.so)
Panama bindings for main runtime
methods
runtime.createSession
inference.run
…

Demo: Run Loads on GPU via
ONNX Runtime, in Java
github.com/LizeRaes/babylon/tree/fer

@CodeReflection helps identify areas of Java source code to reflect over
and give access to as code models at compile time and runtime.
Extend Java Reach to Foreign Programming Models with Project Babylon
func @"f" ()void -> {
%0 : java.io.PrintStream = field.load@"java.lang.System::out()java.io.PrintStream";
%1 : java.lang.String = constant @"Hello !";
invoke %0 %1 @"java.io.PrintStream::print(java.lang.String)void";
return;
};
@CodeReflection
static void f() {
System.out.println("Hello !");
}
Input Java Code
Java Code Model
public static void f();
Code:
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello !
5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
8: return
JVM Bytecode
Reflect
Foreign Code Model
Translate
(eg. autodiff)
Lower

What’s Inside an ONNX Model
Model metadata
• version, description, ...
Graph structure
• operators and tensors
Initializers (weights)
• can be large chunks of binary data (float32, int64, etc.)
Inputs and outputs
• Example: input is float[1, 1, 64, 264], output is float[1, 8].
https://netron.app/

Operators
Using ONNX Operators in Java
Pre-generated (come out-of-the-box with Code Reflection)
Conv (input, weights, ...)
Gemm (matrixA, matrixB, ...)
Relu (tensor)
...
Extract operators from
ONNX schema
(done already)
Operators are recognised
and transformed by Code
Reflection

Architecture of Java ONNX Prototype
ONNX runtime (ORT)
Foreign Function & Memory API
Panama ONNX binding
ONNX model authored using
Java ONNX API
JDK
Library
Application
onnxruntime_c_api.h
Java ONNX API &
Code Model Transformer
Java code
Native code
jextract

Demo: Execute a Java Model
on ONNX Runtime
github.com/openjdk/babylon/tree/code-reflection/cr-examples/onnx

Java on the GPU

Heterogeneous Accelerator Toolkit (HAT)
"Leaning on the work of others from Panama, Babylon and Class-File API"
CPU
GPU or FPGA
JVM
Application code
HAT Programming Model
HAT
Panama + Code Reflection
Native
Java
Application
Library
JDK
GPU library &
kernel compiler
Accelerator
'Jextracted' or Panama FFM Native
Code
Native vendor provided
runtime/framework
Pluggable
Backend

Heterogeneous Accelerator Toolkit (HAT)
"Leaning on the work of others from Panama, Babylon and Class-File API"
CPU
GPU or FPGA
JVM
Application code
HAT
Panama + Code Reflection
Native
Java
Application
Library
JDK
GPU library &
kernel compiler
Accelerator
'Jextracted' or Panama FFM Native
Code
Native vendor provided
runtime/framework
CUDA
OpenCL
LevelZero
HIP
Java
?

20
Kernel (seed)
Inner edible part of a
grain or nut
Kernel (OS core)
Central part of an
operating system that
manages hardware and
software
Kernel (ML function)
Similarity function,
mathematical tool to
measure similarity by
mapping data into higher
dimensions
Kernel (GPU function)
Small function that runs in
parallel across many threads on
a GPU
Kernel
© Avadhoot Tavhare
Compute
Kernel
Thread 1 Thread 2
Thread 4
Thread 3

Accelerator acc;
Getting a suitable GPU or accelerator and launching compute jobs
Kernel Code
Compute Code
Regular Java Code
S32Array s32Array;
ComputeContext cc;
Allocate GPU memory segments and orchestrates which kernels to run
KernelContext kc
Minimal function to be run in parallel, once per GPU core
* runs in JVM, on CPU
* can dispatch to GPU
* runs on GPU

Writing Accelerator-Aware Code with HAT
@CodeReflection
public static void kernel(@RO KernelContext kc, @RW S32Arr s32Arr) {
if (kc.x<kc.maxX){
s32Arr.array(kc.x, s32Arr.array(kc.x) * s32Arr.array(kc.x));
}
}
Accelerator acc = // get a suitable GPU or Java Accelerator
acc.compute(cc -> Square.compute(cc, s32Arr));
What code should run in parallel?
What kernel should be executed on
what data range?
Get the suitable accelerator and
launch the compute
@CodeReflection
public static void compute(@RO ComputeContext cc, @RW S32Array s32Array) {
cc.dispatchKernel(s32Array.length(), kc -> kernel(kc, s32Array));
}
Kernel Code
Compute Code
Regular Java Code
Example for squaring each value in an Array

Access Code Models of Kernels via @CodeReflection
public class Square {
@CodeReflection
public static void kernel(@RO KernelContext kc,
@RW S32Arr s32Arr) {
if (kc.x<kc.maxX){
s32Arr.array(kc.x,
s32Arr.array(kc.x) * s32Arr.array(kc.x));
}
}
}
23
Source: https://jjfumero.github.io/posts/2025/02/07/babylon-and-tornadovm

Heterogeneous Accelerator Toolkit (HAT) in Action
Demo: https://github.com/openjdk/babylon/tree/code-reflection/hat/examples/violajones
A precomputed Haar Cascade :
• N Stages (one shown)
Stage:
• Tree of Haar Features (three shown)
Each Haar Feature
• 0-3 'rectangles'
• Threshold value
☺
☺
☺
☺

Babylon HAT
@Kernel, @Compute
Babylon Code Reflection
@Reflection
Summary
Accessing the GPU in different ways
GPU
Panama bindings for main
runtime methods
.onnx ML model
GPU
Panama bindings for ML
operators
Java ML model
GPU
Accelerator-aware Java
code
goal: libs with Java models for tweaking +
training (long term)
goal: libs with GPU-adapted algos
and allows to write accelerator-aware
code
goal: libs that let you load and execute
any model type
* experimental, requires
Babylon build
* experimental, requires
Babylon build
* runnable on JDK 25+

Thank you
26

Writing GPU-Ready AI Models in Pure Java with Babylon

More Related Content

Similar to Writing GPU-Ready AI Models in Pure Java with Babylon

More from Ana-Maria Mihalceanu

Recently uploaded

Writing GPU-Ready AI Models in Pure Java with Babylon