Writing GPU-Ready AI Models in Pure Java with Babylon

Writing GPU-Ready AI Models in
Pure Java with Babylon
Ana-Maria Mihalceanu
Senior Developer Advocate
Java Platform Group @ Oracle
Lize Raes
Senior Developer Advocate
Java Platform Group @ Oracle

Copyright © 2025, Oracle and/or its affiliates
2
CPU
• General-purpose
processor with few,
complex cores
• Ideal for serial work
where one operation
depends on the other.
GPU
• Highly parallel
processor with many
simple cores.
• Ideal for parallel work
(SIMD)
and matrix
multiplications.
CU
Shorthand for 'see you'
Cu
Cuprum (copper), highly
conductive metal
Central Processing Unit vs Graphical Processing Unit
Anti-confusion chart

Why GPU?
Massive Parallelism
Thousands of small cores that can perform many
arithmetic operations simultaneously.
High Throughput and Memory Bandwidth
GPU memory hierarchy and bandwidth are optimized for
bulk data operations.
Energy Efficiency for Highly Parallel Workloads
GPUs deliver more performance per watt compared to
scaling CPU clusters.
3 Copyright © 2025, Oracle and/or its affiliates
Source: https://pixabay.com/photos/gpu-graphic-card-pcb-hardware-4885250/

GPU for AI?
Deep Learning models have many layers,
multiplications and inputs. Perfect for GPU!
Source https://www.researchgate.net/publication/378171318_Utilising_Machine_Learning_to_Predict_Myocardial_Infarction_by_Electrocardiogram_Derived_Respiration

From Code to Hardware
Java code traditionally runs on CPU
What does ‘running on the GPU’ imply?
Source Code
(Java)
IR (Bytecode)
Interpreter
(JVM JIT)
Machine Code CPU
Legend:
IR = Intermediate Representation
PTX = Parallel Thread Execution
Going from written code to machine code that runs on vendor-specific hardware
Source Code (e.g.
CUDA/C++)
IR
(eg. PTX)
Runtime/Driver Machine Code GPU
Going from written code to machine code that runs on vendor-specific hardware

Diverse GPU Vendors
6
Vendor IR name (internal compiler IR) Runtime / execution layer
NVIDIA PTX (Parallel Thread Execution) IR CUDA, CUDNN, TensorRT
AMD LLVM IR / GCN ISA* (via ROCm) ROCm, MlOpen, HIP runtime
Intel SPIR-V (Standard Portable IR for Vulkan/OpenCL) oneAPI Level Zero, OpenVINO runtime
Apple AIR
(Apple Intermediate Representation)
Metal Performance Shaders, Core ML
ARM/Mali NIR
(for Mesa stack; IR used in open drivers)
Compute Library, Arm NN
*ISA = Instruction Set Architecture

About Us
Ana-Maria Mihalceanu
Senior Developer Advocate @Oracle
Lize Raes
Senior Developer Advocate @Oracle

RUNTIME
Deep Learning Models
MODEL
• .pt
• .pb
• .onnx
• .gguf
• ...
Graph
(ops + layers)
Weights
IN (img, tokens, …) OUT (cat., tokens, …)
Loads Model
Dispatches Load
to HW
→ PyTorch
→ TensorFlow Runtime
→ ONNX Runtime
→ Llama.cpp

9
ONNX
Open Neural Network
Exchange, format for
sharing AI models +
runtime
Onyx
Dark gemstone, often
black or banded, used in
jewelry and tabletops
Onix
210kg-weighing ground-
type Pokémon shaped
like a stone serpent
Oh niks
Flemish for 'oh nothing'
Open Neural Network Exchange

Open Neural Network eXchange (ONNX)
ONNX
Model
Input
Data
Output
Result
In-Memory
Graph
Graph
Partitioner
Provider
Registry
Parallel, Distributed Graph Runner
Execution Providers
CPU GPU-EP Other
1. Interoperable format for machine-learning models
2. Runtime for executing ONNX models

ONNX and Java
ONNX from Java Perspective
The Java platform knows nothing about ONNX.
Java considers ONNX runtime a foreign (native) library.
Java considers the ONNX programming model a foreign
programming model.

Deploy and Execute an ONNX Model
Demo https://github.com/LizeRaes/babylon/tree/fer
ONNX Native Library
(libonnxruntime.dylib | libonnxruntime.dll |
libonnxruntime.so)
Foreign Function & Memory (FFM) Java
Bindings
ONNX Model
(emotion-ferplus-8.onnx)
Java Client Classification/Probabilities
jextract
(https://jdk.java.net/jextract/)
Memory
Layouts
Var handles
Function
Descriptors
Method
Handles
Image
(.png)

Demo: Running Loads on GPU via
ONNX Runtime, in Java

What’s Inside an ONNX Model?
Model metadata
• ir_version (ONNX spec version)
• producer name (e.g. "pytorch", "skl2onnx")
• opset version (the set of available ONNX operators)
• optional metadata strings (author, domain, description, training info)
Graph structure
• Nodes = operators (Conv, Relu, MatMul, etc.)
• Edges = tensors flowing between nodes
• Each node stores its inputs, outputs, and attributes (e.g. kernel size, stride)
Initializers (weights)
• The learned parameters (weights, biases, embeddings, etc.) are stored as raw tensors
inside the file.
• These can be large chunks of binary data (float32, int64, etc.).
Inputs and outputs
• Names, shapes, and data types of expected model inputs and outputs.
• Example: input is float[1, 1, 64, 264], output is float[1, 8].
https://netron.app/

@CodeReflection helps identify areas of Java source code to reflect over
and give access to as code models at compile time and runtime.
Extend Java Reach to Foreign Programming Models with Project Babylon
func @"f" ()void -> {
%0 : java.io.PrintStream = field.load@"java.lang.System::out()java.io.PrintStream";
%1 : java.lang.String = constant @"Hello !";
invoke %0 %1 @"java.io.PrintStream::print(java.lang.String)void";
return;
};
@CodeReflection
static void f() {
System.out.println("Hello !");
}
Input Java Code
Java Code Model
public static void f();
Code:
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello !
5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
8: return
JVM Bytecode
Reflect
Foreign Code Model
Translate
(eg. autodiff)
Lower

Architecture of Java ONNX Prototype
ONNX runtime (ORT)
Foreign Function & Memory API
Panama ONNX binding
ONNX model authored using
Java ONNX API
JDK
Library
Application
onnxruntime_c_api.h
Java ONNX API &
Code Model Transformer
Java code
Native code
jextract

How to run A Java Code Model on ONNX Runtime
Demo https://github.com/openjdk/babylon/tree/code-reflection/cr-examples/onnx
Java Code Model
Code Reflection API
Java ONNX Script Library
Tensor…
ir.OnnxOp..
ir.OnnxType
compiler.OnnxTransformer..
OnnxRuntime…
FFM Bindings
foreign.OrtApi..
foreign.OrtGenApi
ONNX Runtime ONNX GenAI Runtime
OnnxOperators
ir.OnnxOps
proto.OnnxBuilder..
ONNX
Specs & Sources
OpGen
ProtoGen
jextract

Demo: Running a Java Model
on ONNX Runtime

Java on the GPU

20
Kernel (seed)
Inner edible part of a
grain or nut
Kernel (OS core)
Central part of an
operating system that
manages hardware and
software
Kernel (ML function)
Similarity function,
mathematical tool to
measure similarity by
mapping data into higher
dimensions
Kernel (GPU function)
Small function that runs in
parallel across many threads on
a GPU
Kernel
© Avadhoot Tavhare
Compute
Kernel
Thread 1 Thread 2
Thread 4
Thread 3

Heterogeneous Accelerator Toolkit (HAT)
"Leaning on the work of others from Panama, Babylon and Class-File API"
CPU
GPU or FPGA
JVM
Application code
HAT Programming Model
HAT
Panama + Code Reflection
Native
Java
Application
Library
JDK
GPU library &
kernel compiler
Accelerator
'Jextracted' or Panama FFM Native
Code
Native vendor provided
runtime/framework
Pluggable
Backend

Heterogeneous Accelerator Toolkit (HAT)
"Leaning on the work of others from Panama, Babylon and Class-File API"
CPU
GPU or FPGA
JVM
Application code
HAT Programming Model
HAT
Panama + Code Reflection
Native
Java
Application
Library
JDK
GPU library &
kernel compiler
Accelerator
'Jextracted' or Panama FFM Native
Code
Native vendor provided
runtime/framework
CUDA
OpenCL
LevelZero
HIP
Java
?

What Does HAT Offer?
An NDRange style kernel parallel programming model
• Other programming models (Triton, OpenMP/TornadoVM annotated loops)
could be supported
A compute programming model
• For coordinating multiple kernel dispatches and minimizing buffer transfers
using Java
A pluggable backend abstraction
• GPU vendors can showcase their device capabilities
• 'Pure Java' multi-threaded and sequential backends
Interface mapped/wrapped Panama FFM MemorySegments
• Access to off-heap data via Java friendly accessors
• Data can be efficiently passed between Java and non-Java compute nodes
Application
Heterogeneous Accelerator Toolkit
( HAT)
GPU FPGA
CPU
Panama FFM
Vendor
Native Runtime
Babylon JDK JVM

Access Code Models of Kernels via @CodeReflection
public class Square {
@CodeReflection
public static void kernel(@RO KernelContext kc,
@RW S32Arr s32Arr) {
if (kc.x<kc.maxX){
s32Arr.array(kc.x,
s32Arr.array(kc.x) * s32Arr.array(kc.x));
}
}
}
24
Source: https://jjfumero.github.io/posts/2025/02/07/babylon-and-tornadovm

Heterogeneous Accelerator Toolkit (HAT) Programming Model
public class Square {
@CodeReflection
public static void kernel(@RO KernelContext kc, @RW S32Arr s32Arr) {
if (kc.x<kc.maxX){
s32Arr.array(kc.x, s32Arr.array(kc.x) * s32Arr.array(kc.x));
}
}
@CodeReflection
public static void compute(@RO ComputeContext cc, @RW S32Array s32Array) {
cc.dispatchKernel(s32Array.length(), kc -> kernel(kc, s32Array));
}
}
Accelerator acc = // get a suitable GPU or Java Accelerator
acc.compute(cc -> Square.compute(cc, s32Arr));
Kernel Code
Compute Code
Regular Java Code

Heterogeneous Accelerator Toolkit (HAT) in Action
Demo: https://github.com/openjdk/babylon/tree/code-reflection/hat/examples/violajones
A precomputed Haar Cascade :
• N Stages (one shown)
Stage:
• Tree of Haar Features (three shown)
Each Haar Feature
• 0-3 'rectangles'
• Threshold value
☺
☺
☺
☺

Thank you
Java for AI by Paul Sandoz, Thu 9 Oct @ 9.30, Room 5
ONNX-Based Generative AI LLMs in Java with Project Babylon by Adam Sotona, Thu 9 Oct @ 13.50, Room 9
29

Writing GPU-Ready AI Models in Pure Java with Babylon

More Related Content

Similar to Writing GPU-Ready AI Models in Pure Java with Babylon

More from Ana-Maria Mihalceanu

Recently uploaded

Writing GPU-Ready AI Models in Pure Java with Babylon