Writing GPU-Ready AI Models
in Pure Java with Babylon
Ana-Maria Mihalceanu
Senior Developer Advocate
Java Platform Group @ Oracle
Lize Raes
Senior Developer Advocate
Java Platform Group @ Oracle
Copyright © 2025, Oracle and/or its affiliates
2
CPU
• General-purpose
processor with few,
complex cores
• Ideal for serial work
where one operation
depends on the other.
GPU
• Highly parallel
processor with many
simple cores.
• Ideal for parallel work
(SIMD)
and matrix
multiplications.
CU
Shorthand for 'see you'
Cu
Cuprum (copper), highly
conductive metal
Central Processing Unit vs Graphical Processing Unit
Anti-confusion chart
Why GPU?
Massive Parallelism
Thousands of small cores that can perform many
arithmetic operations simultaneously.
High Throughput and Memory Bandwidth
GPU memory hierarchy and bandwidth are optimized for
bulk data operations.
Energy Efficiency for Highly Parallel Workloads
GPUs deliver more performance per watt compared to
scaling CPU clusters.
3 Copyright © 2025, Oracle and/or its affiliates
Source: https://pixabay.com/photos/gpu-graphic-card-pcb-hardware-4885250/
From Code to Hardware
Java code traditionally runs on CPU
What does ‘running on the GPU’ imply?
4 Copyright © 2025, Oracle and/or its affiliates
Source Code
(Java)
IR (Bytecode)
Interpreter
(JVM JIT)
Machine Code CPU
Legend:
IR = Intermediate Representation
PTX = Parallel Thread Execution
Going from written code to machine code that runs on vendor-specific hardware
Source Code (e.g.
CUDA/C++)
IR
(eg. PTX)
Runtime/Driver Machine Code GPU
Going from written code to machine code that runs on vendor-specific hardware
About Us
Ana-Maria Mihalceanu
Senior Developer Advocate @Oracle
Lize Raes
Senior Developer Advocate @Oracle
5 Copyright © 2025, Oracle and/or its affiliates
Babylon HAT
@Kernel, @Compute
Babylon Code Reflection
@Reflection
Talk Overview
Building blocks to access the GPU in different ways
6 Copyright © 2025, Oracle and/or its affiliates
GPU
ONNX runtime (native)
Panama bindings for main
runtime methods
.onnx ML model
GPU
ONNX runtime (native)
Panama bindings for ML
operators
Java ML model
GPU
Accelerator-aware Java
code
RUNTIME
Deep Learning Models
7 Copyright © 2025, Oracle and/or its affiliates
MODEL
• .pt
• .pb
• .onnx
• .gguf
• ...
Graph
(ops + layers)
Weights
IN (img, tokens, …) OUT (cat., tokens, …)
Loads Model
Dispatches Load
to HW
→ PyTorch
→ TensorFlow Runtime
→ ONNX Runtime
→ Llama.cpp
Copyright © 2025, Oracle and/or its affiliates
8
ONNX
Open Neural Network
Exchange, format for
sharing AI models +
runtime
Onyx
Dark gemstone, often
black or banded, used in
jewelry and tabletops
Onix
210kg-weighing ground-
type Pokemon shaped
like a stone serpent
Oh niks
Dutch for 'oh nothing'
Open Neural Network Exchange
Anti-confusion chart
ONNX and Java
ONNX from Java Perspective
The Java platform knows nothing about ONNX.
Java considers ONNX runtime a foreign (native) library.
Java considers the ONNX programming model a foreign
programming model.
9 Copyright © 2025, Oracle and/or its affiliates
Deploy and Execute an ONNX Model
10 Copyright © 2025, Oracle and/or its affiliates
Image
(.png)
Java Client Classification/Probabilities
ONNX Model
(emotion-ferplus-8.onnx)
GPU
ONNX runtime (native)
(libonnxruntime.dylib | libonnxruntime.dll |
libonnxruntime.so)
Panama bindings for main runtime
methods
runtime.createSession
inference.run
…
Demo: Run Loads on GPU via
ONNX Runtime, in Java
11 Copyright © 2025, Oracle and/or its affiliates
github.com/LizeRaes/babylon/tree/fer
@CodeReflection helps identify areas of Java source code to reflect over
and give access to as code models at compile time and runtime.
Extend Java Reach to Foreign Programming Models with Project Babylon
12 Copyright © 2025, Oracle and/or its affiliates
func @"f" ()void -> {
%0 : java.io.PrintStream = field.load@"java.lang.System::out()java.io.PrintStream";
%1 : java.lang.String = constant @"Hello !";
invoke %0 %1 @"java.io.PrintStream::print(java.lang.String)void";
return;
};
@CodeReflection
static void f() {
System.out.println("Hello !");
}
Input Java Code
Java Code Model
public static void f();
Code:
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello !
5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
8: return
JVM Bytecode
Reflect
Foreign Code Model
Translate
(eg. autodiff)
Lower
What’s Inside an ONNX Model
Model metadata
• version, description, ...
Graph structure
• operators and tensors
Initializers (weights)
• can be large chunks of binary data (float32, int64, etc.)
Inputs and outputs
• Example: input is float[1, 1, 64, 264], output is float[1, 8].
13 Copyright © 2025, Oracle and/or its affiliates
https://netron.app/
Operators
Using ONNX Operators in Java
Pre-generated (come out-of-the-box with Code Reflection)
14 Copyright © 2025, Oracle and/or its affiliates
Conv (input, weights, ...)
Gemm (matrixA, matrixB, ...)
Relu (tensor)
...
Extract operators from
ONNX schema
(done already)
Operators are recognised
and transformed by Code
Reflection
Architecture of Java ONNX Prototype
15 Copyright © 2025, Oracle and/or its affiliates
ONNX runtime (ORT)
Foreign Function & Memory API
Panama ONNX binding
ONNX model authored using
Java ONNX API
JDK
Library
Application
onnxruntime_c_api.h
Java ONNX API &
Code Model Transformer
Java code
Native code
jextract
Demo: Execute a Java Model
on ONNX Runtime
16 Copyright © 2025, Oracle and/or its affiliates
github.com/openjdk/babylon/tree/code-reflection/cr-examples/onnx
Java on the GPU
17 Copyright © 2025, Oracle and/or its affiliates
Heterogeneous Accelerator Toolkit (HAT)
"Leaning on the work of others from Panama, Babylon and Class-File API"
18 Copyright © 2025, Oracle and/or its affiliates
CPU
GPU or FPGA
JVM
Application code
HAT Programming Model
HAT
Panama + Code Reflection
Native
Java
Application
Library
JDK
GPU library &
kernel compiler
Accelerator
'Jextracted' or Panama FFM Native
Code
Native vendor provided
runtime/framework
Pluggable
Backend
Heterogeneous Accelerator Toolkit (HAT)
"Leaning on the work of others from Panama, Babylon and Class-File API"
19 Copyright © 2025, Oracle and/or its affiliates
CPU
GPU or FPGA
JVM
Application code
HAT Programming Model
HAT
Panama + Code Reflection
Native
Java
Application
Library
JDK
GPU library &
kernel compiler
Accelerator
'Jextracted' or Panama FFM Native
Code
Native vendor provided
runtime/framework
CUDA
OpenCL
LevelZero
HIP
Java
?
Copyright © 2025, Oracle and/or its affiliates
20
Kernel (seed)
Inner edible part of a
grain or nut
Kernel (OS core)
Central part of an
operating system that
manages hardware and
software
Kernel (ML function)
Similarity function,
mathematical tool to
measure similarity by
mapping data into higher
dimensions
Kernel (GPU function)
Small function that runs in
parallel across many threads on
a GPU
Kernel
Anti-confusion chart
© Avadhoot Tavhare
Compute
Kernel
Thread 1 Thread 2
Thread 4
Thread 3
HAT Programming Model
21 Copyright © 2025, Oracle and/or its affiliates
Accelerator acc;
Getting a suitable GPU or accelerator and launching compute jobs
Kernel Code
Compute Code
Regular Java Code
S32Array s32Array;
ComputeContext cc;
Allocate GPU memory segments and orchestrates which kernels to run
KernelContext kc
Minimal function to be run in parallel, once per GPU core
* runs in JVM, on CPU
* can dispatch to GPU
* runs on GPU
Writing Accelerator-Aware Code with HAT
@CodeReflection
public static void kernel(@RO KernelContext kc, @RW S32Arr s32Arr) {
if (kc.x<kc.maxX){
s32Arr.array(kc.x, s32Arr.array(kc.x) * s32Arr.array(kc.x));
}
}
22 Copyright © 2025, Oracle and/or its affiliates
Accelerator acc = // get a suitable GPU or Java Accelerator
acc.compute(cc -> Square.compute(cc, s32Arr));
What code should run in parallel?
What kernel should be executed on
what data range?
Get the suitable accelerator and
launch the compute
@CodeReflection
public static void compute(@RO ComputeContext cc, @RW S32Array s32Array) {
cc.dispatchKernel(s32Array.length(), kc -> kernel(kc, s32Array));
}
Kernel Code
Compute Code
Regular Java Code
Example for squaring each value in an Array
Access Code Models of Kernels via @CodeReflection
Copyright © 2025, Oracle and/or its affiliates
public class Square {
@CodeReflection
public static void kernel(@RO KernelContext kc,
@RW S32Arr s32Arr) {
if (kc.x<kc.maxX){
s32Arr.array(kc.x,
s32Arr.array(kc.x) * s32Arr.array(kc.x));
}
}
}
23
Source: https://jjfumero.github.io/posts/2025/02/07/babylon-and-tornadovm
Heterogeneous Accelerator Toolkit (HAT) in Action
Demo: https://github.com/openjdk/babylon/tree/code-reflection/hat/examples/violajones
24 Copyright © 2025, Oracle and/or its affiliates
A precomputed Haar Cascade :
• N Stages (one shown)
Stage:
• Tree of Haar Features (three shown)
Each Haar Feature
• 0-3 'rectangles'
• Threshold value
☺
☺
☺
☺
Babylon HAT
@Kernel, @Compute
Babylon Code Reflection
@Reflection
Summary
Accessing the GPU in different ways
25 Copyright © 2025, Oracle and/or its affiliates
GPU
ONNX runtime (native)
Panama bindings for main
runtime methods
.onnx ML model
GPU
ONNX runtime (native)
Panama bindings for ML
operators
Java ML model
GPU
Accelerator-aware Java
code
goal: libs with Java models for tweaking +
training (long term)
goal: libs with GPU-adapted algos
and allows to write accelerator-aware
code
goal: libs that let you load and execute
any model type
* experimental, requires
Babylon build
* experimental, requires
Babylon build
* runnable on JDK 25+
Thank you
Copyright © 2025, Oracle and/or its affiliates
26

Writing GPU-Ready AI Models in Pure Java with Babylon

  • 1.
    Writing GPU-Ready AIModels in Pure Java with Babylon Ana-Maria Mihalceanu Senior Developer Advocate Java Platform Group @ Oracle Lize Raes Senior Developer Advocate Java Platform Group @ Oracle
  • 2.
    Copyright © 2025,Oracle and/or its affiliates 2 CPU • General-purpose processor with few, complex cores • Ideal for serial work where one operation depends on the other. GPU • Highly parallel processor with many simple cores. • Ideal for parallel work (SIMD) and matrix multiplications. CU Shorthand for 'see you' Cu Cuprum (copper), highly conductive metal Central Processing Unit vs Graphical Processing Unit Anti-confusion chart
  • 3.
    Why GPU? Massive Parallelism Thousandsof small cores that can perform many arithmetic operations simultaneously. High Throughput and Memory Bandwidth GPU memory hierarchy and bandwidth are optimized for bulk data operations. Energy Efficiency for Highly Parallel Workloads GPUs deliver more performance per watt compared to scaling CPU clusters. 3 Copyright © 2025, Oracle and/or its affiliates Source: https://pixabay.com/photos/gpu-graphic-card-pcb-hardware-4885250/
  • 4.
    From Code toHardware Java code traditionally runs on CPU What does ‘running on the GPU’ imply? 4 Copyright © 2025, Oracle and/or its affiliates Source Code (Java) IR (Bytecode) Interpreter (JVM JIT) Machine Code CPU Legend: IR = Intermediate Representation PTX = Parallel Thread Execution Going from written code to machine code that runs on vendor-specific hardware Source Code (e.g. CUDA/C++) IR (eg. PTX) Runtime/Driver Machine Code GPU Going from written code to machine code that runs on vendor-specific hardware
  • 5.
    About Us Ana-Maria Mihalceanu SeniorDeveloper Advocate @Oracle Lize Raes Senior Developer Advocate @Oracle 5 Copyright © 2025, Oracle and/or its affiliates
  • 6.
    Babylon HAT @Kernel, @Compute BabylonCode Reflection @Reflection Talk Overview Building blocks to access the GPU in different ways 6 Copyright © 2025, Oracle and/or its affiliates GPU ONNX runtime (native) Panama bindings for main runtime methods .onnx ML model GPU ONNX runtime (native) Panama bindings for ML operators Java ML model GPU Accelerator-aware Java code
  • 7.
    RUNTIME Deep Learning Models 7Copyright © 2025, Oracle and/or its affiliates MODEL • .pt • .pb • .onnx • .gguf • ... Graph (ops + layers) Weights IN (img, tokens, …) OUT (cat., tokens, …) Loads Model Dispatches Load to HW → PyTorch → TensorFlow Runtime → ONNX Runtime → Llama.cpp
  • 8.
    Copyright © 2025,Oracle and/or its affiliates 8 ONNX Open Neural Network Exchange, format for sharing AI models + runtime Onyx Dark gemstone, often black or banded, used in jewelry and tabletops Onix 210kg-weighing ground- type Pokemon shaped like a stone serpent Oh niks Dutch for 'oh nothing' Open Neural Network Exchange Anti-confusion chart
  • 9.
    ONNX and Java ONNXfrom Java Perspective The Java platform knows nothing about ONNX. Java considers ONNX runtime a foreign (native) library. Java considers the ONNX programming model a foreign programming model. 9 Copyright © 2025, Oracle and/or its affiliates
  • 10.
    Deploy and Executean ONNX Model 10 Copyright © 2025, Oracle and/or its affiliates Image (.png) Java Client Classification/Probabilities ONNX Model (emotion-ferplus-8.onnx) GPU ONNX runtime (native) (libonnxruntime.dylib | libonnxruntime.dll | libonnxruntime.so) Panama bindings for main runtime methods runtime.createSession inference.run …
  • 11.
    Demo: Run Loadson GPU via ONNX Runtime, in Java 11 Copyright © 2025, Oracle and/or its affiliates github.com/LizeRaes/babylon/tree/fer
  • 12.
    @CodeReflection helps identifyareas of Java source code to reflect over and give access to as code models at compile time and runtime. Extend Java Reach to Foreign Programming Models with Project Babylon 12 Copyright © 2025, Oracle and/or its affiliates func @"f" ()void -> { %0 : java.io.PrintStream = field.load@"java.lang.System::out()java.io.PrintStream"; %1 : java.lang.String = constant @"Hello !"; invoke %0 %1 @"java.io.PrintStream::print(java.lang.String)void"; return; }; @CodeReflection static void f() { System.out.println("Hello !"); } Input Java Code Java Code Model public static void f(); Code: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3 // String Hello ! 5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V 8: return JVM Bytecode Reflect Foreign Code Model Translate (eg. autodiff) Lower
  • 13.
    What’s Inside anONNX Model Model metadata • version, description, ... Graph structure • operators and tensors Initializers (weights) • can be large chunks of binary data (float32, int64, etc.) Inputs and outputs • Example: input is float[1, 1, 64, 264], output is float[1, 8]. 13 Copyright © 2025, Oracle and/or its affiliates https://netron.app/
  • 14.
    Operators Using ONNX Operatorsin Java Pre-generated (come out-of-the-box with Code Reflection) 14 Copyright © 2025, Oracle and/or its affiliates Conv (input, weights, ...) Gemm (matrixA, matrixB, ...) Relu (tensor) ... Extract operators from ONNX schema (done already) Operators are recognised and transformed by Code Reflection
  • 15.
    Architecture of JavaONNX Prototype 15 Copyright © 2025, Oracle and/or its affiliates ONNX runtime (ORT) Foreign Function & Memory API Panama ONNX binding ONNX model authored using Java ONNX API JDK Library Application onnxruntime_c_api.h Java ONNX API & Code Model Transformer Java code Native code jextract
  • 16.
    Demo: Execute aJava Model on ONNX Runtime 16 Copyright © 2025, Oracle and/or its affiliates github.com/openjdk/babylon/tree/code-reflection/cr-examples/onnx
  • 17.
    Java on theGPU 17 Copyright © 2025, Oracle and/or its affiliates
  • 18.
    Heterogeneous Accelerator Toolkit(HAT) "Leaning on the work of others from Panama, Babylon and Class-File API" 18 Copyright © 2025, Oracle and/or its affiliates CPU GPU or FPGA JVM Application code HAT Programming Model HAT Panama + Code Reflection Native Java Application Library JDK GPU library & kernel compiler Accelerator 'Jextracted' or Panama FFM Native Code Native vendor provided runtime/framework Pluggable Backend
  • 19.
    Heterogeneous Accelerator Toolkit(HAT) "Leaning on the work of others from Panama, Babylon and Class-File API" 19 Copyright © 2025, Oracle and/or its affiliates CPU GPU or FPGA JVM Application code HAT Programming Model HAT Panama + Code Reflection Native Java Application Library JDK GPU library & kernel compiler Accelerator 'Jextracted' or Panama FFM Native Code Native vendor provided runtime/framework CUDA OpenCL LevelZero HIP Java ?
  • 20.
    Copyright © 2025,Oracle and/or its affiliates 20 Kernel (seed) Inner edible part of a grain or nut Kernel (OS core) Central part of an operating system that manages hardware and software Kernel (ML function) Similarity function, mathematical tool to measure similarity by mapping data into higher dimensions Kernel (GPU function) Small function that runs in parallel across many threads on a GPU Kernel Anti-confusion chart © Avadhoot Tavhare Compute Kernel Thread 1 Thread 2 Thread 4 Thread 3
  • 21.
    HAT Programming Model 21Copyright © 2025, Oracle and/or its affiliates Accelerator acc; Getting a suitable GPU or accelerator and launching compute jobs Kernel Code Compute Code Regular Java Code S32Array s32Array; ComputeContext cc; Allocate GPU memory segments and orchestrates which kernels to run KernelContext kc Minimal function to be run in parallel, once per GPU core * runs in JVM, on CPU * can dispatch to GPU * runs on GPU
  • 22.
    Writing Accelerator-Aware Codewith HAT @CodeReflection public static void kernel(@RO KernelContext kc, @RW S32Arr s32Arr) { if (kc.x<kc.maxX){ s32Arr.array(kc.x, s32Arr.array(kc.x) * s32Arr.array(kc.x)); } } 22 Copyright © 2025, Oracle and/or its affiliates Accelerator acc = // get a suitable GPU or Java Accelerator acc.compute(cc -> Square.compute(cc, s32Arr)); What code should run in parallel? What kernel should be executed on what data range? Get the suitable accelerator and launch the compute @CodeReflection public static void compute(@RO ComputeContext cc, @RW S32Array s32Array) { cc.dispatchKernel(s32Array.length(), kc -> kernel(kc, s32Array)); } Kernel Code Compute Code Regular Java Code Example for squaring each value in an Array
  • 23.
    Access Code Modelsof Kernels via @CodeReflection Copyright © 2025, Oracle and/or its affiliates public class Square { @CodeReflection public static void kernel(@RO KernelContext kc, @RW S32Arr s32Arr) { if (kc.x<kc.maxX){ s32Arr.array(kc.x, s32Arr.array(kc.x) * s32Arr.array(kc.x)); } } } 23 Source: https://jjfumero.github.io/posts/2025/02/07/babylon-and-tornadovm
  • 24.
    Heterogeneous Accelerator Toolkit(HAT) in Action Demo: https://github.com/openjdk/babylon/tree/code-reflection/hat/examples/violajones 24 Copyright © 2025, Oracle and/or its affiliates A precomputed Haar Cascade : • N Stages (one shown) Stage: • Tree of Haar Features (three shown) Each Haar Feature • 0-3 'rectangles' • Threshold value ☺ ☺ ☺ ☺
  • 25.
    Babylon HAT @Kernel, @Compute BabylonCode Reflection @Reflection Summary Accessing the GPU in different ways 25 Copyright © 2025, Oracle and/or its affiliates GPU ONNX runtime (native) Panama bindings for main runtime methods .onnx ML model GPU ONNX runtime (native) Panama bindings for ML operators Java ML model GPU Accelerator-aware Java code goal: libs with Java models for tweaking + training (long term) goal: libs with GPU-adapted algos and allows to write accelerator-aware code goal: libs that let you load and execute any model type * experimental, requires Babylon build * experimental, requires Babylon build * runnable on JDK 25+
  • 26.
    Thank you Copyright ©2025, Oracle and/or its affiliates 26