Writing GPU-Ready AI Models in
Pure Java with Babylon
Ana-Maria Mihalceanu
Senior Developer Advocate
Java Platform Group @ Oracle
Lize Raes
Senior Developer Advocate
Java Platform Group @ Oracle
Copyright © 2025, Oracle and/or its affiliates
2
CPU
• General-purpose
processor with few,
complex cores
• Ideal for serial work
where one operation
depends on the other.
GPU
• Highly parallel
processor with many
simple cores.
• Ideal for parallel work
(SIMD)
and matrix
multiplications.
CU
Shorthand for 'see you'
Cu
Cuprum (copper), highly
conductive metal
Central Processing Unit vs Graphical Processing Unit
Anti-confusion chart
Why GPU?
Massive Parallelism
Thousands of small cores that can perform many
arithmetic operations simultaneously.
High Throughput and Memory Bandwidth
GPU memory hierarchy and bandwidth are optimized for
bulk data operations.
Energy Efficiency for Highly Parallel Workloads
GPUs deliver more performance per watt compared to
scaling CPU clusters.
3 Copyright © 2025, Oracle and/or its affiliates
Source: https://pixabay.com/photos/gpu-graphic-card-pcb-hardware-4885250/
GPU for AI?
Deep Learning models have many layers,
multiplications and inputs. Perfect for GPU!
4 Copyright © 2025, Oracle and/or its affiliates
Source https://www.researchgate.net/publication/378171318_Utilising_Machine_Learning_to_Predict_Myocardial_Infarction_by_Electrocardiogram_Derived_Respiration
From Code to Hardware
Java code traditionally runs on CPU
What does ‘running on the GPU’ imply?
5 Copyright © 2025, Oracle and/or its affiliates
Source Code
(Java)
IR (Bytecode)
Interpreter
(JVM JIT)
Machine Code CPU
Legend:
IR = Intermediate Representation
PTX = Parallel Thread Execution
Going from written code to machine code that runs on vendor-specific hardware
Source Code (e.g.
CUDA/C++)
IR
(eg. PTX)
Runtime/Driver Machine Code GPU
Going from written code to machine code that runs on vendor-specific hardware
Diverse GPU Vendors
Copyright © 2025, Oracle and/or its affiliates
6
Vendor IR name (internal compiler IR) Runtime / execution layer
NVIDIA PTX (Parallel Thread Execution) IR CUDA, CUDNN, TensorRT
AMD LLVM IR / GCN ISA* (via ROCm) ROCm, MlOpen, HIP runtime
Intel SPIR-V (Standard Portable IR for Vulkan/OpenCL) oneAPI Level Zero, OpenVINO runtime
Apple AIR
(Apple Intermediate Representation)
Metal Performance Shaders, Core ML
ARM/Mali NIR
(for Mesa stack; IR used in open drivers)
Compute Library, Arm NN
*ISA = Instruction Set Architecture
About Us
Ana-Maria Mihalceanu
Senior Developer Advocate @Oracle
Lize Raes
Senior Developer Advocate @Oracle
7 Copyright © 2025, Oracle and/or its affiliates
RUNTIME
Deep Learning Models
8 Copyright © 2025, Oracle and/or its affiliates
MODEL
• .pt
• .pb
• .onnx
• .gguf
• ...
Graph
(ops + layers)
Weights
IN (img, tokens, …) OUT (cat., tokens, …)
Loads Model
Dispatches Load
to HW
→ PyTorch
→ TensorFlow Runtime
→ ONNX Runtime
→ Llama.cpp
Copyright © 2025, Oracle and/or its affiliates
9
ONNX
Open Neural Network
Exchange, format for
sharing AI models +
runtime
Onyx
Dark gemstone, often
black or banded, used in
jewelry and tabletops
Onix
210kg-weighing ground-
type Pokémon shaped
like a stone serpent
Oh niks
Flemish for 'oh nothing'
Open Neural Network Exchange
Anti-confusion chart
Open Neural Network eXchange (ONNX)
10 Copyright © 2025, Oracle and/or its affiliates
ONNX
Model
Input
Data
Output
Result
In-Memory
Graph
Graph
Partitioner
Provider
Registry
Parallel, Distributed Graph Runner
Execution Providers
CPU GPU-EP Other
1. Interoperable format for machine-learning models
2. Runtime for executing ONNX models
ONNX and Java
ONNX from Java Perspective
The Java platform knows nothing about ONNX.
Java considers ONNX runtime a foreign (native) library.
Java considers the ONNX programming model a foreign
programming model.
11 Copyright © 2025, Oracle and/or its affiliates
Deploy and Execute an ONNX Model
Demo https://github.com/LizeRaes/babylon/tree/fer
12 Copyright © 2025, Oracle and/or its affiliates
ONNX Native Library
(libonnxruntime.dylib | libonnxruntime.dll |
libonnxruntime.so)
Foreign Function & Memory (FFM) Java
Bindings
ONNX Model
(emotion-ferplus-8.onnx)
Java Client Classification/Probabilities
jextract
(https://jdk.java.net/jextract/)
Memory
Layouts
Var handles
Function
Descriptors
Method
Handles
Image
(.png)
Demo: Running Loads on GPU via
ONNX Runtime, in Java
13 Copyright © 2025, Oracle and/or its affiliates
What’s Inside an ONNX Model?
Model metadata
• ir_version (ONNX spec version)
• producer name (e.g. "pytorch", "skl2onnx")
• opset version (the set of available ONNX operators)
• optional metadata strings (author, domain, description, training info)
Graph structure
• Nodes = operators (Conv, Relu, MatMul, etc.)
• Edges = tensors flowing between nodes
• Each node stores its inputs, outputs, and attributes (e.g. kernel size, stride)
Initializers (weights)
• The learned parameters (weights, biases, embeddings, etc.) are stored as raw tensors
inside the file.
• These can be large chunks of binary data (float32, int64, etc.).
Inputs and outputs
• Names, shapes, and data types of expected model inputs and outputs.
• Example: input is float[1, 1, 64, 264], output is float[1, 8].
14 Copyright © 2025, Oracle and/or its affiliates
https://netron.app/
@CodeReflection helps identify areas of Java source code to reflect over
and give access to as code models at compile time and runtime.
Extend Java Reach to Foreign Programming Models with Project Babylon
15 Copyright © 2025, Oracle and/or its affiliates
func @"f" ()void -> {
%0 : java.io.PrintStream = field.load@"java.lang.System::out()java.io.PrintStream";
%1 : java.lang.String = constant @"Hello !";
invoke %0 %1 @"java.io.PrintStream::print(java.lang.String)void";
return;
};
@CodeReflection
static void f() {
System.out.println("Hello !");
}
Input Java Code
Java Code Model
public static void f();
Code:
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello !
5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
8: return
JVM Bytecode
Reflect
Foreign Code Model
Translate
(eg. autodiff)
Lower
Architecture of Java ONNX Prototype
16 Copyright © 2025, Oracle and/or its affiliates
ONNX runtime (ORT)
Foreign Function & Memory API
Panama ONNX binding
ONNX model authored using
Java ONNX API
JDK
Library
Application
onnxruntime_c_api.h
Java ONNX API &
Code Model Transformer
Java code
Native code
jextract
How to run A Java Code Model on ONNX Runtime
Demo https://github.com/openjdk/babylon/tree/code-reflection/cr-examples/onnx
17 Copyright © 2025, Oracle and/or its affiliates
Java Code Model
Code Reflection API
Java ONNX Script Library
Tensor…
ir.OnnxOp..
ir.OnnxType
compiler.OnnxTransformer..
OnnxRuntime…
FFM Bindings
foreign.OrtApi..
foreign.OrtGenApi
ONNX Runtime ONNX GenAI Runtime
OnnxOperators
ir.OnnxOps
proto.OnnxBuilder..
ONNX
Specs & Sources
OpGen
ProtoGen
jextract
Demo: Running a Java Model
on ONNX Runtime
18 Copyright © 2025, Oracle and/or its affiliates
Java on the GPU
19 Copyright © 2025, Oracle and/or its affiliates
Copyright © 2025, Oracle and/or its affiliates
20
Kernel (seed)
Inner edible part of a
grain or nut
Kernel (OS core)
Central part of an
operating system that
manages hardware and
software
Kernel (ML function)
Similarity function,
mathematical tool to
measure similarity by
mapping data into higher
dimensions
Kernel (GPU function)
Small function that runs in
parallel across many threads on
a GPU
Kernel
Anti-confusion chart
© Avadhoot Tavhare
Compute
Kernel
Thread 1 Thread 2
Thread 4
Thread 3
Heterogeneous Accelerator Toolkit (HAT)
"Leaning on the work of others from Panama, Babylon and Class-File API"
21 Copyright © 2025, Oracle and/or its affiliates
CPU
GPU or FPGA
JVM
Application code
HAT Programming Model
HAT
Panama + Code Reflection
Native
Java
Application
Library
JDK
GPU library &
kernel compiler
Accelerator
'Jextracted' or Panama FFM Native
Code
Native vendor provided
runtime/framework
Pluggable
Backend
Heterogeneous Accelerator Toolkit (HAT)
"Leaning on the work of others from Panama, Babylon and Class-File API"
22 Copyright © 2025, Oracle and/or its affiliates
CPU
GPU or FPGA
JVM
Application code
HAT Programming Model
HAT
Panama + Code Reflection
Native
Java
Application
Library
JDK
GPU library &
kernel compiler
Accelerator
'Jextracted' or Panama FFM Native
Code
Native vendor provided
runtime/framework
CUDA
OpenCL
LevelZero
HIP
Java
?
What Does HAT Offer?
An NDRange style kernel parallel programming model
• Other programming models (Triton, OpenMP/TornadoVM annotated loops)
could be supported
A compute programming model
• For coordinating multiple kernel dispatches and minimizing buffer transfers
using Java
A pluggable backend abstraction
• GPU vendors can showcase their device capabilities
• 'Pure Java' multi-threaded and sequential backends
Interface mapped/wrapped Panama FFM MemorySegments
• Access to off-heap data via Java friendly accessors
• Data can be efficiently passed between Java and non-Java compute nodes
23 Copyright © 2025, Oracle and/or its affiliates
Application
Heterogeneous Accelerator Toolkit
( HAT)
GPU FPGA
CPU
Panama FFM
Vendor
Native Runtime
Babylon JDK JVM
Access Code Models of Kernels via @CodeReflection
Copyright © 2025, Oracle and/or its affiliates
public class Square {
@CodeReflection
public static void kernel(@RO KernelContext kc,
@RW S32Arr s32Arr) {
if (kc.x<kc.maxX){
s32Arr.array(kc.x,
s32Arr.array(kc.x) * s32Arr.array(kc.x));
}
}
}
24
Source: https://jjfumero.github.io/posts/2025/02/07/babylon-and-tornadovm
Heterogeneous Accelerator Toolkit (HAT) Programming Model
public class Square {
@CodeReflection
public static void kernel(@RO KernelContext kc, @RW S32Arr s32Arr) {
if (kc.x<kc.maxX){
s32Arr.array(kc.x, s32Arr.array(kc.x) * s32Arr.array(kc.x));
}
}
@CodeReflection
public static void compute(@RO ComputeContext cc, @RW S32Array s32Array) {
cc.dispatchKernel(s32Array.length(), kc -> kernel(kc, s32Array));
}
}
25 Copyright © 2025, Oracle and/or its affiliates
Accelerator acc = // get a suitable GPU or Java Accelerator
acc.compute(cc -> Square.compute(cc, s32Arr));
Kernel Code
Compute Code
Regular Java Code
Heterogeneous Accelerator Toolkit (HAT) in Action
Demo: https://github.com/openjdk/babylon/tree/code-reflection/hat/examples/violajones
26 Copyright © 2025, Oracle and/or its affiliates
A precomputed Haar Cascade :
• N Stages (one shown)
Stage:
• Tree of Haar Features (three shown)
Each Haar Feature
• 0-3 'rectangles'
• Threshold value
☺
☺
☺
☺
27 Copyright © 2025, Oracle and/or its affiliates
Thank you
Java for AI by Paul Sandoz, Thu 9 Oct @ 9.30, Room 5
ONNX-Based Generative AI LLMs in Java with Project Babylon by Adam Sotona, Thu 9 Oct @ 13.50, Room 9
Copyright © 2025, Oracle and/or its affiliates
29

Writing GPU-Ready AI Models in Pure Java with Babylon

  • 1.
    Writing GPU-Ready AIModels in Pure Java with Babylon Ana-Maria Mihalceanu Senior Developer Advocate Java Platform Group @ Oracle Lize Raes Senior Developer Advocate Java Platform Group @ Oracle
  • 2.
    Copyright © 2025,Oracle and/or its affiliates 2 CPU • General-purpose processor with few, complex cores • Ideal for serial work where one operation depends on the other. GPU • Highly parallel processor with many simple cores. • Ideal for parallel work (SIMD) and matrix multiplications. CU Shorthand for 'see you' Cu Cuprum (copper), highly conductive metal Central Processing Unit vs Graphical Processing Unit Anti-confusion chart
  • 3.
    Why GPU? Massive Parallelism Thousandsof small cores that can perform many arithmetic operations simultaneously. High Throughput and Memory Bandwidth GPU memory hierarchy and bandwidth are optimized for bulk data operations. Energy Efficiency for Highly Parallel Workloads GPUs deliver more performance per watt compared to scaling CPU clusters. 3 Copyright © 2025, Oracle and/or its affiliates Source: https://pixabay.com/photos/gpu-graphic-card-pcb-hardware-4885250/
  • 4.
    GPU for AI? DeepLearning models have many layers, multiplications and inputs. Perfect for GPU! 4 Copyright © 2025, Oracle and/or its affiliates Source https://www.researchgate.net/publication/378171318_Utilising_Machine_Learning_to_Predict_Myocardial_Infarction_by_Electrocardiogram_Derived_Respiration
  • 5.
    From Code toHardware Java code traditionally runs on CPU What does ‘running on the GPU’ imply? 5 Copyright © 2025, Oracle and/or its affiliates Source Code (Java) IR (Bytecode) Interpreter (JVM JIT) Machine Code CPU Legend: IR = Intermediate Representation PTX = Parallel Thread Execution Going from written code to machine code that runs on vendor-specific hardware Source Code (e.g. CUDA/C++) IR (eg. PTX) Runtime/Driver Machine Code GPU Going from written code to machine code that runs on vendor-specific hardware
  • 6.
    Diverse GPU Vendors Copyright© 2025, Oracle and/or its affiliates 6 Vendor IR name (internal compiler IR) Runtime / execution layer NVIDIA PTX (Parallel Thread Execution) IR CUDA, CUDNN, TensorRT AMD LLVM IR / GCN ISA* (via ROCm) ROCm, MlOpen, HIP runtime Intel SPIR-V (Standard Portable IR for Vulkan/OpenCL) oneAPI Level Zero, OpenVINO runtime Apple AIR (Apple Intermediate Representation) Metal Performance Shaders, Core ML ARM/Mali NIR (for Mesa stack; IR used in open drivers) Compute Library, Arm NN *ISA = Instruction Set Architecture
  • 7.
    About Us Ana-Maria Mihalceanu SeniorDeveloper Advocate @Oracle Lize Raes Senior Developer Advocate @Oracle 7 Copyright © 2025, Oracle and/or its affiliates
  • 8.
    RUNTIME Deep Learning Models 8Copyright © 2025, Oracle and/or its affiliates MODEL • .pt • .pb • .onnx • .gguf • ... Graph (ops + layers) Weights IN (img, tokens, …) OUT (cat., tokens, …) Loads Model Dispatches Load to HW → PyTorch → TensorFlow Runtime → ONNX Runtime → Llama.cpp
  • 9.
    Copyright © 2025,Oracle and/or its affiliates 9 ONNX Open Neural Network Exchange, format for sharing AI models + runtime Onyx Dark gemstone, often black or banded, used in jewelry and tabletops Onix 210kg-weighing ground- type Pokémon shaped like a stone serpent Oh niks Flemish for 'oh nothing' Open Neural Network Exchange Anti-confusion chart
  • 10.
    Open Neural NetworkeXchange (ONNX) 10 Copyright © 2025, Oracle and/or its affiliates ONNX Model Input Data Output Result In-Memory Graph Graph Partitioner Provider Registry Parallel, Distributed Graph Runner Execution Providers CPU GPU-EP Other 1. Interoperable format for machine-learning models 2. Runtime for executing ONNX models
  • 11.
    ONNX and Java ONNXfrom Java Perspective The Java platform knows nothing about ONNX. Java considers ONNX runtime a foreign (native) library. Java considers the ONNX programming model a foreign programming model. 11 Copyright © 2025, Oracle and/or its affiliates
  • 12.
    Deploy and Executean ONNX Model Demo https://github.com/LizeRaes/babylon/tree/fer 12 Copyright © 2025, Oracle and/or its affiliates ONNX Native Library (libonnxruntime.dylib | libonnxruntime.dll | libonnxruntime.so) Foreign Function & Memory (FFM) Java Bindings ONNX Model (emotion-ferplus-8.onnx) Java Client Classification/Probabilities jextract (https://jdk.java.net/jextract/) Memory Layouts Var handles Function Descriptors Method Handles Image (.png)
  • 13.
    Demo: Running Loadson GPU via ONNX Runtime, in Java 13 Copyright © 2025, Oracle and/or its affiliates
  • 14.
    What’s Inside anONNX Model? Model metadata • ir_version (ONNX spec version) • producer name (e.g. "pytorch", "skl2onnx") • opset version (the set of available ONNX operators) • optional metadata strings (author, domain, description, training info) Graph structure • Nodes = operators (Conv, Relu, MatMul, etc.) • Edges = tensors flowing between nodes • Each node stores its inputs, outputs, and attributes (e.g. kernel size, stride) Initializers (weights) • The learned parameters (weights, biases, embeddings, etc.) are stored as raw tensors inside the file. • These can be large chunks of binary data (float32, int64, etc.). Inputs and outputs • Names, shapes, and data types of expected model inputs and outputs. • Example: input is float[1, 1, 64, 264], output is float[1, 8]. 14 Copyright © 2025, Oracle and/or its affiliates https://netron.app/
  • 15.
    @CodeReflection helps identifyareas of Java source code to reflect over and give access to as code models at compile time and runtime. Extend Java Reach to Foreign Programming Models with Project Babylon 15 Copyright © 2025, Oracle and/or its affiliates func @"f" ()void -> { %0 : java.io.PrintStream = field.load@"java.lang.System::out()java.io.PrintStream"; %1 : java.lang.String = constant @"Hello !"; invoke %0 %1 @"java.io.PrintStream::print(java.lang.String)void"; return; }; @CodeReflection static void f() { System.out.println("Hello !"); } Input Java Code Java Code Model public static void f(); Code: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3 // String Hello ! 5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V 8: return JVM Bytecode Reflect Foreign Code Model Translate (eg. autodiff) Lower
  • 16.
    Architecture of JavaONNX Prototype 16 Copyright © 2025, Oracle and/or its affiliates ONNX runtime (ORT) Foreign Function & Memory API Panama ONNX binding ONNX model authored using Java ONNX API JDK Library Application onnxruntime_c_api.h Java ONNX API & Code Model Transformer Java code Native code jextract
  • 17.
    How to runA Java Code Model on ONNX Runtime Demo https://github.com/openjdk/babylon/tree/code-reflection/cr-examples/onnx 17 Copyright © 2025, Oracle and/or its affiliates Java Code Model Code Reflection API Java ONNX Script Library Tensor… ir.OnnxOp.. ir.OnnxType compiler.OnnxTransformer.. OnnxRuntime… FFM Bindings foreign.OrtApi.. foreign.OrtGenApi ONNX Runtime ONNX GenAI Runtime OnnxOperators ir.OnnxOps proto.OnnxBuilder.. ONNX Specs & Sources OpGen ProtoGen jextract
  • 18.
    Demo: Running aJava Model on ONNX Runtime 18 Copyright © 2025, Oracle and/or its affiliates
  • 19.
    Java on theGPU 19 Copyright © 2025, Oracle and/or its affiliates
  • 20.
    Copyright © 2025,Oracle and/or its affiliates 20 Kernel (seed) Inner edible part of a grain or nut Kernel (OS core) Central part of an operating system that manages hardware and software Kernel (ML function) Similarity function, mathematical tool to measure similarity by mapping data into higher dimensions Kernel (GPU function) Small function that runs in parallel across many threads on a GPU Kernel Anti-confusion chart © Avadhoot Tavhare Compute Kernel Thread 1 Thread 2 Thread 4 Thread 3
  • 21.
    Heterogeneous Accelerator Toolkit(HAT) "Leaning on the work of others from Panama, Babylon and Class-File API" 21 Copyright © 2025, Oracle and/or its affiliates CPU GPU or FPGA JVM Application code HAT Programming Model HAT Panama + Code Reflection Native Java Application Library JDK GPU library & kernel compiler Accelerator 'Jextracted' or Panama FFM Native Code Native vendor provided runtime/framework Pluggable Backend
  • 22.
    Heterogeneous Accelerator Toolkit(HAT) "Leaning on the work of others from Panama, Babylon and Class-File API" 22 Copyright © 2025, Oracle and/or its affiliates CPU GPU or FPGA JVM Application code HAT Programming Model HAT Panama + Code Reflection Native Java Application Library JDK GPU library & kernel compiler Accelerator 'Jextracted' or Panama FFM Native Code Native vendor provided runtime/framework CUDA OpenCL LevelZero HIP Java ?
  • 23.
    What Does HATOffer? An NDRange style kernel parallel programming model • Other programming models (Triton, OpenMP/TornadoVM annotated loops) could be supported A compute programming model • For coordinating multiple kernel dispatches and minimizing buffer transfers using Java A pluggable backend abstraction • GPU vendors can showcase their device capabilities • 'Pure Java' multi-threaded and sequential backends Interface mapped/wrapped Panama FFM MemorySegments • Access to off-heap data via Java friendly accessors • Data can be efficiently passed between Java and non-Java compute nodes 23 Copyright © 2025, Oracle and/or its affiliates Application Heterogeneous Accelerator Toolkit ( HAT) GPU FPGA CPU Panama FFM Vendor Native Runtime Babylon JDK JVM
  • 24.
    Access Code Modelsof Kernels via @CodeReflection Copyright © 2025, Oracle and/or its affiliates public class Square { @CodeReflection public static void kernel(@RO KernelContext kc, @RW S32Arr s32Arr) { if (kc.x<kc.maxX){ s32Arr.array(kc.x, s32Arr.array(kc.x) * s32Arr.array(kc.x)); } } } 24 Source: https://jjfumero.github.io/posts/2025/02/07/babylon-and-tornadovm
  • 25.
    Heterogeneous Accelerator Toolkit(HAT) Programming Model public class Square { @CodeReflection public static void kernel(@RO KernelContext kc, @RW S32Arr s32Arr) { if (kc.x<kc.maxX){ s32Arr.array(kc.x, s32Arr.array(kc.x) * s32Arr.array(kc.x)); } } @CodeReflection public static void compute(@RO ComputeContext cc, @RW S32Array s32Array) { cc.dispatchKernel(s32Array.length(), kc -> kernel(kc, s32Array)); } } 25 Copyright © 2025, Oracle and/or its affiliates Accelerator acc = // get a suitable GPU or Java Accelerator acc.compute(cc -> Square.compute(cc, s32Arr)); Kernel Code Compute Code Regular Java Code
  • 26.
    Heterogeneous Accelerator Toolkit(HAT) in Action Demo: https://github.com/openjdk/babylon/tree/code-reflection/hat/examples/violajones 26 Copyright © 2025, Oracle and/or its affiliates A precomputed Haar Cascade : • N Stages (one shown) Stage: • Tree of Haar Features (three shown) Each Haar Feature • 0-3 'rectangles' • Threshold value ☺ ☺ ☺ ☺
  • 27.
    27 Copyright ©2025, Oracle and/or its affiliates
  • 28.
    Thank you Java forAI by Paul Sandoz, Thu 9 Oct @ 9.30, Room 5 ONNX-Based Generative AI LLMs in Java with Project Babylon by Adam Sotona, Thu 9 Oct @ 13.50, Room 9 Copyright © 2025, Oracle and/or its affiliates 29