LLJVM: LLVM bitcode to JVM bytecode

LLJVM: Bitcode to JVM bytecode

A lightweight library to inject LLVM
bitcode into JVM
▪ LLJVM
▪ https://github.com/maropu/lljvm-translator
▪ Originally authored by David Roberts,
but currently unmaintained
▪ https://github.com/davidar/lljvm
▪ Apache License, Version 2.0
▪ Main target
▪ Compile Python functions and then
run them on JVMs
▪ Not a full-fledged but a restricted translation
.class
TRANSLATE BY LLJVM
JVM
LOAD & RUN
.cc
.py
LANGUAGE DEPENDENT
FUNCTIONS
.go
.rs
.bc
…
2
* Numba: A High Performance Python Compiler, http://numba.pydata.org/

Motivation: PySpark UDF Overhead
▪ UDF is expressive and powerful in real use cases
▪ Domain specific transformation, e.g., feature engineering in ML
▪ Billions of daily UDF invocations in Microsoft Azure*
▪ Many culprits of the overhead in Spark
▪ Interpreter execution in Python
▪ (De-)Serialization between Spark executors and Python workers
▪ Whole-stage codegen breaker
3
* Ramachandra et al., Froid: Optimization of Imperative Programs in a Relational Database, Proceedings of the VLDB Endowment,
Volume 11, Issue 4, Pages 432-444, 2017.

Vectorized UDFs in PySpark
4
▪ Efficient (de-)serialization by Apache Arrow
▪ Vector computation in Python
▪ Pd.DataFrame ⇒ pd.DataFrame
* Improving Python and Spark Performance and Interoperability with Apache Arrow, Spark Summit 2017, https://bit.ly/2DEkhHC
Spark
Previous Spark Spark v2.3+
UDF: scalar ⇒ scalar
UDF: pd.DataFrame ⇒ pd.DataFrame
Immutable
Arrow Batch
Immutable
Arrow Batch

Vectorized UDFs Performance
5
* Introducing Pandas UDF for PySpark, Spark Summit 2017, https://bit.ly/2A6hAdZ
▪ 3x to over 100x faster than row-at-a-time UDFs

Whole-Stage Codegen Breaker
6
Scala UDF
Codegen’d Plans
▪ The plan difference (Scala/Python UDFs) in Spark v2.4
Python UDF
A Python UDF splits them into two parts

PySpark UDF Chain Hell...
▪ No function composition in Spark v2.4 Catalyst
7

Related Research Work: TUPLEWARE
▪ In the UDF compilation process, it gathers input data
statistics and applies low-level & data-dependent
optimizations, e.g., no-branch strategy
8
* Andrew Crotty, et al., An Architecture for Compiling UDF-centric Workflows, Proceedings of the VLDB Endowment,
Volume 8, Issue 12, Pages 1466-1477, 2015.

Related Spark JIRA discussion
▪ SPARK-14083: Analyze JVM bytecode and turn closures
into Catalyst expressions
▪ This closure transformation brings the benefits of many
Spark optimization rules, e.g., Filter Pushdown
9

LLJVM Approach
▪ LLJVM supports simple UDF logics only
▪ Limited instruction support: LLVM v7.0 has 63 instructions and
LLJVM supports 49 ones only
▪ Simple LLVM data type support
▪ Complicated aggregate type unsupported, e.g., { i32, { i64, double }* }
▪ …
▪ Focus on Numba-generated LLVM bitcode
▪ LLJVM provides internal functions that the bitcode uses, e.g.,
math functions and matrix manipulation
10

Numba and LLJVM
▪ Numba: High Performance Python Compiler
▪ Specialized code for CPUs and GPUs
▪ LLJVM provides a new option for JVMs in Numba
11
CPUs GPUs JVMs
.py

How-to-Use LLJVM
▪ Load a LLVM bitcode file via a custom class loader and then
run it by using Java runtime reflection
▪ In unsupported cases (e.g., unsupported LLVM instructions
found), it throws a LLJVMException
12
* Example code for the LLJVM translator, https://github.com/maropu/lljvm-example

Translation Example2: log10
14
JVM assembly code will appear in a next slide...

Translation Example2: log10
15

Translation Example2: array sum
16
NumPy array
Numba-internal array format
JVM assembly code will appear in a next slide...
1-d data array shape/stride

Translation Example2: array sum
17
address of Java array
▪ C-like pointer argument passing

Address of Java Arrays
18
▪ Super hacky calculation in OpenJDK 8 (64bit)
▪ Java object address in OpenJDK is compressed internally:
Ordinary Object Pointer (OOP)
▪ OOP decompression depends on shift and base values
[address of Java object] := base + ([OOP address] << shift)
▪ These values cannot be referenced on runtime, so LLJVM infers
the two values by comparing OOP/raw addresses
See: https://github.com/maropu/lljvm-translator/blob/master/core/src/main/java/io/github/maropu/lljvm/util/ArrayUtils.java

Use Case: Compile PySpark UDF
▪ PySpark UDF compilation flow
▪ 1. Compile Python code into LLVM bitcode in a driver side
▪ LLVM bitcode is a byte array, so serializable
▪ 2. Transfer the bitcode into executors
▪ 3. Load it into JVMs and run it
19
UDF: plus(x, y) => x + y
▪ Even in a simple UDF, ~50x faster than
the vectorized UDF one
UDFs
Vectorized
UDFs
Compiled
UDFs
~50x

Experimental Release: v0.1.0
▪ Supports OpenJDK 8 (64bit) only
▪ Bundles x86_64 native binaries for Linux/Mac
▪ For Linux, it is built by clang++ v3.6.2
▪ For mac, it is built by Apple clang++ v900.0.39.2
▪ LLVM v5.0.2 used internally
▪ In master, the latest v7.0.0 (2018.11.19) used
<dependency>
<groupId>io.github.maropu</groupId>
<artifactId>lljvm-core</artifactId>
<version>0.1.0-EXPERIMENTAL</version>
</dependency> …but, it still has many bugs now
20

Wrap Up
▪ LLJVM: Translate LLVM bitcode into JVM bytecode
▪ Currently, it focuses on the Numba integration
▪ UDF optimization is technically challenging and brings
performance benefits in real use cases
▪ Users love writing code on structured data
▪ If you’re interested in this, plz give it your GitHub star!
▪ https://github.com/maropu/lljvm-translator
21

LLJVM: LLVM bitcode to JVM bytecode

More Related Content

What's hot

Similar to LLJVM: LLVM bitcode to JVM bytecode

More from Takeshi Yamamuro

Recently uploaded

LLJVM: LLVM bitcode to JVM bytecode