Immutable Data Structures Simply Explained with Java Examples

Persistent Data Structures

Living in a world where nothing
changes but everything evolves
- or -
A complete idiot's guide to immutability

Java Haskell

vs

● Warm, soft and cute ● Strange, unfamiliar alien
● Imperative ● Purely functional
● Object oriented ● Everything is different
● Just like good old ● Shocking news! It's not
Basic, but with classes like Basic!

Haskell does not have variables!
Imagine a dialect of Java where everything is final by default
class LinkedList {
class Node {
final Node next, prev;
final Object value;
}

final Node head, tail;

void add(final Object v) {
for (final Node n = head; n != null; n = n.next) {
...
}
}
}

All fields, parameters and variables are automatically
immutable, the final is implied everywhere, and there is no
way to get rid of it

Haskell does not have variables!
Imagine a dialect of Java where everything is final by default
class LinkedList {
class Node {
final Node next, prev;
final Object value;
}
It does for me!
final Node head, tail;

void add(final doesn't make
But it Object v) {
sense!
for (final Node n = head; n != null; n = n.next) {
...
} It won't work!
}
}

All fields, parameters and variables are automatically
immutable, the final is implied everywhere

What is a variable?

var·y/ˈve(ə)rē/
vary, varied, varying

● — verb (used with object)
Definition: to change or alter, as in form, appearance,
character, or substance

● — verb (used without object)
Definition: to undergo change in appearance, form, substance,
character, etc

● — synonyms:
modify, mutate

"Variables" in Haskell

● Must be assigned once declared

YES: int a = 1; NO: int a;

● Cannot be reassigned

YES: final int a = 1; NO: a = 2;

These are mathematical variables, not imperative ones!

When everything is immutable

There is no notion of time:

● Functions take old values, produce new values, nothing is
changed in-place
● It does not matter when a function was called, it only
matters what arguments it was called with

There is no notion of identity:

● Everything is a value, complex data structures are values
too
● There is no way to tell if a == b, only if a.equals(b)
● In other words, values are never identical to each other, but
may be equal

I want my linked list!

Basic terminology:

● Ephemeral data structure — everything that is not
persistent. Most Java data structures (lists, sets, etc.) are
ephemeral.

● Persistent data structure — immutable data structure with
history. No in-place modifications. Operations on it create
new versions. Older versions are always available. That. Is.
Simple.

● The persistence property has nothing to do with persistent
storage, like disks! This is a completely different story.

I want my linked list!

● In imperative languages, like Java, most data structures are
ephemeral by default
Designing persistent data structures is somewhat awkward and
not always efficient

● In purely functional languages, like Haskell, all data
structures are automatically persistent!
There is just no other way to make data structures

History of updates

Making update to a persistent DS instance
always creates a new instance that contains this update.
The current version is left unmodified.

Why should I bother?

Is it fun? Hell yeah!

But is it practical? Let's see!

The free lunch is over!
"The biggest sea change in software development
since the OO revolution is knocking at the door,
and its name is Concurrency." — Herb Sutter

A commodity
hardware
(my laptop)

The need for writing correct multi-threaded code
is constantly increasing

Concurrent data structures are hard!

Want a concurrent ephemeral linked list?
Here are some implementation strategies:

● Coarse-grained synchronization
● Fine-grained synchronization
● Optimistic synchronization
● Lazy synchronization
All lock-based — no composition, deadlocks, etc

● Non-blocking synchronization in different flavors
And you need the size of a list you are in trouble!


● Making mutable concurrent data structures requires inter-
thread coordination within these structures

● Locks and atomic references all over the place

● Decades of research by academia with many attempts

● Sophisticated algorithms that are hard to reason about, test
and prove

● Several different ways to solve the same problems, each
with its own cons and pros


● Making mutable concurrent data structures requires inter-
Yes, but are persistent data
thread coordination within these structures
structures actually simpler?
● Locks and atomic references all over the place

● Decades of research by academia with many attempts

● Sophisticated algorithms that are hard to test and prove

● Several different ways to solve the same problems, each
with its own cons and pros

Just give up mutability!

● Persistent data structures are easy to reason about in
concurrent environment

● The behavior does not depend on how many threads are
trying to "modify" it at once

● Therefore persistent data structures are very easy to test
and debug

The whole picture

● Persistent data structures alone are not sufficient
They are an essential part of the picture, but not the whole
answer to concurrency
● Inter-thread coordination is needed
Threads still need to know what each other thread is doing to
agree on a common outcome

● But it can be added "outside"
Which gives us complete separation of concerns

The whole picture

Solving concurrency challenge in a modern language:

● Scala Way — Persistent data structures with message
passing

● Clojure Way — Persistent data structures with software
transactional memory

● Will likely be mixed in the future

Last few words on concurrency

● Persistent data structures are slower than ephemeral ones
in sequential use

● But not that much slower!

● We can forgive it, since they give you more functionality,
and ephemeral data structures are simply less capable

● And in multiprocessor era, it is better to make things
scalable rather than fast

Efficient persistent data structures

We want persistent data structures to be space and time
efficient:

● Structural sharing
We want to reuse as many fragments of the previous version
as possible
● Path copying
We want to copy as few pieces as possible
● Maybe, just maybe lazy evaluation (where available)
We don't want nasty pathological cases

A case study

● Let's make some persistent data
structures in Java

● All these structures consist of Why are you
classes with only final fields looking at me?!

● With good amortized asymptotic
complexity in most cases

Our plan

Lets start with some trivial examples

● Stack

● Queue

● Tree

The proceed with more advanced structures

● Hash Table

● Finger Tree

Trivial Example — Persistent Stack
class Stack<T> {
final T v; (a)
final Stack<T> next; (b)
It's just a singly linked
Stack() { list of nodes
v = null;
next = null;
size = 0;
}

Stack(T v, Stack<T> next) {
this.v = v;
this.next = next;
}
...

Source Code 1/2

class Stack<T> {
...
Stack<T> push(T v) {
return new Stack<T>(v, this); (a)
}

T peek() {
if (next == null)
throw new NoSuchElementException();
return v; (b)
}

Stack<T> pop() {
if (next == null)
return next; (c)
}

Source Code 2/2


Structural sharing in persistent stack


Looks familiar?
The versions tree!


Also known as
Spaghetti stack or
Cactus stack

Persistent Queue

It's just two stacks combined: When front stack is empty,
reverse back stack and
● Back stack to enqueue items use it as front stack
● Front stack to dequeue items

Persistent Queue
class Queue<T> {
// back stack - push elements here
final Stack<T> b; (a)
// front stack - pop elements from here
final Stack<T> f; (b)

Queue() {
b = f = new Stack<T>();
}

Queue(Stack<T> b, Stack<T> f) {
this.b = b;
this.f = f;
}

boolean isEmpty() {
return f.isEmpty(); (c)
}
...

Source Code 1/3

Persistent Queue
class Queue<T> {
...
static <T> Queue<T> check(Stack<T> b, Stack<T> f) {
if (f.isEmpty())
return new Queue<T>(f, b.reverse()); (a)
else
return new Queue<T>(b, f); (b)
}

Queue<T> push(T v) {
return check(b.push(v), f);
}

Queue<T> pop() {
if (isEmpty()) {
}
return check(b, f.pop());
}

Source Code 2/3

Persistent Queue
class Queue<T> {
...
T peek() {
if (isEmpty()) {
}
return f.peek();
}

class Stack<T> {
...
Stack<T> reverse() {
if (isEmpty() || next.isEmpty())
return this;
Stack<T> r = new Stack<T>();
for (Stack<T> s = this; !s.isEmpty(); s = s.pop()) {
r = r.push(s.peek());
}
return r;
}

Source Code 3/3

Persistent Queue

Structural sharing in persistent queue

Persistent Queue

Beware pathological cases!

● What is forward stack is empty, but back stack is full?

● And we are going to pop from the same queue N times

● Then we get N back back stack reversions!

● Lazy evaluation to the rescue — use lazy streams instead of
strict stacks

Persistent Queue

But there is a better way
to design queue!

Monoidally Annotated 2-3 Finger Tree is a versatile data
structure that can be used to build efficient lists, deques,
priority queues, interval trees, ropes, etc.

It is more complex, we will take a look at it later.

Persistent Tree

● It is trivial to convert any ephemeral tree to a persistent one
by means of path copying

● It works for binary trees, 2-3 trees, B-trees, etc

● The shape of tree is not affected, only mutating algorithms

● In a balanced binary tree at most log N nodes need to be
copied — quite efficient

● The secret to all persistent data structures is that they all
are trees! (Yes, lists and hash tables are trees too)

Simple Persistent Binary Tree

class SimpleBinaryTree {
static class Node {
final K key; (a)
final V value; (b)
final Node l, r; (c)

Node(K key, V value, Node l, Node r) {
this.key = key;
this.value = value;
this.l = l;
this.r = r;
}
}
...

Source Code 1/2

Simple Persistent Binary Tree

class SimpleBinaryTree {
...
static Node insert(Node n, K key, V value) {
if (n == null) {
return new Node(key, value, null, null); (a)
}
int cmp = key.compareTo(n.key); (b)
if (cmp < 0) {
return new Node(n.key, n.value, (c)
insert(n.l, key, value), n.r);
}
if (cmp > 0) {
return new Node(n.key, n.value, (d)
n.l, insert(n.r, key, value));
}
return new Node(key, value, n.l, n.r); (e)
}

Source Code 2/2

Persistent Tree

Multiple definitions of persistence:

● Immutable data structure with history
● Committed to a persistent storage

Append only databases and file systems:

● CouchDB uses append only B-Tree
● RethinkDB makes append only variant of MySQL
● ZFS, BTRFS implement copy-on-write transactions
and snapshots

Nothing is new under the moon!

Persistent Map

interface Map<K, V> {
// get value for a key, or null if not found
V get(K key);
// make key/value association
Map<K, V> put(K key, V value);
// remove key/value association
Map<K, V> remove(K key);
}

Remember, no in-place updates
Mutations create new instances

Persistent Map

Implementation Strategy

● Persistent red-black tree for ordered keys
Time complexity — O(log n)

● Persistent hash table for hashable keys
Time complexity — O(1)

Persistent Hash Table

But how do we implement it?
Copying the whole table would be too expensive!


Here's the idea: partition hash table into smaller
pieces, organized them as a persistent tree

Nice idea, but how do we navigate in such a tree?

Prefix Tree/Trie
Search is guided by individual letters of a string key

Hash code is just a string of digits!

Persistent Hash Table in Prefix Tree

Represent 32 bit hash codes as strings of 5 bit symbol:

hashCode = CAFEBABE16
level 6 5 4 3 2 1 0
bits 11 00101 01111 11101 01110 10101 11110
symbol 3 5 15 29 14 21 30


hashCode = ... xxxxx xxxxx xxxxx xxxxx

Each item is either a key/value pair or a subtree


class PersistentHashMap {
abstract class Item<K, V> {}

class Node<K, V> extends Item<K, V> {
final Item<K, V> children = new Item<K, V>[32]; (a)
}

class Entry<K, V> extends Item<K, V> {
final int hashCode; (b)
final K key; (c)
final V value; (d)
final Entry<K, V> next; (e)
}

Source Code 1/2


V get(K key) {
return root.find(key.hashCode(), key, 0); (a)
}

class Node<K, V> extends Item<K, V> {
V find(int hashCode, K key, int level) {
int index = (hashCode >>> (level * 5)) & 31; (b)
Item<K, V> item = children[index]; (c)
if (item instanceof Node) { (d)
return ((Node<K, V>) item) (e)
.find(hashCode, key, level + 1);
}
if (item instanceof Entry) { (f)
return ((Entry<K, V>) item) (g)
.find(hashCode, key);
}
return null;
}

Source Code 2/2


Do not waste space!

class Node<K, V> {
final Item<K, V> children = new Item<K, V>[32]; (a)
}

● Most of the children would be null on deeper levels

● The number of arrays grows exponentially as we go deeper

● Need to find a way to compact tree

● Simply get rid of nulls in arrays!


class Node<K, V> {
final int mask; (a)
final Item<K, V> children =
new Item<K, V>[bitCount(mask)]; (b)
}

● Mask is a 32-bit integer whose bits set to 1 only for those
array elements that are not null

● Array stores only non-null elements. Its size is the number
of 1 bits in the mask. Array size varies from 2 to 32
elements.

● Overhead for null array element is just one bit. Quite good!


● To test that array has element at index i, simply test if ith bit
in the mask is 1:

if ((mask & (1 << i)) != 0) { ...

● To get offset to ith element in the array, count number of 1
bits lower than i in the mask:

int offset = bitCount(mask & ((1 << i) - 1));
if (children[offset] instanceof ...

Persistent List

interface Seq<T> {
T head(); // get first element
Seq<T> tail(); // get list without first element
Seq<T> cons(T v); // append element to head
Seq<T> snoc(T v); // append element to tail
Seq<T> concat(Seq<T> that); // join two lists
int size(); // get number of elements
T get(int index); // get Nth element
Seq<T> set(int index, T v); // set Nth element
}

Remember, no in-place updates
Mutations create new instances

Persistent List

● There are quite a few ways to implement persistent lists

● But we will not be studying them

● Instead, we will turn our attention to finger trees

● Soon, it will be clear why

Finger Trees

● An incredibly elegant, simple and efficient data structure

● Oh so very versatile, functional programmer's Swiss Army
knife

● Basic data structure for building random acces sequences,
deques, priority queues, ropes, interval trees, etc.

● Let's define it in stages

Persistent leafy 2-3 trees

Let's begin with a simple data structure — leafy 2-3 tree

● Every intermediate node has either two childrent or three
children

● All values are stored in leafs

● Perfectly balanced — all leafs are at the same level

Persistent leafy 2-3 trees

Leafs contain interesting
values,
but what is stored in nodes?

Annotated leafy 2-3 trees

● There must be a way to find interesting values in a tree

● We need to guide search from the root of a tree to its leafs

● Let's add special annotations to nodes

● Use these annotations to find values

Size annotated leafy 2-3 trees

● Each intermediate node is annotated with the size of a
subtree rooted at this node

● Makes it trivial to find any leaf by its index

● Starting from root, test if index is in the range of its left
(middle) or right subtree, and repeat recursively for that
subtree, until a leaf is found

Size annotated leafy 2-3 trees

Looks like random access list

Priority annotated leafy 2-3 trees

● Each intermediate node is annotated with the highest
priority of an element in its subtree

● Makes it trivial to find value with the highest priority

● Starting from root, find subtree with the highest priority
descent recursively into it, until a leaf is found

Priority annotated leafy 2-3 trees

Looks like priority queue

Monoids

● One interface to unify size, priority (and more!) annotations
on trees

● A set of values with a "zero" element 0 and a binary
associative operation ⊕

● Monoid laws:
0⊕a = a
a⊕0 = a
a⊕(b⊕c) = (a⊕b)⊕c

Monoid examples

● Strings with empty string and concatenation
"" + "a" = "a", "a" + "" = "a"
"a" + ("b" + "c") = ("a" + "b") + "c"

● Integers with zero and addition
0 + 1 = 1, 1 + 0 = 1
1 + (2 + 3) = (1 + 2) + 3

● Integers with one and multiplication
1 * 2 = 2, 2 * 1 = 1
2 * (3 * 4) = (2 * 3) * 4

● And many, more of them! (Monoids are everywhere)

Monoid interface

interface Monoid<T extends Monoid<T>> {
T unit();
T combine(T that);
}

class String implements Monoid<String> {
...

String unit() {
return ""; (a)
}

String combine(String that) {
return this + that; (b)
}
}

Size monoid

class Size implements Monoid<Size> {
final int size; (a)

Size(int size) {
this.size = size;
}

Size unit() {
return new Size(0); (b)
}

Size combine(Size that) {
return new Size(this.size + that.size); (c)
}
}

Priority monoid

class Priority implements Monoid<Priority> {
final int priority; (a)

Priority(int priority) {
this.priority = priority;
}

Priority unit() {
return new Priority(MAX_INTEGER); (b)
}

Priority combine(Priority that) {
return new Priority(
Math.min(this.priority, that.priority)); (c)
}
}

But where do we get monoids from?

● Monoids have nice property of composability

● We can get more monoids by combining existing ones

● But where do we get initial monoids to begin with?

● We need a way to measure values!

● Those measures must be monoids, obviously
interface Measured<M extends Monoid> {
M measure();
}

Let's make a sketch of annotated tree
/** <V> is the type of values
<M> is the type of monoidal measures of values */
class Tree<M extends Monoid, V extends Measured<M>>
implements Measured<M> { (a)

abstract class Leaf<M, V> extends Tree<M, V> {
final V value; (b)
override abstract M measure(); (c)
}

class Node<M, V> extends Tree<M, V> {
final Tree<M, V> left, right; (d)
final M m; (e)
Node(Tree<M, V> l, Tree<M, V> r) {
left = l; right = r;
m = l.measure().combine(r.measure()); (f)
}
override final M measure() { Pseudocode!
return m; (g)
}

Let's make a sketch of annotated tree
...
class Leaf<V> extends Tree<Size, V> {
final V value;

override final Size measure() {
return new Size(1); (a)
}
}

...
class Leaf<V> extends Tree<Priority, V> {
final V value;

override final Priority measure() {
return new Priority(value.priority()); (b)
}
}
Pseudocode!

But that is not finger tree yet!

Finger Tree

... is a just an annotated tree of annotated 2-3 trees!

Finger Tree

Digits, 2-3 trees, fingers and nested levels

Finger Tree

class FingerTree<M extends Monoid<M>, T extends Measured<M>>
implements Measured<M> {

class Empty<M extends Monoid<M>, T extends Measured<M>>
extends FingerTree<M, T> {}

class Single<M extends Monoid<M>, T extends Measured<M>>
extends FingerTree<M, T> {
final T v; (a)
final M m; (b)

class Deep<M extends Monoid<M>, T extends Measured<M>>
extends FingerTree<M, T> {
final Digit<M, T> prefix; (c)
final FingerTree<M, Node<M, T>> middle; (d)
final Digit<M, T> suffix; (e)
final M m; (f)

Source Code 1/3

Finger Tree

class Digit<M extends Monoid<M>, T extends Measured<M>>
final M m; (a)

class One<M extends Monoid<M>, T extends Measured<M>>
extends Digit<M, T> {
final T a; (b)

class Two<M extends Monoid<M>, T extends Measured<M>>
final T a, b; (c)

class Three<M extends Monoid<M>, T extends Measured<M>>
final T a, b, c; (d)

class Four<M extends Monoid<M>, T extends Measured<M>>
final T a, b, c, d; (e)

Source Code 2/3

Finger Tree

class Node<M extends Monoid<M>, T extends Measured<M>>
final M m; (a)

class Node2<M extends Monoid<M>, T extends Measured<M>>
extends Node<M, T> {
final T a, b; (b)

class Node3<M extends Monoid<M>, T extends Measured<M>>
extends Node<M, T> {
final T a, b, c; (c)

Source Code 3/3

Finger Tree Interface

Basic operations:

● cons, snoc — append/prepend element
● concat — join two trees
● split — find prefix, element and suffix using predicate

Beyond the scope of this presentation, sorry

Finger Tree Performance

Amortized bounds:

Finger Tree 2-3 Tree List
● cons, snoc O(1) O(log n) O(1)/O(n)
● head, last O(1) O(log n) O(1)/O(n)
● concat O(log min(ℓ1, ℓ2)) O(log n) O(n)
● split O(log min(n, ℓ-n)) O(log n) O(n)
● index O(log min(n, ℓ-n) O(log n) O(n)

Immutable Data Structures Simply Explained with Java Examples

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Immutable Data Structures Simply Explained with Java Examples

Similar to Immutable Data Structures Simply Explained with Java Examples (20)

More from Vasil Remeniuk

More from Vasil Remeniuk (20)

Recently uploaded

Recently uploaded (20)

Immutable Data Structures Simply Explained with Java Examples