• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Persistent Data Structures by @aradzie
 

Persistent Data Structures by @aradzie

on

  • 3,908 views

 

Statistics

Views

Total Views
3,908
Views on SlideShare
3,137
Embed Views
771

Actions

Likes
2
Downloads
48
Comments
2

5 Embeds 771

http://scala.by 759
http://www.linkedin.com 5
http://localhost:4000 4
http://127.0.0.1:4000 2
http://duckduckgo.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Hi, the diagram of p. 43 is wrong.
    Are you sure you want to
    Your message goes here
    Processing…
  • Presentation made by Alex Radzivanovich (@aradzie) at Scala Enthusiasts Belarus Meetup #3
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Persistent Data Structures by @aradzie Persistent Data Structures by @aradzie Presentation Transcript

    • Persistent Data Structures Living in a world where nothing changes but everything evolves - or -A complete idiots guide to immutability
    • Java Haskell vs● Warm, soft and cute ● Strange, unfamiliar alien● Imperative ● Purely functional● Object oriented ● Everything is different● Just like good old ● Shocking news! Its not Basic, but with classes like Basic!
    • Haskell does not have variables!Imagine a dialect of Java where everything is final by default class LinkedList { class Node { final Node next, prev; final Object value; } final Node head, tail; void add(final Object v) { for (final Node n = head; n != null; n = n.next) { ... } } } All fields, parameters and variables are automatically immutable, the final is implied everywhere, and there is no way to get rid of it
    • Haskell does not have variables!Imagine a dialect of Java where everything is final by default class LinkedList { class Node { final Node next, prev; final Object value; } It does for me! final Node head, tail; void add(final doesnt make But it Object v) { sense! for (final Node n = head; n != null; n = n.next) { ... } It wont work! } } All fields, parameters and variables are automatically immutable, the final is implied everywhere
    • What is a variable?var·y/ˈve(ə)rē/vary, varied, varying ● — verb (used with object)Definition: to change or alter, as in form, appearance,character, or substance ● — verb (used without object)Definition: to undergo change in appearance, form, substance,character, etc ● — synonyms:modify, mutate
    • "Variables" in Haskell ● Must be assigned once declared YES: int a = 1; NO: int a; ● Cannot be reassigned YES: final int a = 1; NO: a = 2;These are mathematical variables, not imperative ones!
    • When everything is immutableThere is no notion of time: ● Functions take old values, produce new values, nothing is changed in-place ● It does not matter when a function was called, it only matters what arguments it was called withThere is no notion of identity: ● Everything is a value, complex data structures are values too ● There is no way to tell if a == b, only if a.equals(b) ● In other words, values are never identical to each other, but may be equal
    • I want my linked list!Basic terminology: ● Ephemeral data structure — everything that is not persistent. Most Java data structures (lists, sets, etc.) are ephemeral. ● Persistent data structure — immutable data structure with history. No in-place modifications. Operations on it create new versions. Older versions are always available. That. Is. Simple. ● The persistence property has nothing to do with persistent storage, like disks! This is a completely different story.
    • I want my linked list! ● In imperative languages, like Java, most data structures are ephemeral by defaultDesigning persistent data structures is somewhat awkward andnot always efficient ● In purely functional languages, like Haskell, all data structures are automatically persistent!There is just no other way to make data structures
    • History of updates Making update to a persistent DS instancealways creates a new instance that contains this update. The current version is left unmodified.
    • Why should I bother? Is it fun? Hell yeah! But is it practical? Lets see!
    • The free lunch is over!"The biggest sea change in software development since the OO revolution is knocking at the door, and its name is Concurrency." — Herb Sutter A commodity hardware (my laptop)The need for writing correct multi-threaded code is constantly increasing
    • Concurrent data structures are hard!Want a concurrent ephemeral linked list?Here are some implementation strategies: ● Coarse-grained synchronization ● Fine-grained synchronization ● Optimistic synchronization ● Lazy synchronizationAll lock-based — no composition, deadlocks, etc ● Non-blocking synchronization in different flavorsAnd you need the size of a list you are in trouble!
    • Concurrent data structures are hard!● Making mutable concurrent data structures requires inter- thread coordination within these structures● Locks and atomic references all over the place● Decades of research by academia with many attempts● Sophisticated algorithms that are hard to reason about, test and prove● Several different ways to solve the same problems, each with its own cons and pros
    • Concurrent data structures are hard!● Making mutable concurrent data structures requires inter- Yes, but are persistent data thread coordination within these structures structures actually simpler?● Locks and atomic references all over the place● Decades of research by academia with many attempts● Sophisticated algorithms that are hard to test and prove● Several different ways to solve the same problems, each with its own cons and pros
    • Just give up mutability!● Persistent data structures are easy to reason about in concurrent environment● The behavior does not depend on how many threads are trying to "modify" it at once● Therefore persistent data structures are very easy to test and debug
    • The whole picture ● Persistent data structures alone are not sufficientThey are an essential part of the picture, but not the wholeanswer to concurrency ● Inter-thread coordination is neededThreads still need to know what each other thread is doing toagree on a common outcome ● But it can be added "outside"Which gives us complete separation of concerns
    • The whole pictureSolving concurrency challenge in a modern language: ● Scala Way — Persistent data structures with message passing ● Clojure Way — Persistent data structures with software transactional memory ● Will likely be mixed in the future
    • Last few words on concurrency● Persistent data structures are slower than ephemeral ones in sequential use● But not that much slower!● We can forgive it, since they give you more functionality, and ephemeral data structures are simply less capable● And in multiprocessor era, it is better to make things scalable rather than fast
    • Efficient persistent data structuresWe want persistent data structures to be space and timeefficient: ● Structural sharingWe want to reuse as many fragments of the previous versionas possible ● Path copyingWe want to copy as few pieces as possible ● Maybe, just maybe lazy evaluation (where available)We dont want nasty pathological cases
    • A case study● Lets make some persistent data structures in Java● All these structures consist of Why are you classes with only final fields looking at me?!● With good amortized asymptotic complexity in most cases
    • Our planLets start with some trivial examples ● Stack ● Queue ● TreeThe proceed with more advanced structures ● Hash Table ● Finger Tree
    • Trivial Example — Persistent Stackclass Stack<T> { final T v; (a) final Stack<T> next; (b) Its just a singly linked Stack() { list of nodes v = null; next = null; size = 0; } Stack(T v, Stack<T> next) { this.v = v; this.next = next; } ... Source Code 1/2
    • Trivial Example — Persistent Stackclass Stack<T> { ... Stack<T> push(T v) { return new Stack<T>(v, this); (a) } T peek() { if (next == null) throw new NoSuchElementException(); return v; (b) } Stack<T> pop() { if (next == null) throw new NoSuchElementException(); return next; (c) } Source Code 2/2
    • Trivial Example — Persistent Stack Structural sharing in persistent stack
    • Trivial Example — Persistent Stack Looks familiar? The versions tree!
    • Trivial Example — Persistent Stack Also known as Spaghetti stack or Cactus stack
    • Persistent QueueIts just two stacks combined: When front stack is empty, reverse back stack and ● Back stack to enqueue items use it as front stack ● Front stack to dequeue items
    • Persistent Queueclass Queue<T> { // back stack - push elements here final Stack<T> b; (a) // front stack - pop elements from here final Stack<T> f; (b) Queue() { b = f = new Stack<T>(); } Queue(Stack<T> b, Stack<T> f) { this.b = b; this.f = f; } boolean isEmpty() { return f.isEmpty(); (c) } ... Source Code 1/3
    • Persistent Queueclass Queue<T> { ... static <T> Queue<T> check(Stack<T> b, Stack<T> f) { if (f.isEmpty()) return new Queue<T>(f, b.reverse()); (a) else return new Queue<T>(b, f); (b) } Queue<T> push(T v) { return check(b.push(v), f); } Queue<T> pop() { if (isEmpty()) { throw new NoSuchElementException(); } return check(b, f.pop()); } Source Code 2/3
    • Persistent Queueclass Queue<T> { ... T peek() { if (isEmpty()) { throw new NoSuchElementException(); } return f.peek(); }class Stack<T> { ... Stack<T> reverse() { if (isEmpty() || next.isEmpty()) return this; Stack<T> r = new Stack<T>(); for (Stack<T> s = this; !s.isEmpty(); s = s.pop()) { r = r.push(s.peek()); } return r; } Source Code 3/3
    • Persistent QueueStructural sharing in persistent queue
    • Persistent QueueBeware pathological cases! ● What is forward stack is empty, but back stack is full? ● And we are going to pop from the same queue N times ● Then we get N back back stack reversions! ● Lazy evaluation to the rescue — use lazy streams instead of strict stacks
    • Persistent Queue But there is a better way to design queue!Monoidally Annotated 2-3 Finger Tree is a versatile datastructure that can be used to build efficient lists, deques,priority queues, interval trees, ropes, etc.It is more complex, we will take a look at it later.
    • Persistent Tree● It is trivial to convert any ephemeral tree to a persistent one by means of path copying● It works for binary trees, 2-3 trees, B-trees, etc● The shape of tree is not affected, only mutating algorithms● In a balanced binary tree at most log N nodes need to be copied — quite efficient● The secret to all persistent data structures is that they all are trees! (Yes, lists and hash tables are trees too)
    • Persistent Tree
    • Simple Persistent Binary Treeclass SimpleBinaryTree { static class Node { final K key; (a) final V value; (b) final Node l, r; (c) Node(K key, V value, Node l, Node r) { this.key = key; this.value = value; this.l = l; this.r = r; } } ... Source Code 1/2
    • Simple Persistent Binary Treeclass SimpleBinaryTree { ... static Node insert(Node n, K key, V value) { if (n == null) { return new Node(key, value, null, null); (a) } int cmp = key.compareTo(n.key); (b) if (cmp < 0) { return new Node(n.key, n.value, (c) insert(n.l, key, value), n.r); } if (cmp > 0) { return new Node(n.key, n.value, (d) n.l, insert(n.r, key, value)); } return new Node(key, value, n.l, n.r); (e) } Source Code 2/2
    • Persistent TreeMultiple definitions of persistence: ● Immutable data structure with history ● Committed to a persistent storageAppend only databases and file systems: ● CouchDB uses append only B-Tree ● RethinkDB makes append only variant of MySQL ● ZFS, BTRFS implement copy-on-write transactions and snapshotsNothing is new under the moon!
    • Persistent Mapinterface Map<K, V> { // get value for a key, or null if not found V get(K key); // make key/value association Map<K, V> put(K key, V value); // remove key/value association Map<K, V> remove(K key);} Remember, no in-place updates Mutations create new instances
    • Persistent MapImplementation Strategy ● Persistent red-black tree for ordered keys Time complexity — O(log n) ● Persistent hash table for hashable keys Time complexity — O(1)
    • Persistent Hash TableBut how do we implement it?Copying the whole table would be too expensive!
    • Persistent Hash TableHeres the idea: partition hash table into smallerpieces, organized them as a persistent treeNice idea, but how do we navigate in such a tree?
    • Prefix Tree/TrieSearch is guided by individual letters of a string keyHash code is just a string of digits!
    • Persistent Hash Table in Prefix TreeRepresent 32 bit hash codes as strings of 5 bit symbol:hashCode = CAFEBABE16level 6 5 4 3 2 1 0bits 11 00101 01111 11101 01110 10101 11110symbol 3 5 15 29 14 21 30
    • Persistent Hash Table hashCode = ... xxxxx xxxxx xxxxx xxxxxEach item is either a key/value pair or a subtree
    • Persistent Hash Tableclass PersistentHashMap { abstract class Item<K, V> {} class Node<K, V> extends Item<K, V> { final Item<K, V> children = new Item<K, V>[32]; (a) } class Entry<K, V> extends Item<K, V> { final int hashCode; (b) final K key; (c) final V value; (d) final Entry<K, V> next; (e) } Source Code 1/2
    • Persistent Hash Tableclass PersistentHashMap { V get(K key) { return root.find(key.hashCode(), key, 0); (a) } class Node<K, V> extends Item<K, V> { V find(int hashCode, K key, int level) { int index = (hashCode >>> (level * 5)) & 31; (b) Item<K, V> item = children[index]; (c) if (item instanceof Node) { (d) return ((Node<K, V>) item) (e) .find(hashCode, key, level + 1); } if (item instanceof Entry) { (f) return ((Entry<K, V>) item) (g) .find(hashCode, key); } return null; } Source Code 2/2
    • Persistent Hash TableDo not waste space! class PersistentHashMap { class Node<K, V> { final Item<K, V> children = new Item<K, V>[32]; (a) } ● Most of the children would be null on deeper levels ● The number of arrays grows exponentially as we go deeper ● Need to find a way to compact tree ● Simply get rid of nulls in arrays!
    • Persistent Hash Table class Node<K, V> { final int mask; (a) final Item<K, V> children = new Item<K, V>[bitCount(mask)]; (b) }● Mask is a 32-bit integer whose bits set to 1 only for those array elements that are not null● Array stores only non-null elements. Its size is the number of 1 bits in the mask. Array size varies from 2 to 32 elements.● Overhead for null array element is just one bit. Quite good!
    • Persistent Hash Table● To test that array has element at index i, simply test if ith bit in the mask is 1: if ((mask & (1 << i)) != 0) { ...● To get offset to ith element in the array, count number of 1 bits lower than i in the mask: int offset = bitCount(mask & ((1 << i) - 1)); if (children[offset] instanceof ...
    • Persistent Listinterface Seq<T> { T head(); // get first element Seq<T> tail(); // get list without first element Seq<T> cons(T v); // append element to head Seq<T> snoc(T v); // append element to tail Seq<T> concat(Seq<T> that); // join two lists int size(); // get number of elements T get(int index); // get Nth element Seq<T> set(int index, T v); // set Nth element} Remember, no in-place updates Mutations create new instances
    • Persistent List● There are quite a few ways to implement persistent lists● But we will not be studying them● Instead, we will turn our attention to finger trees● Soon, it will be clear why
    • Finger Trees● An incredibly elegant, simple and efficient data structure● Oh so very versatile, functional programmers Swiss Army knife● Basic data structure for building random acces sequences, deques, priority queues, ropes, interval trees, etc.● Lets define it in stages
    • Persistent leafy 2-3 treesLets begin with a simple data structure — leafy 2-3 tree ● Every intermediate node has either two childrent or three children ● All values are stored in leafs ● Perfectly balanced — all leafs are at the same level
    • Persistent leafy 2-3 trees
    • Persistent leafy 2-3 trees Leafs contain interesting values, but what is stored in nodes?
    • Annotated leafy 2-3 trees● There must be a way to find interesting values in a tree● We need to guide search from the root of a tree to its leafs● Lets add special annotations to nodes● Use these annotations to find values
    • Size annotated leafy 2-3 trees● Each intermediate node is annotated with the size of a subtree rooted at this node● Makes it trivial to find any leaf by its index● Starting from root, test if index is in the range of its left (middle) or right subtree, and repeat recursively for that subtree, until a leaf is found
    • Size annotated leafy 2-3 trees Looks like random access list
    • Priority annotated leafy 2-3 trees● Each intermediate node is annotated with the highest priority of an element in its subtree● Makes it trivial to find value with the highest priority● Starting from root, find subtree with the highest priority descent recursively into it, until a leaf is found
    • Priority annotated leafy 2-3 trees Looks like priority queue
    • Monoids● One interface to unify size, priority (and more!) annotations on trees● A set of values with a "zero" element 0 and a binary associative operation ⊕● Monoid laws: 0⊕a = a a⊕0 = a a⊕(b⊕c) = (a⊕b)⊕c
    • Monoid examples● Strings with empty string and concatenation "" + "a" = "a", "a" + "" = "a" "a" + ("b" + "c") = ("a" + "b") + "c"● Integers with zero and addition 0 + 1 = 1, 1 + 0 = 1 1 + (2 + 3) = (1 + 2) + 3● Integers with one and multiplication 1 * 2 = 2, 2 * 1 = 1 2 * (3 * 4) = (2 * 3) * 4● And many, more of them! (Monoids are everywhere)
    • Monoid interfaceinterface Monoid<T extends Monoid<T>> { T unit(); T combine(T that);}class String implements Monoid<String> { ... String unit() { return ""; (a) } String combine(String that) { return this + that; (b) }}
    • Size monoidclass Size implements Monoid<Size> { final int size; (a) Size(int size) { this.size = size; } Size unit() { return new Size(0); (b) } Size combine(Size that) { return new Size(this.size + that.size); (c) }}
    • Priority monoidclass Priority implements Monoid<Priority> { final int priority; (a) Priority(int priority) { this.priority = priority; } Priority unit() { return new Priority(MAX_INTEGER); (b) } Priority combine(Priority that) { return new Priority( Math.min(this.priority, that.priority)); (c) }}
    • But where do we get monoids from?● Monoids have nice property of composability● We can get more monoids by combining existing ones● But where do we get initial monoids to begin with?● We need a way to measure values!● Those measures must be monoids, obviously interface Measured<M extends Monoid> { M measure(); }
    • Lets make a sketch of annotated tree/** <V> is the type of values <M> is the type of monoidal measures of values */class Tree<M extends Monoid, V extends Measured<M>> implements Measured<M> { (a) abstract class Leaf<M, V> extends Tree<M, V> { final V value; (b) override abstract M measure(); (c) } class Node<M, V> extends Tree<M, V> { final Tree<M, V> left, right; (d) final M m; (e) Node(Tree<M, V> l, Tree<M, V> r) { left = l; right = r; m = l.measure().combine(r.measure()); (f) } override final M measure() { Pseudocode! return m; (g) }
    • Lets make a sketch of annotated tree...class Leaf<V> extends Tree<Size, V> { final V value; override final Size measure() { return new Size(1); (a) }}...class Leaf<V> extends Tree<Priority, V> { final V value; override final Priority measure() { return new Priority(value.priority()); (b) }} Pseudocode!
    • But that is not finger tree yet!
    • Finger Tree... is a just an annotated tree of annotated 2-3 trees!
    • Finger TreeDigits, 2-3 trees, fingers and nested levels
    • Finger TreeA little bit of Haskell would not hurt:data Node v a = Node2 v a a | Node3 v a a adata Digit v a = One v a | Two v a a | Three v a a a | Four v a a a adata FingerTree v a = Empty | Single a | Deep v (Digit a) (a) (FingerTree v (Node v a)) (b) (Digit a) (c)
    • Finger Treeclass FingerTree<M extends Monoid<M>, T extends Measured<M>> implements Measured<M> { class Empty<M extends Monoid<M>, T extends Measured<M>> extends FingerTree<M, T> {} class Single<M extends Monoid<M>, T extends Measured<M>> extends FingerTree<M, T> { final T v; (a) final M m; (b) class Deep<M extends Monoid<M>, T extends Measured<M>> extends FingerTree<M, T> { final Digit<M, T> prefix; (c) final FingerTree<M, Node<M, T>> middle; (d) final Digit<M, T> suffix; (e) final M m; (f) Source Code 1/3
    • Finger Treeclass Digit<M extends Monoid<M>, T extends Measured<M>> implements Measured<M> { final M m; (a) class One<M extends Monoid<M>, T extends Measured<M>> extends Digit<M, T> { final T a; (b) class Two<M extends Monoid<M>, T extends Measured<M>> extends Digit<M, T> { final T a, b; (c) class Three<M extends Monoid<M>, T extends Measured<M>> extends Digit<M, T> { final T a, b, c; (d) class Four<M extends Monoid<M>, T extends Measured<M>> extends Digit<M, T> { final T a, b, c, d; (e) Source Code 2/3
    • Finger Treeclass Node<M extends Monoid<M>, T extends Measured<M>> implements Measured<M> { final M m; (a) class Node2<M extends Monoid<M>, T extends Measured<M>> extends Node<M, T> { final T a, b; (b) class Node3<M extends Monoid<M>, T extends Measured<M>> extends Node<M, T> { final T a, b, c; (c) Source Code 3/3
    • Finger Tree InterfaceBasic operations: ● cons, snoc — append/prepend element ● concat — join two trees ● split — find prefix, element and suffix using predicateBeyond the scope of this presentation, sorry
    • Finger Tree PerformanceAmortized bounds: Finger Tree 2-3 Tree List ● cons, snoc O(1) O(log n) O(1)/O(n) ● head, last O(1) O(log n) O(1)/O(n) ● concat O(log min(ℓ1, ℓ2)) O(log n) O(n) ● split O(log min(n, ℓ-n)) O(log n) O(n) ● index O(log min(n, ℓ-n) O(log n) O(n)
    • Thanks!Questions?