How Efficient Immutable Data Enables Functional
Programming
How Efficient Immutable Data Enables Functional
Programming
or
Okasaki
For
Dummies
3 SEPTEMBER 2015
Who Am I?
4 SEPTEMBER 2015
Tom Faulhaber
➡ Planet OS CTO
➡ Background in
networking, Unix OS,
visualization, video
➡ Currently working mostly
in “Big Data”
➡ Contributor to the
Clojure programming
language
5 SEPTEMBER 2015
Who Are YOU?
6 SEPTEMBER 2015
What is functional programming?
7 SEPTEMBER 2015
8 SEPTEMBER 2015
y = f(x)
Pure Functions:
9 SEPTEMBER 2015
y = f(x)
Pure Functions:
y = f(x)
10 SEPTEMBER 2015
y = f(x)
Pure Functions:
y = f(x)y = f(x)
Not modified
Not shared
11 SEPTEMBER 2015
Higher-order Functions:
map(f, [x1, x2, ..., xn]) !
[f(x1), f(x2), ..., f(xn)]
12 SEPTEMBER 2015
Higher-order Functions:
g = map(f)
Result is a new function
13 SEPTEMBER 2015
Higher-order Functions:
g = map f
14 SEPTEMBER 2015
Other Aspects:
➡Type inference
➡Laziness
15 SEPTEMBER 2015
Functional is the opposite of
Object-oriented
16 SEPTEMBER 2015
State is
managed
through
encapsulation
Object-oriented:
State is avoided
altogether
Functional:
17 SEPTEMBER 2015
Why functional?
18 SEPTEMBER 2015
Why functional?
➡ No shared state makes it easier to reason about
programs
➡ Concurrency problems simply go away (almost!)
➡ Undo and backtracking are trivial
➡ Algorithms are often more elegant
It is better to have 100 functions operate on one data
structure than 10 functions on 10 data structures. -
Alan Perlis
19 SEPTEMBER 2015
Why functional?
A host of new languages support the functional model:
- ML, Haskell, Clojure, Scala, Idris
- All with different degrees of purity
20 SEPTEMBER 2015
There’s a catch!
21 SEPTEMBER 2015
There’s a catch!
f(5)
This is cheap:
22 SEPTEMBER 2015
There’s a catch!
f({"type": "object",
"properties": {
"mesos": {
"description": "Mesos specific configuration properties",
"type": "object",
"properties": {
"master": { … }
… } … } … } … })
But this is expensive:
23 SEPTEMBER 2015
There’s a catch!
f(<my whole database>)
And this is crazy:
24 SEPTEMBER 2015
Persistent Data Structures
to the Rescue
25 SEPTEMBER 2015
Persistent Data Structures
The goal: Approximate the performance of mutable
data structures: CPU and memory.
The big secret: Use structural sharing!
There are lots of little secrets, too. We won’t cover
them today.
26 SEPTEMBER 2015
Persistent Data Structures - History
1990 2000 2010
Persistant
Arrays
(Dietz)
ML Language
(1973)
Catenable
Queues
(Buchsbaum/
Tarjan)
Okasaki
Haskell
Language
Clojure
CollectionsFinger Trees
(1977)
Zipper
(Huet)
Data.Map
in Haskell
Priority
Search
Queues
(Hinze)
Fast And
Space Efficient
Trie Searches
(Bagwell)
Ideal Hash
Trees
(Bagwell)
RRB
Trees
(Bagwell/
Rompf)
27 SEPTEMBER 2015
The quick brown dog jumps over
6
Example: Vector
➡ In Java/C# ArrayList; in C++ std::vector.
➡ A list with constant access and update and amortized
constant append.
The quick brown fox jumps over
6 a[3] =“dog”dog
28 SEPTEMBER 2015
Example: Vector
➡ In Java/C# ArrayList; in C++ std::vector.
➡ A list with constant access and update and amortized
constant append.
The quick brown dog jumps over
6 a.push_back(“the”)
The quick brown dog jumps over
7
the
the
The quick brown dog jumps over
7
the
29 SEPTEMBER 2015
Example: Vector
➡ To build a persistent vector, we start with a tree:
Persistent
^
depth =
dlog ne
Data is in
the leaves
6
The quick brown fox jumps over
30 SEPTEMBER 2015
The quick brown fox jumps over
6
0 1 2 3 4 5
000 001 010 011 100 101
LLL LLR LRL LRR RLL RLR
The quick brown fox jumps over
6
0 1 2 3 4 5
000 001 010 011 100 101
LLL LLR LRL LRR RLL RLR
The quick brown fox jumps over
6
0 1 2 3 4 5
000 001 010 011 100 101
LLL LLR LRL LRR RLL RLR
x = a[3]
The quick brown fox jumps over
6
0 1 2 3 4 5
000 001 010 011 100 101
LLL LLR LRL LRR RLL RLR
The quick brown fox jumps over
6
0 1 2 3 4 5
000 001 010 011 100 101
LLL LLR LRL LRR RLL RLR
31 SEPTEMBER 2015
The quick brown fox jumps over
6 7
The quick brown fox jumps over
6 7
The quick brown fox jumps over
6 7
The quick brown fox jumps over
6
b = a.add(“the”)
7
The quick brown fox jumps over
6
the
32 SEPTEMBER 2015
7
The quick brown fox jumps over the
33 SEPTEMBER 2015
The quick brown fox jumps over
6
34 SEPTEMBER 2015
7
The quick brown fox jumps over
6
the
35 SEPTEMBER 2015
But, wait…
36 SEPTEMBER 2015
But, wait…
O(1) 6= O(log n)
This isn’t what you promised!
37 SEPTEMBER 2015
2
4
6
8
10
0 250 500 750 1000
Number of elements
Treedepth
2
4
6
8
10
0 250 500 750 1000
Number of elements
Treedepth
2
4
6
8
10
0 250 500 750 1000
Number of elements
Treedepth
d = 1
d = dlog2 ne
38 SEPTEMBER 2015
The answer:
Use 32-way trees
39 SEPTEMBER 2015
x = a[7022896]x = a[7022896]
00110 10110 01010 01001 10000
6 22 10 9 16
40 SEPTEMBER 2015
6
apple
22
10
9
16
41 SEPTEMBER 2015
O(1) ' O(log32 n)
42 SEPTEMBER 2015
2
4
6
8
10
0 250 500 750 1000
Number of elements
Treedepth
d = 1
d = dlog2 ne
2
4
6
8
10
0 250 500 750 1000
Number of elements
Treedepth
d = dlog32 ne
43 SEPTEMBER 2015
Example: Tree Walking
➡ The functional equivalent of the visitor pattern
44 SEPTEMBER 2015
Clojure code to implement the walker:
(postwalk
(fn [node]
(if (= :blue (:color node))
(assoc node :color :green)
node))
tree)
Example: Tree Walking
45 SEPTEMBER 2015
Example: Zippers
➡ Allow you to navigate and update a tree across many
operations by “unzipping” it.
46 SEPTEMBER 2015
Takeaways
➡ Functional data structures can approximate the
performance of mutable data structures, but will
usually won’t be quite as fast.
➡ … but not having to do state management often
wins back the difference
➡ We need to choose data structures carefully
depending on how they’re going to be used.
➡ This doesn’t solve shared state, just reduces it. (but
see message passing, software transactional
memory, etc.)
47 SEPTEMBER 2015
References
Chris Okasaki, Purely Functional Data Structures, Doctoral dissertation, Carnegie Mellon University, 1996.
Rich Hickey, “Are We There Yet?” Presentation at the JVM Languages SUmmit, 2009. http://www.infoq.com/
presentations/Are-We-There-Yet-Rich-Hickey
Gerard Huet, "Functional Pearl: The Zipper". Journal of Functional Programming 7 (5): 549–554. doi:10.1017/
s0956796897002864
Jean Niklas L’orange, “Understanding Clojure's Persistent Vectors” Blog post at http://hypirion.com/musings/
understanding-persistent-vector-pt-1.
48 SEPTEMBER 2015
Discussion

Efficient Immutable Data Structures (Okasaki for Dummies)

  • 1.
    How Efficient ImmutableData Enables Functional Programming
  • 2.
    How Efficient ImmutableData Enables Functional Programming or Okasaki For Dummies
  • 3.
  • 4.
    4 SEPTEMBER 2015 TomFaulhaber ➡ Planet OS CTO ➡ Background in networking, Unix OS, visualization, video ➡ Currently working mostly in “Big Data” ➡ Contributor to the Clojure programming language
  • 5.
  • 6.
    6 SEPTEMBER 2015 Whatis functional programming?
  • 7.
  • 8.
    8 SEPTEMBER 2015 y= f(x) Pure Functions:
  • 9.
    9 SEPTEMBER 2015 y= f(x) Pure Functions: y = f(x)
  • 10.
    10 SEPTEMBER 2015 y= f(x) Pure Functions: y = f(x)y = f(x) Not modified Not shared
  • 11.
    11 SEPTEMBER 2015 Higher-orderFunctions: map(f, [x1, x2, ..., xn]) ! [f(x1), f(x2), ..., f(xn)]
  • 12.
    12 SEPTEMBER 2015 Higher-orderFunctions: g = map(f) Result is a new function
  • 13.
    13 SEPTEMBER 2015 Higher-orderFunctions: g = map f
  • 14.
    14 SEPTEMBER 2015 OtherAspects: ➡Type inference ➡Laziness
  • 15.
    15 SEPTEMBER 2015 Functionalis the opposite of Object-oriented
  • 16.
    16 SEPTEMBER 2015 Stateis managed through encapsulation Object-oriented: State is avoided altogether Functional:
  • 17.
  • 18.
    18 SEPTEMBER 2015 Whyfunctional? ➡ No shared state makes it easier to reason about programs ➡ Concurrency problems simply go away (almost!) ➡ Undo and backtracking are trivial ➡ Algorithms are often more elegant It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures. - Alan Perlis
  • 19.
    19 SEPTEMBER 2015 Whyfunctional? A host of new languages support the functional model: - ML, Haskell, Clojure, Scala, Idris - All with different degrees of purity
  • 20.
  • 21.
    21 SEPTEMBER 2015 There’sa catch! f(5) This is cheap:
  • 22.
    22 SEPTEMBER 2015 There’sa catch! f({"type": "object", "properties": { "mesos": { "description": "Mesos specific configuration properties", "type": "object", "properties": { "master": { … } … } … } … } … }) But this is expensive:
  • 23.
    23 SEPTEMBER 2015 There’sa catch! f(<my whole database>) And this is crazy:
  • 24.
    24 SEPTEMBER 2015 PersistentData Structures to the Rescue
  • 25.
    25 SEPTEMBER 2015 PersistentData Structures The goal: Approximate the performance of mutable data structures: CPU and memory. The big secret: Use structural sharing! There are lots of little secrets, too. We won’t cover them today.
  • 26.
    26 SEPTEMBER 2015 PersistentData Structures - History 1990 2000 2010 Persistant Arrays (Dietz) ML Language (1973) Catenable Queues (Buchsbaum/ Tarjan) Okasaki Haskell Language Clojure CollectionsFinger Trees (1977) Zipper (Huet) Data.Map in Haskell Priority Search Queues (Hinze) Fast And Space Efficient Trie Searches (Bagwell) Ideal Hash Trees (Bagwell) RRB Trees (Bagwell/ Rompf)
  • 27.
    27 SEPTEMBER 2015 Thequick brown dog jumps over 6 Example: Vector ➡ In Java/C# ArrayList; in C++ std::vector. ➡ A list with constant access and update and amortized constant append. The quick brown fox jumps over 6 a[3] =“dog”dog
  • 28.
    28 SEPTEMBER 2015 Example:Vector ➡ In Java/C# ArrayList; in C++ std::vector. ➡ A list with constant access and update and amortized constant append. The quick brown dog jumps over 6 a.push_back(“the”) The quick brown dog jumps over 7 the the The quick brown dog jumps over 7 the
  • 29.
    29 SEPTEMBER 2015 Example:Vector ➡ To build a persistent vector, we start with a tree: Persistent ^ depth = dlog ne Data is in the leaves 6 The quick brown fox jumps over
  • 30.
    30 SEPTEMBER 2015 Thequick brown fox jumps over 6 0 1 2 3 4 5 000 001 010 011 100 101 LLL LLR LRL LRR RLL RLR The quick brown fox jumps over 6 0 1 2 3 4 5 000 001 010 011 100 101 LLL LLR LRL LRR RLL RLR The quick brown fox jumps over 6 0 1 2 3 4 5 000 001 010 011 100 101 LLL LLR LRL LRR RLL RLR x = a[3] The quick brown fox jumps over 6 0 1 2 3 4 5 000 001 010 011 100 101 LLL LLR LRL LRR RLL RLR The quick brown fox jumps over 6 0 1 2 3 4 5 000 001 010 011 100 101 LLL LLR LRL LRR RLL RLR
  • 31.
    31 SEPTEMBER 2015 Thequick brown fox jumps over 6 7 The quick brown fox jumps over 6 7 The quick brown fox jumps over 6 7 The quick brown fox jumps over 6 b = a.add(“the”) 7 The quick brown fox jumps over 6 the
  • 32.
    32 SEPTEMBER 2015 7 Thequick brown fox jumps over the
  • 33.
    33 SEPTEMBER 2015 Thequick brown fox jumps over 6
  • 34.
    34 SEPTEMBER 2015 7 Thequick brown fox jumps over 6 the
  • 35.
  • 36.
    36 SEPTEMBER 2015 But,wait… O(1) 6= O(log n) This isn’t what you promised!
  • 37.
    37 SEPTEMBER 2015 2 4 6 8 10 0250 500 750 1000 Number of elements Treedepth 2 4 6 8 10 0 250 500 750 1000 Number of elements Treedepth 2 4 6 8 10 0 250 500 750 1000 Number of elements Treedepth d = 1 d = dlog2 ne
  • 38.
    38 SEPTEMBER 2015 Theanswer: Use 32-way trees
  • 39.
    39 SEPTEMBER 2015 x= a[7022896]x = a[7022896] 00110 10110 01010 01001 10000 6 22 10 9 16
  • 40.
  • 41.
  • 42.
    42 SEPTEMBER 2015 2 4 6 8 10 0250 500 750 1000 Number of elements Treedepth d = 1 d = dlog2 ne 2 4 6 8 10 0 250 500 750 1000 Number of elements Treedepth d = dlog32 ne
  • 43.
    43 SEPTEMBER 2015 Example:Tree Walking ➡ The functional equivalent of the visitor pattern
  • 44.
    44 SEPTEMBER 2015 Clojurecode to implement the walker: (postwalk (fn [node] (if (= :blue (:color node)) (assoc node :color :green) node)) tree) Example: Tree Walking
  • 45.
    45 SEPTEMBER 2015 Example:Zippers ➡ Allow you to navigate and update a tree across many operations by “unzipping” it.
  • 46.
    46 SEPTEMBER 2015 Takeaways ➡Functional data structures can approximate the performance of mutable data structures, but will usually won’t be quite as fast. ➡ … but not having to do state management often wins back the difference ➡ We need to choose data structures carefully depending on how they’re going to be used. ➡ This doesn’t solve shared state, just reduces it. (but see message passing, software transactional memory, etc.)
  • 47.
    47 SEPTEMBER 2015 References ChrisOkasaki, Purely Functional Data Structures, Doctoral dissertation, Carnegie Mellon University, 1996. Rich Hickey, “Are We There Yet?” Presentation at the JVM Languages SUmmit, 2009. http://www.infoq.com/ presentations/Are-We-There-Yet-Rich-Hickey Gerard Huet, "Functional Pearl: The Zipper". Journal of Functional Programming 7 (5): 549–554. doi:10.1017/ s0956796897002864 Jean Niklas L’orange, “Understanding Clojure's Persistent Vectors” Blog post at http://hypirion.com/musings/ understanding-persistent-vector-pt-1.
  • 48.