- 1. Working with Functional Data StructuresPractical F# Application Jack Fox @foxyjackfox Jackfoxy.com acster.com jackfoxy.com @foxyjackfox 2/5/2013 1
- 2. Bibliographyhttp://jackfoxy.com/fsharp-user-group-working-with-functional-data-structures-bibliography acster.com jackfoxy.com @foxyjackfox 2/5/2013 2
- 3. tl;dr Singly-linked list -- the fundamental purely functional data structure Time complexity overview Garbage collection and real-world performance Reasons to use Purely Functional Data Structures When not to use Purely Functional Data Structures Choices and shapes Build your own Purely Functional Data Structure acster.com jackfoxy.com @foxyjackfox 2/5/2013 3
- 4. What is purely functional? Immutable Persistent Thread safe Recursive Incremental acster.com jackfoxy.com @foxyjackfox 2/5/2013 4
- 5. Theoretical Performance O(1) O(log * n) practically O(1) O(log log n) O(log n) O(n) linear time O(n2) gets real bad from here on out … acster.com jackfoxy.com @foxyjackfox 2/5/2013 5
- 6. Theoretical Performance (most common) O(1) O(log * n) practically O(1) O(log log n) O(log n) O(n) linear time O(n2) gets real bad from here on out O(i) variables other than n require explanation acster.com jackfoxy.com @foxyjackfox 2/5/2013 6
- 7. Actual Performance Processor architecture (instruction look-ahead, cache, etc.) .NET Garbage Collection O(n) behavior starts for “large enough size”Recursive Benchmarks over different Structure Sizes 102 103 often looks like << O(n) 104 105 usually settles down to O(n), sometimes looks like > O(n) 106 acster.com jackfoxy.com @foxyjackfox 2/5/2013 7
- 8. List as a recursive structure Adding Element Empty List 4 :: 3 2 1 []Head Tail acster.com jackfoxy.com @foxyjackfox 2/5/2013 8
- 9. So what the heck would you do with a list? Demo 1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 9
- 10. “Getting” the recursive thing SICP a.k.a Abelson & Sussman a.k.a The Wizard Book acster.com jackfoxy.com @foxyjackfox 2/5/2013 10
- 11. Why no update or remove in List ?Graphics: unattributed, all over the internet acster.com jackfoxy.com @foxyjackfox 2/5/2013 11
- 12. Okasaki’s Pseudo-Canonical List Update1. let rec loop i updateElem (l:list<a>) =2. match (i, l) with3. | i, [] -> raise (System.Exception("subscript"))4. | 0, x::xs -> updateElem::xs5. | i, x::xs -> x::(loop (i - 1) y xs) found it! 4 :: 3 :: 2 :: 1 [] acster.com jackfoxy.com @foxyjackfox 2/5/2013 12
- 13. Okasaki’s Pseudo-Canonical List Update1. let rec loop i updateElem (l:list<a>) =2. match (i, l) with3. | i, [] -> raise (System.Exception("subscript"))4. | 0, x::xs -> updateElem::xs5. | i, x::xs -> x::(loop (i - 1) y xs) Do you see a problem? acster.com jackfoxy.com @foxyjackfox 2/5/2013 13
- 14. We could just punt1. let punt i updateElem (l:list<a>) =2. let a = List.toArray l3. a.[i] <- updateElem4. List.ofArray a acster.com jackfoxy.com @foxyjackfox 2/5/2013 14
- 15. …or try a Hybrid approach1. let hybrid i updateElem (l:list<a>) =2. if (i = 0) then List.Cons (y, (List.tail l))3. else4. let rec loop i (front:a array) back =5. match i with6. | x when x < 0 -> front, (List.tail back)7. | x ->8. Array.set front x (List.head back)9. loop (x-1) front (List.tail back)10. let front, back = loop (i - 1) (Array.create i y) l11. let rec loop2 i frontLen (front’:a array) back’ =12. match i with13. | x when x > frontLen -> back’14. | x -> loop2 (x + 1) frontLen front’ (front’.[x]::back’)15. loop2 0 ((Seq.length front) - 1) front (updateElem ::back) acster.com jackfoxy.com @foxyjackfox 2/5/2013 15
- 16. Time complexity of update options Pseudo-Canonical O(i) Punt O(n) Hybrid O(i) Place your bets !Graphics: unattributed, all over the internet acster.com jackfoxy.com @foxyjackfox 2/5/2013 16
- 17. Actual Performance 10k Random Updates One-time Worst Case 102 PC - 2.9ms Punt - 0.2ms Hybrid 1.4X 4.0 PC 1.1X 0.2 Punt 1.5 4.5 Hybrid 4.1 0.8 PC looks perfect !Graphics: http://www.freebievectors.com/es/material-de-antemano/51738/material-vector-dinamico-estilo-comic-femenino/ acster.com jackfoxy.com @foxyjackfox 2/5/2013 17
- 18. Actual Performance 10k Random Updates One-time Worst Case 102 PC - 2.9ms Punt - 0.2ms Hybrid 1.4X 4.0 PC 1.1X 0.2 Punt 1.5 4.5 Hybrid 4.1 0.8 103 Hybrid - 29.6 Punt - 0.2 Punt 1.6 47.6 PC 1.1 0.2 PC 1.7 50.3 Hybrid 4.1 0.8 104 Hybrid - 320.3 Punt - 0.3 Punt 1.7 534.9 PC 1.3 0.4 PC 2.9 920.2 Hybrid 3.2 0.9 105 Hybrid - 4.67sec Punt - 1.0 Punt 2.0 9.34 Hybrid 1.5 1.5 PC stack overflow ! acster.com jackfoxy.com @foxyjackfox 2/5/2013 18
- 19. Benchmarking performance Hard to reason about actual performance DS_Benchmark ◦ Open source on Github ◦ Discards outliers ◦ Fully isolates code to benchmark ◦ Fully documented ◦ “how to extend” documented acster.com jackfoxy.com @foxyjackfox 2/5/2013 19
- 20. Shapes: let your imagination run wild! Graphics: Larry D. Moore Attribution-Share Alike 3.0 Unported license. http://commons.wikimedia.org/wiki/File:Playdoh.jpg acster.com jackfoxy.com @foxyjackfox 2/5/2013 20
- 21. Binary Random Access List Same Cons, Head, Tail signature Optimized for Lookup and Update O(log n) …but not for Remove Why Not? Does it with alternate internal structures acster.com jackfoxy.com @foxyjackfox 2/5/2013 21
- 22. Queue (FIFO) Adding Element 1 :: 2 3 4 ;; 5Head Tail [] Empty Queue acster.com jackfoxy.com @foxyjackfox 2/5/2013 22
- 23. Deque (double-ended queue) Adding Element Init Last 1 :: 2 3 4 ;; 5Head Tail [] Empty Deque acster.com jackfoxy.com @foxyjackfox 2/5/2013 23
- 24. Deque and removeApproximately O(i/2)(where i is index to element) acster.com jackfoxy.com @foxyjackfox 2/5/2013 24
- 25. Heap 1 HeadInsert Element :: Tail[]Empty Heap Merge Heaps * names in signature altered from Okasaki’s implementation Graphics: http://www.turbosquid.com/3d-models/heap-gravel-max/668104 acster.com jackfoxy.com @foxyjackfox 2/5/2013 25
- 26. Heap and removeO(1) (if implemented)…but implementation raises issues Deleting before inserting Order of events could nullify deletion before insertion Equal values? acster.com jackfoxy.com @foxyjackfox 2/5/2013 26
- 27. Canonical Functional Linear Structures Order by construction ascending descending random Grow Shrink Peek acster.com jackfoxy.com @foxyjackfox 2/5/2013 27
- 28. Fsharpx.Collections RandomAccessList = List + iLookup + iUpdate DList = List + conj + append Deque = List + conj + last + initial + rev = initial U tail LazyList = List Lazy Heap = List + sorted + append Queue = List - cons + conj Vector = List - cons - head - tail + conj + last + initial + iLookup + iUpdate = RandomAccessList -1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 28
- 29. Summary of time complexity performance Vector & Binary Random Access List 1 O( ) cons-conj / head-last / tail-init O(log32n) lookup / update Dlist O( ) 1 cons / conj / head / append O(log n) tail Deque O(log n) merge / tail 1 O( ) cons / head / tail / conj / last / init O(1) reverse O(i/ 2) lookup / update (generally) Heap 1 O( ) insert / head O(log n) merge / tail Queue 1 O( ) conj / head / tail (generally) acster.com jackfoxy.com @foxyjackfox 2/5/2013 29
- 30. Measured performance (grow by one) 2 3 4 5 6 10 10 10 10 10ms.f#.array 0.8 1.8 100.9 11,771.4 n/ams.f#.array — list 0.3 1 69.5 n/a n/ams.f#.list 0.4 0.4 0.4 1.0 13.8ms.f#.list — list 0.7 0.7 0.9 2.3 45.3Deque — conj 0.3 0.3 0.5 4.7 *Deque — cons 0.3 0.3 0.5 4.7 *Dlist — conj 0.7 0.7 1.0 7.7 153.0Dlist — cons 0.7 0.7 1.0 6.4 118.4Heap 3.2 3.3 5.0 22.5 254.7LazyList 0.9 0.9 1.0 2.6 108.3Queue 1.0 1.1 1.4 7.6 106.6RandomAccessList 0.8 0.9 3.3 19.6 189.8Vector 0.8 0.9 3.3 19.7 189.1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 30
- 31. Treesacster.com jackfoxy.com @foxyjackfox 2/5/2013 31
- 32. Trees Wide variety of applications Binary (balanced or unbalanced) Multiway (a.k.a. RoseTree)acster.com jackfoxy.com @foxyjackfox 2/5/2013 32
- 33. Red Black Tree Balancing d a b c a d a b c d b c c d a b a d b c Source: https://wiki.rice.edu/confluence/download/attachments/2761212/Okasaki-Red-Black.pdf acster.com jackfoxy.com @foxyjackfox 2/5/2013 33
- 34. Talk about reducing complexity!1. type a t = Node of color * a * a t * a t | Leaf2. let balance = function3. | Black, z, Node (Red, y, Node (Red, x, a, b), c), d4. | Black, z, Node (Red, x, a, Node (Red, y, b, c)), d5. | Black, x, a, Node (Red, z, Node (Red, y, b, c), d)6. | Black, x, a, Node (Red, y, b, Node (Red, z, c, d)) ->7. Node (Red, y, Node (Black, x, a, b), Node (Black, z, c, d))8. | x -> Node xSource: http://fsharpnews.blogspot.com/2010/07/f-vs-mathematica-red-black-trees.html acster.com jackfoxy.com @foxyjackfox 2/5/2013 34
- 35. Extra CreditWrite the Remove operation for aRed Black TreeHere’s how: http://en.wikipedia.org/wiki/Red-black_tree#Removal acster.com jackfoxy.com @foxyjackfox 2/5/2013 35
- 36. Fsharpx.Collections.Experimental IntMap (Map-like structure) BKTree RoseTree (lazy multiway) EagerRoseTree IndexedRoseTreeMS.F#.Collections Map Set acster.com jackfoxy.com @foxyjackfox 2/5/2013 36
- 37. To Do:Benchmark:RoseTree (lazy)EagerRoseTree (not yet implemented)IndexedRoseTreeMultiway as unbalanced binary tree (polymorphic recursion) acster.com jackfoxy.com @foxyjackfox 2/5/2013 37
- 38. Another To Do:The (not-so-) Naïve Binary Tree:As seen all over the internet… acster.com jackfoxy.com @foxyjackfox 2/5/2013 38
- 39. Another To Do:The (not-so-) Naïve Binary Tree:As seen all over the internet……yet often missing: Pre-order Post-order In-orderfold traversals (better be tail-recursive).And maybe a zipper navigator while you are at it! acster.com jackfoxy.com @foxyjackfox 2/5/2013 39
- 40. Call for Action!Fsharpx.Collections.Experimental GitHub fork FSharpx Implement some interesting structure and tests Sync back to your fork Pull requestOut of ideas or just want to practice? Unimplemented Okasaki structures: http://github.com/jackfoxy/DS_Benchmark/tree/ master/PurelyFunctionalDataStructures acster.com jackfoxy.com @foxyjackfox 2/5/2013 40
- 41. When not to use purely functional Consider Array if performance is critical Functional dictionary–like structures (Map) may not perform well-enough, especially after scale 104 Consider .NET dictionary–like object acster.com jackfoxy.com @foxyjackfox 2/5/2013 41
- 42. Publishing your functional DSFSharpx.Collections.readme.md Include Try value returning option for values that can throw Exception Include other common values if < O(n) Reason about edge cases (more unit tests better than not enough) acster.com jackfoxy.com @foxyjackfox 2/5/2013 42
- 43. Build your own structure Leverage Heap as internal structure to create RandomStack Demo 3 acster.com jackfoxy.com @foxyjackfox 2/5/2013 43
- 44. Closing ThoughtThe functional data structures further from the “mainstream” (if such a measure were possible) tend to have less inherit value in their generic form.Therefore the ultimate functional data structures collection would combine the characteristics of a library, a snippet collection, a benchmarking tool, superb documentation, test cases, and EXAMPLES! acster.com jackfoxy.com @foxyjackfox 2/5/2013 44
- 45. Resources FSPowerPack.Core.Community (NuGet) FSharpx.Core (GitHub & NuGet) FSharpx.Collections.Experimental (GitHub & NuGet) DS_Benchmark (GitHub) raw code for structures not yet merged to FSharpx acster.com jackfoxy.com @foxyjackfox 2/5/2013 45

