Advertisement
Advertisement

More Related Content

Advertisement

Working with functional data structures

  1. Working with Functional Data Structures Practical F# Application Jack Fox @foxyjackfox Jackfoxy.com acster.com jackfoxy.com @foxyjackfox 2/5/2013 1
  2. Bibliography http://jackfoxy.com/fsharp-user-group-working-with-functional-data-structures-bibliography acster.com jackfoxy.com @foxyjackfox 2/5/2013 2
  3. tl;dr  Singly-linked list -- the fundamental purely functional data structure  Time complexity overview  Garbage collection and real-world performance  Reasons to use Purely Functional Data Structures  When not to use Purely Functional Data Structures  Choices and shapes  Build your own Purely Functional Data Structure acster.com jackfoxy.com @foxyjackfox 2/5/2013 3
  4. What is purely functional?  Immutable  Persistent  Thread safe  Recursive  Incremental acster.com jackfoxy.com @foxyjackfox 2/5/2013 4
  5. Theoretical Performance  O(1)  O(log * n) practically O(1)  O(log log n)  O(log n)  O(n) linear time  O(n2) gets real bad from here on out  … acster.com jackfoxy.com @foxyjackfox 2/5/2013 5
  6. Theoretical Performance (most common)  O(1)  O(log * n) practically O(1)  O(log log n)  O(log n)  O(n) linear time  O(n2) gets real bad from here on out  O(i) variables other than n require explanation acster.com jackfoxy.com @foxyjackfox 2/5/2013 6
  7. Actual Performance  Processor architecture (instruction look-ahead, cache, etc.)  .NET Garbage Collection  O(n) behavior starts for “large enough size” Recursive Benchmarks over different Structure Sizes 102 103 often looks like << O(n) 104 105 usually settles down to O(n), sometimes looks like > O(n) 106 acster.com jackfoxy.com @foxyjackfox 2/5/2013 7
  8. List as a recursive structure Adding Element Empty List 4 :: 3 2 1 [] Head Tail acster.com jackfoxy.com @foxyjackfox 2/5/2013 8
  9. So what the heck would you do with a list? Demo 1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 9
  10. “Getting” the recursive thing  SICP a.k.a  Abelson & Sussman a.k.a  The Wizard Book acster.com jackfoxy.com @foxyjackfox 2/5/2013 10
  11. Why no update or remove in List ? Graphics: unattributed, all over the internet acster.com jackfoxy.com @foxyjackfox 2/5/2013 11
  12. Okasaki’s Pseudo-Canonical List Update 1. let rec loop i updateElem (l:list<'a>) = 2. match (i, l) with 3. | i', [] -> raise (System.Exception("subscript")) 4. | 0, x::xs -> updateElem::xs 5. | i', x::xs -> x::(loop (i' - 1) y xs) found it! 4 :: 3 :: 2 :: 1 [] acster.com jackfoxy.com @foxyjackfox 2/5/2013 12
  13. Okasaki’s Pseudo-Canonical List Update 1. let rec loop i updateElem (l:list<'a>) = 2. match (i, l) with 3. | i', [] -> raise (System.Exception("subscript")) 4. | 0, x::xs -> updateElem::xs 5. | i', x::xs -> x::(loop (i' - 1) y xs)  Do you see a problem? acster.com jackfoxy.com @foxyjackfox 2/5/2013 13
  14. We could just punt 1. let punt i updateElem (l:list<'a>) = 2. let a = List.toArray l 3. a.[i] <- updateElem 4. List.ofArray a acster.com jackfoxy.com @foxyjackfox 2/5/2013 14
  15. …or try a Hybrid approach 1. let hybrid i updateElem (l:list<'a>) = 2. if (i = 0) then List.Cons (y, (List.tail l)) 3. else 4. let rec loop i' (front:'a array) back = 5. match i' with 6. | x when x < 0 -> front, (List.tail back) 7. | x -> 8. Array.set front x (List.head back) 9. loop (x-1) front (List.tail back) 10. let front, back = loop (i - 1) (Array.create i y) l 11. let rec loop2 i' frontLen (front’:'a array) back’ = 12. match i' with 13. | x when x > frontLen -> back’ 14. | x -> loop2 (x + 1) frontLen front’ (front’.[x]::back’) 15. loop2 0 ((Seq.length front) - 1) front (updateElem ::back) acster.com jackfoxy.com @foxyjackfox 2/5/2013 15
  16. Time complexity of update options  Pseudo-Canonical O(i)  Punt O(n)  Hybrid O(i) Place your bets ! Graphics: unattributed, all over the internet acster.com jackfoxy.com @foxyjackfox 2/5/2013 16
  17. Actual Performance 10k Random Updates One-time Worst Case  102 PC - 2.9ms Punt - 0.2ms Hybrid 1.4X 4.0 PC 1.1X 0.2 Punt 1.5 4.5 Hybrid 4.1 0.8 PC looks perfect ! Graphics: http://www.freebievectors.com/es/material-de-antemano/51738/material-vector-dinamico-estilo-comic-femenino/ acster.com jackfoxy.com @foxyjackfox 2/5/2013 17
  18. Actual Performance 10k Random Updates One-time Worst Case  102 PC - 2.9ms Punt - 0.2ms Hybrid 1.4X 4.0 PC 1.1X 0.2 Punt 1.5 4.5 Hybrid 4.1 0.8  103 Hybrid - 29.6 Punt - 0.2 Punt 1.6 47.6 PC 1.1 0.2 PC 1.7 50.3 Hybrid 4.1 0.8  104 Hybrid - 320.3 Punt - 0.3 Punt 1.7 534.9 PC 1.3 0.4 PC 2.9 920.2 Hybrid 3.2 0.9  105 Hybrid - 4.67sec Punt - 1.0 Punt 2.0 9.34 Hybrid 1.5 1.5 PC stack overflow ! acster.com jackfoxy.com @foxyjackfox 2/5/2013 18
  19. Benchmarking performance  Hard to reason about actual performance  DS_Benchmark ◦ Open source on Github ◦ Discards outliers ◦ Fully isolates code to benchmark ◦ Fully documented ◦ “how to extend” documented acster.com jackfoxy.com @foxyjackfox 2/5/2013 19
  20. Shapes: let your imagination run wild! Graphics: Larry D. Moore Attribution-Share Alike 3.0 Unported license. http://commons.wikimedia.org/wiki/File:Playdoh.jpg acster.com jackfoxy.com @foxyjackfox 2/5/2013 20
  21. Binary Random Access List  Same Cons, Head, Tail signature  Optimized for Lookup and Update O(log n)  …but not for Remove Why Not?  Does it with alternate internal structures acster.com jackfoxy.com @foxyjackfox 2/5/2013 21
  22. Queue (FIFO) Adding Element 1 :: 2 3 4 ;; 5 Head Tail [] Empty Queue acster.com jackfoxy.com @foxyjackfox 2/5/2013 22
  23. Deque (double-ended queue) Adding Element Init Last 1 :: 2 3 4 ;; 5 Head Tail [] Empty Deque acster.com jackfoxy.com @foxyjackfox 2/5/2013 23
  24. Deque and remove Approximately O(i/2) (where i is index to element) acster.com jackfoxy.com @foxyjackfox 2/5/2013 24
  25. Heap 1 Head Insert Element :: Tail [] Empty Heap Merge Heaps * names in signature altered from Okasaki’s implementation Graphics: http://www.turbosquid.com/3d-models/heap-gravel-max/668104 acster.com jackfoxy.com @foxyjackfox 2/5/2013 25
  26. Heap and remove O(1) (if implemented) …but implementation raises issues  Deleting before inserting  Order of events could nullify deletion before insertion  Equal values? acster.com jackfoxy.com @foxyjackfox 2/5/2013 26
  27. Canonical Functional Linear Structures  Order  by construction  ascending  descending  random  Grow  Shrink  Peek acster.com jackfoxy.com @foxyjackfox 2/5/2013 27
  28. Fsharpx.Collections  RandomAccessList = List + iLookup + iUpdate  DList = List + conj + append  Deque = List + conj + last + initial + rev = initial U tail  LazyList = List Lazy  Heap = List + sorted + append  Queue = List - cons + conj  Vector = List - cons - head - tail + conj + last + initial + iLookup + iUpdate = RandomAccessList -1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 28
  29. Summary of time complexity performance  Vector & Binary Random Access List 1 O( ) cons-conj / head-last / tail-init O(log32n) lookup / update  Dlist O( ) 1 cons / conj / head / append O(log n) tail  Deque O(log n) merge / tail 1 O( ) cons / head / tail / conj / last / init O(1) reverse O(i/ 2) lookup / update (generally)  Heap 1 O( ) insert / head O(log n) merge / tail  Queue 1 O( ) conj / head / tail (generally) acster.com jackfoxy.com @foxyjackfox 2/5/2013 29
  30. Measured performance (grow by one) 2 3 4 5 6 10 10 10 10 10 ms.f#.array 0.8 1.8 100.9 11,771.4 n/a ms.f#.array — list 0.3 1 69.5 n/a n/a ms.f#.list 0.4 0.4 0.4 1.0 13.8 ms.f#.list — list 0.7 0.7 0.9 2.3 45.3 Deque — conj 0.3 0.3 0.5 4.7 * Deque — cons 0.3 0.3 0.5 4.7 * Dlist — conj 0.7 0.7 1.0 7.7 153.0 Dlist — cons 0.7 0.7 1.0 6.4 118.4 Heap 3.2 3.3 5.0 22.5 254.7 LazyList 0.9 0.9 1.0 2.6 108.3 Queue 1.0 1.1 1.4 7.6 106.6 RandomAccessList 0.8 0.9 3.3 19.6 189.8 Vector 0.8 0.9 3.3 19.7 189.1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 30
  31. Trees acster.com jackfoxy.com @foxyjackfox 2/5/2013 31
  32. Trees  Wide variety of applications  Binary (balanced or unbalanced)  Multiway (a.k.a. RoseTree) acster.com jackfoxy.com @foxyjackfox 2/5/2013 32
  33. Red Black Tree Balancing d a b c a d a b c d b c c d a b a d b c Source: https://wiki.rice.edu/confluence/download/attachments/2761212/Okasaki-Red-Black.pdf acster.com jackfoxy.com @foxyjackfox 2/5/2013 33
  34. Talk about reducing complexity! 1. type 'a t = Node of color * 'a * 'a t * 'a t | Leaf 2. let balance = function 3. | Black, z, Node (Red, y, Node (Red, x, a, b), c), d 4. | Black, z, Node (Red, x, a, Node (Red, y, b, c)), d 5. | Black, x, a, Node (Red, z, Node (Red, y, b, c), d) 6. | Black, x, a, Node (Red, y, b, Node (Red, z, c, d)) -> 7. Node (Red, y, Node (Black, x, a, b), Node (Black, z, c, d)) 8. | x -> Node x Source: http://fsharpnews.blogspot.com/2010/07/f-vs-mathematica-red-black-trees.html acster.com jackfoxy.com @foxyjackfox 2/5/2013 34
  35. Extra Credit Write the Remove operation for a Red Black Tree Here’s how: http://en.wikipedia.org/wiki/Red-black_tree#Removal acster.com jackfoxy.com @foxyjackfox 2/5/2013 35
  36. Fsharpx.Collections.Experimental  IntMap (Map-like structure)  BKTree  RoseTree (lazy multiway)  EagerRoseTree IndexedRoseTree MS.F#.Collections  Map  Set acster.com jackfoxy.com @foxyjackfox 2/5/2013 36
  37. To Do: Benchmark: RoseTree (lazy) EagerRoseTree (not yet implemented) IndexedRoseTree Multiway as unbalanced binary tree (polymorphic recursion) acster.com jackfoxy.com @foxyjackfox 2/5/2013 37
  38. Another To Do: The (not-so-) Naïve Binary Tree: As seen all over the internet… acster.com jackfoxy.com @foxyjackfox 2/5/2013 38
  39. Another To Do: The (not-so-) Naïve Binary Tree: As seen all over the internet… …yet often missing: Pre-order Post-order In-order fold traversals (better be tail-recursive). And maybe a zipper navigator while you are at it! acster.com jackfoxy.com @foxyjackfox 2/5/2013 39
  40. Call for Action! Fsharpx.Collections.Experimental  GitHub fork FSharpx  Implement some interesting structure and tests  Sync back to your fork  Pull request Out of ideas or just want to practice? Unimplemented Okasaki structures: http://github.com/jackfoxy/DS_Benchmark/tree/ master/PurelyFunctionalDataStructures acster.com jackfoxy.com @foxyjackfox 2/5/2013 40
  41. When not to use purely functional  Consider Array if performance is critical  Functional dictionary–like structures (Map) may not perform well-enough, especially after scale 104  Consider .NET dictionary–like object acster.com jackfoxy.com @foxyjackfox 2/5/2013 41
  42. Publishing your functional DS FSharpx.Collections.readme.md  Include Try value returning option for values that can throw Exception  Include other common values if < O(n)  Reason about edge cases (more unit tests better than not enough) acster.com jackfoxy.com @foxyjackfox 2/5/2013 42
  43. Build your own structure  Leverage Heap as internal structure to create RandomStack Demo 3 acster.com jackfoxy.com @foxyjackfox 2/5/2013 43
  44. Closing Thought The functional data structures further from the “mainstream” (if such a measure were possible) tend to have less inherit value in their generic form. Therefore the ultimate functional data structures collection would combine the characteristics of a library, a snippet collection, a benchmarking tool, superb documentation, test cases, and EXAMPLES! acster.com jackfoxy.com @foxyjackfox 2/5/2013 44
  45. Resources  FSPowerPack.Core.Community (NuGet)  FSharpx.Core (GitHub & NuGet)  FSharpx.Collections.Experimental (GitHub & NuGet)  DS_Benchmark (GitHub) raw code for structures not yet merged to FSharpx acster.com jackfoxy.com @foxyjackfox 2/5/2013 45

Editor's Notes

  1. The big ideas in the presentation
  2. Immutable is only requirement for “definition” of purely functionalPersistence is a side effect of immutable Immutable and persistence allows for thread safetyRecursive just happens to be implementation of nearly all purely functional data structuresIncremental is an aspect of recursion, and enables efficient GC, structures never require .NET large object heap
  3. Time complexity relates how the time component of a process scales
  4. You usually only have to reason about a few time complexity cases
  5. Processor architecture and GC can affect time complexity analysisEspecially on repeated operations resulting in new structure object
  6. Singly-linked list, arguably the most pervasive functional data structure(setting aside stream/IEnumerable for the moment)Tail is itself a listSo is empty list
  7. Summary: recursing through a list with active pattern to format the dataDoing the same with a LazyList takes more time and more Garbage CollectionBut if active pattern cuts short recursion before covering the whole list LazyList actually saves resources (time)(especially useful if calculation or other resources involved)
  8. Read first few chapters to see what singly linked lists are all about
  9. Or in practically any purely functional data structure, for that matter.
  10. This is how we would like to write List update. Remove is the same algorithm, but losing the target element without replacingRecursing like this is akin to operating on a Russian doll
  11. Not tail recursive. (see later slides)
  12. Array is not a functional structureHowever it ends up hidden from the rest of the user code, thus preserving structure immutabilityCould end up transitoraly using .NET large object heapThis approach only addresses update, not remove
  13. Recursive loop to take tail of original list after update position and build Array from the frontCons the updated element to the tailRecursive loop to cons front elementsNote both loops are tail recursiveStill could use .NET large object heapThis approach does work for remove
  14. “i” needs explanation, it is index value of element to update
  15. All the DS stack overflows seem to occur after 10^4 and before 10^5Also best time of 10k updates seems to scale perfectly with size until 10^5(possibly because we crossed over into the large object heap structure?)NOTE – punt is actually quite good for worst caseNOTE 2 – worst case does not scale linearly for any of the options, presumably overhead more expensive than performant code
  16. Hard to reason… : for instance DList Append is O(1), but Deque.OfCatLists outperforms it until a scale of appending 100,000 element structuresPull requests welcome, guidelines for pull requestsFrequently several choices for the operation you want
  17. Singly-linked lists are the starting point of functional data structures.Many of the principles of operation remain the same, but changing shapes offer new possibilitiesLike the Play-Doh Fun Factory, run your data through data structures to change its shape
  18. Solves the lookup and update problem for lists, but not RemoveCreative internal structures required for new functional data structures
  19. Adding function is called “conj”(the inverse of cons)Cons operator, conj operator, empty symbol not actually availableEmpty Queue stands in a different relationship than with List, because no real pointer to itDashed arrows because “pointing” not same as singly linked list
  20. Last &amp; Init are the complement to Head &amp; Tail
  21. Either the minimum or maximum element rises to the top of the heap
  22. Attributes of functional linear structures
  23. Somewhat complete collection of canonical sequential data structures
  24. Some minor exceptions
  25. Array at a disadvantage for this benchmark
  26. No know correct implementation of remove in F#
  27. Summary:RandomStack internally implements 2 partsIComparable type consisting of a random integer and valueHeap of the IComparable items
Advertisement