Working with functional data structures

1,234 views

Published on

Beyond the bread-and-butter singly linked list are dozens of practical Functional Data Structures available to mask complexity, enable composition, and open possibilities in pattern matching. This session focuses on data structures available today in F#, but has practical application to most any functional language. The session covers what makes these structures functional, when to use them, why to use them, choosing a structure, time complexity awareness, garbage collection, and practical insights into creating and profiling your own Functional Data Structure. Bibliography at http://jackfoxy.com/fsharp-user-group-working-with-functional-data-structures-bibliography

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,234
On SlideShare
0
From Embeds
0
Number of Embeds
83
Actions
Shares
0
Downloads
18
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • The big ideas in the presentation
  • Immutable is only requirement for “definition” of purely functionalPersistence is a side effect of immutable Immutable and persistence allows for thread safetyRecursive just happens to be implementation of nearly all purely functional data structuresIncremental is an aspect of recursion, and enables efficient GC, structures never require .NET large object heap
  • Time complexity relates how the time component of a process scales
  • You usually only have to reason about a few time complexity cases
  • Processor architecture and GC can affect time complexity analysisEspecially on repeated operations resulting in new structure object
  • Singly-linked list, arguably the most pervasive functional data structure(setting aside stream/IEnumerable for the moment)Tail is itself a listSo is empty list
  • Summary: recursing through a list with active pattern to format the dataDoing the same with a LazyList takes more time and more Garbage CollectionBut if active pattern cuts short recursion before covering the whole list LazyList actually saves resources (time)(especially useful if calculation or other resources involved)
  • Read first few chapters to see what singly linked lists are all about
  • Or in practically any purely functional data structure, for that matter.
  • This is how we would like to write List update. Remove is the same algorithm, but losing the target element without replacingRecursing like this is akin to operating on a Russian doll
  • Not tail recursive. (see later slides)
  • Array is not a functional structureHowever it ends up hidden from the rest of the user code, thus preserving structure immutabilityCould end up transitoraly using .NET large object heapThis approach only addresses update, not remove
  • Recursive loop to take tail of original list after update position and build Array from the frontCons the updated element to the tailRecursive loop to cons front elementsNote both loops are tail recursiveStill could use .NET large object heapThis approach does work for remove
  • “i” needs explanation, it is index value of element to update
  • All the DS stack overflows seem to occur after 10^4 and before 10^5Also best time of 10k updates seems to scale perfectly with size until 10^5(possibly because we crossed over into the large object heap structure?)NOTE – punt is actually quite good for worst caseNOTE 2 – worst case does not scale linearly for any of the options, presumably overhead more expensive than performant code
  • Hard to reason… : for instance DList Append is O(1), but Deque.OfCatLists outperforms it until a scale of appending 100,000 element structuresPull requests welcome, guidelines for pull requestsFrequently several choices for the operation you want
  • Singly-linked lists are the starting point of functional data structures.Many of the principles of operation remain the same, but changing shapes offer new possibilitiesLike the Play-Doh Fun Factory, run your data through data structures to change its shape
  • Solves the lookup and update problem for lists, but not RemoveCreative internal structures required for new functional data structures
  • Adding function is called “conj”(the inverse of cons)Cons operator, conj operator, empty symbol not actually availableEmpty Queue stands in a different relationship than with List, because no real pointer to itDashed arrows because “pointing” not same as singly linked list
  • Last & Init are the complement to Head & Tail
  • Either the minimum or maximum element rises to the top of the heap
  • Attributes of functional linear structures
  • Somewhat complete collection of canonical sequential data structures
  • Some minor exceptions
  • Array at a disadvantage for this benchmark
  • No know correct implementation of remove in F#
  • Summary:RandomStack internally implements 2 partsIComparable type consisting of a random integer and valueHeap of the IComparable items
  • Working with functional data structures

    1. 1. Working with Functional Data StructuresPractical F# Application Jack Fox @foxyjackfox Jackfoxy.com acster.com jackfoxy.com @foxyjackfox 2/5/2013 1
    2. 2. Bibliographyhttp://jackfoxy.com/fsharp-user-group-working-with-functional-data-structures-bibliography acster.com jackfoxy.com @foxyjackfox 2/5/2013 2
    3. 3. tl;dr Singly-linked list -- the fundamental purely functional data structure Time complexity overview Garbage collection and real-world performance Reasons to use Purely Functional Data Structures When not to use Purely Functional Data Structures Choices and shapes Build your own Purely Functional Data Structure acster.com jackfoxy.com @foxyjackfox 2/5/2013 3
    4. 4. What is purely functional? Immutable Persistent Thread safe Recursive Incremental acster.com jackfoxy.com @foxyjackfox 2/5/2013 4
    5. 5. Theoretical Performance O(1) O(log * n) practically O(1) O(log log n) O(log n) O(n) linear time O(n2) gets real bad from here on out … acster.com jackfoxy.com @foxyjackfox 2/5/2013 5
    6. 6. Theoretical Performance (most common) O(1) O(log * n) practically O(1) O(log log n) O(log n) O(n) linear time O(n2) gets real bad from here on out O(i) variables other than n require explanation acster.com jackfoxy.com @foxyjackfox 2/5/2013 6
    7. 7. Actual Performance Processor architecture (instruction look-ahead, cache, etc.) .NET Garbage Collection O(n) behavior starts for “large enough size”Recursive Benchmarks over different Structure Sizes 102 103 often looks like << O(n) 104 105 usually settles down to O(n), sometimes looks like > O(n) 106 acster.com jackfoxy.com @foxyjackfox 2/5/2013 7
    8. 8. List as a recursive structure Adding Element Empty List 4 :: 3 2 1 []Head Tail acster.com jackfoxy.com @foxyjackfox 2/5/2013 8
    9. 9. So what the heck would you do with a list? Demo 1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 9
    10. 10. “Getting” the recursive thing  SICP a.k.a  Abelson & Sussman a.k.a  The Wizard Book acster.com jackfoxy.com @foxyjackfox 2/5/2013 10
    11. 11. Why no update or remove in List ?Graphics: unattributed, all over the internet acster.com jackfoxy.com @foxyjackfox 2/5/2013 11
    12. 12. Okasaki’s Pseudo-Canonical List Update1. let rec loop i updateElem (l:list<a>) =2. match (i, l) with3. | i, [] -> raise (System.Exception("subscript"))4. | 0, x::xs -> updateElem::xs5. | i, x::xs -> x::(loop (i - 1) y xs) found it! 4 :: 3 :: 2 :: 1 [] acster.com jackfoxy.com @foxyjackfox 2/5/2013 12
    13. 13. Okasaki’s Pseudo-Canonical List Update1. let rec loop i updateElem (l:list<a>) =2. match (i, l) with3. | i, [] -> raise (System.Exception("subscript"))4. | 0, x::xs -> updateElem::xs5. | i, x::xs -> x::(loop (i - 1) y xs)  Do you see a problem? acster.com jackfoxy.com @foxyjackfox 2/5/2013 13
    14. 14. We could just punt1. let punt i updateElem (l:list<a>) =2. let a = List.toArray l3. a.[i] <- updateElem4. List.ofArray a acster.com jackfoxy.com @foxyjackfox 2/5/2013 14
    15. 15. …or try a Hybrid approach1. let hybrid i updateElem (l:list<a>) =2. if (i = 0) then List.Cons (y, (List.tail l))3. else4. let rec loop i (front:a array) back =5. match i with6. | x when x < 0 -> front, (List.tail back)7. | x ->8. Array.set front x (List.head back)9. loop (x-1) front (List.tail back)10. let front, back = loop (i - 1) (Array.create i y) l11. let rec loop2 i frontLen (front’:a array) back’ =12. match i with13. | x when x > frontLen -> back’14. | x -> loop2 (x + 1) frontLen front’ (front’.[x]::back’)15. loop2 0 ((Seq.length front) - 1) front (updateElem ::back) acster.com jackfoxy.com @foxyjackfox 2/5/2013 15
    16. 16. Time complexity of update options Pseudo-Canonical O(i) Punt O(n) Hybrid O(i) Place your bets !Graphics: unattributed, all over the internet acster.com jackfoxy.com @foxyjackfox 2/5/2013 16
    17. 17. Actual Performance 10k Random Updates One-time Worst Case 102 PC - 2.9ms Punt - 0.2ms Hybrid 1.4X 4.0 PC 1.1X 0.2 Punt 1.5 4.5 Hybrid 4.1 0.8 PC looks perfect !Graphics: http://www.freebievectors.com/es/material-de-antemano/51738/material-vector-dinamico-estilo-comic-femenino/ acster.com jackfoxy.com @foxyjackfox 2/5/2013 17
    18. 18. Actual Performance 10k Random Updates One-time Worst Case 102 PC - 2.9ms Punt - 0.2ms Hybrid 1.4X 4.0 PC 1.1X 0.2 Punt 1.5 4.5 Hybrid 4.1 0.8 103 Hybrid - 29.6 Punt - 0.2 Punt 1.6 47.6 PC 1.1 0.2 PC 1.7 50.3 Hybrid 4.1 0.8 104 Hybrid - 320.3 Punt - 0.3 Punt 1.7 534.9 PC 1.3 0.4 PC 2.9 920.2 Hybrid 3.2 0.9 105 Hybrid - 4.67sec Punt - 1.0 Punt 2.0 9.34 Hybrid 1.5 1.5 PC stack overflow ! acster.com jackfoxy.com @foxyjackfox 2/5/2013 18
    19. 19. Benchmarking performance Hard to reason about actual performance DS_Benchmark ◦ Open source on Github ◦ Discards outliers ◦ Fully isolates code to benchmark ◦ Fully documented ◦ “how to extend” documented acster.com jackfoxy.com @foxyjackfox 2/5/2013 19
    20. 20. Shapes: let your imagination run wild! Graphics: Larry D. Moore Attribution-Share Alike 3.0 Unported license. http://commons.wikimedia.org/wiki/File:Playdoh.jpg acster.com jackfoxy.com @foxyjackfox 2/5/2013 20
    21. 21. Binary Random Access List Same Cons, Head, Tail signature Optimized for Lookup and Update O(log n) …but not for Remove Why Not? Does it with alternate internal structures acster.com jackfoxy.com @foxyjackfox 2/5/2013 21
    22. 22. Queue (FIFO) Adding Element 1 :: 2 3 4 ;; 5Head Tail [] Empty Queue acster.com jackfoxy.com @foxyjackfox 2/5/2013 22
    23. 23. Deque (double-ended queue) Adding Element Init Last 1 :: 2 3 4 ;; 5Head Tail [] Empty Deque acster.com jackfoxy.com @foxyjackfox 2/5/2013 23
    24. 24. Deque and removeApproximately O(i/2)(where i is index to element) acster.com jackfoxy.com @foxyjackfox 2/5/2013 24
    25. 25. Heap 1 HeadInsert Element :: Tail[]Empty Heap Merge Heaps * names in signature altered from Okasaki’s implementation Graphics: http://www.turbosquid.com/3d-models/heap-gravel-max/668104 acster.com jackfoxy.com @foxyjackfox 2/5/2013 25
    26. 26. Heap and removeO(1) (if implemented)…but implementation raises issues  Deleting before inserting  Order of events could nullify deletion before insertion  Equal values? acster.com jackfoxy.com @foxyjackfox 2/5/2013 26
    27. 27. Canonical Functional Linear Structures Order  by construction  ascending  descending  random Grow Shrink Peek acster.com jackfoxy.com @foxyjackfox 2/5/2013 27
    28. 28. Fsharpx.Collections RandomAccessList = List + iLookup + iUpdate DList = List + conj + append Deque = List + conj + last + initial + rev = initial U tail LazyList = List Lazy Heap = List + sorted + append Queue = List - cons + conj Vector = List - cons - head - tail + conj + last + initial + iLookup + iUpdate = RandomAccessList -1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 28
    29. 29. Summary of time complexity performance Vector & Binary Random Access List 1 O( ) cons-conj / head-last / tail-init O(log32n) lookup / update Dlist O( ) 1 cons / conj / head / append O(log n) tail Deque O(log n) merge / tail 1 O( ) cons / head / tail / conj / last / init O(1) reverse O(i/ 2) lookup / update (generally) Heap 1 O( ) insert / head O(log n) merge / tail Queue 1 O( ) conj / head / tail (generally) acster.com jackfoxy.com @foxyjackfox 2/5/2013 29
    30. 30. Measured performance (grow by one) 2 3 4 5 6 10 10 10 10 10ms.f#.array 0.8 1.8 100.9 11,771.4 n/ams.f#.array — list 0.3 1 69.5 n/a n/ams.f#.list 0.4 0.4 0.4 1.0 13.8ms.f#.list — list 0.7 0.7 0.9 2.3 45.3Deque — conj 0.3 0.3 0.5 4.7 *Deque — cons 0.3 0.3 0.5 4.7 *Dlist — conj 0.7 0.7 1.0 7.7 153.0Dlist — cons 0.7 0.7 1.0 6.4 118.4Heap 3.2 3.3 5.0 22.5 254.7LazyList 0.9 0.9 1.0 2.6 108.3Queue 1.0 1.1 1.4 7.6 106.6RandomAccessList 0.8 0.9 3.3 19.6 189.8Vector 0.8 0.9 3.3 19.7 189.1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 30
    31. 31. Treesacster.com jackfoxy.com @foxyjackfox 2/5/2013 31
    32. 32. Trees Wide variety of applications Binary (balanced or unbalanced) Multiway (a.k.a. RoseTree)acster.com jackfoxy.com @foxyjackfox 2/5/2013 32
    33. 33. Red Black Tree Balancing d a b c a d a b c d b c c d a b a d b c Source: https://wiki.rice.edu/confluence/download/attachments/2761212/Okasaki-Red-Black.pdf acster.com jackfoxy.com @foxyjackfox 2/5/2013 33
    34. 34. Talk about reducing complexity!1. type a t = Node of color * a * a t * a t | Leaf2. let balance = function3. | Black, z, Node (Red, y, Node (Red, x, a, b), c), d4. | Black, z, Node (Red, x, a, Node (Red, y, b, c)), d5. | Black, x, a, Node (Red, z, Node (Red, y, b, c), d)6. | Black, x, a, Node (Red, y, b, Node (Red, z, c, d)) ->7. Node (Red, y, Node (Black, x, a, b), Node (Black, z, c, d))8. | x -> Node xSource: http://fsharpnews.blogspot.com/2010/07/f-vs-mathematica-red-black-trees.html acster.com jackfoxy.com @foxyjackfox 2/5/2013 34
    35. 35. Extra CreditWrite the Remove operation for aRed Black TreeHere’s how: http://en.wikipedia.org/wiki/Red-black_tree#Removal acster.com jackfoxy.com @foxyjackfox 2/5/2013 35
    36. 36. Fsharpx.Collections.Experimental IntMap (Map-like structure) BKTree RoseTree (lazy multiway) EagerRoseTree IndexedRoseTreeMS.F#.Collections  Map  Set acster.com jackfoxy.com @foxyjackfox 2/5/2013 36
    37. 37. To Do:Benchmark:RoseTree (lazy)EagerRoseTree (not yet implemented)IndexedRoseTreeMultiway as unbalanced binary tree (polymorphic recursion) acster.com jackfoxy.com @foxyjackfox 2/5/2013 37
    38. 38. Another To Do:The (not-so-) Naïve Binary Tree:As seen all over the internet… acster.com jackfoxy.com @foxyjackfox 2/5/2013 38
    39. 39. Another To Do:The (not-so-) Naïve Binary Tree:As seen all over the internet……yet often missing: Pre-order Post-order In-orderfold traversals (better be tail-recursive).And maybe a zipper navigator while you are at it! acster.com jackfoxy.com @foxyjackfox 2/5/2013 39
    40. 40. Call for Action!Fsharpx.Collections.Experimental GitHub fork FSharpx Implement some interesting structure and tests Sync back to your fork Pull requestOut of ideas or just want to practice? Unimplemented Okasaki structures: http://github.com/jackfoxy/DS_Benchmark/tree/ master/PurelyFunctionalDataStructures acster.com jackfoxy.com @foxyjackfox 2/5/2013 40
    41. 41. When not to use purely functional Consider Array if performance is critical Functional dictionary–like structures (Map) may not perform well-enough, especially after scale 104 Consider .NET dictionary–like object acster.com jackfoxy.com @foxyjackfox 2/5/2013 41
    42. 42. Publishing your functional DSFSharpx.Collections.readme.md Include Try value returning option for values that can throw Exception Include other common values if < O(n) Reason about edge cases (more unit tests better than not enough) acster.com jackfoxy.com @foxyjackfox 2/5/2013 42
    43. 43. Build your own structure Leverage Heap as internal structure to create RandomStack Demo 3 acster.com jackfoxy.com @foxyjackfox 2/5/2013 43
    44. 44. Closing ThoughtThe functional data structures further from the “mainstream” (if such a measure were possible) tend to have less inherit value in their generic form.Therefore the ultimate functional data structures collection would combine the characteristics of a library, a snippet collection, a benchmarking tool, superb documentation, test cases, and EXAMPLES! acster.com jackfoxy.com @foxyjackfox 2/5/2013 44
    45. 45. Resources FSPowerPack.Core.Community (NuGet) FSharpx.Core (GitHub & NuGet) FSharpx.Collections.Experimental (GitHub & NuGet) DS_Benchmark (GitHub) raw code for structures not yet merged to FSharpx acster.com jackfoxy.com @foxyjackfox 2/5/2013 45

    ×