Advertisement

Feb. 5, 2013•0 likes## 2 likes

•1,267 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Technology

Beyond the bread-and-butter singly linked list are dozens of practical Functional Data Structures available to mask complexity, enable composition, and open possibilities in pattern matching. This session focuses on data structures available today in F#, but has practical application to most any functional language. The session covers what makes these structures functional, when to use them, why to use them, choosing a structure, time complexity awareness, garbage collection, and practical insights into creating and profiling your own Functional Data Structure. Bibliography at http://jackfoxy.com/fsharp-user-group-working-with-functional-data-structures-bibliography

Jack FoxFollow

Engineer at TachyusAdvertisement

- Working with Functional Data Structures Practical F# Application Jack Fox @foxyjackfox Jackfoxy.com acster.com jackfoxy.com @foxyjackfox 2/5/2013 1
- Bibliography http://jackfoxy.com/fsharp-user-group-working-with-functional-data-structures-bibliography acster.com jackfoxy.com @foxyjackfox 2/5/2013 2
- tl;dr Singly-linked list -- the fundamental purely functional data structure Time complexity overview Garbage collection and real-world performance Reasons to use Purely Functional Data Structures When not to use Purely Functional Data Structures Choices and shapes Build your own Purely Functional Data Structure acster.com jackfoxy.com @foxyjackfox 2/5/2013 3
- What is purely functional? Immutable Persistent Thread safe Recursive Incremental acster.com jackfoxy.com @foxyjackfox 2/5/2013 4
- Theoretical Performance O(1) O(log * n) practically O(1) O(log log n) O(log n) O(n) linear time O(n2) gets real bad from here on out … acster.com jackfoxy.com @foxyjackfox 2/5/2013 5
- Theoretical Performance (most common) O(1) O(log * n) practically O(1) O(log log n) O(log n) O(n) linear time O(n2) gets real bad from here on out O(i) variables other than n require explanation acster.com jackfoxy.com @foxyjackfox 2/5/2013 6
- Actual Performance Processor architecture (instruction look-ahead, cache, etc.) .NET Garbage Collection O(n) behavior starts for “large enough size” Recursive Benchmarks over different Structure Sizes 102 103 often looks like << O(n) 104 105 usually settles down to O(n), sometimes looks like > O(n) 106 acster.com jackfoxy.com @foxyjackfox 2/5/2013 7
- List as a recursive structure Adding Element Empty List 4 :: 3 2 1 [] Head Tail acster.com jackfoxy.com @foxyjackfox 2/5/2013 8
- So what the heck would you do with a list? Demo 1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 9
- “Getting” the recursive thing SICP a.k.a Abelson & Sussman a.k.a The Wizard Book acster.com jackfoxy.com @foxyjackfox 2/5/2013 10
- Why no update or remove in List ? Graphics: unattributed, all over the internet acster.com jackfoxy.com @foxyjackfox 2/5/2013 11
- Okasaki’s Pseudo-Canonical List Update 1. let rec loop i updateElem (l:list<'a>) = 2. match (i, l) with 3. | i', [] -> raise (System.Exception("subscript")) 4. | 0, x::xs -> updateElem::xs 5. | i', x::xs -> x::(loop (i' - 1) y xs) found it! 4 :: 3 :: 2 :: 1 [] acster.com jackfoxy.com @foxyjackfox 2/5/2013 12
- Okasaki’s Pseudo-Canonical List Update 1. let rec loop i updateElem (l:list<'a>) = 2. match (i, l) with 3. | i', [] -> raise (System.Exception("subscript")) 4. | 0, x::xs -> updateElem::xs 5. | i', x::xs -> x::(loop (i' - 1) y xs) Do you see a problem? acster.com jackfoxy.com @foxyjackfox 2/5/2013 13
- We could just punt 1. let punt i updateElem (l:list<'a>) = 2. let a = List.toArray l 3. a.[i] <- updateElem 4. List.ofArray a acster.com jackfoxy.com @foxyjackfox 2/5/2013 14
- …or try a Hybrid approach 1. let hybrid i updateElem (l:list<'a>) = 2. if (i = 0) then List.Cons (y, (List.tail l)) 3. else 4. let rec loop i' (front:'a array) back = 5. match i' with 6. | x when x < 0 -> front, (List.tail back) 7. | x -> 8. Array.set front x (List.head back) 9. loop (x-1) front (List.tail back) 10. let front, back = loop (i - 1) (Array.create i y) l 11. let rec loop2 i' frontLen (front’:'a array) back’ = 12. match i' with 13. | x when x > frontLen -> back’ 14. | x -> loop2 (x + 1) frontLen front’ (front’.[x]::back’) 15. loop2 0 ((Seq.length front) - 1) front (updateElem ::back) acster.com jackfoxy.com @foxyjackfox 2/5/2013 15
- Time complexity of update options Pseudo-Canonical O(i) Punt O(n) Hybrid O(i) Place your bets ! Graphics: unattributed, all over the internet acster.com jackfoxy.com @foxyjackfox 2/5/2013 16
- Actual Performance 10k Random Updates One-time Worst Case 102 PC - 2.9ms Punt - 0.2ms Hybrid 1.4X 4.0 PC 1.1X 0.2 Punt 1.5 4.5 Hybrid 4.1 0.8 PC looks perfect ! Graphics: http://www.freebievectors.com/es/material-de-antemano/51738/material-vector-dinamico-estilo-comic-femenino/ acster.com jackfoxy.com @foxyjackfox 2/5/2013 17
- Actual Performance 10k Random Updates One-time Worst Case 102 PC - 2.9ms Punt - 0.2ms Hybrid 1.4X 4.0 PC 1.1X 0.2 Punt 1.5 4.5 Hybrid 4.1 0.8 103 Hybrid - 29.6 Punt - 0.2 Punt 1.6 47.6 PC 1.1 0.2 PC 1.7 50.3 Hybrid 4.1 0.8 104 Hybrid - 320.3 Punt - 0.3 Punt 1.7 534.9 PC 1.3 0.4 PC 2.9 920.2 Hybrid 3.2 0.9 105 Hybrid - 4.67sec Punt - 1.0 Punt 2.0 9.34 Hybrid 1.5 1.5 PC stack overflow ! acster.com jackfoxy.com @foxyjackfox 2/5/2013 18
- Benchmarking performance Hard to reason about actual performance DS_Benchmark ◦ Open source on Github ◦ Discards outliers ◦ Fully isolates code to benchmark ◦ Fully documented ◦ “how to extend” documented acster.com jackfoxy.com @foxyjackfox 2/5/2013 19
- Shapes: let your imagination run wild! Graphics: Larry D. Moore Attribution-Share Alike 3.0 Unported license. http://commons.wikimedia.org/wiki/File:Playdoh.jpg acster.com jackfoxy.com @foxyjackfox 2/5/2013 20
- Binary Random Access List Same Cons, Head, Tail signature Optimized for Lookup and Update O(log n) …but not for Remove Why Not? Does it with alternate internal structures acster.com jackfoxy.com @foxyjackfox 2/5/2013 21
- Queue (FIFO) Adding Element 1 :: 2 3 4 ;; 5 Head Tail [] Empty Queue acster.com jackfoxy.com @foxyjackfox 2/5/2013 22
- Deque (double-ended queue) Adding Element Init Last 1 :: 2 3 4 ;; 5 Head Tail [] Empty Deque acster.com jackfoxy.com @foxyjackfox 2/5/2013 23
- Deque and remove Approximately O(i/2) (where i is index to element) acster.com jackfoxy.com @foxyjackfox 2/5/2013 24
- Heap 1 Head Insert Element :: Tail [] Empty Heap Merge Heaps * names in signature altered from Okasaki’s implementation Graphics: http://www.turbosquid.com/3d-models/heap-gravel-max/668104 acster.com jackfoxy.com @foxyjackfox 2/5/2013 25
- Heap and remove O(1) (if implemented) …but implementation raises issues Deleting before inserting Order of events could nullify deletion before insertion Equal values? acster.com jackfoxy.com @foxyjackfox 2/5/2013 26
- Canonical Functional Linear Structures Order by construction ascending descending random Grow Shrink Peek acster.com jackfoxy.com @foxyjackfox 2/5/2013 27
- Fsharpx.Collections RandomAccessList = List + iLookup + iUpdate DList = List + conj + append Deque = List + conj + last + initial + rev = initial U tail LazyList = List Lazy Heap = List + sorted + append Queue = List - cons + conj Vector = List - cons - head - tail + conj + last + initial + iLookup + iUpdate = RandomAccessList -1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 28
- Summary of time complexity performance Vector & Binary Random Access List 1 O( ) cons-conj / head-last / tail-init O(log32n) lookup / update Dlist O( ) 1 cons / conj / head / append O(log n) tail Deque O(log n) merge / tail 1 O( ) cons / head / tail / conj / last / init O(1) reverse O(i/ 2) lookup / update (generally) Heap 1 O( ) insert / head O(log n) merge / tail Queue 1 O( ) conj / head / tail (generally) acster.com jackfoxy.com @foxyjackfox 2/5/2013 29
- Measured performance (grow by one) 2 3 4 5 6 10 10 10 10 10 ms.f#.array 0.8 1.8 100.9 11,771.4 n/a ms.f#.array — list 0.3 1 69.5 n/a n/a ms.f#.list 0.4 0.4 0.4 1.0 13.8 ms.f#.list — list 0.7 0.7 0.9 2.3 45.3 Deque — conj 0.3 0.3 0.5 4.7 * Deque — cons 0.3 0.3 0.5 4.7 * Dlist — conj 0.7 0.7 1.0 7.7 153.0 Dlist — cons 0.7 0.7 1.0 6.4 118.4 Heap 3.2 3.3 5.0 22.5 254.7 LazyList 0.9 0.9 1.0 2.6 108.3 Queue 1.0 1.1 1.4 7.6 106.6 RandomAccessList 0.8 0.9 3.3 19.6 189.8 Vector 0.8 0.9 3.3 19.7 189.1 acster.com jackfoxy.com @foxyjackfox 2/5/2013 30
- Trees acster.com jackfoxy.com @foxyjackfox 2/5/2013 31
- Trees Wide variety of applications Binary (balanced or unbalanced) Multiway (a.k.a. RoseTree) acster.com jackfoxy.com @foxyjackfox 2/5/2013 32
- Red Black Tree Balancing d a b c a d a b c d b c c d a b a d b c Source: https://wiki.rice.edu/confluence/download/attachments/2761212/Okasaki-Red-Black.pdf acster.com jackfoxy.com @foxyjackfox 2/5/2013 33
- Talk about reducing complexity! 1. type 'a t = Node of color * 'a * 'a t * 'a t | Leaf 2. let balance = function 3. | Black, z, Node (Red, y, Node (Red, x, a, b), c), d 4. | Black, z, Node (Red, x, a, Node (Red, y, b, c)), d 5. | Black, x, a, Node (Red, z, Node (Red, y, b, c), d) 6. | Black, x, a, Node (Red, y, b, Node (Red, z, c, d)) -> 7. Node (Red, y, Node (Black, x, a, b), Node (Black, z, c, d)) 8. | x -> Node x Source: http://fsharpnews.blogspot.com/2010/07/f-vs-mathematica-red-black-trees.html acster.com jackfoxy.com @foxyjackfox 2/5/2013 34
- Extra Credit Write the Remove operation for a Red Black Tree Here’s how: http://en.wikipedia.org/wiki/Red-black_tree#Removal acster.com jackfoxy.com @foxyjackfox 2/5/2013 35
- Fsharpx.Collections.Experimental IntMap (Map-like structure) BKTree RoseTree (lazy multiway) EagerRoseTree IndexedRoseTree MS.F#.Collections Map Set acster.com jackfoxy.com @foxyjackfox 2/5/2013 36
- To Do: Benchmark: RoseTree (lazy) EagerRoseTree (not yet implemented) IndexedRoseTree Multiway as unbalanced binary tree (polymorphic recursion) acster.com jackfoxy.com @foxyjackfox 2/5/2013 37
- Another To Do: The (not-so-) Naïve Binary Tree: As seen all over the internet… acster.com jackfoxy.com @foxyjackfox 2/5/2013 38
- Another To Do: The (not-so-) Naïve Binary Tree: As seen all over the internet… …yet often missing: Pre-order Post-order In-order fold traversals (better be tail-recursive). And maybe a zipper navigator while you are at it! acster.com jackfoxy.com @foxyjackfox 2/5/2013 39
- Call for Action! Fsharpx.Collections.Experimental GitHub fork FSharpx Implement some interesting structure and tests Sync back to your fork Pull request Out of ideas or just want to practice? Unimplemented Okasaki structures: http://github.com/jackfoxy/DS_Benchmark/tree/ master/PurelyFunctionalDataStructures acster.com jackfoxy.com @foxyjackfox 2/5/2013 40
- When not to use purely functional Consider Array if performance is critical Functional dictionary–like structures (Map) may not perform well-enough, especially after scale 104 Consider .NET dictionary–like object acster.com jackfoxy.com @foxyjackfox 2/5/2013 41
- Publishing your functional DS FSharpx.Collections.readme.md Include Try value returning option for values that can throw Exception Include other common values if < O(n) Reason about edge cases (more unit tests better than not enough) acster.com jackfoxy.com @foxyjackfox 2/5/2013 42
- Build your own structure Leverage Heap as internal structure to create RandomStack Demo 3 acster.com jackfoxy.com @foxyjackfox 2/5/2013 43
- Closing Thought The functional data structures further from the “mainstream” (if such a measure were possible) tend to have less inherit value in their generic form. Therefore the ultimate functional data structures collection would combine the characteristics of a library, a snippet collection, a benchmarking tool, superb documentation, test cases, and EXAMPLES! acster.com jackfoxy.com @foxyjackfox 2/5/2013 44
- Resources FSPowerPack.Core.Community (NuGet) FSharpx.Core (GitHub & NuGet) FSharpx.Collections.Experimental (GitHub & NuGet) DS_Benchmark (GitHub) raw code for structures not yet merged to FSharpx acster.com jackfoxy.com @foxyjackfox 2/5/2013 45

- The big ideas in the presentation
- Immutable is only requirement for “definition” of purely functionalPersistence is a side effect of immutable Immutable and persistence allows for thread safetyRecursive just happens to be implementation of nearly all purely functional data structuresIncremental is an aspect of recursion, and enables efficient GC, structures never require .NET large object heap
- Time complexity relates how the time component of a process scales
- You usually only have to reason about a few time complexity cases
- Processor architecture and GC can affect time complexity analysisEspecially on repeated operations resulting in new structure object
- Singly-linked list, arguably the most pervasive functional data structure(setting aside stream/IEnumerable for the moment)Tail is itself a listSo is empty list
- Summary: recursing through a list with active pattern to format the dataDoing the same with a LazyList takes more time and more Garbage CollectionBut if active pattern cuts short recursion before covering the whole list LazyList actually saves resources (time)(especially useful if calculation or other resources involved)
- Read first few chapters to see what singly linked lists are all about
- Or in practically any purely functional data structure, for that matter.
- This is how we would like to write List update. Remove is the same algorithm, but losing the target element without replacingRecursing like this is akin to operating on a Russian doll
- Not tail recursive. (see later slides)
- Array is not a functional structureHowever it ends up hidden from the rest of the user code, thus preserving structure immutabilityCould end up transitoraly using .NET large object heapThis approach only addresses update, not remove
- Recursive loop to take tail of original list after update position and build Array from the frontCons the updated element to the tailRecursive loop to cons front elementsNote both loops are tail recursiveStill could use .NET large object heapThis approach does work for remove
- “i” needs explanation, it is index value of element to update
- All the DS stack overflows seem to occur after 10^4 and before 10^5Also best time of 10k updates seems to scale perfectly with size until 10^5(possibly because we crossed over into the large object heap structure?)NOTE – punt is actually quite good for worst caseNOTE 2 – worst case does not scale linearly for any of the options, presumably overhead more expensive than performant code
- Hard to reason… : for instance DList Append is O(1), but Deque.OfCatLists outperforms it until a scale of appending 100,000 element structuresPull requests welcome, guidelines for pull requestsFrequently several choices for the operation you want
- Singly-linked lists are the starting point of functional data structures.Many of the principles of operation remain the same, but changing shapes offer new possibilitiesLike the Play-Doh Fun Factory, run your data through data structures to change its shape
- Solves the lookup and update problem for lists, but not RemoveCreative internal structures required for new functional data structures
- Adding function is called “conj”(the inverse of cons)Cons operator, conj operator, empty symbol not actually availableEmpty Queue stands in a different relationship than with List, because no real pointer to itDashed arrows because “pointing” not same as singly linked list
- Last & Init are the complement to Head & Tail
- Either the minimum or maximum element rises to the top of the heap
- Attributes of functional linear structures
- Somewhat complete collection of canonical sequential data structures
- Some minor exceptions
- Array at a disadvantage for this benchmark
- No know correct implementation of remove in F#
- Summary:RandomStack internally implements 2 partsIComparable type consisting of a random integer and valueHeap of the IComparable items

Advertisement