Linked Lists With Perl: Why bother?


Published on

Linked lists can be useful in Perl for memory management , walking multiple lists, managing memory in long-lived tasks, or in threaded applications. This talk describes the basics of singly-linked lists, the basics of code that make up LinkedList::Single, and shows some applications of the lists.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Linked Lists With Perl: Why bother?

  1. 1. Linked Lists In PerlWhat?Why bother?Steven Lembark  Workhorse©2009,2013 Steven Lembark
  2. 2. What About Arrays?● Perl arrays are fast and flexible.● Autovivication makes them generally easy to use.● Built-in Perl functions for managing them.● But there are a few limitations:● Memory manglement.● Iterating multiple lists.● Maintaining state within the lists.● Difficult to manage links within the list.
  3. 3. Memory Issues● Pushing a single value onto an array candouble its size.● Copying array contents is particularly expensivefor long-lived processes which have problemswith heap fragmentation.● Heavily forked processes also run into problemswith copy-on-write due to whole-array copies.● There is no way to reduce the array structuresize once it grows -- also painful for long-livedprocesses.
  4. 4. Comparing Multiple Lists● for( @array ) will iterate one list at a time.● Indexes work but have their own problems:● Single index requires identical offsets in all of thearrays.● Varying offsets require indexes for each array,with separate bookeeping for each.● Passing all of the arrays and objects becomes a lotof work, even if they are wrapped on objects –which have their own performance overhead.● Indexes used for external references have to beupdated with every shift, push, pop...
  5. 5. Linked Lists: The Other Guys● Perly code for linked lists is simple, simplifiesmemory management, and list iteration.● Trade indexed access for skip-chains andexternal references.● List operators like push, pop, and splice aresimple.● You also get more granular locking in threadedapplications.
  6. 6. Examples● Insertion sorts are stable, but splicing an iteminto the middle of an array is expensive;adding a new node to a linked list is cheap.● Threading can require locking an entire arrayto update it; linked lists can safely be lockedby node and frequently dont even requirelocking.● Comparing lists of nodes can be simpler thandealing with multiple arrays – especially if theoffsets change.
  7. 7. Implementing Perly Linked Lists● Welcome back to arrayrefs.● Arrays can hold any type of data.● Use a “next” ref and arbitrary data.● The list itself requires a static “head” and anode variable to walk down the list.● Doubly-linked lists are also manageable withthe addition of weak links.● For the sake of time Ill only discuss singly-linked lists here.
  8. 8. List Structure ● Singly-linked listsare usually drawnas some datafollowed by apointer to the nextlink.● In Perl it helps todraw the pointerfirst, because thatis where it isstored.
  9. 9. Adding a Node● New nodes do notneed to becontiguous inmemory.● Also doesnt requirelocking the entirelist.
  10. 10. Dropping A Node● Dropping a node releasesits memory – at last backto Perl.● Only real effect is on theprior nodes next ref.● This is the only piece thatneeds to be locked forthreading.
  11. 11. Perl Code● A link followed by data looks like:$node = [ $next, @data ]● Walking the list copies the node:( $node, my @data ) = @$node;●Adding a new node recycles the “next” ref:$node->[0] = [ $node->[0], @data ]●Removing recycles the nexts next:($node->[0], my @data) = @{$node->[0]};
  12. 12. A Reverse-Order List● Just update the head nodes next reference.● Fast because it moves the minimum of $list = [ [] ];my $node = $list->[0];for my $val ( @_ ){$node->[0] = [ $node->[0], $val ]}# list is populated w/ empty tail.
  13. 13. In-order List● Just move the node, looks like a push.● Could be a one-liner, Ive shown it here as $list = [ [] ];my $node = $list->[0];for my $val ( @_ ){@$node = ( [], $val );$node = $node->[0];}
  14. 14. Viewing The List● Structure is recursive from Perls point of view.● Uses the one-line version (golf anyone)?DB<1> $list = [ [], head node ];DB<2> $node = $list->[0];DB<3> for( 1 .. 5 ) { ($node) = @$node = ( [], “node-$_” ) }DB<14> x $list0 ARRAY(0x8390608) $list0 ARRAY(0x83ee698) $list->[0]0 ARRAY(0x8411f88) $list->[0][0]0 ARRAY(0x83907c8) $list->[0][0][0]0 ARRAY(0x83f9a10) $list->[0][0][0][0]0 ARRAY(0x83f9a20) $list->[0][0][0][0][0]0 ARRAY(0x83f9a50) $list->[0][0][0][0][0][0]empty array empty tail node1 node-5 $list->[0][0][0][0][0][1]1 node-4 $list->[0][0][0][0][1]1 node-3 $list->[0][0][0][1]1 node-2 $list->[0][0][1]1 node-1 $list->[0][1]1 head node $list->[1]
  15. 15. Destroying A Linked List● Prior to 5.12, Perls memory de-allocator isrecursive.● Without a DESTROY the lists blow up after 100nodes when perl blows is stack.● The fix was an iterative destructor:● This is no longer required.DESTROY{my $list = shift;$list = $list->[0] while $list;}
  16. 16. Simple Linked List Class● Bless an arrayref with the head (placeholder)node and any data for tracking the list.sub new{my $proto = shift;bless [ [], @_ ], blessed $proto || $proto}# iterative < 5.12, else no-op.DESTROY {}
  17. 17. Building the list: unshift● One reason for the head node: it provides aplace to insert the data nodes after.● The new first node has the old first nodes“next” ref and the new data.sub unshift{my $list = shift;$list->[0] = [ $list->[0], @_ ];$list}
  18. 18. Taking one off: shift● This starts directly from the head node also:just replace the head nodes next with the firstnodes next.sub shift{my $list = shift;( $list->[0], my @data )= @{ $list >[0] };‑wantarray ? @data : @data}
  19. 19. Push Is A Little Harder● One approach is an unshift before the tail.● Another is populating the tail node:sub push{my $list = shift;my $node = $list->[0];$node = $node->[0] while $node->[0];# populate the empty tail node@{ $node } = [ [], @_ ];$list}
  20. 20. The Bane Of Single Links: pop● You need the node-before-the-tail to pop thetail.● By the time youve found the tail it is too lateto pop it off.● Storing the node-before-tail takes extrabookeeping.● The trick is to use two nodes, one trailing theother: when the small roast is burned the bigone is just right.
  21. 21. sub node_pop{my $list = shift;my $prior = $list->head;my $node = $prior->[0];while( $node->[0] ){$prior = $node;$node = $node->[0];}( $prior->[0], my @data ) = @$node;wantarray ? @data : @data}● Lexical $prior is more efficient than examining$node->[0][0] at multiple points in the loop.
  22. 22. Mixing OO & Procedural Code● Most of what follows could be done entirelywith method calls.● Catch: They are slower than directly accessingthe nodes.● One advantage to singly-linked lists is that thestructure is simple.● Mixing procedural and OO code without gettingtangled is easy enough.● This is why I use it for genetics code: the code isfast and simple.
  23. 23. Walking A List● By itself the node has enough state to tracklocation – no separate index required.● Putting the link first allows advancing andextraction of data in one access of the node:my $node = $list->[0];while( $node ){( $node, my %info ) = @$node;# process %info...}
  24. 24. Comparing Multiple Lists● Same code, just more assignments:my $n0 = $list0->[0];my $n1 = $list1->[0];while( $n0 && $n1 ){( $n0, my @data0 ) = @$n0;( $n1, my @data1 ) = @$n1;# deal with @data1, @data2...}
  25. 25. Syncopated Lists● Adjusting the offsets requires minimumbookkeeping, doesnt affect the parent list.while( @$n0, @$n1 ){$_ = $_->[0] for $n0, $n1;aligned $n0, $n1or ( $n0, $n1 ) = realign $n0, $n1or last;$score += compare $n0, $n1;}
  26. 26. Using The Head Node● $head->[0] is the first node, there are a fewuseful things to add into @head[1...].● Tracking the length or keeping a ref to the tail tailsimplifys push, pop; requires extra bookkeeping.● The head node can also store user-supplieddata describing the list.● I use this for tracking length and species names inresults of DNA sequences.
  27. 27. Close, but no cigar...● The class shown only works at the head.● Be nice to insert things in the middle withoutresorting to $node variables.● Or call methods on the internal nodes.● A really useful class would use inside-out datato track the head, for example.● Cant assign $list = $list->[0], however.● Looses the inside-out data.● We need a structure that walks the list withoutmodifying its own refaddr.
  28. 28. The fix: ref-to-arrayref● Scalar refs are the ultimate container struct.● They can reference anything, in this case an array.● $list stays in one place, $$list walks up the list.● “head” or “next” modify $$list to repositionthe location.● Saves blessing every node on the list.● Simplifies having a separate class for nodes.● Also helps when resorting to procedural code forspeed.
  29. 29. Basics Dont Change Muchsub new{my $proto = shift;my $head = [ [], @_ ];my $list = $head;$headz{ refaddr $list } = $head;bless $list, blessed $proto || $proto}DESTROY # add iteration for < v5.12.{my $list = shift;delete $headz{ refaddr $list };}● List updates assign to $$list.● DESTROY cleans up inside-out data.
  30. 30. Walking The Listsub head{my $list = shift$$list = $headz{ refaddr $list };$list}sub next{my $list = shift;( $$list, my @data ) = @$$listor returnwantarray ? @data : @data}$list->head;while( my @data = $list->next ) { ... }
  31. 31. Reverse-order revisited:● Unshift isnt much different.● Note that $list is not updated.sub unshift{my $list = shift;my $head = $list->head;$head->[0] = [ $head->[0], @_ ];$list}my $list = List::Class->new( ... );$list->unshift( $_ ) for @data;
  32. 32. Useful shortcut: add_after● Unlike arrays, adding into the middle of a list isefficient and common.● “push” adds to the tail, need something else.● add_after() puts a node after the current one.● unshift() is really “$list->head->add_after”.● Use with “next_node” that ignores the data.● In-order addition (head, middle, or end):$list->add_after( $_ )->next for @data;● Helps if next() avoids walking off the list.
  33. 33. sub next{my $list = shift;my $next = $$list->[0] or return;@$next or return;$$list = $next;$list}sub add_after{my $list = shift;my $node = $$list;$node->[0] = [ $node->[0], @_ ]$list}
  34. 34. Off-by-one gotchas● The head node does not have any user data.● Common mistake: $list->head->data.● This gets you the lists data, not users.● Fix is to pre-increment the list:$list->head->next or last;while( @data = $list->next ) {...}
  35. 35. Overloading a list: bool, offset.● while( $list ), if( $list ) would be nice.● Basically, this leaves the list “true” if it hasdata; false if it is at the tail node.use overloadq{bool} =>sub{my $list = shift;$$list},
  36. 36. Offsets would be nice alsoq{++} => sub{my $list = shift;my $node = $$list;@$node and $$list= $node->[0];$list},q{+} => sub{my ( $list, $offset ) = $_[2] ? $node = $$list;for ( 1 .. $offset ){@$node && $node = $node->[0]or last;}$node},
  37. 37. Updating the list becomes trivial● An offset from the list is a node.● That leaves += simply assigning $list + $off.q{+=} =>sub{my ( $list, $offset ) = …$$list = $list + $offset;$listh};
  38. 38. Backdoor: node operations● Be nice to extract a node without having tocreep around inside of the object.● Handing back the node ref saves derivedclasses from having to unwrap the object.● Also save having to update the list objectslocation to peek into the next or head node.sub curr_node { ${ $_[0] } }sub next_node { ${ $_[0] }->[0] }sub root_node { $headz{ refaddr $_[0] } }sub head_node { $headz{ refaddr $_[0] }->[0] }
  39. 39. Skip chains: bookmarks for lists● Separate list of interesting nodes.● Alphabetical sort would have skip-chain of firstletters or prefixes.● In-list might have ref to next “interesting” node.● Placeholders simplify bookkeeping.● For alphabetic, pre-load A .. Z into the list.● Saves updating the skip chain for inserts prior tothe currently referenced node.
  40. 40. Applying Perly Linked Lists● I use them in the W-curve code for comparingDNA sequences.● The comparison has to deal with different sizes,local gaps between the curves.● Comparison requires fast lookahead for the nextinteresting node on the list.● Nodes and skip chains do the job nicely.● List structure allows efficient node updateswithout disturbing the object.
  41. 41. W-curve is derived from LL::S● Nodes have three spatial values and a skip-chaininitialized after the list is initialized.sub initialize{my ( $wc, $dna ) = @$_;my $pt = [ 0, 0, 0 ];$wc->head->truncate;while( my $a = substr $dna, 0, 1, ){$pt = $wc->next_point( $a, $pt );$wc->add_after( @$pt, )->next;}$wc}
  42. 42. Skip-chain looks for “peaks”● The alignment algorithm looks for nearbypoints ignoring their Z value.● Comparing the sparse list of radii > 0.50 speedsup the alignment.● Skip-chains for each node point to the next nodewith a large-enough radius.● Building the skip chain uses an “inch worm”.● The head walks up to the next useful node.● The tail fills ones between with a node refrence.
  43. 43. Skip Chain:“interesting”nodes.sub add_skip{my $list = shift;my $node = $list->head_node;my $skip = $node->[0];for( 1 .. $list->size ){$skip->[1] > $cutoff or next;while( $node != $skip ){$node->[4] = $skip; # replace “”$node = $node->[0]; # next node}}continue{$skip = $skip->[0];}}
  44. 44. Use nodes to compare lists.● DNA sequences can become offset due to gapson either sequence.● This prevents using a single index to comparelists stored as arrays.● A linked list can be re-aligned passing only thenodes.● Comparison can be re-started with only the nodes.● Makes for a useful mix of OO and proceduralcode.
  45. 45. ● Re-aligningthe listssimplyrequiresassigning thelocal values.● Updating thenode varsdoes notaffect the listlocations ifcomparefails.sub compare_lists{...while( @$node0 && @$node1 ){( $node0, $node1 )= realign_nodes $node0, $node1or last;( $dist, $node0, $node1 )= compare_aligned $node0, $node1;$score += $dist;}if( defined $score ){$list0->node( $node0 );$list1->node( $node1 );}# caller gets back unused portion.( $score, $list0, $list1 )}
  46. 46. sub compare_aligned{my ( $node0, $node1 ) = @_;my $sum = 0;my $dist = 0;while( @$node0 && @$node1 ){$dist = distance $node0, $node1// last;$sum += $dist;$_ = $_->[0] for $node0, $node1;}( $sum, $node0, $node1 )}● Comparealignedhands backthe unusedportion.● Caller getsback thenodes to re-align ifthere is agap.
  47. 47. Other Uses for Linked Lists● Convenient trees.● Each level of tree is a list.● The data is a list of children.● Balancing trees only updates a couple of next refs.● Arrays work but get expensive.● Need to copy entire sub-lists for modification.
  48. 48. Two-dimensional lists● If the data at each node is a list you get two-dimensional lists.● Four-way linked lists are the guts ofspreadsheets.● Inserting a column or row does not require re-allocating the existing list.● Deleting a row or column returns data to the heap.● A multiply-linked array allows immediate jumpsto neighboring cells.● Three-dimensional lists add “sheets”.
  49. 49. Work Queues● Pushing pairs of nodes onto an array makes forsimple queued analysis of the lists.● Adding to the list doesnt invalidate the queue.● Circular lists last node points back to the head.● Used for queues where one thread inserts newlinks, others remove them for processing.● Minimal locking reduces overhead.● New tasks get inserted after the last one.● Worker tasks just keep walking the list fromwhere they last slept.
  50. 50. Construct a circular linked list:DB<1> $list = [];DB<2> @$list = ( $list, head node );DB<3> $node = $list;DB<4> for( 1 .. 5 ) { $node->[0] = [ $node->[0], "node $_" ] }DB<5> x $list0 ARRAY(0xe87d00)0 ARRAY(0xe8a3a8)0 ARRAY(0xc79758)0 ARRAY(0xe888e0)0 ARRAY(0xea31b0)0 ARRAY(0xea31c8)0 ARRAY(0xe87d00)-> REUSED_ADDRESS1 node 11 node 21 node 31 node 41 node 51 head node● No end, use $node != $list as sentinel value.● weaken( $list->[0] ) if list goes out of scope.
  51. 51. Summary● Linked lists can be quite lazy for a variety ofuses in Perl.● Singly-linked lists are simple to implement,efficient for “walking” the list.● Tradeoff for random access:● Memory allocation for large, varying lists.● Simpler comparison of multiple lists.● Skip-chains.● Reduced locking in threads.