Linked Lists With Perl: Why bother?

Like this? Share it with your network

Share

Linked Lists With Perl: Why bother?

  • 5,036 views
Uploaded on

Linked lists can be useful in Perl for memory management , walking multiple lists, managing memory in long-lived tasks, or in threaded applications. This talk describes the basics of singly-linked......

Linked lists can be useful in Perl for memory management , walking multiple lists, managing memory in long-lived tasks, or in threaded applications. This talk describes the basics of singly-linked lists, the basics of code that make up LinkedList::Single, and shows some applications of the lists.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,036
On Slideshare
5,031
From Embeds
5
Number of Embeds
2

Actions

Shares
Downloads
48
Comments
0
Likes
2

Embeds 5

http://www.slideshare.net 3
http://192.168.33.10 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Linked Lists In PerlWhat?Why bother?Steven Lembark  Workhorse Computinglembark@wrkhors.com©2009,2013 Steven Lembark
  • 2. What About Arrays?● Perl arrays are fast and flexible.● Autovivication makes them generally easy to use.● Built-in Perl functions for managing them.● But there are a few limitations:● Memory manglement.● Iterating multiple lists.● Maintaining state within the lists.● Difficult to manage links within the list.
  • 3. Memory Issues● Pushing a single value onto an array candouble its size.● Copying array contents is particularly expensivefor long-lived processes which have problemswith heap fragmentation.● Heavily forked processes also run into problemswith copy-on-write due to whole-array copies.● There is no way to reduce the array structuresize once it grows -- also painful for long-livedprocesses.
  • 4. Comparing Multiple Lists● for( @array ) will iterate one list at a time.● Indexes work but have their own problems:● Single index requires identical offsets in all of thearrays.● Varying offsets require indexes for each array,with separate bookeeping for each.● Passing all of the arrays and objects becomes a lotof work, even if they are wrapped on objects –which have their own performance overhead.● Indexes used for external references have to beupdated with every shift, push, pop...
  • 5. Linked Lists: The Other Guys● Perly code for linked lists is simple, simplifiesmemory management, and list iteration.● Trade indexed access for skip-chains andexternal references.● List operators like push, pop, and splice aresimple.● You also get more granular locking in threadedapplications.
  • 6. Examples● Insertion sorts are stable, but splicing an iteminto the middle of an array is expensive;adding a new node to a linked list is cheap.● Threading can require locking an entire arrayto update it; linked lists can safely be lockedby node and frequently dont even requirelocking.● Comparing lists of nodes can be simpler thandealing with multiple arrays – especially if theoffsets change.
  • 7. Implementing Perly Linked Lists● Welcome back to arrayrefs.● Arrays can hold any type of data.● Use a “next” ref and arbitrary data.● The list itself requires a static “head” and anode variable to walk down the list.● Doubly-linked lists are also manageable withthe addition of weak links.● For the sake of time Ill only discuss singly-linked lists here.
  • 8. List Structure ● Singly-linked listsare usually drawnas some datafollowed by apointer to the nextlink.● In Perl it helps todraw the pointerfirst, because thatis where it isstored.
  • 9. Adding a Node● New nodes do notneed to becontiguous inmemory.● Also doesnt requirelocking the entirelist.
  • 10. Dropping A Node● Dropping a node releasesits memory – at last backto Perl.● Only real effect is on theprior nodes next ref.● This is the only piece thatneeds to be locked forthreading.
  • 11. Perl Code● A link followed by data looks like:$node = [ $next, @data ]● Walking the list copies the node:( $node, my @data ) = @$node;●Adding a new node recycles the “next” ref:$node->[0] = [ $node->[0], @data ]●Removing recycles the nexts next:($node->[0], my @data) = @{$node->[0]};
  • 12. A Reverse-Order List● Just update the head nodes next reference.● Fast because it moves the minimum of data.my $list = [ [] ];my $node = $list->[0];for my $val ( @_ ){$node->[0] = [ $node->[0], $val ]}# list is populated w/ empty tail.
  • 13. In-order List● Just move the node, looks like a push.● Could be a one-liner, Ive shown it here as twooperations.my $list = [ [] ];my $node = $list->[0];for my $val ( @_ ){@$node = ( [], $val );$node = $node->[0];}
  • 14. Viewing The List● Structure is recursive from Perls point of view.● Uses the one-line version (golf anyone)?DB<1> $list = [ [], head node ];DB<2> $node = $list->[0];DB<3> for( 1 .. 5 ) { ($node) = @$node = ( [], “node-$_” ) }DB<14> x $list0 ARRAY(0x8390608) $list0 ARRAY(0x83ee698) $list->[0]0 ARRAY(0x8411f88) $list->[0][0]0 ARRAY(0x83907c8) $list->[0][0][0]0 ARRAY(0x83f9a10) $list->[0][0][0][0]0 ARRAY(0x83f9a20) $list->[0][0][0][0][0]0 ARRAY(0x83f9a50) $list->[0][0][0][0][0][0]empty array empty tail node1 node-5 $list->[0][0][0][0][0][1]1 node-4 $list->[0][0][0][0][1]1 node-3 $list->[0][0][0][1]1 node-2 $list->[0][0][1]1 node-1 $list->[0][1]1 head node $list->[1]
  • 15. Destroying A Linked List● Prior to 5.12, Perls memory de-allocator isrecursive.● Without a DESTROY the lists blow up after 100nodes when perl blows is stack.● The fix was an iterative destructor:● This is no longer required.DESTROY{my $list = shift;$list = $list->[0] while $list;}
  • 16. Simple Linked List Class● Bless an arrayref with the head (placeholder)node and any data for tracking the list.sub new{my $proto = shift;bless [ [], @_ ], blessed $proto || $proto}# iterative < 5.12, else no-op.DESTROY {}
  • 17. Building the list: unshift● One reason for the head node: it provides aplace to insert the data nodes after.● The new first node has the old first nodes“next” ref and the new data.sub unshift{my $list = shift;$list->[0] = [ $list->[0], @_ ];$list}
  • 18. Taking one off: shift● This starts directly from the head node also:just replace the head nodes next with the firstnodes next.sub shift{my $list = shift;( $list->[0], my @data )= @{ $list >[0] };‑wantarray ? @data : @data}
  • 19. Push Is A Little Harder● One approach is an unshift before the tail.● Another is populating the tail node:sub push{my $list = shift;my $node = $list->[0];$node = $node->[0] while $node->[0];# populate the empty tail node@{ $node } = [ [], @_ ];$list}
  • 20. The Bane Of Single Links: pop● You need the node-before-the-tail to pop thetail.● By the time youve found the tail it is too lateto pop it off.● Storing the node-before-tail takes extrabookeeping.● The trick is to use two nodes, one trailing theother: when the small roast is burned the bigone is just right.
  • 21. sub node_pop{my $list = shift;my $prior = $list->head;my $node = $prior->[0];while( $node->[0] ){$prior = $node;$node = $node->[0];}( $prior->[0], my @data ) = @$node;wantarray ? @data : @data}● Lexical $prior is more efficient than examining$node->[0][0] at multiple points in the loop.
  • 22. Mixing OO & Procedural Code● Most of what follows could be done entirelywith method calls.● Catch: They are slower than directly accessingthe nodes.● One advantage to singly-linked lists is that thestructure is simple.● Mixing procedural and OO code without gettingtangled is easy enough.● This is why I use it for genetics code: the code isfast and simple.
  • 23. Walking A List● By itself the node has enough state to tracklocation – no separate index required.● Putting the link first allows advancing andextraction of data in one access of the node:my $node = $list->[0];while( $node ){( $node, my %info ) = @$node;# process %info...}
  • 24. Comparing Multiple Lists● Same code, just more assignments:my $n0 = $list0->[0];my $n1 = $list1->[0];while( $n0 && $n1 ){( $n0, my @data0 ) = @$n0;( $n1, my @data1 ) = @$n1;# deal with @data1, @data2...}
  • 25. Syncopated Lists● Adjusting the offsets requires minimumbookkeeping, doesnt affect the parent list.while( @$n0, @$n1 ){$_ = $_->[0] for $n0, $n1;aligned $n0, $n1or ( $n0, $n1 ) = realign $n0, $n1or last;$score += compare $n0, $n1;}
  • 26. Using The Head Node● $head->[0] is the first node, there are a fewuseful things to add into @head[1...].● Tracking the length or keeping a ref to the tail tailsimplifys push, pop; requires extra bookkeeping.● The head node can also store user-supplieddata describing the list.● I use this for tracking length and species names inresults of DNA sequences.
  • 27. Close, but no cigar...● The class shown only works at the head.● Be nice to insert things in the middle withoutresorting to $node variables.● Or call methods on the internal nodes.● A really useful class would use inside-out datato track the head, for example.● Cant assign $list = $list->[0], however.● Looses the inside-out data.● We need a structure that walks the list withoutmodifying its own refaddr.
  • 28. The fix: ref-to-arrayref● Scalar refs are the ultimate container struct.● They can reference anything, in this case an array.● $list stays in one place, $$list walks up the list.● “head” or “next” modify $$list to repositionthe location.● Saves blessing every node on the list.● Simplifies having a separate class for nodes.● Also helps when resorting to procedural code forspeed.
  • 29. Basics Dont Change Muchsub new{my $proto = shift;my $head = [ [], @_ ];my $list = $head;$headz{ refaddr $list } = $head;bless $list, blessed $proto || $proto}DESTROY # add iteration for < v5.12.{my $list = shift;delete $headz{ refaddr $list };}● List updates assign to $$list.● DESTROY cleans up inside-out data.
  • 30. Walking The Listsub head{my $list = shift$$list = $headz{ refaddr $list };$list}sub next{my $list = shift;( $$list, my @data ) = @$$listor returnwantarray ? @data : @data}$list->head;while( my @data = $list->next ) { ... }
  • 31. Reverse-order revisited:● Unshift isnt much different.● Note that $list is not updated.sub unshift{my $list = shift;my $head = $list->head;$head->[0] = [ $head->[0], @_ ];$list}my $list = List::Class->new( ... );$list->unshift( $_ ) for @data;
  • 32. Useful shortcut: add_after● Unlike arrays, adding into the middle of a list isefficient and common.● “push” adds to the tail, need something else.● add_after() puts a node after the current one.● unshift() is really “$list->head->add_after”.● Use with “next_node” that ignores the data.● In-order addition (head, middle, or end):$list->add_after( $_ )->next for @data;● Helps if next() avoids walking off the list.
  • 33. sub next{my $list = shift;my $next = $$list->[0] or return;@$next or return;$$list = $next;$list}sub add_after{my $list = shift;my $node = $$list;$node->[0] = [ $node->[0], @_ ]$list}
  • 34. Off-by-one gotchas● The head node does not have any user data.● Common mistake: $list->head->data.● This gets you the lists data, not users.● Fix is to pre-increment the list:$list->head->next or last;while( @data = $list->next ) {...}
  • 35. Overloading a list: bool, offset.● while( $list ), if( $list ) would be nice.● Basically, this leaves the list “true” if it hasdata; false if it is at the tail node.use overloadq{bool} =>sub{my $list = shift;$$list},
  • 36. Offsets would be nice alsoq{++} => sub{my $list = shift;my $node = $$list;@$node and $$list= $node->[0];$list},q{+} => sub{my ( $list, $offset ) = $_[2] ? ...my $node = $$list;for ( 1 .. $offset ){@$node && $node = $node->[0]or last;}$node},
  • 37. Updating the list becomes trivial● An offset from the list is a node.● That leaves += simply assigning $list + $off.q{+=} =>sub{my ( $list, $offset ) = …$$list = $list + $offset;$listh};
  • 38. Backdoor: node operations● Be nice to extract a node without having tocreep around inside of the object.● Handing back the node ref saves derivedclasses from having to unwrap the object.● Also save having to update the list objectslocation to peek into the next or head node.sub curr_node { ${ $_[0] } }sub next_node { ${ $_[0] }->[0] }sub root_node { $headz{ refaddr $_[0] } }sub head_node { $headz{ refaddr $_[0] }->[0] }
  • 39. Skip chains: bookmarks for lists● Separate list of interesting nodes.● Alphabetical sort would have skip-chain of firstletters or prefixes.● In-list might have ref to next “interesting” node.● Placeholders simplify bookkeeping.● For alphabetic, pre-load A .. Z into the list.● Saves updating the skip chain for inserts prior tothe currently referenced node.
  • 40. Applying Perly Linked Lists● I use them in the W-curve code for comparingDNA sequences.● The comparison has to deal with different sizes,local gaps between the curves.● Comparison requires fast lookahead for the nextinteresting node on the list.● Nodes and skip chains do the job nicely.● List structure allows efficient node updateswithout disturbing the object.
  • 41. W-curve is derived from LL::S● Nodes have three spatial values and a skip-chaininitialized after the list is initialized.sub initialize{my ( $wc, $dna ) = @$_;my $pt = [ 0, 0, 0 ];$wc->head->truncate;while( my $a = substr $dna, 0, 1, ){$pt = $wc->next_point( $a, $pt );$wc->add_after( @$pt, )->next;}$wc}
  • 42. Skip-chain looks for “peaks”● The alignment algorithm looks for nearbypoints ignoring their Z value.● Comparing the sparse list of radii > 0.50 speedsup the alignment.● Skip-chains for each node point to the next nodewith a large-enough radius.● Building the skip chain uses an “inch worm”.● The head walks up to the next useful node.● The tail fills ones between with a node refrence.
  • 43. Skip Chain:“interesting”nodes.sub add_skip{my $list = shift;my $node = $list->head_node;my $skip = $node->[0];for( 1 .. $list->size ){$skip->[1] > $cutoff or next;while( $node != $skip ){$node->[4] = $skip; # replace “”$node = $node->[0]; # next node}}continue{$skip = $skip->[0];}}
  • 44. Use nodes to compare lists.● DNA sequences can become offset due to gapson either sequence.● This prevents using a single index to comparelists stored as arrays.● A linked list can be re-aligned passing only thenodes.● Comparison can be re-started with only the nodes.● Makes for a useful mix of OO and proceduralcode.
  • 45. ● Re-aligningthe listssimplyrequiresassigning thelocal values.● Updating thenode varsdoes notaffect the listlocations ifcomparefails.sub compare_lists{...while( @$node0 && @$node1 ){( $node0, $node1 )= realign_nodes $node0, $node1or last;( $dist, $node0, $node1 )= compare_aligned $node0, $node1;$score += $dist;}if( defined $score ){$list0->node( $node0 );$list1->node( $node1 );}# caller gets back unused portion.( $score, $list0, $list1 )}
  • 46. sub compare_aligned{my ( $node0, $node1 ) = @_;my $sum = 0;my $dist = 0;while( @$node0 && @$node1 ){$dist = distance $node0, $node1// last;$sum += $dist;$_ = $_->[0] for $node0, $node1;}( $sum, $node0, $node1 )}● Comparealignedhands backthe unusedportion.● Caller getsback thenodes to re-align ifthere is agap.
  • 47. Other Uses for Linked Lists● Convenient trees.● Each level of tree is a list.● The data is a list of children.● Balancing trees only updates a couple of next refs.● Arrays work but get expensive.● Need to copy entire sub-lists for modification.
  • 48. Two-dimensional lists● If the data at each node is a list you get two-dimensional lists.● Four-way linked lists are the guts ofspreadsheets.● Inserting a column or row does not require re-allocating the existing list.● Deleting a row or column returns data to the heap.● A multiply-linked array allows immediate jumpsto neighboring cells.● Three-dimensional lists add “sheets”.
  • 49. Work Queues● Pushing pairs of nodes onto an array makes forsimple queued analysis of the lists.● Adding to the list doesnt invalidate the queue.● Circular lists last node points back to the head.● Used for queues where one thread inserts newlinks, others remove them for processing.● Minimal locking reduces overhead.● New tasks get inserted after the last one.● Worker tasks just keep walking the list fromwhere they last slept.
  • 50. Construct a circular linked list:DB<1> $list = [];DB<2> @$list = ( $list, head node );DB<3> $node = $list;DB<4> for( 1 .. 5 ) { $node->[0] = [ $node->[0], "node $_" ] }DB<5> x $list0 ARRAY(0xe87d00)0 ARRAY(0xe8a3a8)0 ARRAY(0xc79758)0 ARRAY(0xe888e0)0 ARRAY(0xea31b0)0 ARRAY(0xea31c8)0 ARRAY(0xe87d00)-> REUSED_ADDRESS1 node 11 node 21 node 31 node 41 node 51 head node● No end, use $node != $list as sentinel value.● weaken( $list->[0] ) if list goes out of scope.
  • 51. Summary● Linked lists can be quite lazy for a variety ofuses in Perl.● Singly-linked lists are simple to implement,efficient for “walking” the list.● Tradeoff for random access:● Memory allocation for large, varying lists.● Simpler comparison of multiple lists.● Skip-chains.● Reduced locking in threads.