Linked Lists In Perl
What?
Why bother?
Steven Lembark  
Workhorse Computing
lembark@wrkhors.com
©2009,2013 Steven Lembark
What About Arrays?
● Perl arrays are fast and flexible.
● Autovivication makes them generally easy to use.
● Built-in Perl functions for managing them.
● But there are a few limitations:
● Memory manglement.
● Iterating multiple lists.
● Maintaining state within the lists.
● Difficult to manage links within the list.
Memory Issues
● Pushing a single value onto an array can
double its size.
● Copying array contents is particularly expensive
for long-lived processes which have problems
with heap fragmentation.
● Heavily forked processes also run into problems
with copy-on-write due to whole-array copies.
● There is no way to reduce the array structure
size once it grows -- also painful for long-lived
processes.
Comparing Multiple Lists
● for( @array ) will iterate one list at a time.
● Indexes work but have their own problems:
● Single index requires identical offsets in all of the
arrays.
● Varying offsets require indexes for each array,
with separate bookeeping for each.
● Passing all of the arrays and objects becomes a lot
of work, even if they are wrapped on objects –
which have their own performance overhead.
● Indexes used for external references have to be
updated with every shift, push, pop...
Linked Lists: The Other Guys
● Perly code for linked lists is simple, simplifies
memory management, and list iteration.
● Trade indexed access for skip-chains and
external references.
● List operators like push, pop, and splice are
simple.
● You also get more granular locking in threaded
applications.
Examples
● Insertion sorts are stable, but splicing an item
into the middle of an array is expensive;
adding a new node to a linked list is cheap.
● Threading can require locking an entire array
to update it; linked lists can safely be locked
by node and frequently don't even require
locking.
● Comparing lists of nodes can be simpler than
dealing with multiple arrays – especially if the
offsets change.
Implementing Perly Linked Lists
● Welcome back to arrayrefs.
● Arrays can hold any type of data.
● Use a “next” ref and arbitrary data.
● The list itself requires a static “head” and a
node variable to walk down the list.
● Doubly-linked lists are also manageable with
the addition of weak links.
● For the sake of time I'll only discuss singly-
linked lists here.
List Structure ● Singly-linked lists
are usually drawn
as some data
followed by a
pointer to the next
link.
● In Perl it helps to
draw the pointer
first, because that
is where it is
stored.
Adding a Node
● New nodes do not
need to be
contiguous in
memory.
● Also doesn't require
locking the entire
list.
Dropping A Node
● Dropping a node releases
its memory – at last back
to Perl.
● Only real effect is on the
prior node's next ref.
● This is the only piece that
needs to be locked for
threading.
Perl Code
● A link followed by data looks like:
$node = [ $next, @data ]
● Walking the list copies the node:
( $node, my @data ) = @$node;
●
Adding a new node recycles the “next” ref:
$node->[0] = [ $node->[0], @data ]
●
Removing recycles the next's next:
($node->[0], my @data) = @{$node->[0]};
A Reverse-Order List
● Just update the head node's next reference.
● Fast because it moves the minimum of data.
my $list = [ [] ];
my $node = $list->[0];
for my $val ( @_ )
{
$node->[0] = [ $node->[0], $val ]
}
# list is populated w/ empty tail.
In-order List
● Just move the node, looks like a push.
● Could be a one-liner, I've shown it here as two
operations.
my $list = [ [] ];
my $node = $list->[0];
for my $val ( @_ )
{
@$node = ( [], $val );
$node = $node->[0];
}
Viewing The List
● Structure is recursive from Perl's point of view.
● Uses the one-line version (golf anyone)?
DB<1> $list = [ [], 'head node' ];
DB<2> $node = $list->[0];
DB<3> for( 1 .. 5 ) { ($node) = @$node = ( [], “node-$_” ) }
DB<14> x $list
0 ARRAY(0x8390608) $list
0 ARRAY(0x83ee698) $list->[0]
0 ARRAY(0x8411f88) $list->[0][0]
0 ARRAY(0x83907c8) $list->[0][0][0]
0 ARRAY(0x83f9a10) $list->[0][0][0][0]
0 ARRAY(0x83f9a20) $list->[0][0][0][0][0]
0 ARRAY(0x83f9a50) $list->[0][0][0][0][0][0]
empty array empty tail node
1 'node-5' $list->[0][0][0][0][0][1]
1 'node-4' $list->[0][0][0][0][1]
1 'node-3' $list->[0][0][0][1]
1 'node-2' $list->[0][0][1]
1 'node-1' $list->[0][1]
1 'head node' $list->[1]
Destroying A Linked List
● Prior to 5.12, Perl's memory de-allocator is
recursive.
● Without a DESTROY the lists blow up after 100
nodes when perl blows is stack.
● The fix was an iterative destructor:
● This is no longer required.
DESTROY
{
my $list = shift;
$list = $list->[0] while $list;
}
Simple Linked List Class
● Bless an arrayref with the head (placeholder)
node and any data for tracking the list.
sub new
{
my $proto = shift;
bless [ [], @_ ], blessed $proto || $proto
}
# iterative < 5.12, else no-op.
DESTROY {}
Building the list: unshift
● One reason for the head node: it provides a
place to insert the data nodes after.
● The new first node has the old first node's
“next” ref and the new data.
sub unshift
{
my $list = shift;
$list->[0] = [ $list->[0], @_ ];
$list
}
Taking one off: shift
● This starts directly from the head node also:
just replace the head node's next with the first
node's next.
sub shift
{
my $list = shift;
( $list->[0], my @data )
= @{ $list >[0] };‑
wantarray ? @data : @data
}
Push Is A Little Harder
● One approach is an unshift before the tail.
● Another is populating the tail node:
sub push
{
my $list = shift;
my $node = $list->[0];
$node = $node->[0] while $node->[0];
# populate the empty tail node
@{ $node } = [ [], @_ ];
$list
}
The Bane Of Single Links: pop
● You need the node-before-the-tail to pop the
tail.
● By the time you've found the tail it is too late
to pop it off.
● Storing the node-before-tail takes extra
bookeeping.
● The trick is to use two nodes, one trailing the
other: when the small roast is burned the big
one is just right.
sub node_pop
{
my $list = shift;
my $prior = $list->head;
my $node = $prior->[0];
while( $node->[0] )
{
$prior = $node;
$node = $node->[0];
}
( $prior->[0], my @data ) = @$node;
wantarray ? @data : @data
}
● Lexical $prior is more efficient than examining
$node->[0][0] at multiple points in the loop.
Mixing OO & Procedural Code
● Most of what follows could be done entirely
with method calls.
● Catch: They are slower than directly accessing
the nodes.
● One advantage to singly-linked lists is that the
structure is simple.
● Mixing procedural and OO code without getting
tangled is easy enough.
● This is why I use it for genetics code: the code is
fast and simple.
Walking A List
● By itself the node has enough state to track
location – no separate index required.
● Putting the link first allows advancing and
extraction of data in one access of the node:
my $node = $list->[0];
while( $node )
{
( $node, my %info ) = @$node;
# process %info...
}
Comparing Multiple Lists
● Same code, just more assignments:
my $n0 = $list0->[0];
my $n1 = $list1->[0];
while( $n0 && $n1 )
{
( $n0, my @data0 ) = @$n0;
( $n1, my @data1 ) = @$n1;
# deal with @data1, @data2
...
}
Syncopated Lists
● Adjusting the offsets requires minimum
bookkeeping, doesn't affect the parent list.
while( @$n0, @$n1 )
{
$_ = $_->[0] for $n0, $n1;
aligned $n0, $n1
or ( $n0, $n1 ) = realign $n0, $n1
or last;
$score += compare $n0, $n1;
}
Using The Head Node
● $head->[0] is the first node, there are a few
useful things to add into @head[1...].
● Tracking the length or keeping a ref to the tail tail
simplifys push, pop; requires extra bookkeeping.
● The head node can also store user-supplied
data describing the list.
● I use this for tracking length and species names in
results of DNA sequences.
Close, but no cigar...
● The class shown only works at the head.
● Be nice to insert things in the middle without
resorting to $node variables.
● Or call methods on the internal nodes.
● A really useful class would use inside-out data
to track the head, for example.
● Can't assign $list = $list->[0], however.
● Looses the inside-out data.
● We need a structure that walks the list without
modifying its own refaddr.
The fix: ref-to-arrayref
● Scalar refs are the ultimate container struct.
● They can reference anything, in this case an array.
● $list stays in one place, $$list walks up the list.
● “head” or “next” modify $$list to reposition
the location.
● Saves blessing every node on the list.
● Simplifies having a separate class for nodes.
● Also helps when resorting to procedural code for
speed.
Basics Don't Change Much
sub new
{
my $proto = shift;
my $head = [ [], @_ ];
my $list = $head;
$headz{ refaddr $list } = $head;
bless $list, blessed $proto || $proto
}
DESTROY # add iteration for < v5.12.
{
my $list = shift;
delete $headz{ refaddr $list };
}
● List updates assign to $$list.
● DESTROY cleans up inside-out data.
Walking The List
sub head
{
my $list = shift
$$list = $headz{ refaddr $list };
$list
}
sub next
{
my $list = shift;
( $$list, my @data ) = @$$list
or return
wantarray ? @data : @data
}
$list->head;
while( my @data = $list->next ) { ... }
Reverse-order revisited:
● Unshift isn't much different.
● Note that $list is not updated.
sub unshift
{
my $list = shift;
my $head = $list->head;
$head->[0] = [ $head->[0], @_ ];
$list
}
my $list = List::Class->new( ... );
$list->unshift( $_ ) for @data;
Useful shortcut: 'add_after'
● Unlike arrays, adding into the middle of a list is
efficient and common.
● “push” adds to the tail, need something else.
● add_after() puts a node after the current one.
● unshift() is really “$list->head->add_after”.
● Use with “next_node” that ignores the data.
● In-order addition (head, middle, or end):
$list->add_after( $_ )->next for @data;
● Helps if next() avoids walking off the list.
sub next
{
my $list = shift;
my $next = $$list->[0] or return;
@$next or return;
$$list = $next;
$list
}
sub add_after
{
my $list = shift;
my $node = $$list;
$node->[0] = [ $node->[0], @_ ]
$list
}
Off-by-one gotchas
● The head node does not have any user data.
● Common mistake: $list->head->data.
● This gets you the list's data, not users'.
● Fix is to pre-increment the list:
$list->head->next or last;
while( @data = $list->next ) {...}
Overloading a list: bool, offset.
● while( $list ), if( $list ) would be nice.
● Basically, this leaves the list “true” if it has
data; false if it is at the tail node.
use overload
q{bool} =>
sub
{
my $list = shift;
$$list
},
Offsets would be nice also
q{++} => sub
{
my $list = shift;
my $node = $$list;
@$node and $$list= $node->[0];
$list
},
q{+} => sub
{
my ( $list, $offset ) = $_[2] ? ...
my $node = $$list;
for ( 1 .. $offset )
{
@$node && $node = $node->[0]
or last;
}
$node
},
Updating the list becomes trivial
● An offset from the list is a node.
● That leaves += simply assigning $list + $off.
q{+=} =>
sub
{
my ( $list, $offset ) = …
$$list = $list + $offset;
$listh
};
Backdoor: node operations
● Be nice to extract a node without having to
creep around inside of the object.
● Handing back the node ref saves derived
classes from having to unwrap the object.
● Also save having to update the list object's
location to peek into the next or head node.
sub curr_node { ${ $_[0] } }
sub next_node { ${ $_[0] }->[0] }
sub root_node { $headz{ refaddr $_[0] } }
sub head_node { $headz{ refaddr $_[0] }->[0] }
Skip chains: bookmarks for lists
● Separate list of 'interesting' nodes.
● Alphabetical sort would have skip-chain of first
letters or prefixes.
● In-list might have ref to next “interesting” node.
● Placeholders simplify bookkeeping.
● For alphabetic, pre-load 'A' .. 'Z' into the list.
● Saves updating the skip chain for inserts prior to
the currently referenced node.
Applying Perly Linked Lists
● I use them in the W-curve code for comparing
DNA sequences.
● The comparison has to deal with different sizes,
local gaps between the curves.
● Comparison requires fast lookahead for the next
'interesting' node on the list.
● Nodes and skip chains do the job nicely.
● List structure allows efficient node updates
without disturbing the object.
W-curve is derived from LL::S
● Nodes have three spatial values and a skip-chain
initialized after the list is initialized.
sub initialize
{
my ( $wc, $dna ) = @$_;
my $pt = [ 0, 0, 0 ];
$wc->head->truncate;
while( my $a = substr $dna, 0, 1, '' )
{
$pt = $wc->next_point( $a, $pt );
$wc->add_after( @$pt, '' )->next;
}
$wc
}
Skip-chain looks for “peaks”
● The alignment algorithm looks for nearby
points ignoring their Z value.
● Comparing the sparse list of radii > 0.50 speeds
up the alignment.
● Skip-chains for each node point to the next node
with a large-enough radius.
● Building the skip chain uses an “inch worm”.
● The head walks up to the next useful node.
● The tail fills ones between with a node refrence.
Skip Chain:
“interesting”
nodes.
sub add_skip
{
my $list = shift;
my $node = $list->head_node;
my $skip = $node->[0];
for( 1 .. $list->size )
{
$skip->[1] > $cutoff or next;
while( $node != $skip )
{
$node->[4] = $skip; # replace “”
$node = $node->[0]; # next node
}
}
continue
{
$skip = $skip->[0];
}
}
Use nodes to compare lists.
● DNA sequences can become offset due to gaps
on either sequence.
● This prevents using a single index to compare
lists stored as arrays.
● A linked list can be re-aligned passing only the
nodes.
● Comparison can be re-started with only the nodes.
● Makes for a useful mix of OO and procedural
code.
● Re-aligning
the lists
simply
requires
assigning the
local values.
● Updating the
node var's
does not
affect the list
locations if
compare
fails.
sub compare_lists
{
...
while( @$node0 && @$node1 )
{
( $node0, $node1 )
= realign_nodes $node0, $node1
or last;
( $dist, $node0, $node1 )
= compare_aligned $node0, $node1;
$score += $dist;
}
if( defined $score )
{
$list0->node( $node0 );
$list1->node( $node1 );
}
# caller gets back unused portion.
( $score, $list0, $list1 )
}
sub compare_aligned
{
my ( $node0, $node1 ) = @_;
my $sum = 0;
my $dist = 0;
while( @$node0 && @$node1 )
{
$dist = distance $node0, $node1
// last;
$sum += $dist;
$_ = $_->[0] for $node0, $node1;
}
( $sum, $node0, $node1 )
}
● Compare
aligned
hands back
the unused
portion.
● Caller gets
back the
nodes to re-
align if
there is a
gap.
Other Uses for Linked Lists
● Convenient trees.
● Each level of tree is a list.
● The data is a list of children.
● Balancing trees only updates a couple of next refs.
● Arrays work but get expensive.
● Need to copy entire sub-lists for modification.
Two-dimensional lists
● If the data at each node is a list you get two-
dimensional lists.
● Four-way linked lists are the guts of
spreadsheets.
● Inserting a column or row does not require re-
allocating the existing list.
● Deleting a row or column returns data to the heap.
● A multiply-linked array allows immediate jumps
to neighboring cells.
● Three-dimensional lists add “sheets”.
Work Queues
● Pushing pairs of nodes onto an array makes for
simple queued analysis of the lists.
● Adding to the list doesn't invalidate the queue.
● Circular lists' last node points back to the head.
● Used for queues where one thread inserts new
links, others remove them for processing.
● Minimal locking reduces overhead.
● New tasks get inserted after the last one.
● Worker tasks just keep walking the list from
where they last slept.
Construct a circular linked list:
DB<1> $list = [];
DB<2> @$list = ( $list, 'head node' );
DB<3> $node = $list;
DB<4> for( 1 .. 5 ) { $node->[0] = [ $node->[0], "node $_" ] }
DB<5> x $list
0 ARRAY(0xe87d00)
0 ARRAY(0xe8a3a8)
0 ARRAY(0xc79758)
0 ARRAY(0xe888e0)
0 ARRAY(0xea31b0)
0 ARRAY(0xea31c8)
0 ARRAY(0xe87d00)
-> REUSED_ADDRESS
1 'node 1'
1 'node 2'
1 'node 3'
1 'node 4'
1 'node 5'
1 'head node
● No end, use $node != $list as sentinel value.
● weaken( $list->[0] ) if list goes out of scope.
Summary
● Linked lists can be quite lazy for a variety of
uses in Perl.
● Singly-linked lists are simple to implement,
efficient for “walking” the list.
● Tradeoff for random access:
● Memory allocation for large, varying lists.
● Simpler comparison of multiple lists.
● Skip-chains.
● Reduced locking in threads.

Linked Lists With Perl: Why bother?

  • 1.
    Linked Lists InPerl What? Why bother? Steven Lembark   Workhorse Computing lembark@wrkhors.com ©2009,2013 Steven Lembark
  • 2.
    What About Arrays? ●Perl arrays are fast and flexible. ● Autovivication makes them generally easy to use. ● Built-in Perl functions for managing them. ● But there are a few limitations: ● Memory manglement. ● Iterating multiple lists. ● Maintaining state within the lists. ● Difficult to manage links within the list.
  • 3.
    Memory Issues ● Pushinga single value onto an array can double its size. ● Copying array contents is particularly expensive for long-lived processes which have problems with heap fragmentation. ● Heavily forked processes also run into problems with copy-on-write due to whole-array copies. ● There is no way to reduce the array structure size once it grows -- also painful for long-lived processes.
  • 4.
    Comparing Multiple Lists ●for( @array ) will iterate one list at a time. ● Indexes work but have their own problems: ● Single index requires identical offsets in all of the arrays. ● Varying offsets require indexes for each array, with separate bookeeping for each. ● Passing all of the arrays and objects becomes a lot of work, even if they are wrapped on objects – which have their own performance overhead. ● Indexes used for external references have to be updated with every shift, push, pop...
  • 5.
    Linked Lists: TheOther Guys ● Perly code for linked lists is simple, simplifies memory management, and list iteration. ● Trade indexed access for skip-chains and external references. ● List operators like push, pop, and splice are simple. ● You also get more granular locking in threaded applications.
  • 6.
    Examples ● Insertion sortsare stable, but splicing an item into the middle of an array is expensive; adding a new node to a linked list is cheap. ● Threading can require locking an entire array to update it; linked lists can safely be locked by node and frequently don't even require locking. ● Comparing lists of nodes can be simpler than dealing with multiple arrays – especially if the offsets change.
  • 7.
    Implementing Perly LinkedLists ● Welcome back to arrayrefs. ● Arrays can hold any type of data. ● Use a “next” ref and arbitrary data. ● The list itself requires a static “head” and a node variable to walk down the list. ● Doubly-linked lists are also manageable with the addition of weak links. ● For the sake of time I'll only discuss singly- linked lists here.
  • 8.
    List Structure ●Singly-linked lists are usually drawn as some data followed by a pointer to the next link. ● In Perl it helps to draw the pointer first, because that is where it is stored.
  • 9.
    Adding a Node ●New nodes do not need to be contiguous in memory. ● Also doesn't require locking the entire list.
  • 10.
    Dropping A Node ●Dropping a node releases its memory – at last back to Perl. ● Only real effect is on the prior node's next ref. ● This is the only piece that needs to be locked for threading.
  • 11.
    Perl Code ● Alink followed by data looks like: $node = [ $next, @data ] ● Walking the list copies the node: ( $node, my @data ) = @$node; ● Adding a new node recycles the “next” ref: $node->[0] = [ $node->[0], @data ] ● Removing recycles the next's next: ($node->[0], my @data) = @{$node->[0]};
  • 12.
    A Reverse-Order List ●Just update the head node's next reference. ● Fast because it moves the minimum of data. my $list = [ [] ]; my $node = $list->[0]; for my $val ( @_ ) { $node->[0] = [ $node->[0], $val ] } # list is populated w/ empty tail.
  • 13.
    In-order List ● Justmove the node, looks like a push. ● Could be a one-liner, I've shown it here as two operations. my $list = [ [] ]; my $node = $list->[0]; for my $val ( @_ ) { @$node = ( [], $val ); $node = $node->[0]; }
  • 14.
    Viewing The List ●Structure is recursive from Perl's point of view. ● Uses the one-line version (golf anyone)? DB<1> $list = [ [], 'head node' ]; DB<2> $node = $list->[0]; DB<3> for( 1 .. 5 ) { ($node) = @$node = ( [], “node-$_” ) } DB<14> x $list 0 ARRAY(0x8390608) $list 0 ARRAY(0x83ee698) $list->[0] 0 ARRAY(0x8411f88) $list->[0][0] 0 ARRAY(0x83907c8) $list->[0][0][0] 0 ARRAY(0x83f9a10) $list->[0][0][0][0] 0 ARRAY(0x83f9a20) $list->[0][0][0][0][0] 0 ARRAY(0x83f9a50) $list->[0][0][0][0][0][0] empty array empty tail node 1 'node-5' $list->[0][0][0][0][0][1] 1 'node-4' $list->[0][0][0][0][1] 1 'node-3' $list->[0][0][0][1] 1 'node-2' $list->[0][0][1] 1 'node-1' $list->[0][1] 1 'head node' $list->[1]
  • 15.
    Destroying A LinkedList ● Prior to 5.12, Perl's memory de-allocator is recursive. ● Without a DESTROY the lists blow up after 100 nodes when perl blows is stack. ● The fix was an iterative destructor: ● This is no longer required. DESTROY { my $list = shift; $list = $list->[0] while $list; }
  • 16.
    Simple Linked ListClass ● Bless an arrayref with the head (placeholder) node and any data for tracking the list. sub new { my $proto = shift; bless [ [], @_ ], blessed $proto || $proto } # iterative < 5.12, else no-op. DESTROY {}
  • 17.
    Building the list:unshift ● One reason for the head node: it provides a place to insert the data nodes after. ● The new first node has the old first node's “next” ref and the new data. sub unshift { my $list = shift; $list->[0] = [ $list->[0], @_ ]; $list }
  • 18.
    Taking one off:shift ● This starts directly from the head node also: just replace the head node's next with the first node's next. sub shift { my $list = shift; ( $list->[0], my @data ) = @{ $list >[0] };‑ wantarray ? @data : @data }
  • 19.
    Push Is ALittle Harder ● One approach is an unshift before the tail. ● Another is populating the tail node: sub push { my $list = shift; my $node = $list->[0]; $node = $node->[0] while $node->[0]; # populate the empty tail node @{ $node } = [ [], @_ ]; $list }
  • 20.
    The Bane OfSingle Links: pop ● You need the node-before-the-tail to pop the tail. ● By the time you've found the tail it is too late to pop it off. ● Storing the node-before-tail takes extra bookeeping. ● The trick is to use two nodes, one trailing the other: when the small roast is burned the big one is just right.
  • 21.
    sub node_pop { my $list= shift; my $prior = $list->head; my $node = $prior->[0]; while( $node->[0] ) { $prior = $node; $node = $node->[0]; } ( $prior->[0], my @data ) = @$node; wantarray ? @data : @data } ● Lexical $prior is more efficient than examining $node->[0][0] at multiple points in the loop.
  • 22.
    Mixing OO &Procedural Code ● Most of what follows could be done entirely with method calls. ● Catch: They are slower than directly accessing the nodes. ● One advantage to singly-linked lists is that the structure is simple. ● Mixing procedural and OO code without getting tangled is easy enough. ● This is why I use it for genetics code: the code is fast and simple.
  • 23.
    Walking A List ●By itself the node has enough state to track location – no separate index required. ● Putting the link first allows advancing and extraction of data in one access of the node: my $node = $list->[0]; while( $node ) { ( $node, my %info ) = @$node; # process %info... }
  • 24.
    Comparing Multiple Lists ●Same code, just more assignments: my $n0 = $list0->[0]; my $n1 = $list1->[0]; while( $n0 && $n1 ) { ( $n0, my @data0 ) = @$n0; ( $n1, my @data1 ) = @$n1; # deal with @data1, @data2 ... }
  • 25.
    Syncopated Lists ● Adjustingthe offsets requires minimum bookkeeping, doesn't affect the parent list. while( @$n0, @$n1 ) { $_ = $_->[0] for $n0, $n1; aligned $n0, $n1 or ( $n0, $n1 ) = realign $n0, $n1 or last; $score += compare $n0, $n1; }
  • 26.
    Using The HeadNode ● $head->[0] is the first node, there are a few useful things to add into @head[1...]. ● Tracking the length or keeping a ref to the tail tail simplifys push, pop; requires extra bookkeeping. ● The head node can also store user-supplied data describing the list. ● I use this for tracking length and species names in results of DNA sequences.
  • 27.
    Close, but nocigar... ● The class shown only works at the head. ● Be nice to insert things in the middle without resorting to $node variables. ● Or call methods on the internal nodes. ● A really useful class would use inside-out data to track the head, for example. ● Can't assign $list = $list->[0], however. ● Looses the inside-out data. ● We need a structure that walks the list without modifying its own refaddr.
  • 28.
    The fix: ref-to-arrayref ●Scalar refs are the ultimate container struct. ● They can reference anything, in this case an array. ● $list stays in one place, $$list walks up the list. ● “head” or “next” modify $$list to reposition the location. ● Saves blessing every node on the list. ● Simplifies having a separate class for nodes. ● Also helps when resorting to procedural code for speed.
  • 29.
    Basics Don't ChangeMuch sub new { my $proto = shift; my $head = [ [], @_ ]; my $list = $head; $headz{ refaddr $list } = $head; bless $list, blessed $proto || $proto } DESTROY # add iteration for < v5.12. { my $list = shift; delete $headz{ refaddr $list }; } ● List updates assign to $$list. ● DESTROY cleans up inside-out data.
  • 30.
    Walking The List subhead { my $list = shift $$list = $headz{ refaddr $list }; $list } sub next { my $list = shift; ( $$list, my @data ) = @$$list or return wantarray ? @data : @data } $list->head; while( my @data = $list->next ) { ... }
  • 31.
    Reverse-order revisited: ● Unshiftisn't much different. ● Note that $list is not updated. sub unshift { my $list = shift; my $head = $list->head; $head->[0] = [ $head->[0], @_ ]; $list } my $list = List::Class->new( ... ); $list->unshift( $_ ) for @data;
  • 32.
    Useful shortcut: 'add_after' ●Unlike arrays, adding into the middle of a list is efficient and common. ● “push” adds to the tail, need something else. ● add_after() puts a node after the current one. ● unshift() is really “$list->head->add_after”. ● Use with “next_node” that ignores the data. ● In-order addition (head, middle, or end): $list->add_after( $_ )->next for @data; ● Helps if next() avoids walking off the list.
  • 33.
    sub next { my $list= shift; my $next = $$list->[0] or return; @$next or return; $$list = $next; $list } sub add_after { my $list = shift; my $node = $$list; $node->[0] = [ $node->[0], @_ ] $list }
  • 34.
    Off-by-one gotchas ● Thehead node does not have any user data. ● Common mistake: $list->head->data. ● This gets you the list's data, not users'. ● Fix is to pre-increment the list: $list->head->next or last; while( @data = $list->next ) {...}
  • 35.
    Overloading a list:bool, offset. ● while( $list ), if( $list ) would be nice. ● Basically, this leaves the list “true” if it has data; false if it is at the tail node. use overload q{bool} => sub { my $list = shift; $$list },
  • 36.
    Offsets would benice also q{++} => sub { my $list = shift; my $node = $$list; @$node and $$list= $node->[0]; $list }, q{+} => sub { my ( $list, $offset ) = $_[2] ? ... my $node = $$list; for ( 1 .. $offset ) { @$node && $node = $node->[0] or last; } $node },
  • 37.
    Updating the listbecomes trivial ● An offset from the list is a node. ● That leaves += simply assigning $list + $off. q{+=} => sub { my ( $list, $offset ) = … $$list = $list + $offset; $listh };
  • 38.
    Backdoor: node operations ●Be nice to extract a node without having to creep around inside of the object. ● Handing back the node ref saves derived classes from having to unwrap the object. ● Also save having to update the list object's location to peek into the next or head node. sub curr_node { ${ $_[0] } } sub next_node { ${ $_[0] }->[0] } sub root_node { $headz{ refaddr $_[0] } } sub head_node { $headz{ refaddr $_[0] }->[0] }
  • 39.
    Skip chains: bookmarksfor lists ● Separate list of 'interesting' nodes. ● Alphabetical sort would have skip-chain of first letters or prefixes. ● In-list might have ref to next “interesting” node. ● Placeholders simplify bookkeeping. ● For alphabetic, pre-load 'A' .. 'Z' into the list. ● Saves updating the skip chain for inserts prior to the currently referenced node.
  • 40.
    Applying Perly LinkedLists ● I use them in the W-curve code for comparing DNA sequences. ● The comparison has to deal with different sizes, local gaps between the curves. ● Comparison requires fast lookahead for the next 'interesting' node on the list. ● Nodes and skip chains do the job nicely. ● List structure allows efficient node updates without disturbing the object.
  • 41.
    W-curve is derivedfrom LL::S ● Nodes have three spatial values and a skip-chain initialized after the list is initialized. sub initialize { my ( $wc, $dna ) = @$_; my $pt = [ 0, 0, 0 ]; $wc->head->truncate; while( my $a = substr $dna, 0, 1, '' ) { $pt = $wc->next_point( $a, $pt ); $wc->add_after( @$pt, '' )->next; } $wc }
  • 42.
    Skip-chain looks for“peaks” ● The alignment algorithm looks for nearby points ignoring their Z value. ● Comparing the sparse list of radii > 0.50 speeds up the alignment. ● Skip-chains for each node point to the next node with a large-enough radius. ● Building the skip chain uses an “inch worm”. ● The head walks up to the next useful node. ● The tail fills ones between with a node refrence.
  • 43.
    Skip Chain: “interesting” nodes. sub add_skip { my$list = shift; my $node = $list->head_node; my $skip = $node->[0]; for( 1 .. $list->size ) { $skip->[1] > $cutoff or next; while( $node != $skip ) { $node->[4] = $skip; # replace “” $node = $node->[0]; # next node } } continue { $skip = $skip->[0]; } }
  • 44.
    Use nodes tocompare lists. ● DNA sequences can become offset due to gaps on either sequence. ● This prevents using a single index to compare lists stored as arrays. ● A linked list can be re-aligned passing only the nodes. ● Comparison can be re-started with only the nodes. ● Makes for a useful mix of OO and procedural code.
  • 45.
    ● Re-aligning the lists simply requires assigningthe local values. ● Updating the node var's does not affect the list locations if compare fails. sub compare_lists { ... while( @$node0 && @$node1 ) { ( $node0, $node1 ) = realign_nodes $node0, $node1 or last; ( $dist, $node0, $node1 ) = compare_aligned $node0, $node1; $score += $dist; } if( defined $score ) { $list0->node( $node0 ); $list1->node( $node1 ); } # caller gets back unused portion. ( $score, $list0, $list1 ) }
  • 46.
    sub compare_aligned { my ($node0, $node1 ) = @_; my $sum = 0; my $dist = 0; while( @$node0 && @$node1 ) { $dist = distance $node0, $node1 // last; $sum += $dist; $_ = $_->[0] for $node0, $node1; } ( $sum, $node0, $node1 ) } ● Compare aligned hands back the unused portion. ● Caller gets back the nodes to re- align if there is a gap.
  • 47.
    Other Uses forLinked Lists ● Convenient trees. ● Each level of tree is a list. ● The data is a list of children. ● Balancing trees only updates a couple of next refs. ● Arrays work but get expensive. ● Need to copy entire sub-lists for modification.
  • 48.
    Two-dimensional lists ● Ifthe data at each node is a list you get two- dimensional lists. ● Four-way linked lists are the guts of spreadsheets. ● Inserting a column or row does not require re- allocating the existing list. ● Deleting a row or column returns data to the heap. ● A multiply-linked array allows immediate jumps to neighboring cells. ● Three-dimensional lists add “sheets”.
  • 49.
    Work Queues ● Pushingpairs of nodes onto an array makes for simple queued analysis of the lists. ● Adding to the list doesn't invalidate the queue. ● Circular lists' last node points back to the head. ● Used for queues where one thread inserts new links, others remove them for processing. ● Minimal locking reduces overhead. ● New tasks get inserted after the last one. ● Worker tasks just keep walking the list from where they last slept.
  • 50.
    Construct a circularlinked list: DB<1> $list = []; DB<2> @$list = ( $list, 'head node' ); DB<3> $node = $list; DB<4> for( 1 .. 5 ) { $node->[0] = [ $node->[0], "node $_" ] } DB<5> x $list 0 ARRAY(0xe87d00) 0 ARRAY(0xe8a3a8) 0 ARRAY(0xc79758) 0 ARRAY(0xe888e0) 0 ARRAY(0xea31b0) 0 ARRAY(0xea31c8) 0 ARRAY(0xe87d00) -> REUSED_ADDRESS 1 'node 1' 1 'node 2' 1 'node 3' 1 'node 4' 1 'node 5' 1 'head node ● No end, use $node != $list as sentinel value. ● weaken( $list->[0] ) if list goes out of scope.
  • 51.
    Summary ● Linked listscan be quite lazy for a variety of uses in Perl. ● Singly-linked lists are simple to implement, efficient for “walking” the list. ● Tradeoff for random access: ● Memory allocation for large, varying lists. ● Simpler comparison of multiple lists. ● Skip-chains. ● Reduced locking in threads.