Linked Lists With Perl: Why bother?

Linked Lists In Perl
What?
Why bother?
Steven Lembark
Workhorse Computing
lembark@wrkhors.com
©2009,2013 Steven Lembark

What About Arrays?
● Perl arrays are fast and flexible.
● Autovivication makes them generally easy to use.
● Built-in Perl functions for managing them.
● But there are a few limitations:
● Memory manglement.
● Iterating multiple lists.
● Maintaining state within the lists.
● Difficult to manage links within the list.

Memory Issues
● Pushing a single value onto an array can
double its size.
● Copying array contents is particularly expensive
for long-lived processes which have problems
with heap fragmentation.
● Heavily forked processes also run into problems
with copy-on-write due to whole-array copies.
● There is no way to reduce the array structure
size once it grows -- also painful for long-lived
processes.

Comparing Multiple Lists
● for( @array ) will iterate one list at a time.
● Indexes work but have their own problems:
● Single index requires identical offsets in all of the
arrays.
● Varying offsets require indexes for each array,
with separate bookeeping for each.
● Passing all of the arrays and objects becomes a lot
of work, even if they are wrapped on objects –
which have their own performance overhead.
● Indexes used for external references have to be
updated with every shift, push, pop...

Linked Lists: The Other Guys
● Perly code for linked lists is simple, simplifies
memory management, and list iteration.
● Trade indexed access for skip-chains and
external references.
● List operators like push, pop, and splice are
simple.
● You also get more granular locking in threaded
applications.

Examples
● Insertion sorts are stable, but splicing an item
into the middle of an array is expensive;
adding a new node to a linked list is cheap.
● Threading can require locking an entire array
to update it; linked lists can safely be locked
by node and frequently don't even require
locking.
● Comparing lists of nodes can be simpler than
dealing with multiple arrays – especially if the
offsets change.

Implementing Perly Linked Lists
● Welcome back to arrayrefs.
● Arrays can hold any type of data.
● Use a “next” ref and arbitrary data.
● The list itself requires a static “head” and a
node variable to walk down the list.
● Doubly-linked lists are also manageable with
the addition of weak links.
● For the sake of time I'll only discuss singly-
linked lists here.

List Structure ● Singly-linked lists
are usually drawn
as some data
followed by a
pointer to the next
link.
● In Perl it helps to
draw the pointer
first, because that
is where it is
stored.

Adding a Node
● New nodes do not
need to be
contiguous in
memory.
● Also doesn't require
locking the entire
list.

Dropping A Node
● Dropping a node releases
its memory – at last back
to Perl.
● Only real effect is on the
prior node's next ref.
● This is the only piece that
needs to be locked for
threading.

Perl Code
● A link followed by data looks like:
$node = [ $next, @data ]
● Walking the list copies the node:
( $node, my @data ) = @$node;
●
Adding a new node recycles the “next” ref:
$node->[0] = [ $node->[0], @data ]
●
Removing recycles the next's next:
($node->[0], my @data) = @{$node->[0]};

A Reverse-Order List
● Just update the head node's next reference.
● Fast because it moves the minimum of data.
my $list = [ [] ];
my $node = $list->[0];
for my $val ( @_ )
{
$node->[0] = [ $node->[0], $val ]
}
# list is populated w/ empty tail.

In-order List
● Just move the node, looks like a push.
● Could be a one-liner, I've shown it here as two
operations.
my $list = [ [] ];
for my $val ( @_ )
{
@$node = ( [], $val );
$node = $node->[0];
}

Viewing The List
● Structure is recursive from Perl's point of view.
● Uses the one-line version (golf anyone)?
DB<1> $list = [ [], 'head node' ];
DB<2> $node = $list->[0];
DB<3> for( 1 .. 5 ) { ($node) = @$node = ( [], “node-$_” ) }
DB<14> x $list
0 ARRAY(0x8390608) $list
0 ARRAY(0x83ee698) $list->[0]
0 ARRAY(0x8411f88) $list->[0][0]
0 ARRAY(0x83907c8) $list->[0][0][0]
0 ARRAY(0x83f9a10) $list->[0][0][0][0]
0 ARRAY(0x83f9a20) $list->[0][0][0][0][0]
0 ARRAY(0x83f9a50) $list->[0][0][0][0][0][0]
empty array empty tail node
1 'node-5' $list->[0][0][0][0][0][1]
1 'node-4' $list->[0][0][0][0][1]
1 'node-3' $list->[0][0][0][1]
1 'node-2' $list->[0][0][1]
1 'node-1' $list->[0][1]
1 'head node' $list->[1]

Destroying A Linked List
● Prior to 5.12, Perl's memory de-allocator is
recursive.
● Without a DESTROY the lists blow up after 100
nodes when perl blows is stack.
● The fix was an iterative destructor:
● This is no longer required.
DESTROY
{
my $list = shift;
$list = $list->[0] while $list;
}

Simple Linked List Class
● Bless an arrayref with the head (placeholder)
node and any data for tracking the list.
sub new
{
my $proto = shift;
bless [ [], @_ ], blessed $proto || $proto
}
# iterative < 5.12, else no-op.
DESTROY {}

Building the list: unshift
● One reason for the head node: it provides a
place to insert the data nodes after.
● The new first node has the old first node's
“next” ref and the new data.
sub unshift
{
my $list = shift;
$list->[0] = [ $list->[0], @_ ];
$list
}

Taking one off: shift
● This starts directly from the head node also:
just replace the head node's next with the first
node's next.
sub shift
{
my $list = shift;
( $list->[0], my @data )
= @{ $list >[0] };‑
wantarray ? @data : @data
}

Push Is A Little Harder
● One approach is an unshift before the tail.
● Another is populating the tail node:
sub push
{
my $list = shift;
$node = $node->[0] while $node->[0];
# populate the empty tail node
@{ $node } = [ [], @_ ];
$list
}

The Bane Of Single Links: pop
● You need the node-before-the-tail to pop the
tail.
● By the time you've found the tail it is too late
to pop it off.
● Storing the node-before-tail takes extra
bookeeping.
● The trick is to use two nodes, one trailing the
other: when the small roast is burned the big
one is just right.

sub node_pop
{
my $list = shift;
my $prior = $list->head;
my $node = $prior->[0];
while( $node->[0] )
{
$prior = $node;
$node = $node->[0];
}
( $prior->[0], my @data ) = @$node;
}
● Lexical $prior is more efficient than examining
$node->[0][0] at multiple points in the loop.

Mixing OO & Procedural Code
● Most of what follows could be done entirely
with method calls.
● Catch: They are slower than directly accessing
the nodes.
● One advantage to singly-linked lists is that the
structure is simple.
● Mixing procedural and OO code without getting
tangled is easy enough.
● This is why I use it for genetics code: the code is
fast and simple.

Walking A List
● By itself the node has enough state to track
location – no separate index required.
● Putting the link first allows advancing and
extraction of data in one access of the node:
while( $node )
{
( $node, my %info ) = @$node;
# process %info...
}

Comparing Multiple Lists
● Same code, just more assignments:
my $n0 = $list0->[0];
my $n1 = $list1->[0];
while( $n0 && $n1 )
{
( $n0, my @data0 ) = @$n0;
( $n1, my @data1 ) = @$n1;
# deal with @data1, @data2
...
}

Syncopated Lists
● Adjusting the offsets requires minimum
bookkeeping, doesn't affect the parent list.
while( @$n0, @$n1 )
{
$_ = $_->[0] for $n0, $n1;
aligned $n0, $n1
or ( $n0, $n1 ) = realign $n0, $n1
or last;
$score += compare $n0, $n1;
}

Using The Head Node
● $head->[0] is the first node, there are a few
useful things to add into @head[1...].
● Tracking the length or keeping a ref to the tail tail
simplifys push, pop; requires extra bookkeeping.
● The head node can also store user-supplied
data describing the list.
● I use this for tracking length and species names in
results of DNA sequences.

Close, but no cigar...
● The class shown only works at the head.
● Be nice to insert things in the middle without
resorting to $node variables.
● Or call methods on the internal nodes.
● A really useful class would use inside-out data
to track the head, for example.
● Can't assign $list = $list->[0], however.
● Looses the inside-out data.
● We need a structure that walks the list without
modifying its own refaddr.

The fix: ref-to-arrayref
● Scalar refs are the ultimate container struct.
● They can reference anything, in this case an array.
● $list stays in one place, $$list walks up the list.
● “head” or “next” modify $$list to reposition
the location.
● Saves blessing every node on the list.
● Simplifies having a separate class for nodes.
● Also helps when resorting to procedural code for
speed.

Basics Don't Change Much
sub new
{
my $proto = shift;
my $head = [ [], @_ ];
my $list = $head;
$headz{ refaddr $list } = $head;
bless $list, blessed $proto || $proto
}
DESTROY # add iteration for < v5.12.
{
my $list = shift;
delete $headz{ refaddr $list };
}
● List updates assign to $$list.
● DESTROY cleans up inside-out data.

Walking The List
sub head
{
my $list = shift
$$list = $headz{ refaddr $list };
$list
}
sub next
{
my $list = shift;
( $$list, my @data ) = @$$list
or return
}
$list->head;
while( my @data = $list->next ) { ... }

Reverse-order revisited:
● Unshift isn't much different.
● Note that $list is not updated.
sub unshift
{
my $list = shift;
my $head = $list->head;
$head->[0] = [ $head->[0], @_ ];
$list
}
my $list = List::Class->new( ... );
$list->unshift( $_ ) for @data;

Useful shortcut: 'add_after'
● Unlike arrays, adding into the middle of a list is
efficient and common.
● “push” adds to the tail, need something else.
● add_after() puts a node after the current one.
● unshift() is really “$list->head->add_after”.
● Use with “next_node” that ignores the data.
● In-order addition (head, middle, or end):
$list->add_after( $_ )->next for @data;
● Helps if next() avoids walking off the list.

sub next
{
my $list = shift;
my $next = $$list->[0] or return;
@$next or return;
$$list = $next;
$list
}
sub add_after
{
my $list = shift;
my $node = $$list;
$node->[0] = [ $node->[0], @_ ]
$list
}

Off-by-one gotchas
● The head node does not have any user data.
● Common mistake: $list->head->data.
● This gets you the list's data, not users'.
● Fix is to pre-increment the list:
$list->head->next or last;
while( @data = $list->next ) {...}

Overloading a list: bool, offset.
● while( $list ), if( $list ) would be nice.
● Basically, this leaves the list “true” if it has
data; false if it is at the tail node.
use overload
q{bool} =>
sub
{
my $list = shift;
$$list
},

Offsets would be nice also
q{++} => sub
{
my $list = shift;
my $node = $$list;
@$node and $$list= $node->[0];
$list
},
q{+} => sub
{
my ( $list, $offset ) = $_[2] ? ...
my $node = $$list;
for ( 1 .. $offset )
{
@$node && $node = $node->[0]
or last;
}
$node
},

Updating the list becomes trivial
● An offset from the list is a node.
● That leaves += simply assigning $list + $off.
q{+=} =>
sub
{
my ( $list, $offset ) = …
$$list = $list + $offset;
$listh
};

Backdoor: node operations
● Be nice to extract a node without having to
creep around inside of the object.
● Handing back the node ref saves derived
classes from having to unwrap the object.
● Also save having to update the list object's
location to peek into the next or head node.
sub curr_node { ${ $_[0] } }
sub next_node { ${ $_[0] }->[0] }
sub root_node { $headz{ refaddr $_[0] } }
sub head_node { $headz{ refaddr $_[0] }->[0] }

Skip chains: bookmarks for lists
● Separate list of 'interesting' nodes.
● Alphabetical sort would have skip-chain of first
letters or prefixes.
● In-list might have ref to next “interesting” node.
● Placeholders simplify bookkeeping.
● For alphabetic, pre-load 'A' .. 'Z' into the list.
● Saves updating the skip chain for inserts prior to
the currently referenced node.

Applying Perly Linked Lists
● I use them in the W-curve code for comparing
DNA sequences.
● The comparison has to deal with different sizes,
local gaps between the curves.
● Comparison requires fast lookahead for the next
'interesting' node on the list.
● Nodes and skip chains do the job nicely.
● List structure allows efficient node updates
without disturbing the object.

W-curve is derived from LL::S
● Nodes have three spatial values and a skip-chain
initialized after the list is initialized.
sub initialize
{
my ( $wc, $dna ) = @$_;
my $pt = [ 0, 0, 0 ];
$wc->head->truncate;
while( my $a = substr $dna, 0, 1, '' )
{
$pt = $wc->next_point( $a, $pt );
$wc->add_after( @$pt, '' )->next;
}
$wc
}

Skip-chain looks for “peaks”
● The alignment algorithm looks for nearby
points ignoring their Z value.
● Comparing the sparse list of radii > 0.50 speeds
up the alignment.
● Skip-chains for each node point to the next node
with a large-enough radius.
● Building the skip chain uses an “inch worm”.
● The head walks up to the next useful node.
● The tail fills ones between with a node refrence.

Skip Chain:
“interesting”
nodes.
sub add_skip
{
my $list = shift;
my $node = $list->head_node;
my $skip = $node->[0];
for( 1 .. $list->size )
{
$skip->[1] > $cutoff or next;
while( $node != $skip )
{
$node->[4] = $skip; # replace “”
$node = $node->[0]; # next node
}
}
continue
{
$skip = $skip->[0];
}
}

Use nodes to compare lists.
● DNA sequences can become offset due to gaps
on either sequence.
● This prevents using a single index to compare
lists stored as arrays.
● A linked list can be re-aligned passing only the
nodes.
● Comparison can be re-started with only the nodes.
● Makes for a useful mix of OO and procedural
code.

● Re-aligning
the lists
simply
requires
assigning the
local values.
● Updating the
node var's
does not
affect the list
locations if
compare
fails.
sub compare_lists
{
...
while( @$node0 && @$node1 )
{
( $node0, $node1 )
= realign_nodes $node0, $node1
or last;
( $dist, $node0, $node1 )
= compare_aligned $node0, $node1;
$score += $dist;
}
if( defined $score )
{
$list0->node( $node0 );
$list1->node( $node1 );
}
# caller gets back unused portion.
( $score, $list0, $list1 )
}

sub compare_aligned
{
my ( $node0, $node1 ) = @_;
my $sum = 0;
my $dist = 0;
while( @$node0 && @$node1 )
{
$dist = distance $node0, $node1
// last;
$sum += $dist;
$_ = $_->[0] for $node0, $node1;
}
( $sum, $node0, $node1 )
}
● Compare
aligned
hands back
the unused
portion.
● Caller gets
back the
nodes to re-
align if
there is a
gap.

Other Uses for Linked Lists
● Convenient trees.
● Each level of tree is a list.
● The data is a list of children.
● Balancing trees only updates a couple of next refs.
● Arrays work but get expensive.
● Need to copy entire sub-lists for modification.

Two-dimensional lists
● If the data at each node is a list you get two-
dimensional lists.
● Four-way linked lists are the guts of
spreadsheets.
● Inserting a column or row does not require re-
allocating the existing list.
● Deleting a row or column returns data to the heap.
● A multiply-linked array allows immediate jumps
to neighboring cells.
● Three-dimensional lists add “sheets”.

Work Queues
● Pushing pairs of nodes onto an array makes for
simple queued analysis of the lists.
● Adding to the list doesn't invalidate the queue.
● Circular lists' last node points back to the head.
● Used for queues where one thread inserts new
links, others remove them for processing.
● Minimal locking reduces overhead.
● New tasks get inserted after the last one.
● Worker tasks just keep walking the list from
where they last slept.

Construct a circular linked list:
DB<1> $list = [];
DB<2> @$list = ( $list, 'head node' );
DB<3> $node = $list;
DB<4> for( 1 .. 5 ) { $node->[0] = [ $node->[0], "node $_" ] }
DB<5> x $list
0 ARRAY(0xe87d00)
0 ARRAY(0xe8a3a8)
0 ARRAY(0xc79758)
0 ARRAY(0xe888e0)
0 ARRAY(0xea31b0)
0 ARRAY(0xea31c8)
0 ARRAY(0xe87d00)
-> REUSED_ADDRESS
1 'node 1'
1 'node 2'
1 'node 3'
1 'node 4'
1 'node 5'
1 'head node
● No end, use $node != $list as sentinel value.
● weaken( $list->[0] ) if list goes out of scope.

Summary
● Linked lists can be quite lazy for a variety of
uses in Perl.
● Singly-linked lists are simple to implement,
efficient for “walking” the list.
● Tradeoff for random access:
● Memory allocation for large, varying lists.
● Simpler comparison of multiple lists.
● Skip-chains.
● Reduced locking in threads.

Linked Lists With Perl: Why bother?

More Related Content

What's hot

Similar to Linked Lists With Perl: Why bother?

More from Workhorse Computing

Recently uploaded

Linked Lists With Perl: Why bother?