Neatly Folding a Tree:
Functional Perl5 AWS Glacier Hashes
Steven Lembark
Workhorse Computing
lembark@wrkhors.com
In the beginning...
There was Spaghetti Code.
And it was bad.
In the beginning...
There was Spaghetti Code.
And it was bad.
So we invented Objects.
In the beginning...
There was Spaghetti Code.
And it was bad.
So we invented Objects.
Now we have Spaghetti Objects.
Alternative: Fucntional Programming
Based on Lambda Calculus.
Few basic ideas:
Transparency.
Consistency.
Basic rules
Constant data.
Transparent transforms.
Functions require input.
Output determined fully by inputs.
Avoid internal state & side effects.
Catch: It doesn't always work.
time()
random()
readline()
fetchrow_array()
Result: State matters!
Fix: Apply reality.
Where it does: Tree Hash
Used with AWS “Glacier” service.
$0.01/GiB/Month.
Large, cold data (discounts for EiB, PiB).
Uploads require lots of sha256 values.
Digesting large chunks
Uploads chunked in multiples of 1MB.
Digest for each chunk & entire upload.
Result: tree-hash.
Image from Amazon Developer Guide (API Version 2012-06-01)
http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html
One solution from Net::Amazon::TreeHash
sub calc_tree
{
my ($self) = @_;
my $prev_level = 0;
while (scalar @{ $self->{tree}->[$prev_level] } > 1) {
my $curr_level = $prev_level+1;
$self->{tree}->[$curr_level] = [];
my $prev_tree = $self->{tree}->[$prev_level];
my $curr_tree = $self->{tree}->[$curr_level];
my $len = scalar @$prev_tree;
for (my $i = 0; $i < $len; $i += 2) {
if ($len - $i > 1) {
my $a = $prev_tree->[$i];
my $b = $prev_tree->[$i+1];
push @$curr_tree, { joined => 0, start => $a->{start}, finish => $b->{finish},
hash => sha256( $a->{hash}.$b->{hash} ) };
} else {
push @$curr_tree, $prev_tree->[$i];
}
}
$prev_level = $curr_level;
}
Possibly simpler?
Trees are naturally recursive.
Two-step generation:
Split the buffer.
Reduce the hashes.
Pass 1: Reduce the hashes
Reduce pairs.
Until one value
remaining.
sub reduce_hash
{
# undef for empty list
@_ > 1 or return $_[0];
my $count = @_ / 2 + @_ % 2;
reduce_hash
map
{
@_ > 1
? sha256 splice @_, 0, 2
: shift
}
( 1 .. $count )
}
Pass 1: Reduce the hashes
Reduce pairs.
Until one value
remaining.
Catch:
Eats Stack
sub reduce_hash
{
# undef for empty list
@_ > 1 or return $_[0];
my $count = @_ / 2 + @_ % 2;
reduce_hash
map
{
@_ > 1
? sha256 splice @_, 0, 2
: shift
}
( 1 .. $count )
}
Chasing your tail
Tail recursion is common.
"Tail call elimination" recycles stack.
"Fold" is a feature of FP languages.
Reduces the stack to a scalar.
Fold in Perl5
Reset the
stack.
Restart the
sub.
my $foo =
sub
{
@_ > 1 or return $_[0];
@_ = … ;
# new in v5.16
goto __SUB__
};
Pass 2: Reduce hashes
Viola!
Stack
shrinks.
sub reduce_hash
{
2 > @_ and return $_[0];
my $count = @_ / 2 + @_ % 2;
@_
= map
{
@_ > 1
? sha256 splice @_, 0, 2
: @_
}
( 1 .. $count );
goto __SUB__
};
Pass 2: Reduce hashes
Viola!
Stack
shrinks.
@_ =
is ugly.
sub reduce_hash
{
2 > @_ and return $_[0];
my $count = @_ / 2 + @_ % 2;
@_
= map
{
@_ > 1
? sha256 splice @_, 0, 2
: @_
}
( 1 .. $count );
goto __SUB__
};
Pass 2: Reduce hashes
Viola!
Stack
shrinks.
@_ =
is ugly.
goto scares
people.
sub reduce_hash
{
2 > @_ and return $_[0];
my $count = @_ / 2 + @_ % 2;
@_
= map
{
@_ > 1
? sha256 splice @_, 0, 2
: @_
}
( 1 .. $count );
goto __SUB__
};
"Fold" is an FP Pattern.
use Keyword::Declare;
keyword tree_fold ( Ident $name, Block $new_list )
{
qq # this is souce code, not a subref!
{
sub $name
{
@_ or return;
( @_ = do $new_list ) > 1;
and goto __SUB__;
$_[0]
}
}
}
See K::D
POD for
{{{…}}}
to avoid
"@_".
Minimal syntax
tree_fold reduce_hash
{
my $count = @_ / 2 + @_ % 2;
map
{
@_ > 1
? sha256 splice @_, 0, 2
: @_
}
( 1 .. $count )
}
User
supplies
generator
a.k.a
$new_list
Minimal syntax
tree_fold reduce_hash
{
my $count = @_ / 2 + @_ % 2;
map
{
@_ > 1
? sha256 splice @_, 0, 2
: @_
}
( 1 .. $count )
}
User
supplies
generator.
NQFP:
Hacks the
stack.
Don't hack the stack
Replace splice
with offsets.
tree_fold reduce_hash
{
my $last = @_ / 2 + @_ % 2 - 1;
map
{
$_[ $_ + 1 ]
? sha256 @_[ $_, $_ + 1 ]
: $_[ $_ ]
}
map
{
2 * $_
}
( 0 .. $last )
}
Don't hack the stack
Replace splice
with offsets.
Still messy:
@_,
stacked map.
tree_fold reduce_hash
{
my $last = @_ / 2 + @_ % 2 - 1;
map
{
$_[ $_ + 1 ]
? sha256 @_[ $_, $_ + 1 ]
: $_[ $_ ]
}
map
{
2 * $_
}
( 0 .. $last )
}
Using lexical variables
Declare
fold_hash with
parameters.
Caller uses
lexical vars.
keyword tree_fold
(
Ident $name,
List $argz,
Block $stack_op
)
{
...
}
Boilerplate for lexical variables
Extract lexical
variables.
See also:
PPI::Token
my @varz # ( '$foo', '$bar' )
= map
{
$_->isa( 'PPI::Token::Symbol' )
? $_->{ content }
: ()
}
map
{
$_->isa( 'PPI::Statement::Expression' )
? @{ $_->{ children } }
: ()
}
@{ $argz->{ children } };
Boilerplate for lexical variables
my $lexical = join ',' => @varz;
my $count = @varz;
my $offset = $count -1;
sub $name
{
@_ or return;
my $last
= @_ % $count
? int( @_ / $count )
: int( @_ / $count ) - 1
;
...
Count & offset
used to extract
stack.
Boilerplate for lexical variables
@_
= map
{
my ( $lexical )
= @_[ $_ .. $_ + $offset ];
do $stack_op
}
map
{
$_ * $count
}
( 0 .. $last );
Interpolate
lexicals,
count,
offset,
stack op.
Chop shop
Not much
body left:
tree_fold reduce_hash($left, $rite)
{
$rite
? sha2656 $left, $rite
: $left
}
Buffer Size vs. Usr Time
Explicit map,
keyword with
and without
lexicals.
8-32MiB are
good chunk
sizes.
MiB Explicit Implicit Keyword
1 0.02 0.01 0.02
2 0.03 0.03 0.04
4 0.07 0.07 0.07
8 0.14 0.13 0.10
16 0.19 0.18 0.17
32 0.31 0.30 0.26
64 0.50 0.51 0.49
128 1.00 1.02 1.01
256 2.03 2.03 2.03
512 4.05 4.10 4.06
1024 8.10 8.10 8.11
Result: FP in Perl5
When FP works it is elegant.
Core Perl5 syntax helps:
lvalue
__SUB__
COW strings
Result: FP in Perl5 & Perl6
When FP works it is elegant.
Keywords: True Lazyness ® at its best.
Don't repeat boilerplate.
Multimethods in Perl5.

Neatly folding-a-tree

  • 1.
    Neatly Folding aTree: Functional Perl5 AWS Glacier Hashes Steven Lembark Workhorse Computing lembark@wrkhors.com
  • 2.
    In the beginning... Therewas Spaghetti Code. And it was bad.
  • 3.
    In the beginning... Therewas Spaghetti Code. And it was bad. So we invented Objects.
  • 4.
    In the beginning... Therewas Spaghetti Code. And it was bad. So we invented Objects. Now we have Spaghetti Objects.
  • 5.
    Alternative: Fucntional Programming Basedon Lambda Calculus. Few basic ideas: Transparency. Consistency.
  • 6.
    Basic rules Constant data. Transparenttransforms. Functions require input. Output determined fully by inputs. Avoid internal state & side effects.
  • 7.
    Catch: It doesn'talways work. time() random() readline() fetchrow_array() Result: State matters! Fix: Apply reality.
  • 8.
    Where it does:Tree Hash Used with AWS “Glacier” service. $0.01/GiB/Month. Large, cold data (discounts for EiB, PiB). Uploads require lots of sha256 values.
  • 9.
    Digesting large chunks Uploadschunked in multiples of 1MB. Digest for each chunk & entire upload. Result: tree-hash.
  • 10.
    Image from AmazonDeveloper Guide (API Version 2012-06-01) http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html
  • 11.
    One solution fromNet::Amazon::TreeHash sub calc_tree { my ($self) = @_; my $prev_level = 0; while (scalar @{ $self->{tree}->[$prev_level] } > 1) { my $curr_level = $prev_level+1; $self->{tree}->[$curr_level] = []; my $prev_tree = $self->{tree}->[$prev_level]; my $curr_tree = $self->{tree}->[$curr_level]; my $len = scalar @$prev_tree; for (my $i = 0; $i < $len; $i += 2) { if ($len - $i > 1) { my $a = $prev_tree->[$i]; my $b = $prev_tree->[$i+1]; push @$curr_tree, { joined => 0, start => $a->{start}, finish => $b->{finish}, hash => sha256( $a->{hash}.$b->{hash} ) }; } else { push @$curr_tree, $prev_tree->[$i]; } } $prev_level = $curr_level; }
  • 12.
    Possibly simpler? Trees arenaturally recursive. Two-step generation: Split the buffer. Reduce the hashes.
  • 13.
    Pass 1: Reducethe hashes Reduce pairs. Until one value remaining. sub reduce_hash { # undef for empty list @_ > 1 or return $_[0]; my $count = @_ / 2 + @_ % 2; reduce_hash map { @_ > 1 ? sha256 splice @_, 0, 2 : shift } ( 1 .. $count ) }
  • 14.
    Pass 1: Reducethe hashes Reduce pairs. Until one value remaining. Catch: Eats Stack sub reduce_hash { # undef for empty list @_ > 1 or return $_[0]; my $count = @_ / 2 + @_ % 2; reduce_hash map { @_ > 1 ? sha256 splice @_, 0, 2 : shift } ( 1 .. $count ) }
  • 15.
    Chasing your tail Tailrecursion is common. "Tail call elimination" recycles stack. "Fold" is a feature of FP languages. Reduces the stack to a scalar.
  • 16.
    Fold in Perl5 Resetthe stack. Restart the sub. my $foo = sub { @_ > 1 or return $_[0]; @_ = … ; # new in v5.16 goto __SUB__ };
  • 17.
    Pass 2: Reducehashes Viola! Stack shrinks. sub reduce_hash { 2 > @_ and return $_[0]; my $count = @_ / 2 + @_ % 2; @_ = map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ); goto __SUB__ };
  • 18.
    Pass 2: Reducehashes Viola! Stack shrinks. @_ = is ugly. sub reduce_hash { 2 > @_ and return $_[0]; my $count = @_ / 2 + @_ % 2; @_ = map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ); goto __SUB__ };
  • 19.
    Pass 2: Reducehashes Viola! Stack shrinks. @_ = is ugly. goto scares people. sub reduce_hash { 2 > @_ and return $_[0]; my $count = @_ / 2 + @_ % 2; @_ = map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ); goto __SUB__ };
  • 20.
    "Fold" is anFP Pattern. use Keyword::Declare; keyword tree_fold ( Ident $name, Block $new_list ) { qq # this is souce code, not a subref! { sub $name { @_ or return; ( @_ = do $new_list ) > 1; and goto __SUB__; $_[0] } } } See K::D POD for {{{…}}} to avoid "@_".
  • 21.
    Minimal syntax tree_fold reduce_hash { my$count = @_ / 2 + @_ % 2; map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ) } User supplies generator a.k.a $new_list
  • 22.
    Minimal syntax tree_fold reduce_hash { my$count = @_ / 2 + @_ % 2; map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ) } User supplies generator. NQFP: Hacks the stack.
  • 23.
    Don't hack thestack Replace splice with offsets. tree_fold reduce_hash { my $last = @_ / 2 + @_ % 2 - 1; map { $_[ $_ + 1 ] ? sha256 @_[ $_, $_ + 1 ] : $_[ $_ ] } map { 2 * $_ } ( 0 .. $last ) }
  • 24.
    Don't hack thestack Replace splice with offsets. Still messy: @_, stacked map. tree_fold reduce_hash { my $last = @_ / 2 + @_ % 2 - 1; map { $_[ $_ + 1 ] ? sha256 @_[ $_, $_ + 1 ] : $_[ $_ ] } map { 2 * $_ } ( 0 .. $last ) }
  • 25.
    Using lexical variables Declare fold_hashwith parameters. Caller uses lexical vars. keyword tree_fold ( Ident $name, List $argz, Block $stack_op ) { ... }
  • 26.
    Boilerplate for lexicalvariables Extract lexical variables. See also: PPI::Token my @varz # ( '$foo', '$bar' ) = map { $_->isa( 'PPI::Token::Symbol' ) ? $_->{ content } : () } map { $_->isa( 'PPI::Statement::Expression' ) ? @{ $_->{ children } } : () } @{ $argz->{ children } };
  • 27.
    Boilerplate for lexicalvariables my $lexical = join ',' => @varz; my $count = @varz; my $offset = $count -1; sub $name { @_ or return; my $last = @_ % $count ? int( @_ / $count ) : int( @_ / $count ) - 1 ; ... Count & offset used to extract stack.
  • 28.
    Boilerplate for lexicalvariables @_ = map { my ( $lexical ) = @_[ $_ .. $_ + $offset ]; do $stack_op } map { $_ * $count } ( 0 .. $last ); Interpolate lexicals, count, offset, stack op.
  • 29.
    Chop shop Not much bodyleft: tree_fold reduce_hash($left, $rite) { $rite ? sha2656 $left, $rite : $left }
  • 30.
    Buffer Size vs.Usr Time Explicit map, keyword with and without lexicals. 8-32MiB are good chunk sizes. MiB Explicit Implicit Keyword 1 0.02 0.01 0.02 2 0.03 0.03 0.04 4 0.07 0.07 0.07 8 0.14 0.13 0.10 16 0.19 0.18 0.17 32 0.31 0.30 0.26 64 0.50 0.51 0.49 128 1.00 1.02 1.01 256 2.03 2.03 2.03 512 4.05 4.10 4.06 1024 8.10 8.10 8.11
  • 31.
    Result: FP inPerl5 When FP works it is elegant. Core Perl5 syntax helps: lvalue __SUB__ COW strings
  • 32.
    Result: FP inPerl5 & Perl6 When FP works it is elegant. Keywords: True Lazyness ® at its best. Don't repeat boilerplate. Multimethods in Perl5.