Successfully reported this slideshow.
Your SlideShare is downloading. ×

Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6

Ad

Steven Lembark
Workhorse Computing
lembark@wrkhors.com

Ad

There was Spaghetti Code.
And it was bad.

Ad

There was Spaghetti Code.
And it was bad.
So we invented Objects.

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 45 Ad
1 of 45 Ad

Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6

Download to read offline

This describes a Functional Programming approach to computing AWS Glacier "tree hash" values, hiding the tail-call elimination in Perl5 with a keyword and also shows how to accomplish the same result in Perl6.

This was the talk actually given at YAPC::NA 2016 by Dr. Conway and myself.

This describes a Functional Programming approach to computing AWS Glacier "tree hash" values, hiding the tail-call elimination in Perl5 with a keyword and also shows how to accomplish the same result in Perl6.

This was the talk actually given at YAPC::NA 2016 by Dr. Conway and myself.

Advertisement
Advertisement

More Related Content

Advertisement

More from Workhorse Computing (20)

Advertisement

Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6

  1. 1. Steven Lembark Workhorse Computing lembark@wrkhors.com
  2. 2. There was Spaghetti Code. And it was bad.
  3. 3. There was Spaghetti Code. And it was bad. So we invented Objects.
  4. 4. There was Spaghetti Code. And it was bad. So we invented Objects. Now we have Spaghetti Objects.
  5. 5. Based on Lambda Calculus. Few basic ideas: Transparency. Consistency.
  6. 6. Constant data. Transparent transforms. Functions require input. Output determined fully by inputs. Avoid internal state & side effects.
  7. 7. time() random() readline() fetchrow_array() Result: State matters! Fix: Apply reality.
  8. 8. Used with AWS “Glacier” service. $0.01/GiB/Month. Large, cold data (discounts for EiB, PiB). Uploads require lots of sha256 values.
  9. 9. Uploads chunked in multiples of 1MB. Digest for each chunk & entire upload. Result: tree-hash.
  10. 10. Image from Amazon Developer Guide (API Version 2012-06-01) http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html
  11. 11. sub calc_tree { my ($self) = @_; my $prev_level = 0; while (scalar @{ $self->{tree}->[$prev_level] } > 1) { my $curr_level = $prev_level+1; $self->{tree}->[$curr_level] = []; my $prev_tree = $self->{tree}->[$prev_level]; my $curr_tree = $self->{tree}->[$curr_level]; my $len = scalar @$prev_tree; for (my $i = 0; $i < $len; $i += 2) { if ($len - $i > 1) { my $a = $prev_tree->[$i]; my $b = $prev_tree->[$i+1]; push @$curr_tree, { hash => sha256( $a->{hash}.$b->{hash} ), start => $a->{start}, finish => $b->{finish}, joined => 0 }; } else { push @$curr_tree, $prev_tree->[$i]; } } $prev_level = $curr_level; } }
  12. 12. Trees are naturally recursive. Two-step generation: Split the buffer. Reduce the hashes.
  13. 13. Reduce pairs. Until one value remains. sub reduce_hash { # undef for empty list @_ > 1 or return $_[0]; my $count = @_ / 2 + @_ % 2; reduce_hash map { @_ > 1 ? sha256 splice @_, 0, 2 : shift } ( 1 .. $count ) }
  14. 14. Reduce pairs. Until one value remains. Catch: Eats Stack sub reduce_hash { # undef for empty list @_ > 1 or return $_[0]; my $count = @_ / 2 + @_ % 2; reduce_hash map { @_ > 1 ? sha256 splice @_, 0, 2 : shift } ( 1 .. $count ) }
  15. 15. Tail recursion is common. “Tail call elimination” recycles stack. “Fold” is a feature of FP languages. Reduces the stack to a scalar.
  16. 16. Reset the stack. Restart the sub. my $foo = sub { @_ > 1 or return $_[0]; @_ = … ; # new in v5.16 goto __SUB__ };
  17. 17. Voila! Stack shrinks. sub reduce_hash { @_ > 1 or return $_[0]; my $count = @_ / 2 + @_ % 2; @_ = map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ); goto __SUB__ }
  18. 18. Voila! Stack shrinks. @_ = goto scare people. sub reduce_hash { @_ > 1 or return $_[0]; my $count = @_ / 2 + @_ % 2; @_ = map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ); goto __SUB__ }
  19. 19. See K::D POD for {{{…}}} to avoid "@_". use Keyword::Declare; keyword tree_fold ( Ident $name, Block $new_list ) { qq # this is source code, not a subref! { sub $name { @_ > 1 or return $_[0]; @_ = do $new_list; goto __SUB__ } } }
  20. 20. User supplies generator a.k.a $new_list tree_fold reduce_hash { my $count = @_ / 2 + @_ % 2; map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ) }
  21. 21. User supplies generator. NQFP: Hacks the stack. tree_fold reduce_hash { my $count = @_ / 2 + @_ % 2; map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ) }
  22. 22. Replace splice with offsets. tree_fold reduce_hash { my $last = @_ / 2 + @_ % 2 – 1; map { $_[ $_ + 1 ] ? sha256 @_[ $_, $_ + 1 ] : $_[ $_ ] } map { 2 * $_ } ( 0 .. $last ) }
  23. 23. Replace splice with offsets. Still messy: @_, stacked map. tree_fold reduce_hash { my $last = @_ / 2 + @_ % 2 – 1; map { $_[ $_ + 1 ] ? sha256 @_[ $_, $_ + 1 ] : $_[ $_ ] } map { 2 * $_ } ( 0 .. $last ) }
  24. 24. Declare fold_hash with parameters. Caller uses lexical vars. keyword tree_fold ( Ident $name, List $argz, Block $stack_op ) { ... }
  25. 25. Extract lexical variables. See also: PPI::Token my @varz # ( '$foo', '$bar' ) = map { $_->isa( 'PPI::Token::Symbol' ) ? $_->{ content } : () } map { $_->isa( 'PPI::Statement::Expression' ) ? @{ $_->{ children } } : () } @{ $argz->{ children } };
  26. 26. Count & offset used to extract stack. my $lexical = join ',' => @varz; my $count = @varz; my $offset = $count -1; sub $name { @_ > 1 or return $_[0]; my $last = @_ % $count ? int( @_ / $count ) : int( @_ / $count ) - 1 ; ...
  27. 27. Interpolate lexicals, count, offset, stack op. @_ = map { my ( $lexical ) = @_[ $_ .. $_ + $offset ]; do $stack_op } map { $_ * $count } ( 0 .. $last ); goto __SUB__
  28. 28. Not much body left: tree_fold reduce_hash($left, $rite) { $rite ? sha2656 $left, $rite : $left }
  29. 29. Explicit map, keyword with and without lexicals. 4-32MiB are good chunk sizes. MiB Explicit Implicit Keyword 1 0.02 0.01 0.02 2 0.03 0.03 0.04 4 0.07 0.07 0.07 8 0.14 0.13 0.10 16 0.19 0.18 0.17 32 0.31 0.30 0.26 64 0.50 0.51 0.49 128 1.00 1.02 1.01 256 2.03 2.03 2.03 512 4.05 4.10 4.06 1024 8.10 8.10 8.11
  30. 30. Don’t need Haskell or Scala. Efficient and elegant functional code. In Perl 5.
  31. 31. Don’t need Haskell or Scala. Efficient and elegant functional code. In Perl 6?
  32. 32. Don’t need Haskell or Scala. Efficient and elegant functional code. In Perl 6? Doubt if even Damian could do it better.
  33. 33. use v6; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } multi sub reduce_hash ( @nodes) { reduce_hash redigest @nodes } multi sub reduce_hash ([$node]) { $node } sub redigest (@list) { map -> $a, $b? { $b ?? sha256 $a~$b !! $a }, @list; }
  34. 34. use v6; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } multi sub reduce_hash ( @nodes) { samewith redigest @nodes } multi sub reduce_hash ([$node]) { $node } sub redigest (@list) { map -> $a, $b? { $b ?? sha256 $a~$b !! $a }, @list; }
  35. 35. use v6; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } multi sub reduce_hash (@nodes) { samewith map -> $a, $b? { $b ?? sha256 $a~$b !! $a }, @nodes } multi sub reduce_hash ([$node]) { $node }
  36. 36. use v6; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } sub reduce_hash (@nodes) { treefold -> $a, $b? { $b ?? sha256 $a~$b !! $a }, @nodes } sub treefold (&block, *@data) { @data > 1 ?? samewith &block, map &block, @data !! @data[0] }
  37. 37. use v6; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } sub reduce_hash (@nodes) { treefold { sha256 $^a~$^b }, @nodes } multi treefold (&block, @data) { |@data } multi treefold (&block, @data where * >= &block.arity) { given @data - @data % &block.arity -> $last { samewith &block, [|map(&block, @data[^$last]), |@data[$last..*]] } }
  38. 38. use v6; use Treefold; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } sub reduce_hash (@nodes) { treefold { sha256 $^a~$^b }, @nodes }
  39. 39. use v6; use Treefold; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { treefold { sha256 $^a~$^b }, map &sha256, comb / . ** {1..$chunk_size} /, $data }
  40. 40. Don’t need Haskell or Scala. Efficient and elegant functional code. In Perl 5 or Perl 6.
  41. 41. Easy to write (once you get the knack). Easy to optimize (with some syntactic sugar). Surprisingly efficient.
  42. 42. Give it a try.

×