Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Our Friends the Utils: A highway traveled by wheels we didn't re-invent.


Published on

Scalar::Util, List::Util, and List::MoreUtils provide simpler, cleaner, and faster solutions in XS for scalar introspection and list management than what is available in Pure Perl. This is a short introduction to the utilities and how they work with more recent Perl features like smart matching.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Our Friends the Utils: A highway traveled by wheels we didn't re-invent.

  1. 1. Our Friends the Utils:A highway traveled by wheelswe didnt re-invent. Steven Lembark Workhorse Computing
  2. 2. Meet the Utils● Scalar::Util & List::Util were first written in by the ancient Prophet of Barr (c. 1997).● The modules provide often-requested features that were not worth modifying Perl itself to offer.● Later, List::MoreUtils added features that List::Util does not include.● If the Sound of Perl is an un-bloodied wall, the Utils are a superhighway traveled by truly lazy wheels.
  3. 3. Mixing old and new● Several features in v5.10+ overlap Util features. – Smart matches are the most obvious, and are usually compared with List::Util::first. – New features are not replacements, but work well with the modules. – Examples here show how to use the modules with smart matching, switches.● Whats important to notice is that these modules remain relevant.
  4. 4. Scalar::Util Provides introspection for scalars: – Is a filehandle [still] open? – The address, type, and class of a variable. – Is a value “numeric” according to Perl? – Does the variable contain readonly or tainted data? – Tools for managing weak references or modifying prototypes.● Handling these in Pure Perl is messy, slow, or error-prone.
  5. 5. Dealing with refs & objects● Collectively these replace “ref” or stringified references with a simpler, cleaner interface.● The problem with ref and stringified objects is that they return different data for objects or “plain” refs. – Stringified refs are “Foobar=ARRAY(0x29eba90)”, unless overloading gets in the way. – Ref returns the address and base type, unless the reference is blessed.● blessed, refaddr, & reftype are consistent.
  6. 6. Blessed is the Object● blessed returns a class or undef.● This simplifies sanity checks: blessed $_[0] or die Non-object...;● Construction with objects for types: bless $x, blessed $proto || $proto; avoids classes like “ARRAY(0xab1234)”.● Check for blessed before “can” to avoid errors: blessed $x && $x->can( $x ) or die ...
  7. 7. Blessed Structures● ref does not return the base type of a blessed ref.● reftype returns the data type, regardless of blessing.● Works nicely with switches: given( reftype $thing ) # blessed or not, same reftype { when( undef ) { die “Not a reference: $thing” } when( ARRAY ) { ... } when( HASH ) { ... } when( SCALAR ) { ... } die "Un-usable data type: $_"; }
  8. 8. Blessed Matches● Smart-matching an object requires an overloading.● Developers would like to QA their modules to validate the overload is available.● A generic test is simple: blessed scalars that can( ~~ ) are usable.● Writing this test with only ref is a pain.● With Scalar::Utils it is blessedly simple: blessed $var && $var->can( ~~ ) or die ...
  9. 9. The guts of “inside out” classes● Virtual addresses are unique during execution.● Make useful keys for associating external data.● Problem is that stringified refs include too much data: – Plain : ARRAY(0XEAA750) – Blessed: Foo=ARRAY(0XEAA750) – Re-blessed: Bletch=ARRAY(0XEAA750)● The extra data makes them unusable as keys.● Parsing the refs to extract the address is too slow.
  10. 10. The key to your guts: refaddr● refaddr returns only the address portion of a ref: – Previous values all look like: 0XEAA750● Note the lack of package or type.● This is not affected by [re]blessing the variable.● This leaves $data{ refaddr $ref } a stable over the life cycle of a ref or object.
  11. 11. use Scalar::Util qw( refaddr );my %obj2data = (); # private cache for object data.sub set{ my ( $obj, $data ) = @_; $obj2data{ refaddr $obj } = $data; return}sub get{ $obj2data{ refaddr $_[0] }}# have to manually clear out the cache.DESTROY{ delete $obj2data{ refaddr $_[0] }; $obj->NEXT::DESTROY;}
  12. 12. Circular references are notgarbage● In fact, with Perls reference counting they are normally memory leaks.● These are any case where a variable keeps alive some extra reference to itself: – Self reference: $a = $a – Linked list: $a->[0] = [ [], $a, @data ]● The first is probably a mistake, the second is a properly formed doubly-linked list.● Both of them prevent $a from ever being released.
  13. 13. Fix: Weak References● Weak refs do not increment the vars reference count.● In this case $backlink does not prevent cleaning $a: weaken ( my $backlink = $a ); @$a = ( [], $backlink, @data );● $a->[1] will be undef if $a goes out of scope.● isweak returns true for weak refs.
  14. 14. Aside: Accidentally gettingstrong● Copies are strong references unless they are explicitly weakened.● This can leave you accidentally keeping items alive with things like: my @a = grep { defined } @a; this leaves @a with strong references that have to be explicitly weakened again.● See Scalar::Utils POD for dealing with this.
  15. 15. Knowing Your Numbers● Weve all seen code that checks for numeric values with a regex like /^d+$/.● Aside from being slow, this simply does not work. Exercse: Come up with a working regex that gracefully handles all of Perls numeric types including int, float, exponents, hex, and octal along with optional whitespace.● Better yet, let Perl figure it out for you: if( looks_like_number $x ) { … }
  16. 16. Switching on numerics● Switches with looks_like_number help parsing and make the logic more readable: if( looks_like_number $_ ) { … } elsif( $regex ) # deal with text ... }
  17. 17. Sorting and Sanity Checkssub generic_minimum{ looks_like_number $_[0] $_[0] ? min @_ : minstr @_}sub numeric_input{ my $numstr = get_user_input; looks_like_number $numstr or die "Not a number: $numstr"; $numstr}
  18. 18. Anonymous Prototyping● set_prototype adjusts the prototype on a subref. – Including anonymous subroutines. – Allows installation of subs that handle block inputs or multiple arrays – think of import subs.● Another is removing or modifying mis-guided prototypes in wrappers that call them. – Example is a prototype of “$$” that prevents calling a wrapped sub with “@_”.
  19. 19. Bi-polar Variables● dulvar is a fast handler for dealing with multimode string+numeric data.● Returns stringy or numeric portion depending on context: $a = dualvar ( 90, /var/tmp ); print $a if $a > 80; # prints “/var/tmp” or sort { $a <=> $b or $a cmp $b } @list;● dulvars are faster than blessed refs with overloads and offer better encapsulation.
  20. 20. But wait, theres more!!!● Obvious sanity checks:● openhandle returns true for an open filehandle. – validate stdin for interactive sessions. – check for [still] live sockets.● isvstring returns true for a vstrings (e.g., “v5.16.0”).● tainted returns true for tainted values.● isreadonly checks for readonly values or variables.
  21. 21. Managing lists● List::Util provides mostly-obvious functions: sum, max, min, maxstr, minstr, shuffle, first, and reduce.● max and min compare numbers, maxstr and minstr handle strings.● shuffle randomized the order of a list – useful for security or simulations.● first & reduce take a bit more explanation...
  22. 22. First Thing: Why Bother?● These can all be written in Pure Perl.● Why bother with Yet Another Module and XS? – Most people think of speed, which is true. – These all have simple, clean interfaces that Just Work. – XS encapsulates the in-work data. – Module provides them in one place, once, with POD.● So, speed is not the only issue –but it doesnt hurt that these are fast.
  23. 23. Second Things first()● first looks a lot like grep, with a block and list.● Unlike grep, first stops after finding the first match.● It returns the first scalar that leaves the block true – not the blocks output!● Lists dont have to be data: they can be anything. my $odd = first { $_ % 2} @itemz; my $valid = first { /$rx/ } @regexen; my $found = first { foo $_} @inputz; my $obj = first { $_->valid($data) } @objz or die “Invalid data...”;
  24. 24. first with ~~ for validation● Ever get sick of running through if-blocks for mutually exclusive switches?● first with smart matching offers is declarative: my @bogus = ( [ qw( fork debug ) ], … ); ... if( my $botched = first { $_ ~~ %argz } @bogus ) { local $” = ; die “Mutually exclusive: @$botched”; }● Hash-slicing the arguments array allows comparing invalid values with the same structure.
  25. 25. Working smarter● First saves overhead by stopping early.● Returning a scalar simplifies the syntax for assigning a result.● Depending on your data, first on an array may be faster than exists on a hash key.● Useful for more than iterating data: – Use a list of regexes to determine what type of data is being processed. – Lists of objects can be iterated to find the correct parser for general input.
  26. 26. Smart Match ~~ first● Unlike most Perly boolean operators, smart returns true or false, not the argument value that left it true.● first returns the value that matched: my $found = first { $record ~~ $_ } @filterz;● $found is the first entry from @filterz that matches the record.● Filters can be regexen, arrays, hashes, or objects with overloaded ~~ matching valid or unusable data. – Use to check edge-cases in testing data handlers.
  27. 27. Inside-out data for a regex● Use an inside-out structure to associate arbitrary data or state with the regex.● Smart matching handles blessed regexen properly: works equally well with std regex or object. my $regex1 = qr{ ... }; my $regex2 = qr{ ... }; $inside{ refaddr $regex1 } = []; my @filtrz = ( $regex1, $regex2 ); my $found = first { $input ~~ $_ } @filtrz; push @{ $inside{ refaddr $found }, $input;
  28. 28. Use first to pick handlers● Say you have records with a variety of fields.● A set of arrays with the required fields for handlers makes it easy to pick the right one: my @keyz = ( [ qw( ... ) ], [ qw( ... ) ] ); my $found = first { $record ~~ $_ } @keyz or die Record fails minimum key test;● Add a bit of inside-out data and you can dispatch the record and its handler in a few lines of code.
  29. 29. Reducing your workload● All of the min, max, and sum functions are canned versions of reduce.● reduce looks like sort, with $a and $b.● Empty returns undef, singletons return themselves.● Otherwise: – $a, $b are aliased to the first two list values. – The blocks result is assigned to $a. – $b is cycled through the remaining list values.
  30. 30. Example: min, max, sum, prodmy @list = ( 1 .. 100 );my $min = reduce { $a < $b ? $a : $b } @list;my $max = reduce { $a > $b ? $a : $b } @list;# sum, product roll the value forward:my $sum = reduce { $a += $b } @list;my $prd = reduce { $a *= $b } @list;# sum of x-squared uses a placeholder:my $sumx2= reduce { $a += $b**2 } ( 0,@list );
  31. 31. But wait, theres more more!!!● List::Utils lacks a number of operations that are easy to implement in Pure Perl: – unique – interleave, every nth record, groups of N records.● Using XS does have advantages, not the least having none of use re-write the same Pure Perl.● So... we have List::MoreUtils, written by Adam Kennedy, maintained by Tassilo von Parseval.
  32. 32. Taking lazyness to XS● This module is a kitchen sink of things youve done at least once: any all none notall true false firstidx first_index lastidx last_index insert_after insert_after_string apply indexes after after_incl before before_incl firstval first_value lastval last_value each_array each_arrayref pairwise natatime mesh zip uniq distinct minmax part
  33. 33. Indexes and last items● first is nice, but to find the last item you need to reverse a list, which is expensive.● Looking up using indexes with first requires $ary[$_], which also gets expensive.● last, last_index, first_index do what youd expect [novel idea, what?].● before and after are more compact versions of slices using the results of first_index.
  34. 34. If first is false, use any● first returns a list value, which might be false.● any() returns true the first time its block is true.● Solves tests using first failing on a false list value: # $x is 0, $y is 1 @list = ( 0, 1, 2 ); $x = first { defined $_ } @list; $y = any { defined $_ } @list;
  35. 35. Unique lists● MoreUtils unique returns a list in its original order (list) or the last value (scalar): # 1 2 3 5 4 my @x = uniq 1, 1, 2, 2, 3, 5, 3, 4; # 5 my $x = uniq 1, 1, 2, 2, 3, 5, 3, 4;● Using hash keys gives a random order.● Any Pure Perl approach requires sort or lots of index operations.
  36. 36. Relative locations● insert_after places an item after the first item for which its block passes.● insert_after_string uses a string compare, avoiding the need for a block.● Example: post-insert sentinel values into processed lists.
  37. 37. apply: map Without Side-effects● One downside to map, sort, & grep is that they alias their block variables. – Updating $_ or $a/$b will alter the inputs.● apply works like map: extracting the result of a block applied to each element in a list. – The difference is that $_ is copied, not aliased. – The inputs are safe from modification.
  38. 38. Merging Lists● Pairwise processing of lists uses prototypes to keep the syntax saner: @sum_xy = pairwise { $a + $b } @x, @y; @x = pairwise { $a->($b) } @subz, @valz;● Nice for merging key/value pairs, which is what mesh does without a block: %y = pairwise{ ($a,$b) } @keyz, @valz; %y = mesh @keyz, @valz;● Prototypes require arrays; arrayrefs have to use “@$arrayref” sytax.
  39. 39. Iterating Separate Lists● each_array generates an iterator that cycles through successive values in multiple lists: my $each = each_array @a, @b, @c; while( my( $a, $b, $c ) = $each->() ) { … }● This avoids having to destroy the lists with shift or the overhead of many index accesses.● each_arrayref takes arrayref (vs. array) args.● Limitation of prototypes: cant mix arrays & refs.
  40. 40. Breaking up is easy to do● Partitioning a list is quite doable in Pure Perl but gets messy when handling arbitrary lists.● part uses a block to select index entries, returning an array[ref] segregated by the block output: # [ 1, 3, 5, 7 ], [ 2, 4, 6, 8 ] my @partz = part { $i ++ % 2 } ( 1 .. 8 );● using %3 generates three lists.● Block can use regexen (including parsing results), looks_like_number, error levels, whatever.
  41. 41. POD is your friend● Actually, the module authors are: All of these modules are well documented, with good examples.● Especially for MoreUtils: Take the time to run the POD code in a debugger to see what it does.
  42. 42. CPAN & the Power of Perl● Code on CPAN isnt mouldy just because its old. – The modules are kept up to date. – The guts of Perl have remained stable enough to keep the XS working.● This is due to a lot of effort from module owners and Perl hackers.
  43. 43. Summary● Smart matches did not obviate “first”, they work together.● Utils work with newer features like smart matching and switches.● Any time you find yourself hacking indexes, its probably time to think about these modules.● POD is your friend – check the modules for examples (and good examples of writing XS).● Truly lazy wheels are not re-invented.