Our Friends the Utils:A highway traveled by wheelswe didnt re-invent.          Steven Lembark        Workhorse Computing  ...
Meet the Utils●   Scalar::Util & List::Util were first written in by the    ancient Prophet of Barr (c. 1997).●   The modu...
Mixing old and new●   Several features in v5.10+ overlap Util features.    –   Smart matches are the most obvious, and are...
Scalar::Util    Provides introspection for scalars:    –   Is a filehandle [still] open?    –   The address, type, and cla...
Dealing with refs & objects●   Collectively these replace “ref” or stringified    references with a simpler, cleaner inter...
Blessed is the Object●    blessed returns a class or undef.●   This simplifies sanity checks:      blessed $_[0] or die No...
Blessed Structures●   ref does not return the base type of a blessed ref.●   reftype returns the data type, regardless of ...
Blessed Matches●   Smart-matching an object requires an overloading.●   Developers would like to QA their modules to    va...
The guts of “inside out” classes●   Virtual addresses are unique during execution.●   Make useful keys for associating ext...
The key to your guts: refaddr●   refaddr returns only the address portion of a ref:    –   Previous values all look like: ...
use Scalar::Util qw( refaddr );my %obj2data = (); # private cache for object data.sub set{    my ( $obj, $data ) = @_;    ...
Circular references are notgarbage●   In fact, with Perls reference counting they are    normally memory leaks.●   These a...
Fix: Weak References●   Weak refs do not increment the vars reference    count.●   In this case $backlink does not prevent...
Aside: Accidentally gettingstrong●   Copies are strong references unless they are    explicitly weakened.●   This can leav...
Knowing Your Numbers●   Weve all seen code that checks for numeric values    with a regex like /^d+$/.●   Aside from being...
Switching on numerics●   Switches with looks_like_number help parsing and    make the logic more readable:    if( looks_li...
Sorting and Sanity Checkssub generic_minimum{  looks_like_number $_[0]  $_[0] ? min @_ : minstr @_}sub numeric_input{    m...
Anonymous Prototyping●   set_prototype adjusts the prototype on a subref.    –   Including anonymous subroutines.    –   A...
Bi-polar Variables●   dulvar is a fast handler for dealing with multimode    string+numeric data.●   Returns stringy or nu...
But wait, theres more!!!●   Obvious sanity checks:●   openhandle returns true for an open filehandle.    –   validate stdi...
Managing lists●   List::Util provides mostly-obvious functions: sum,    max, min, maxstr, minstr, shuffle, first, and redu...
First Thing: Why Bother?●   These can all be written in Pure Perl.●   Why bother with Yet Another Module and XS?    –   Mo...
Second Things first()●   first looks a lot like grep, with a block and list.●   Unlike grep, first stops after finding the...
first with ~~ for validation●   Ever get sick of running through if-blocks for    mutually exclusive switches?●   first wi...
Working smarter●   First saves overhead by stopping early.●   Returning a scalar simplifies the syntax for    assigning a ...
Smart Match ~~ first●   Unlike most Perly boolean operators, smart returns true    or false, not the argument value that l...
Inside-out data for a regex●   Use an inside-out structure to associate arbitrary    data or state with the regex.●   Smar...
Use first to pick handlers●   Say you have records with a variety of fields.●   A set of arrays with the required fields f...
Reducing your workload●   All of the min, max, and sum functions are canned    versions of reduce.●   reduce looks like so...
Example: min, max, sum, prodmy @list = ( 1 .. 100 );my $min = reduce { $a < $b ? $a : $b } @list;my $max = reduce { $a > $...
But wait, theres more more!!!●   List::Utils lacks a number of operations that are    easy to implement in Pure Perl:    –...
Taking lazyness to XS●   This module is a kitchen sink of things youve done    at least once:    any all none notall true ...
Indexes and last items●   first is nice, but to find the last item you need to    reverse a list, which is expensive.●   L...
If first is false, use any●   first returns a list value, which might be false.●   any() returns true the first time its b...
Unique lists●   MoreUtils unique returns a list in its original order    (list) or the last value (scalar):    # 1 2 3 5 4...
Relative locations●   insert_after places an item after the first item for    which its block passes.●   insert_after_stri...
apply: map Without Side-effects●   One downside to map, sort, & grep is that they    alias their block variables.    –   U...
Merging Lists●   Pairwise processing of lists uses prototypes to keep    the syntax saner:    @sum_xy = pairwise { $a + $b...
Iterating Separate Lists●    each_array  generates an iterator that cycles    through successive values in multiple lists:...
Breaking up is easy to do●   Partitioning a list is quite doable in Pure Perl but    gets messy when handling arbitrary li...
POD is your friend●   Actually, the module authors are: All of these    modules are well documented, with good    examples...
CPAN & the Power of Perl●   Code on CPAN isnt mouldy just because its old.    –   The modules are kept up to date.    –   ...
Summary●   Smart matches did not obviate “first”, they work    together.●   Utils work with newer features like smart    m...
Upcoming SlideShare
Loading in...5
×

Our Friends the Utils: A highway traveled by wheels we didn't re-invent.

516
-1

Published on

Scalar::Util, List::Util, and List::MoreUtils provide simpler, cleaner, and faster solutions in XS for scalar introspection and list management than what is available in Pure Perl. This is a short introduction to the utilities and how they work with more recent Perl features like smart matching.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
516
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Our Friends the Utils: A highway traveled by wheels we didn't re-invent.

  1. 1. Our Friends the Utils:A highway traveled by wheelswe didnt re-invent. Steven Lembark Workhorse Computing lembark@wrkhors.com
  2. 2. Meet the Utils● Scalar::Util & List::Util were first written in by the ancient Prophet of Barr (c. 1997).● The modules provide often-requested features that were not worth modifying Perl itself to offer.● Later, List::MoreUtils added features that List::Util does not include.● If the Sound of Perl is an un-bloodied wall, the Utils are a superhighway traveled by truly lazy wheels.
  3. 3. Mixing old and new● Several features in v5.10+ overlap Util features. – Smart matches are the most obvious, and are usually compared with List::Util::first. – New features are not replacements, but work well with the modules. – Examples here show how to use the modules with smart matching, switches.● Whats important to notice is that these modules remain relevant.
  4. 4. Scalar::Util Provides introspection for scalars: – Is a filehandle [still] open? – The address, type, and class of a variable. – Is a value “numeric” according to Perl? – Does the variable contain readonly or tainted data? – Tools for managing weak references or modifying prototypes.● Handling these in Pure Perl is messy, slow, or error-prone.
  5. 5. Dealing with refs & objects● Collectively these replace “ref” or stringified references with a simpler, cleaner interface.● The problem with ref and stringified objects is that they return different data for objects or “plain” refs. – Stringified refs are “Foobar=ARRAY(0x29eba90)”, unless overloading gets in the way. – Ref returns the address and base type, unless the reference is blessed.● blessed, refaddr, & reftype are consistent.
  6. 6. Blessed is the Object● blessed returns a class or undef.● This simplifies sanity checks: blessed $_[0] or die Non-object...;● Construction with objects for types: bless $x, blessed $proto || $proto; avoids classes like “ARRAY(0xab1234)”.● Check for blessed before “can” to avoid errors: blessed $x && $x->can( $x ) or die ...
  7. 7. Blessed Structures● ref does not return the base type of a blessed ref.● reftype returns the data type, regardless of blessing.● Works nicely with switches: given( reftype $thing ) # blessed or not, same reftype { when( undef ) { die “Not a reference: $thing” } when( ARRAY ) { ... } when( HASH ) { ... } when( SCALAR ) { ... } die "Un-usable data type: $_"; }
  8. 8. Blessed Matches● Smart-matching an object requires an overloading.● Developers would like to QA their modules to validate the overload is available.● A generic test is simple: blessed scalars that can( ~~ ) are usable.● Writing this test with only ref is a pain.● With Scalar::Utils it is blessedly simple: blessed $var && $var->can( ~~ ) or die ...
  9. 9. The guts of “inside out” classes● Virtual addresses are unique during execution.● Make useful keys for associating external data.● Problem is that stringified refs include too much data: – Plain : ARRAY(0XEAA750) – Blessed: Foo=ARRAY(0XEAA750) – Re-blessed: Bletch=ARRAY(0XEAA750)● The extra data makes them unusable as keys.● Parsing the refs to extract the address is too slow.
  10. 10. The key to your guts: refaddr● refaddr returns only the address portion of a ref: – Previous values all look like: 0XEAA750● Note the lack of package or type.● This is not affected by [re]blessing the variable.● This leaves $data{ refaddr $ref } a stable over the life cycle of a ref or object.
  11. 11. use Scalar::Util qw( refaddr );my %obj2data = (); # private cache for object data.sub set{ my ( $obj, $data ) = @_; $obj2data{ refaddr $obj } = $data; return}sub get{ $obj2data{ refaddr $_[0] }}# have to manually clear out the cache.DESTROY{ delete $obj2data{ refaddr $_[0] }; $obj->NEXT::DESTROY;}
  12. 12. Circular references are notgarbage● In fact, with Perls reference counting they are normally memory leaks.● These are any case where a variable keeps alive some extra reference to itself: – Self reference: $a = $a – Linked list: $a->[0] = [ [], $a, @data ]● The first is probably a mistake, the second is a properly formed doubly-linked list.● Both of them prevent $a from ever being released.
  13. 13. Fix: Weak References● Weak refs do not increment the vars reference count.● In this case $backlink does not prevent cleaning $a: weaken ( my $backlink = $a ); @$a = ( [], $backlink, @data );● $a->[1] will be undef if $a goes out of scope.● isweak returns true for weak refs.
  14. 14. Aside: Accidentally gettingstrong● Copies are strong references unless they are explicitly weakened.● This can leave you accidentally keeping items alive with things like: my @a = grep { defined } @a; this leaves @a with strong references that have to be explicitly weakened again.● See Scalar::Utils POD for dealing with this.
  15. 15. Knowing Your Numbers● Weve all seen code that checks for numeric values with a regex like /^d+$/.● Aside from being slow, this simply does not work. Exercse: Come up with a working regex that gracefully handles all of Perls numeric types including int, float, exponents, hex, and octal along with optional whitespace.● Better yet, let Perl figure it out for you: if( looks_like_number $x ) { … }
  16. 16. Switching on numerics● Switches with looks_like_number help parsing and make the logic more readable: if( looks_like_number $_ ) { … } elsif( $regex ) # deal with text ... }
  17. 17. Sorting and Sanity Checkssub generic_minimum{ looks_like_number $_[0] $_[0] ? min @_ : minstr @_}sub numeric_input{ my $numstr = get_user_input; looks_like_number $numstr or die "Not a number: $numstr"; $numstr}
  18. 18. Anonymous Prototyping● set_prototype adjusts the prototype on a subref. – Including anonymous subroutines. – Allows installation of subs that handle block inputs or multiple arrays – think of import subs.● Another is removing or modifying mis-guided prototypes in wrappers that call them. – Example is a prototype of “$$” that prevents calling a wrapped sub with “@_”.
  19. 19. Bi-polar Variables● dulvar is a fast handler for dealing with multimode string+numeric data.● Returns stringy or numeric portion depending on context: $a = dualvar ( 90, /var/tmp ); print $a if $a > 80; # prints “/var/tmp” or sort { $a <=> $b or $a cmp $b } @list;● dulvars are faster than blessed refs with overloads and offer better encapsulation.
  20. 20. But wait, theres more!!!● Obvious sanity checks:● openhandle returns true for an open filehandle. – validate stdin for interactive sessions. – check for [still] live sockets.● isvstring returns true for a vstrings (e.g., “v5.16.0”).● tainted returns true for tainted values.● isreadonly checks for readonly values or variables.
  21. 21. Managing lists● List::Util provides mostly-obvious functions: sum, max, min, maxstr, minstr, shuffle, first, and reduce.● max and min compare numbers, maxstr and minstr handle strings.● shuffle randomized the order of a list – useful for security or simulations.● first & reduce take a bit more explanation...
  22. 22. First Thing: Why Bother?● These can all be written in Pure Perl.● Why bother with Yet Another Module and XS? – Most people think of speed, which is true. – These all have simple, clean interfaces that Just Work. – XS encapsulates the in-work data. – Module provides them in one place, once, with POD.● So, speed is not the only issue –but it doesnt hurt that these are fast.
  23. 23. Second Things first()● first looks a lot like grep, with a block and list.● Unlike grep, first stops after finding the first match.● It returns the first scalar that leaves the block true – not the blocks output!● Lists dont have to be data: they can be anything. my $odd = first { $_ % 2} @itemz; my $valid = first { /$rx/ } @regexen; my $found = first { foo $_} @inputz; my $obj = first { $_->valid($data) } @objz or die “Invalid data...”;
  24. 24. first with ~~ for validation● Ever get sick of running through if-blocks for mutually exclusive switches?● first with smart matching offers is declarative: my @bogus = ( [ qw( fork debug ) ], … ); ... if( my $botched = first { $_ ~~ %argz } @bogus ) { local $” = ; die “Mutually exclusive: @$botched”; }● Hash-slicing the arguments array allows comparing invalid values with the same structure.
  25. 25. Working smarter● First saves overhead by stopping early.● Returning a scalar simplifies the syntax for assigning a result.● Depending on your data, first on an array may be faster than exists on a hash key.● Useful for more than iterating data: – Use a list of regexes to determine what type of data is being processed. – Lists of objects can be iterated to find the correct parser for general input.
  26. 26. Smart Match ~~ first● Unlike most Perly boolean operators, smart returns true or false, not the argument value that left it true.● first returns the value that matched: my $found = first { $record ~~ $_ } @filterz;● $found is the first entry from @filterz that matches the record.● Filters can be regexen, arrays, hashes, or objects with overloaded ~~ matching valid or unusable data. – Use to check edge-cases in testing data handlers.
  27. 27. Inside-out data for a regex● Use an inside-out structure to associate arbitrary data or state with the regex.● Smart matching handles blessed regexen properly: works equally well with std regex or object. my $regex1 = qr{ ... }; my $regex2 = qr{ ... }; $inside{ refaddr $regex1 } = []; my @filtrz = ( $regex1, $regex2 ); my $found = first { $input ~~ $_ } @filtrz; push @{ $inside{ refaddr $found }, $input;
  28. 28. Use first to pick handlers● Say you have records with a variety of fields.● A set of arrays with the required fields for handlers makes it easy to pick the right one: my @keyz = ( [ qw( ... ) ], [ qw( ... ) ] ); my $found = first { $record ~~ $_ } @keyz or die Record fails minimum key test;● Add a bit of inside-out data and you can dispatch the record and its handler in a few lines of code.
  29. 29. Reducing your workload● All of the min, max, and sum functions are canned versions of reduce.● reduce looks like sort, with $a and $b.● Empty returns undef, singletons return themselves.● Otherwise: – $a, $b are aliased to the first two list values. – The blocks result is assigned to $a. – $b is cycled through the remaining list values.
  30. 30. Example: min, max, sum, prodmy @list = ( 1 .. 100 );my $min = reduce { $a < $b ? $a : $b } @list;my $max = reduce { $a > $b ? $a : $b } @list;# sum, product roll the value forward:my $sum = reduce { $a += $b } @list;my $prd = reduce { $a *= $b } @list;# sum of x-squared uses a placeholder:my $sumx2= reduce { $a += $b**2 } ( 0,@list );
  31. 31. But wait, theres more more!!!● List::Utils lacks a number of operations that are easy to implement in Pure Perl: – unique – interleave, every nth record, groups of N records.● Using XS does have advantages, not the least having none of use re-write the same Pure Perl.● So... we have List::MoreUtils, written by Adam Kennedy, maintained by Tassilo von Parseval.
  32. 32. Taking lazyness to XS● This module is a kitchen sink of things youve done at least once: any all none notall true false firstidx first_index lastidx last_index insert_after insert_after_string apply indexes after after_incl before before_incl firstval first_value lastval last_value each_array each_arrayref pairwise natatime mesh zip uniq distinct minmax part
  33. 33. Indexes and last items● first is nice, but to find the last item you need to reverse a list, which is expensive.● Looking up using indexes with first requires $ary[$_], which also gets expensive.● last, last_index, first_index do what youd expect [novel idea, what?].● before and after are more compact versions of slices using the results of first_index.
  34. 34. If first is false, use any● first returns a list value, which might be false.● any() returns true the first time its block is true.● Solves tests using first failing on a false list value: # $x is 0, $y is 1 @list = ( 0, 1, 2 ); $x = first { defined $_ } @list; $y = any { defined $_ } @list;
  35. 35. Unique lists● MoreUtils unique returns a list in its original order (list) or the last value (scalar): # 1 2 3 5 4 my @x = uniq 1, 1, 2, 2, 3, 5, 3, 4; # 5 my $x = uniq 1, 1, 2, 2, 3, 5, 3, 4;● Using hash keys gives a random order.● Any Pure Perl approach requires sort or lots of index operations.
  36. 36. Relative locations● insert_after places an item after the first item for which its block passes.● insert_after_string uses a string compare, avoiding the need for a block.● Example: post-insert sentinel values into processed lists.
  37. 37. apply: map Without Side-effects● One downside to map, sort, & grep is that they alias their block variables. – Updating $_ or $a/$b will alter the inputs.● apply works like map: extracting the result of a block applied to each element in a list. – The difference is that $_ is copied, not aliased. – The inputs are safe from modification.
  38. 38. Merging Lists● Pairwise processing of lists uses prototypes to keep the syntax saner: @sum_xy = pairwise { $a + $b } @x, @y; @x = pairwise { $a->($b) } @subz, @valz;● Nice for merging key/value pairs, which is what mesh does without a block: %y = pairwise{ ($a,$b) } @keyz, @valz; %y = mesh @keyz, @valz;● Prototypes require arrays; arrayrefs have to use “@$arrayref” sytax.
  39. 39. Iterating Separate Lists● each_array generates an iterator that cycles through successive values in multiple lists: my $each = each_array @a, @b, @c; while( my( $a, $b, $c ) = $each->() ) { … }● This avoids having to destroy the lists with shift or the overhead of many index accesses.● each_arrayref takes arrayref (vs. array) args.● Limitation of prototypes: cant mix arrays & refs.
  40. 40. Breaking up is easy to do● Partitioning a list is quite doable in Pure Perl but gets messy when handling arbitrary lists.● part uses a block to select index entries, returning an array[ref] segregated by the block output: # [ 1, 3, 5, 7 ], [ 2, 4, 6, 8 ] my @partz = part { $i ++ % 2 } ( 1 .. 8 );● using %3 generates three lists.● Block can use regexen (including parsing results), looks_like_number, error levels, whatever.
  41. 41. POD is your friend● Actually, the module authors are: All of these modules are well documented, with good examples.● Especially for MoreUtils: Take the time to run the POD code in a debugger to see what it does.
  42. 42. CPAN & the Power of Perl● Code on CPAN isnt mouldy just because its old. – The modules are kept up to date. – The guts of Perl have remained stable enough to keep the XS working.● This is due to a lot of effort from module owners and Perl hackers.
  43. 43. Summary● Smart matches did not obviate “first”, they work together.● Utils work with newer features like smart matching and switches.● Any time you find yourself hacking indexes, its probably time to think about these modules.● POD is your friend – check the modules for examples (and good examples of writing XS).● Truly lazy wheels are not re-invented.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×