• Share
  • Email
  • Embed
  • Like
  • Private Content
Our Friends the Utils: A highway traveled by wheels we didn't re-invent.
 

Our Friends the Utils: A highway traveled by wheels we didn't re-invent.

on

  • 398 views

Scalar::Util, List::Util, and List::MoreUtils provide simpler, cleaner, and faster solutions in XS for scalar introspection and list management than what is available in Pure Perl. This is a short ...

Scalar::Util, List::Util, and List::MoreUtils provide simpler, cleaner, and faster solutions in XS for scalar introspection and list management than what is available in Pure Perl. This is a short introduction to the utilities and how they work with more recent Perl features like smart matching.

Statistics

Views

Total Views
398
Views on SlideShare
392
Embed Views
6

Actions

Likes
0
Downloads
1
Comments
0

1 Embed 6

http://pesome.com 6

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Our Friends the Utils: A highway traveled by wheels we didn't re-invent. Our Friends the Utils: A highway traveled by wheels we didn't re-invent. Presentation Transcript

    • Our Friends the Utils:A highway traveled by wheelswe didnt re-invent. Steven Lembark Workhorse Computing lembark@wrkhors.com
    • Meet the Utils● Scalar::Util & List::Util were first written in by the ancient Prophet of Barr (c. 1997).● The modules provide often-requested features that were not worth modifying Perl itself to offer.● Later, List::MoreUtils added features that List::Util does not include.● If the Sound of Perl is an un-bloodied wall, the Utils are a superhighway traveled by truly lazy wheels.
    • Mixing old and new● Several features in v5.10+ overlap Util features. – Smart matches are the most obvious, and are usually compared with List::Util::first. – New features are not replacements, but work well with the modules. – Examples here show how to use the modules with smart matching, switches.● Whats important to notice is that these modules remain relevant.
    • Scalar::Util Provides introspection for scalars: – Is a filehandle [still] open? – The address, type, and class of a variable. – Is a value “numeric” according to Perl? – Does the variable contain readonly or tainted data? – Tools for managing weak references or modifying prototypes.● Handling these in Pure Perl is messy, slow, or error-prone.
    • Dealing with refs & objects● Collectively these replace “ref” or stringified references with a simpler, cleaner interface.● The problem with ref and stringified objects is that they return different data for objects or “plain” refs. – Stringified refs are “Foobar=ARRAY(0x29eba90)”, unless overloading gets in the way. – Ref returns the address and base type, unless the reference is blessed.● blessed, refaddr, & reftype are consistent.
    • Blessed is the Object● blessed returns a class or undef.● This simplifies sanity checks: blessed $_[0] or die Non-object...;● Construction with objects for types: bless $x, blessed $proto || $proto; avoids classes like “ARRAY(0xab1234)”.● Check for blessed before “can” to avoid errors: blessed $x && $x->can( $x ) or die ...
    • Blessed Structures● ref does not return the base type of a blessed ref.● reftype returns the data type, regardless of blessing.● Works nicely with switches: given( reftype $thing ) # blessed or not, same reftype { when( undef ) { die “Not a reference: $thing” } when( ARRAY ) { ... } when( HASH ) { ... } when( SCALAR ) { ... } die "Un-usable data type: $_"; }
    • Blessed Matches● Smart-matching an object requires an overloading.● Developers would like to QA their modules to validate the overload is available.● A generic test is simple: blessed scalars that can( ~~ ) are usable.● Writing this test with only ref is a pain.● With Scalar::Utils it is blessedly simple: blessed $var && $var->can( ~~ ) or die ...
    • The guts of “inside out” classes● Virtual addresses are unique during execution.● Make useful keys for associating external data.● Problem is that stringified refs include too much data: – Plain : ARRAY(0XEAA750) – Blessed: Foo=ARRAY(0XEAA750) – Re-blessed: Bletch=ARRAY(0XEAA750)● The extra data makes them unusable as keys.● Parsing the refs to extract the address is too slow.
    • The key to your guts: refaddr● refaddr returns only the address portion of a ref: – Previous values all look like: 0XEAA750● Note the lack of package or type.● This is not affected by [re]blessing the variable.● This leaves $data{ refaddr $ref } a stable over the life cycle of a ref or object.
    • use Scalar::Util qw( refaddr );my %obj2data = (); # private cache for object data.sub set{ my ( $obj, $data ) = @_; $obj2data{ refaddr $obj } = $data; return}sub get{ $obj2data{ refaddr $_[0] }}# have to manually clear out the cache.DESTROY{ delete $obj2data{ refaddr $_[0] }; $obj->NEXT::DESTROY;}
    • Circular references are notgarbage● In fact, with Perls reference counting they are normally memory leaks.● These are any case where a variable keeps alive some extra reference to itself: – Self reference: $a = $a – Linked list: $a->[0] = [ [], $a, @data ]● The first is probably a mistake, the second is a properly formed doubly-linked list.● Both of them prevent $a from ever being released.
    • Fix: Weak References● Weak refs do not increment the vars reference count.● In this case $backlink does not prevent cleaning $a: weaken ( my $backlink = $a ); @$a = ( [], $backlink, @data );● $a->[1] will be undef if $a goes out of scope.● isweak returns true for weak refs.
    • Aside: Accidentally gettingstrong● Copies are strong references unless they are explicitly weakened.● This can leave you accidentally keeping items alive with things like: my @a = grep { defined } @a; this leaves @a with strong references that have to be explicitly weakened again.● See Scalar::Utils POD for dealing with this.
    • Knowing Your Numbers● Weve all seen code that checks for numeric values with a regex like /^d+$/.● Aside from being slow, this simply does not work. Exercse: Come up with a working regex that gracefully handles all of Perls numeric types including int, float, exponents, hex, and octal along with optional whitespace.● Better yet, let Perl figure it out for you: if( looks_like_number $x ) { … }
    • Switching on numerics● Switches with looks_like_number help parsing and make the logic more readable: if( looks_like_number $_ ) { … } elsif( $regex ) # deal with text ... }
    • Sorting and Sanity Checkssub generic_minimum{ looks_like_number $_[0] $_[0] ? min @_ : minstr @_}sub numeric_input{ my $numstr = get_user_input; looks_like_number $numstr or die "Not a number: $numstr"; $numstr}
    • Anonymous Prototyping● set_prototype adjusts the prototype on a subref. – Including anonymous subroutines. – Allows installation of subs that handle block inputs or multiple arrays – think of import subs.● Another is removing or modifying mis-guided prototypes in wrappers that call them. – Example is a prototype of “$$” that prevents calling a wrapped sub with “@_”.
    • Bi-polar Variables● dulvar is a fast handler for dealing with multimode string+numeric data.● Returns stringy or numeric portion depending on context: $a = dualvar ( 90, /var/tmp ); print $a if $a > 80; # prints “/var/tmp” or sort { $a <=> $b or $a cmp $b } @list;● dulvars are faster than blessed refs with overloads and offer better encapsulation.
    • But wait, theres more!!!● Obvious sanity checks:● openhandle returns true for an open filehandle. – validate stdin for interactive sessions. – check for [still] live sockets.● isvstring returns true for a vstrings (e.g., “v5.16.0”).● tainted returns true for tainted values.● isreadonly checks for readonly values or variables.
    • Managing lists● List::Util provides mostly-obvious functions: sum, max, min, maxstr, minstr, shuffle, first, and reduce.● max and min compare numbers, maxstr and minstr handle strings.● shuffle randomized the order of a list – useful for security or simulations.● first & reduce take a bit more explanation...
    • First Thing: Why Bother?● These can all be written in Pure Perl.● Why bother with Yet Another Module and XS? – Most people think of speed, which is true. – These all have simple, clean interfaces that Just Work. – XS encapsulates the in-work data. – Module provides them in one place, once, with POD.● So, speed is not the only issue –but it doesnt hurt that these are fast.
    • Second Things first()● first looks a lot like grep, with a block and list.● Unlike grep, first stops after finding the first match.● It returns the first scalar that leaves the block true – not the blocks output!● Lists dont have to be data: they can be anything. my $odd = first { $_ % 2} @itemz; my $valid = first { /$rx/ } @regexen; my $found = first { foo $_} @inputz; my $obj = first { $_->valid($data) } @objz or die “Invalid data...”;
    • first with ~~ for validation● Ever get sick of running through if-blocks for mutually exclusive switches?● first with smart matching offers is declarative: my @bogus = ( [ qw( fork debug ) ], … ); ... if( my $botched = first { $_ ~~ %argz } @bogus ) { local $” = ; die “Mutually exclusive: @$botched”; }● Hash-slicing the arguments array allows comparing invalid values with the same structure.
    • Working smarter● First saves overhead by stopping early.● Returning a scalar simplifies the syntax for assigning a result.● Depending on your data, first on an array may be faster than exists on a hash key.● Useful for more than iterating data: – Use a list of regexes to determine what type of data is being processed. – Lists of objects can be iterated to find the correct parser for general input.
    • Smart Match ~~ first● Unlike most Perly boolean operators, smart returns true or false, not the argument value that left it true.● first returns the value that matched: my $found = first { $record ~~ $_ } @filterz;● $found is the first entry from @filterz that matches the record.● Filters can be regexen, arrays, hashes, or objects with overloaded ~~ matching valid or unusable data. – Use to check edge-cases in testing data handlers.
    • Inside-out data for a regex● Use an inside-out structure to associate arbitrary data or state with the regex.● Smart matching handles blessed regexen properly: works equally well with std regex or object. my $regex1 = qr{ ... }; my $regex2 = qr{ ... }; $inside{ refaddr $regex1 } = []; my @filtrz = ( $regex1, $regex2 ); my $found = first { $input ~~ $_ } @filtrz; push @{ $inside{ refaddr $found }, $input;
    • Use first to pick handlers● Say you have records with a variety of fields.● A set of arrays with the required fields for handlers makes it easy to pick the right one: my @keyz = ( [ qw( ... ) ], [ qw( ... ) ] ); my $found = first { $record ~~ $_ } @keyz or die Record fails minimum key test;● Add a bit of inside-out data and you can dispatch the record and its handler in a few lines of code.
    • Reducing your workload● All of the min, max, and sum functions are canned versions of reduce.● reduce looks like sort, with $a and $b.● Empty returns undef, singletons return themselves.● Otherwise: – $a, $b are aliased to the first two list values. – The blocks result is assigned to $a. – $b is cycled through the remaining list values.
    • Example: min, max, sum, prodmy @list = ( 1 .. 100 );my $min = reduce { $a < $b ? $a : $b } @list;my $max = reduce { $a > $b ? $a : $b } @list;# sum, product roll the value forward:my $sum = reduce { $a += $b } @list;my $prd = reduce { $a *= $b } @list;# sum of x-squared uses a placeholder:my $sumx2= reduce { $a += $b**2 } ( 0,@list );
    • But wait, theres more more!!!● List::Utils lacks a number of operations that are easy to implement in Pure Perl: – unique – interleave, every nth record, groups of N records.● Using XS does have advantages, not the least having none of use re-write the same Pure Perl.● So... we have List::MoreUtils, written by Adam Kennedy, maintained by Tassilo von Parseval.
    • Taking lazyness to XS● This module is a kitchen sink of things youve done at least once: any all none notall true false firstidx first_index lastidx last_index insert_after insert_after_string apply indexes after after_incl before before_incl firstval first_value lastval last_value each_array each_arrayref pairwise natatime mesh zip uniq distinct minmax part
    • Indexes and last items● first is nice, but to find the last item you need to reverse a list, which is expensive.● Looking up using indexes with first requires $ary[$_], which also gets expensive.● last, last_index, first_index do what youd expect [novel idea, what?].● before and after are more compact versions of slices using the results of first_index.
    • If first is false, use any● first returns a list value, which might be false.● any() returns true the first time its block is true.● Solves tests using first failing on a false list value: # $x is 0, $y is 1 @list = ( 0, 1, 2 ); $x = first { defined $_ } @list; $y = any { defined $_ } @list;
    • Unique lists● MoreUtils unique returns a list in its original order (list) or the last value (scalar): # 1 2 3 5 4 my @x = uniq 1, 1, 2, 2, 3, 5, 3, 4; # 5 my $x = uniq 1, 1, 2, 2, 3, 5, 3, 4;● Using hash keys gives a random order.● Any Pure Perl approach requires sort or lots of index operations.
    • Relative locations● insert_after places an item after the first item for which its block passes.● insert_after_string uses a string compare, avoiding the need for a block.● Example: post-insert sentinel values into processed lists.
    • apply: map Without Side-effects● One downside to map, sort, & grep is that they alias their block variables. – Updating $_ or $a/$b will alter the inputs.● apply works like map: extracting the result of a block applied to each element in a list. – The difference is that $_ is copied, not aliased. – The inputs are safe from modification.
    • Merging Lists● Pairwise processing of lists uses prototypes to keep the syntax saner: @sum_xy = pairwise { $a + $b } @x, @y; @x = pairwise { $a->($b) } @subz, @valz;● Nice for merging key/value pairs, which is what mesh does without a block: %y = pairwise{ ($a,$b) } @keyz, @valz; %y = mesh @keyz, @valz;● Prototypes require arrays; arrayrefs have to use “@$arrayref” sytax.
    • Iterating Separate Lists● each_array generates an iterator that cycles through successive values in multiple lists: my $each = each_array @a, @b, @c; while( my( $a, $b, $c ) = $each->() ) { … }● This avoids having to destroy the lists with shift or the overhead of many index accesses.● each_arrayref takes arrayref (vs. array) args.● Limitation of prototypes: cant mix arrays & refs.
    • Breaking up is easy to do● Partitioning a list is quite doable in Pure Perl but gets messy when handling arbitrary lists.● part uses a block to select index entries, returning an array[ref] segregated by the block output: # [ 1, 3, 5, 7 ], [ 2, 4, 6, 8 ] my @partz = part { $i ++ % 2 } ( 1 .. 8 );● using %3 generates three lists.● Block can use regexen (including parsing results), looks_like_number, error levels, whatever.
    • POD is your friend● Actually, the module authors are: All of these modules are well documented, with good examples.● Especially for MoreUtils: Take the time to run the POD code in a debugger to see what it does.
    • CPAN & the Power of Perl● Code on CPAN isnt mouldy just because its old. – The modules are kept up to date. – The guts of Perl have remained stable enough to keep the XS working.● This is due to a lot of effort from module owners and Perl hackers.
    • Summary● Smart matches did not obviate “first”, they work together.● Utils work with newer features like smart matching and switches.● Any time you find yourself hacking indexes, its probably time to think about these modules.● POD is your friend – check the modules for examples (and good examples of writing XS).● Truly lazy wheels are not re-invented.