Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Perl best practices v4

1,817 views

Published on

So, you don't have time to read Damian Conway's "Perl Best Practices" book, to understand his "256 guidelines on the art of coding to help you write better Perl code"? Hear Randal Schwartz provide the executive summary, including pointing out where Randal disagrees with Damian, and why. This high-speed overview will help you understand "code layout, naming conventions, choice of data and control structures, program decomposition, interface design and implementation, modularity, object orientation, error handling, testing, and debugging." But using shorter words.

Published in: Technology

Perl best practices v4

  1. 1. Perl Best Practices Randal L. Schwartz merlyn@stonehenge.com version 4.3 on 11 Jun 2017 This document is copyright 2006,2007,2008, 2017 by Randal L. Schwartz, Stonehenge Consulting Services, Inc.
  2. 2. Introduction • This talk organized according to “Perl Best Practices” • With my own twist along the way • Perhaps I should call this “second Best Practices”?
  3. 3. Chapter 2. Code Layout • There’s more than one way to indent it • This isn’t Python • Blessing and a curse • You’re writing for two readers: • The compiler (and it doesn’t care) • Your maintenance programmer • Best to ensure the maintenance programmer doesn’t hunt you down with intent to do serious bodily harm • You are the maintenance programmer three months later
  4. 4. Section 2.1. Bracketing • Indent like K&R • Paren pairs at end of opening line, and beginning of close:
 my @items = (
 12,
 19,
 ); • Similar for control flow:
 if (condition) {
 true branch;
 } else {
 false branch;
 }
  5. 5. Section 2.2. Keywords • Put a space between keyword and following punctuation:
 if (...) { ... } else { ... }
 while (...) { ... } • Not “if(...)” • It’s easier to pick out the keyword from the rest of the code that way • Keywords are important!
  6. 6. Section 2.3. Subroutines andVariables • Keep subroutine names cuddled with args:
 foo(3, 5) # not foo (3, 5) • Keep variable indexing cuddled with variables:
 $x[3] # not $x [3] • If you didn’t know you could do that, forget that you can!
  7. 7. Section 2.4. Builtins • Avoid unnecessary parens for built-ins • Good:
 print $x; • Bad:
 print($x); • But don’t avoid the necessary parens:
 warn(“something is wrong”), next if $x > 3; • Without the parens, that would have executed “next” too early
  8. 8. Section 2.5. Keys and Indices • Separate complex keys from enclosing brackets:
 $x[ complex() * expression() ] • But don’t use this for trivial keys:
 $x[3] • The extra whitespace helps the reader find the brackets
  9. 9. Section 2.6. Operators • Add whitespace around binary operators if it helps to see the operands:
 print $one_value + $two_value[3];
 print 3+4; # wouldn’t help much here • Don’t add whitespace after a prefix unary operator:
 $x = !$y; # no space after !
  10. 10. Section 2.7. Semicolons • Put a semicolon after every statement • Yeah, even the ones at the end of a block: if ($condition) { state1; state2; state3; # even here } • Very likely, this code will be edited • Occasionally, a missing semicolon is not a syntax error • Exception: blocks on a single line: if ($foo) { this } else { that }
  11. 11. Section 2.8. Commas • Include comma after every list element:
 my @days = (
 ‘Sunday’,
 ‘Monday’,
 ); • Easier to add new items • Don’t need two different behaviors • Easier to swap items
  12. 12. Section 2.9. Line Lengths • Damian says “use 78 character lines” • I’d say keep it even shorter • Long lines are hard to read • Long lines might wrap if sent in email • Long lines are harder to copypasta for reuse
  13. 13. Section 2.10. Indentation • Damian says “use 4-column indentation levels” • I use Emacs cperl-mode, and use whatever it does • Yeah, there’s probably some configuration to adjust it • Better still, write your code where the indentation is clear
  14. 14. Section 2.11. Tabs • Indent with spaces, not tabs • Tabs expand to different widths per user • If you must use tabs, ensure all tabs are at the beginning of line • You can generally tell your text editor to insert spaces • For indentation • When you hit the tab key • And you should do that!
  15. 15. Section 2.12. Blocks • Damian says “Never place two statements on the same line” • Except when it’s handy • Statements are a logical chunk • Whitespace break helps people • Also helps cut-n-paste • Also keeps the lines from getting too long (see earlier)
  16. 16. Section 2.13. Chunking • Code in commented paragraphs • Whitespace between chunks is helpful • Chunk according to logical steps • Interpreting @_ separated from the next step in a subroutine • Add leading comments in front of the paragraph for a logical-step description • Don’t use comments for user docs, use POD
  17. 17. Section 2.14. Elses • Damian says “Don’t cuddle an else” • Cuddled:
 } else { • Uncuddled:
 }
 else { • I disagree. I like cuddled elses • They save a line • I recognize “} else {“ as “switching from true to false” • Think of it as the “batwing else”
  18. 18. Section 2.15. Vertical Alignment • Align corresponding items vertically • (Hard to show in a variable-pitch font) • Something like:
 $foo = $this + $that;
 $bartholomew = $the + $other; • Yeah, that’s pretty • For me, it’d be more time fidding in the editor rather than coding • Again, your choice is yours
  19. 19. Section 2.16. Breaking Long Lines • Break long expressions before an operator:
 print $offset
 + $rate * $time
 + $fudge_factor
 ; • The leading operator is “out of place” enough • Leads the reader to know this is a continuation • Allows for a nice comment on each line too • Lineup should be with the piece in the first line • Again, this seems like a lot of work in an editor • Emacs cperl-mode can line up on an open paren
  20. 20. Section 2.17. Non-Terminal Expressions • If it simplifies things, name a subexpression • Instead of:
 my $result = $foo + (very complicated thing) + $bar; • Use
 my $meaningful_name = very complicated thing;
 my $result = $foo + $meaningful_name + $bar; • Don’t use an meaningless name • This also helps in debugging • Set a breakpoint after the intermediate computation • Put a watchpoint on that value • There’s no winner in the “eliminate variables” contest
  21. 21. Section 2.18. Breaking by Precedence • Break the expression at lowest precedence • To break up: $offset + $rate * $time • Good:
 $offset
 + $rate * $time • Bad:
 $offset + $rate
 * $time • Keep same level of precedence at same visual level
  22. 22. Section 2.19. Assignments • Break long assignments before the operator • To fix:
 $new_offset = $offset + $rate * $time; • Bad:
 $new_offset = $offset
 + $rate * $time; • Good:
 $new_offset
 = $offset
 + $rate * $time; • Often I put the assignment on the previous line • Damian would disagree with me on that
  23. 23. Section 2.20. Ternaries • Use ?: in columns
 my $result
 = $test1 ? $test1_true
 : $test2 ? $test2_true
 : $test3 ? $test3_true
 : $else_value; • Of course, Damian also lines up the question marks • Conclusion: Damian must have a lot of time on his hands
  24. 24. Section 2.21. Lists • Parenthesize long lists • Helps to show beginning and ending of related things • Also tells Emacs how to group and indent it:
 print (
 $one,
 $two,
 $three,
 ); • Nice setting: (cperl-indent-parens-as-block t)
  25. 25. Section 2.22. Automated Layout • Use a consistent layout • Pick a style, and stick with it • Use perltidy to enforce it • But don’t waste time reformatting everything • Life’s too short to get into fights on this
  26. 26. Chapter 3. Naming Conventions • Syntactic consistency • Related things look similar • Semantic consistency • Names of things reflect their purpose
  27. 27. Section 3.1. Identifiers • Use grammar-based identifiers • Packages: Noun ::Adjective ::Adjective • Disk::DVD::Rewritable • Variables: adjective_adjective_noun • $estimated_net_worth • Lookup vars: adjective_noun_preposition • %blocks_of, @sales_for • Subs: imperative_adjective_noun[_prep] • get_next_cookie, eat_cake • get_cookie_of_type, eat_cake_using
  28. 28. Section 3.2. Booleans • Name booleans after their test • Scalars:
 $has_potential
 $has_been_checked
 $is_internal
 $loading_is_finished • Subroutines:
 is_excessive
 has_child_record • Often starts with is or has
  29. 29. Section 3.3. ReferenceVariables • Damian says “mark variables holding a ref to end in _ref”:
 my $items_ref = @items; • I generally do this, but just use “ref”:
 my $itemsref = @items; • Rationale: ordinary scalars and refs both go in to scalar vars • Need some way of distinguishing typing information • use-strict-refs helps here too
  30. 30. Section 3.4. Arrays and Hashes • Give arrays plural names:
 @items
 @records
 @input_files • Give hashes singular names:
 %option
 %highest_of
 %total_for • “Mapping” hashes should end in a connector
  31. 31. Section 3.5. Underscores • Use underscores to separate names • $total_paid instead of $totalpaid • Helps to distinguish items • What was “remembersthelens.com”? • Not “remembers the lens” as I thought • It’s “remember st helens”! • Not CamelCase (except for some package names)
  32. 32. Section 3.6. Capitalization • The more capitalized, the more global it is • All lowercase for local variables:
 $item @queries %result_of • Mixed case for package/class names:
 $File::Finder • Uppercase for constants:
 $MODE $HOURS_PER_DAY • Exception: uppercase acronyms and proper names:
 CGI::Prototype
  33. 33. Section 3.7. Abbreviations • Abbreviate identifiers by prefix • $desc rather than $dscn • $len rather than $lngth • But $ctrl rather than $con, since it’s familiar • $msg instead of $mess • $tty instead of $ter
  34. 34. Section 3.8. Ambiguous Abbreviations • Don’t abbreviate beyond meaning • Is $term_val • TerminationValid • TerminalValue • Is $temp • Temperature • Temporary • Context and comments are useful • con & com r u
  35. 35. Section 3.9. Ambiguous Names • Avoid ambiguous names • last (final or previous) • set (adjust or collection) • left (direction, what remains) • right (direction, correct, entitlement) • no (negative,“number”) • record (verb or noun) • second (time or position) • close (nearby or shut) • use (active usage, or category of function)
  36. 36. Section 3.10. Utility Subroutines • Use underscore prefix as “internal use only” • Obviously, this won’t stop anyone determined to do it • But it’s a clue that it’s maybe not a good idea • External interface:
 sub you_call_this {
 ... _internal_tweaking(@foo) ...;
 }
 sub _internal_tweaking { ... } • Historical note: broken in Perl4 • Made the subroutine a member of ALL packages
  37. 37. Chapter 4. Values and Expressions • How literals look • How expressions operate
  38. 38. Section 4.1. String Delimiters • Damian says “Use double quotes when you’re interpolating” • Example:“hello $person” • But ‘hello world’ (note single quotes) • In contrast, I always use double quotes • I’m lazy • I often edit code later to add something that needs it • There’s no efficiency difference • Perl compiles “foo” and ‘foo’ identically
  39. 39. Section 4.2. Empty Strings • Use q{} for the empty string • The rest of this page intentionally left blank • And empty
  40. 40. Section 4.3. Single-Character Strings • Likewise, use q{X} for the single character X. • Again, I can disagree with this • But it helps these stand out:
 my $result = join(q{,},
 ‘this’,
 ‘that’,
 ); • That first comma deserves a bit of special treatment.
  41. 41. Section 4.4. Escaped Characters • Use named escapes instead of magical constants • Bad:“x7fx06x18Z” • Good:
 use charnames qw{:full};
 “N{DELETE}N{ACKNOWLEDGE}N{CANCEL}Z” • For some meaning of “good” • Damian claims that is “self documenting” • Yeah, right
  42. 42. Section 4.5. Constants • Create constants with the Readonly module • Don’t use “constant”, even though it’s core • use constant PI => 3; • You can’t interpolate these easily: • print “In indiana, pi is @{[PI]}n”; • Instead, get Readonly from the CPAN: • Readonly my $PI => 3; • Now you can interpolate: • print “In indiana, pi is $PIn”; • Or consider Const::Fast or Attribute::Constant • 20 to 30 times faster than Readonly • neilb.org/reviews/constants.html for more
  43. 43. Section 4.6. Leading Zeros • Don’t pad decimal numbers with leading 0’s • They become octal • 001, 002, 004, 008 • Only barfs on 008! • Damian says “don’t use leading 0’s ever” • Use oct() for octal values: oct(600) instead of 0600 • Yeah, right • Could confuse some readers • But only those who haven’t read the Llama
  44. 44. Section 4.7. Long Numbers • Use underscores in long numbers • Instead of 123456, write 123_456 • What? You didn’t know? • Yeah, underscores in a number are ignored • 12_34_56 is also the same • As is 1________23456 • Works in hex/oct too:
 0xFF_FF_00_FF
  45. 45. Section 4.8. Multiline Strings • Layout multiline strings over multiple lines • Sure, you can do this:
 my $message = “You idiot!
 You should have enabled the frobulator!
 ”; • Use this instead:
 my $message
 = “You idiot!n”
 .“You should have enabled the formulator!n”
 ; • This text can be autoindented more nicely
  46. 46. Section 4.9. Here Documents • Use a here-doc for more than two lines • Such as:
 my $usage = <<”END_USAGE”;
 1) insert plug
 2) turn on power
 3) wait 60 seconds for warm up
 4) pull plug if sparks appear during warm up
 END_USAGE
  47. 47. Section 4.10. Heredoc Indentation • Use a “theredoc” when a heredoc would have been indented:
 use Readonly;
 Readonly my $USAGE = <<”END_USAGE”;
 1) insert plug
 2) turn on power
 END_USAGE • Now you can use it in any indented code
 if ($usage_error) {
 print $USAGE;
 }
  48. 48. Section 4.11. Heredoc Terminators • Make heredocs end the same • Suggestion:“END_” followed by the type of thing • END_MESSAGE • END_SQL • END_OF_WORLD_AS_WE_KNOW_IT • All caps to stand out
  49. 49. Section 4.12. Heredoc Quoters • Always use single or double quotes around terminator • Makes it clear what kind of interpolation is being used • <<”END_FOO” gets double-quote interpolation • <<’END_FOO’ gets single-quote • Default is actually double-quote • Most people don’t know that
  50. 50. Section 4.13. Barewords • Grrr • Don’t use barewords • You can’t use them under strict anyway • Instead of: @months = (Jan, Feb, Mar); • Use: @months = qw(Jan Feb Mar);
  51. 51. Section 4.14. Fat Commas • Reserve => for pairs • As in:
 my %score = (
 win => 30,
 tie => 15,
 loss => 10,
 ); • Don’t use it to replace comma • Bad: rename $this => $that • What if $this is “FOO”: use constant FOO => ‘File’ • rename FOO => $that would be a literal “FOO” instead
  52. 52. Section 4.15. Thin Commas • Don’t use commas in place of semicolons as sequence • Bad:
 ($a = 3), print “$an”; • Good:
 $a = 3; print “$an”; • If a sequence is desired where an expression must be... • Use a do-block:
 do { $a = 3; print “$an” };
  53. 53. Section 4.16. Low-Precedence Operators • Don’t mix and/or/not with && || ! in the same expression • Too much potential confusion: not $finished || $result • Not the same as !$finished || $result • Reserve and/or/not for outer control flow: print $foo if not $this; unlink $foo or die;
  54. 54. Section 4.17. Lists • Parenthesize every raw list • Bad: @x = 3, 5, 7; (broken!) • Good: @x = (3, 5, 7);
  55. 55. Section 4.18. List Membership • Consider any() from List::MoreUtils:
 if (any { $requested_slot == $_ } @allocated_slots) { ... } • Stops checking when a true value has been seen • However, for “eq” test, use a predefined hash:
 my @ACTIONS = qw(open close read write);
 my %ACTIONS = map { $_ => 1 } @ACTIONS;
 ....
 if ($ACTIONS{$first_word}) { ... } # seen an action
  56. 56. Chapter 5. Variables • A program without variables would be rather useless • Names are important
  57. 57. Section 5.1. LexicalVariables • Use lexicals, not package variables • Lexical access is guaranteed to be in the same file • Unless some code passes around a reference to it • Package variables can be accessed anywhere • Including well-intentioned but broken code
  58. 58. Section 5.2. PackageVariables • Don’t use package variables • Except when required by law • Example: @ISA • Example: @EXPORT, @EXPORT_OK, etc • Otherwise, use lexicals
  59. 59. Section 5.3. Localization • Always localize a temporarily modified package var • Example:
 {
 local $BYPASS_AUDIT = 1;
 delete_any_traces(@items);
 } • Any exit from the block restores the global • You can’t always predict all exits from the block
  60. 60. Section 5.4. Initialization • Initialize any localized var • Without that, you get undef! • Example:
 local $YAML::Indent = $YAML::Indent;
 if ($local_indent) {
 $YAML::Indent = $local_indent;
 } • Weird first assignment creates local-but-same value
  61. 61. Section 5.5. PunctuationVariables • Damian says “use English” • I say “no - use comments” • How is $OUTPUT_FIELD_SEPARATOR better than $, • given that you probably won’t use it anyway • Use it, but comment it:
 {
 local $, = “,“; # set output separator
 ...
 }
  62. 62. Section 5.6. Localizing Punctuation Variables • Always localize the punctuation variables • Example: slurping:
 my $file = do { local $/; <HANDLE> }; • Important to restore value at end of block • Items called within the block still see new value
  63. 63. Section 5.7. MatchVariables • Don’t “use English” on $& or $` or $’ • Always “use English qw(-no_match_vars)” • Otherwise, your program slows down • Consider Regexp::MatchContext • L-value PREMATCH(), MATCH(), POSTMATCH()
  64. 64. Section 5.8. Dollar-Underscore • Localize $_ if you’re using it in a subroutine • Example:
 sub foo {
 ...
 local $_ = “something”;
 s/foo/bar/;
 } • Now the replacement doesn’t mess up the outer $_
  65. 65. Section 5.9. Array Indices • Use negative indices where possible • Negative values count from the end • $some_array[-1] same as $some_array[$#some_array] • $foo[-2] same as $foo[$#foo-1] • $foo[-3] same as $foo[$#foo-2] • Sadly, can’t use in range • $foo[1..$#foo] is all-but-first • Cannot replace with $foo[1..-1]
  66. 66. Section 5.10. Slicing • Use slices when retrieving and storing related hash/array items • Example:
 @frames[-1, -2, -3]
 = @active{‘top’,‘prev’,‘backup’}; • Note that both array slice and hash slice use @ sigil
  67. 67. Section 5.11. Slice Layout • Line up your slicing • Example from previous slide:
 @frames[ -1, -2, -3]
 = @active{‘top’,‘prev’,‘backup’}; • Yeah, far more typing than I have time for • Damian must have lots of spare time
  68. 68. Section 5.12. Slice Factoring • Match up keys and values for those last examples • For example:
 Readonly my %MAPPING = (
 top => -1,
 prev => -2,
 backup => -3,
 ); • Now use keys/values with that for the assignment:
 @frames[values %MAPPING]
 = @active{keys %MAPPING}; • Doesn’t matter that the result is unsorted. It Just Works.
  69. 69. Chapter 6. Control Structures • More than one way to control it • Some ways better than others
  70. 70. Section 6.1. If Blocks • Use block if, not postfix if. • Bad:
 $sum += $measurement if defined $measurement; • Good:
 if (defined $measurement) {
 $sum += $measurement;
 }
  71. 71. Section 6.2. Postfix Selectors • So when can we use postfix if? • For flow of control! • last FOO if $some_condition; • die “horribly” unless $things_worked_out; • next, last, redo, return, goto, die, croak, throw
  72. 72. Section 6.3. Other Postfix Modifiers • Don’t use postfix unless/for/while/until • Says Damian • If the controlled expression is simple enough, go ahead • Good:
 print “ho ho $_” for @some_list; • Bad:
 defined $_ and print “ho ho $_”
 for @some_list; • Replace that with:
 for (@some_list) {
 print “ho ho $_” if defined $_;
 }
  73. 73. Section 6.4. Negative Control Statements • Don’t use unless or until at all • Replace unless with if-not • Replace until with while-not • Says Damian • My exclusion: Don’t use “unless .. else” • unless(not $foo and $bar) { ... } else { ... } • Arrrgh! • And remember your DeMorgan’s laws • Replace not(this or that) with not this and not that • Replace not(this and that) with not this or not that • Often, that will simplify things
  74. 74. Section 6.5. C-Style Loops • Avoid C-style for loops • Says Damian • I say they’re ok for this: for (my $start = 0; $start <= 100; $start += 3) { ... }
  75. 75. Section 6.6. Unnecessary Subscripting • Avoid subscripting arrays and hashes in a loop • Compare:
 for my $n (0..$#items) { print $items[$n] } • With:
 for my $item (@items) { print $item } • And:
 for my $key (keys %mapper) { print $mapper{$key} } • With:
 for my $value (values %mapper) { print $value } • Can’t replace like this if you need the index or key
  76. 76. Section 6.7. Necessary Subscripting • Use Data::Alias or Lexical::Alias • Don’t subscript more than once in a loop:
 for my $k (sort keys %names) {
 if (is_bad($names{$k})) {
 print “$names{$k} is bad”; ... • Instead, alias the value:
 for my $k (sort keys %names) {
 alias my $name = $names{$k}; • The $name is read/write! • Requires Data::Alias or Lexical::Alias
  77. 77. Section 6.8. IteratorVariables • Always name your foreach loop variable • Unless $_ is more natural • for my $item (@items) { ... } • Example of $_: s/foo/bar/ for @items;
  78. 78. Section 6.9. Non-Lexical Loop Iterators • Always use “my” with foreach • Bad:
 for $foo (@bar) { ... } • Good:
 for my $foo (@bar) { ... }
  79. 79. Section 6.10. List Generation • Use map instead of foreach to transform lists • Bad:
 my @output;
 for (@input) {
 push @output, some_func($_);
 } • Good:
 my @output = map some_func($_), @input; • Concise, faster, easier to read at a glance • Can also stack them easier
  80. 80. Section 6.11. List Selections • Use grep and first to search a list • Bad:
 my @output;
 for (@input) {
 push @output, $_ if some_func($_);
 } • Good:
 my @output = grep some_func($_), @input; • But first:
 use List::Util qw(first);
 my $item = first { $_ > 30 } @input;
  81. 81. Section 6.12. List Transformation • Transform list in place with foreach, not map • Bad:
 @items = map { make_bigger($_) } @items; • Good:
 $_ = make_bigger($_) for @items; • Not completely equivalent • If make_bigger in a list context returns varying items • But generally what you’re intending
  82. 82. Section 6.13. Complex Mappings • If the map block is complex, make it a subroutine • Keeps the code easier to read • Names the code for easier recognition • Permits testing the subroutine separately • Allows reusing the same code for more than one map
  83. 83. Section 6.14. List Processing Side Effects • Modifying $_ in map or grep alters the input! • Don’t do that • Instead, make a copy:
 my @result = map {
 my $copy = $_;
 $copy =~ s/foo/bar/g;
 $copy;
 } @input; • Note that the last expression evaluated is the result • Don’t use return!
  84. 84. Section 6.15. Multipart Selections • Avoid if-elsif-elsif-else if possible • But what to replace it with? • Many choices follow
  85. 85. Section 6.16. Value Switches • Use table lookup instead of cascaded equality tests • Rather than:
 if ($n eq ‘open’) { $op = ‘o’ }
 elsif ($n eq ‘close’ } { $op = ‘c’ }
 elsif ($n eq ‘erase’ } { $op = ‘e’ } • Use:
 my %OPS = (open => ‘o’, close => ‘c’, erase => ‘e’);
 ...
 if (my $op = $OPS{$n}) { ... } else { ... } • Table should be initialized once
  86. 86. Section 6.17. Tabular Ternaries • Sometimes, cascaded ?: are the best • Test each condition, return appropriate value • Example: my $result = condition1 ? result1 : condition2 ? result2 : condition3 ? result3 : fallbackvalue ;
  87. 87. Section 6.18. do-while Loops • Don’t. • Do-while loops do not respect last/next/redo • If you need test-last, use a naked block:
 {
 ...
 ...
 redo if SOME CONDITION;
 } • last/next/redo works with this block nicely
  88. 88. Section 6.19. Linear Coding • Reject as many conditions as early as possible • Isolate disqualifying conditions early in the loop • Use “next” with separate conditions • Example: while (<....>) { next unless /S/; # skip blank lines next if /^#/; # skip comments chomp; next if length > 20; # skip long lines ... }
  89. 89. Section 6.20. Distributed Control • Don’t throw everything into the loop control • “Every loop should have a single exit” is bogus • Exit early, exit often, as shown by previous item
  90. 90. Section 6.21. Redoing • Use redo with foreach to process the same item • Better than a while loop where the item count may or may not be incremented • No, i didn’t really understand the point of this, either
  91. 91. Section 6.22. Loop Labels • Label loops used with last/next/redo • last/next/redo provide a limited “goto” • But it’s still sometimes confusing • Always label your loops for these: LINE: while (<>) { ... next LINE if $cond; ... } • The labels should be the noun of the thing being processed • Makes “last LINE” read like English!
  92. 92. Chapter 7. Documentation • We don’t need any! • Perl is self-documenting! • Wrong
  93. 93. Section 7.1. Types of Documentation • Be clear about the audience • Is this for users? or maintainers? • User docs should go into POD • Maintainer docs can be in comments • Might also be good to have those in separate PODs
  94. 94. Section 7.2. Boilerplates • Create POD templates • Understand the standard POD • Enhance it with local standards • Don’t keep retyping your name and email • Consider Module::Starter
  95. 95. Section 7.3. Extended Boilerplates • Add customized POD headers: • Examples • Frequently Asked Questions • Common Usage Mistakes • See Also • Disclaimer of Warranty • Acknowledgements
  96. 96. Section 7.4. Location • Use POD for user docs • Keep the user docs near the code • Helps ensure the user docs are in sync
  97. 97. Section 7.5. Contiguity • Keep the POD together in the source file • Says Damian • I like the style of “little code, little pod” • Bit tricky though, since you have to get the right =cut: =item FOO text about FOO =cut sub FOO { ... }
  98. 98. Section 7.6. Position • Place POD at the end of the file • Says Damian • Nice though, because you can add __END__ ahead • This is true of the standard h2xs templates • Hard to do if it’s intermingled though • My preference: • mainline code • top of POD page • subroutines intermingled with POD • bottom of POD page • “1;” __END__ and <DATA> (if needed)
  99. 99. Section 7.7. Technical Documentation • Organize the tech docs • Make each section clear as to audience and purpose • Don’t make casual user accidentally read tech docs • Scares them off from using your code, perhaps
  100. 100. Section 7.8. Comments • Use block comments for major comments:
 #########################
 # Called with: $foo (int), $bar (arrayref)
 # Returns: $item (in scalar context), @items (in list) • Teach your editor a template for these
  101. 101. Section 7.9. Algorithmic Documentation • Use full-line comments to explain the algorithm • Prefix any “chunk” of code • Don’t exceed a single line • If longer, refactor the code to a subroutine • Put the long explanation as a block comment
  102. 102. Section 7.10. Elucidating Documentation • Use end-of-line comments to point out weird stuff • Obviously, definition of “weird” might vary • Example: local $/ = “0”; # NUL, not newline chomp; # using modified $/ here
  103. 103. Section 7.11. Defensive Documentation • Comment anything that has puzzled or tricked you • Example:
 @options = map +{ $_ => 1 }, @input; # hash not block! • Think of the reader • Think of yourself three months from now • Or at 3am tomorrow
  104. 104. Section 7.12. Indicative Documentation • If you keep adding comments for “tricky” stuff... • ... maybe you should just change your style • That last example can be rewritten:
 @options = map { { $_ => 1 } } @input; • Of course, some might consider nested curlies odd
  105. 105. Section 7.13. Discursive Documentation • Use “invisible POD” for longer technical notes • Start with =for KEYWORD • Include paragraph with no embedded blank lines • End with blank line and =cut • As in: =for Clarity: Every time we reboot, this code gets called. Even if it wasn’t our fault. I mean really! Yeah, that’s our story and we’re sticking to it. =cut
  106. 106. Section 7.14. Proofreading • Check spelling, syntax, sanity • Include both user and internal docs • Better yet, get someone else to help you • Hard to tell your own mistakes with docs • Could make this part of code/peer review
  107. 107. Chapter 8. Built-in Functions • Use them! • But use them wisely
  108. 108. Section 8.1. Sorting • Don’t recompute sort keys inside sort • Sorting often compares the same item against many others • Recomputing means wasted work • Consider a caching solution • Simple cache • Schwartzian Transform
  109. 109. Section 8.2. Reversing Lists • Use reverse to reverse a list • Don’t sort { $b cmp $a } - reverse sort! • Use reverse to make descending ranges: my @countdown = reverse 0..10;
  110. 110. Section 8.3. Reversing Scalars • Use scalar reverse to reverse a string • Add the “scalar” keyword to make it clear • Beware of:
 foo(reverse $bar) • This is likely list context • Reversing a single item list is a no-op! • Suggest: foo(scalar reverse $bar)
  111. 111. Section 8.4. Fixed-Width Data • Use unpack to extract fixed-width fields • Example:
 my ($x, $y, $z)
 = unpack ‘@0 A6 @8 A10 @20 A8’, $input; • The @ ensures absolute position
  112. 112. Section 8.5. Separated Data • Use split() for simple variable-width fields • First arg of string-space for awk-like behavior:
 my @columns = split ‘ ’, $input; • Leading/trailing whitespace is ignored • Multiple whitespace is a single delimiter • Trailing whitespace is ignored • Otherwise, leading delimiters are significant:
 my @cols = split /s+/,“ foo bar”; • $cols[0] = empty string • $cols[1] = “foo”
  113. 113. Section 8.6. Variable-Width Data • Text::CSV_XS is your friend • Handles quoting of delimiters and whitespace
  114. 114. Section 8.7. String Evaluations • Avoid string eval • Harder to debug • Slower (firing up compiler at runtime) • Might create security holes • Generally unnecessary • Runtime data should not affect program space
  115. 115. Section 8.8. Automating Sorts • Sort::Maker (CPAN) is your friend • Can build • Schwartzian Transform • Or-cache (Orcish) • Gutmann Rosler Transform
  116. 116. Section 8.9. Substrings • Use 4-arg substr() instead of lvalue substr() • Old way:
 substr($x, 10, 3) = “Hello”; # replace 10..12 with “Hello” • New way (for some ancient meaning of “new”):
 substr($x, 10, 3,“Hello”); # replace, and return old • Slightly faster
  117. 117. Section 8.10. HashValues • Remember that values is an lvalue • Double all values:
 $_ *= 2 for values %somehash; • Same as:
 for (keys %somehash) {
 $somehash{$_} *= 2;
 } • But with less typing!
  118. 118. Section 8.11. Globbing • Use glob(), not <...> • Save <...> for filehandle reading • Works the same way:
 <* .*> same as glob ‘* .*’ • Don’t use multiple arguments
  119. 119. Section 8.12. Sleeping • Avoid raw select for sub-second sleeps • use Time::HiRes qw(sleep);
 sleep 1.5; • Says Damian • I think “select undef, undef, undef, 1.5” is just fine • Far more portable • If you’re scared of select, hide it behind a subroutine
  120. 120. Section 8.13. Mapping and Grepping • Always use block-form of map/grep • Never use the expression form • Too easy to get the “expression” messed up • Works:
 map { join “:”, @$_ } @some_values • Doesn’t work:
 map join “:”, @$_, @some_values; • Block form has no trailing comma after the block • Might need disambiguating semicolon:
 map {; ... other stuff here } @input
  121. 121. Section 8.14. Utilities • Use the semi-builtins in Scalar::Util and List::Util • These are core with modern Perl • Scalar::Util has: blessed, refaddr, reftype, readonly, tainted, openhandle, weaken, unweaken, is_weak, dualvar, looks_like_number, isdual, isvstring, set_prototype • List::Util has: reduce, any, all, none, notall, first, max, maxstr, min, minstr, product, sum, sum0, pairs, unpairs, pairkeys, pairvalues, pairgrep, pairfirst, pairmap, shuffle, uniq, uniqnum, uniqstr • reduce is cool: • my $factorial = reduce { $a * $b } 1..20; • my $commy = reduce { “$a, $b” } @strings;
  122. 122. Chapter 9. Subroutines • Making your program modular • Nice unit of reuse • GIving name to a set of steps • Isolating local variables
  123. 123. Section 9.1. Call Syntax • Use parens • Don’t use & • Bad: &foo • Good: foo() • Parens help them stand out from builtins • Lack of ampersand means (rare) prototypes are honored • Still need the ampersand for a reference though: &foo
  124. 124. Section 9.2. Homonyms • Don’t overload the built-in functions • Perl has weird rules about which have precedence • Sometimes your code is picked (lock) • Sometimes, the original is picked (link)
  125. 125. Section 9.3. Argument Lists • Unpack @_ into named variables • $_[3] is just ugly • Even compared to the rest of Perl • Unpack your args: • my ($name, $rank, $serial) = @_; • Use “shift” form for more breathing room: • my $name = shift; # “Flintstone, Fred” • my $rank = shift; # 1..10 • my $serial = shift; # 7-digit integer • Modern Perl can also have signatures • Built-in starting in 5.20 (but primitive) • Damian now recommends (his) Method::Signatures
  126. 126. Section 9.4. Named Arguments • Use named parameters if more than three • Three positional parameters is enough • For more than three, use a hash: • my %options = %{+shift}; • my $name = $options{name} || die “?”; • Pass them as a hashref: • my_routine({name => “Randal”}) • This catches the odd-number of parameters at the caller, not the callee.
  127. 127. Section 9.5. Missing Arguments • Use definedness rather than boolean to test for existence • Bad: if (my $directory = shift) { ... } • What if $directory is “0”? Legal! • Better: if (defined(my $directory = shift)) { ... } • Or: if (@_) { my $directory = shift; ... } • Automatically processed for item-at-a-time built-ins:
 while (my $f = readdir $d) { … }
  128. 128. Section 9.6. Default ArgumentValues • Set up default arguments early • Avoids temptation to use a quick boolean test later:
 my ($text, $arg_ref) = @_;
 my $thing = exists $arg_ref->{thing}
 ? $arg_ref->{thing} :“default thing”;
 my $other = exists $arg_ref->{other}
 ? $arg_ref->{other} :“default other”; • Or maybe:
 my %args = (thing => ‘default thing’,
 other => ‘default other’, %$arg_ref);
  129. 129. Section 9.7. Scalar ReturnValues • Indicate scalar return with “return scalar” • Ensures that list context doesn’t ruin your day • Example: returning grep for count (true/false)
 sub big10 { return grep { $_ > 10 } @_ } • Works ok when used as boolean:
 if (big10(@input)) { ... } • Or even to get the count:
 my $big_uns = big10(@input); • But breaks weird when used in a list:
 some_other_sub($foo, big10(@input), $bar); • Fixed: sub big10 { return scalar grep { $_ > 10 } @_ }
  130. 130. Section 9.8. Contextual ReturnValues • Make list-returning subs intuitive in scalar context • Consider the various choices: • Count of items (like grep/map/arrayname) • Serialized string representation (localtime) • First item (caller, each, select, unpack) • “next” item (glob, readline) • last item (splice, various slices) • undef (sort) • third(!) item (getpwnam) • See http://www.perlmonks.org/index.pl?node_id=347416
  131. 131. Section 9.9. Multi-Contextual Return Values • Consider Contextual::Return • Appropriately Damian-ized (regretfully discouraged now)
 use List::Util qw(first);
 use Contextual::Return;
 sub defined_samples_in {
 return (
 LIST { grep { defined $_} @_ }
 SCALAR { first { defined $_} @_ }
 );
 } • It does more, much more. So much more: • VOID, BOOL, NUM, STR, REF, FAIL, LAZY, LVALUE…
  132. 132. Section 9.10. Prototypes • Don’t • Spooky action at a distance • Not for mortals • Useful only by wizards • sub LIST (;&$) { ... }
  133. 133. Section 9.11. Implicit Returns • Always use explicit return • Perl returns the last expression evaluated • But make it explicit: • return $this; • Easier to see that the return value is expected • Protects against someone adding an extra line to the subroutine, breaking the return
  134. 134. Section 9.12. Returning Failure • use “return;” for undef/empty list return • Not “return undef;” • Thus it falls out of lists correctly:
 my @result = map { yoursub($_) } @input; • If the caller wants the undef, they can say so:
 my @result = map { scalar yoursub($_) } @input;
  135. 135. Chapter 10. I/O • It’s off to work we go • If a program didn’t have I/O • Could you still tell it ran?
  136. 136. Section 10.1. Filehandles • Don’t use bareword filehandles • Can’t easily pass them around • Can’t easily localize them • Can’t make them “go out of scope”
  137. 137. Section 10.2. Indirect Filehandles • Use indirect filehandles: • open my $foo,“<”, $someinput or die; • They close automatically • Can be passed to/from subroutines • Can be stored in aggregates • Might need readline() or print {...} though • my $line = readline $inputs[3]; • print { $output_for{$day} } @list;
  138. 138. Section 10.3. Localizing Filehandles • If you must use a package filehandle, localize it • local *HANDLE • Beware this also stomps on other package items: • $HANDLE • @HANDLE • %HANDLE • &HANDLE • HANDLE dirhandle • HANDLE format • So, use it sparingly • That’s why indirect handles are better
  139. 139. Section 10.4. Opening Cleanly • Use IO::File or 3-arg open • IO::File:
 use IO::File;
 my $in_handle = IO::File->new($name,“<”) or die; • 3-arg open (with indirect handle):
 open my $in_handle,“<”, $name or die; • 3-arg open ensures safety • Even if $name starts with < or > or | or ends with |. • Or whitespace (normally trimmed)
  140. 140. Section 10.5. Error Checking • Add error checking or die! • Include the $! as well • open my $handle,“<”, $that_file or die “Cannot open $that_file: $!”; • Also check close and print • Close and print can fail • If the file isn’t opened (oops!) • If the final flush causes an I/O error (oops!)
  141. 141. Section 10.6. Cleanup • Close filehandles explicitly as soon as possible • Says Damian • Or just let them fall out of scope and use a tight scope • my $line = do { open my $x,“<”,“file”; scalar <$x> };
  142. 142. Section 10.7. Input Loops • Use while(<>), not foreach(<>) • While - reads one line at a time • Exits at undef (no more lines) • Foreach - reads everything at once • No reason to bring everything into memory • Especially when you can’t reference it!
  143. 143. Section 10.8. Line-Based Input • Similar to previous hint • If you can grab it a line at a time, do it • Bad:
 print for grep /ITEM/, <INPUT>; • Good:
 while (<INPUT>) { print if /ITEM/ }
  144. 144. Section 10.9. Simple Slurping • Slurp in a do-block • Get the entire file: my $entire_file = do { local $/; <$in> }; • local here automatically provides undef • The localization protects nearby items • Dangerous if you had $/ = undef in normal code • Better/faster/stronger than: join “”, <$in>
  145. 145. Section 10.10. Power Slurping • Slurp a stream with File::Slurp (from the CPAN) • Read an entire file:
 use File::Slurp qw(read_file);
 my $entire_file = read_file $in; • And a few other interesting options
  146. 146. Section 10.11. Standard Input • Avoid using STDIN unless you really mean it • It’s not necessarily the terminal (could be redirected) • It’s almost never the files on the command line • It means only “standard input” • Read from ARGV instead: my $line = <ARGV>; # explicit form my $line = <>; # implicit (preferred) form
  147. 147. Section 10.12. Printing to Filehandles • Always put filehandles in braces in any print statement • Says Damian • The rest of us do that when it’s complex: print $foo @data; # simple print { $hashref->{foo} } @data; # complex • If it’s a bareword or a simple scalar, no need for braces
  148. 148. Section 10.13. Simple Prompting • Always prompt for interactive input • Keeps the user from wondering: • Is it running? • Is it waiting for me? • Has it crashed? • Prompt only when interactive though • Don’t prompt if part of a pipeline
  149. 149. Section 10.14. Interactivity • Test for interactivity with -t • Simple test:
 if (-t) { ... STDIN is a terminal } • Damian makes it more complex • Suggests IO::Interactive from CPAN
  150. 150. Section 10.15. Power Prompting • Use IO::Prompt for prompting (in CPAN) • my $line = prompt “line, please? ”; • Automatically chomps (yeay!) • my $password = prompt “password:”,
 -echo => ‘*’; • my $choice = prompt ‘letter’, -onechar,
 -require => { ‘must be [a-e]’ => qr/[a-e]/ }; • Many more options • Damian followed up with IO::Prompter (even more options)
  151. 151. Section 10.16. Progress Indicators • Interactive apps need progress indicators for long stuff • People need to know “working? or hung?” • And “Coffee? Or take the afternoon off?” • Good estimates of completion time are handy • Spooky estimates can backfire • 90% of ticks come within 2 seconds • last 10% takes 5 minutes... oops
  152. 152. Section 10.17. Automatic Progress Indicators • Damianized Smart::Comments in CPAN • Example: use Smart::Comments; for my $path (split /:/, $ENV{PATH}) { ### Checking path... done for my $file (glob “$path/*”) { ### Checking files... done if (-x $file) { ... } } }
  153. 153. Section 10.18. Autoflushing • Avoid using select() to set autoflushing: select((select($fh), $| = 1)[0]); • Yeah, so I came up with that • I must have been, uh... confused • Use IO::Handle (not needed in Perl 5.14 and later) • Then you can call $fh->autoflush(1) • And even *STDOUT->autoflush(1)
  154. 154. Chapter 11. References • Can’t get jobs without them • Can’t get to indirect data without them either • Symbolic (soft) references are evil
  155. 155. Section 11.1. Dereferencing • Prefer arrows over circumflex dereferencing • Bad: ${$foo{bar}}[3] • Good: $foo{bar}->[3] • Can’t reduce: @{$foo{bar}}[3, 4] • Until Perl 5.20 • $foo{bar}->@[3, 4] • Might require enabling the feature
  156. 156. Section 11.2. Braced References • Always use braces for circumflex dereferencing • Bad: @$foo[3, 4] • Good: @{$foo}[3, 4] • Makes the deref item clearer • Needed anyway when deref item is complex • Or flip it around to postfix form
  157. 157. Section 11.3. Symbolic References • Don’t • use strict ‘refs’ prevents you anyway • If you think of the symbol table as a hash... • ... you can see how to move it down a level anyway • Bad: $x = “K”; ${$x} = 17; • Good: $x = “K”; $my_hash{$x} = 17;
  158. 158. Section 11.4. Cyclic References • Cycles cannot be reference-counted nicely • Break the cycle with weaken • Use weaken for every “up” link • When kids need to know about their parents • use Scalar::Util qw(weaken);
 my $root = {};
 my $leaf1 = { parent => $root };
 weaken $leaf1->{parent};
 my $leaf2 = { parent => $root };
 weaken $leaf2->{parent};
 push @$root{kids}, $leaf1, $leaf2; • Now we can toss $root nicely
  159. 159. Chapter 12. Regular Expressions • Some people, when confronted with a problem, think:
 "I know, I'll use regular expressions".
 Now they have two problems.
 —Jamie Zawinski
  160. 160. Section 12.1. Extended Formatting • Always use /x • Internal whitespace is ignored • Comments are simple #-to-end-of-line • Hard: /^(.*?)(w+).*?21$/ • Easy(er):
 /^
 (.*?) (w+) # some text and an alpha-word
 .*? # anything
 2 1 # the word followed by text again
 $/x • Beware: this will break old regex if you’re not careful
  161. 161. Section 12.2. Line Boundaries • Always use /m • ^ means • Start of string • Just after any embedded newline • $ means • End of string • Just before any embedded newline • Damian argues that this is closer to what people expect • I say use it when it’s what you want
  162. 162. Section 12.3. String Boundaries • Use A and z • A is the old ^ • z is the old $
  163. 163. Section 12.4. End of String • Use z not Z or $ • It’s really “end of string” • Not “end of string but maybe before n” • Security hole: if ($value =~ /^d+$/) { ... } • Value could be “35n” • That might ruin a good day • Fix: if ($value =~ /Ad+z/) { ... }
  164. 164. Section 12.5. Matching Anything • Always use /s • Turns . into “match anything” • Not “match anything (except the newline)” • This is more often what people want • Says Damian • Use it when you need it
  165. 165. Section 12.6. Lazy Flags • If you’re as crazy as Damian • And as lazy as Damian • Consider Regexp::Autoflags • Automatically adds /xms to all regexp • But then notice that the book lies • Because it’s actually Regexp::DefaultFlags in the CPAN
  166. 166. Section 12.7. Brace Delimiters • Prefer m{...} to /.../ for multiline regex • Open-close stands out easier • Especially when alone on a line • Heck, use ‘em even for single line regex • Particularly if the regex contains a slash • Balancing is easy: m{ abc{3,5} } # “ab” then 3-5 c’s m{ abc{3,5} } # “abc{3,5}” all literal
  167. 167. Section 12.8. Other Delimiters • Don’t use any other delimiters • Unless you’re sure that Damian’s not looking • But seriously... • m#foo# is cute • But annoying • And m,foo, is downright weird
  168. 168. Section 12.9. Metacharacters • Prefer singular character classes to escaped metachars • Instead of “.”, use [.] • Instead of “ “ (escaped space), use [ ] (space in []) • Advantage: brackets are balanced
  169. 169. Section 12.10. Named Characters • Prefer named characters to escaped metacharacters • Example:
 use charnames qw(:full);
 if ($escape_seq =~ m{
 N{DELETE}
 N{ACKNOWLEDGE}
 N{CANCEL} Z
 }xms) { ... } • Downside: now you have to know these names • Is anyone really gonna type N{SPACE} ?
  170. 170. Section 12.11. Properties • Prefer properties to character classes • Works better with Unicode • Such as: qr{ p{Uppercase} p{Alphabetic}* /xms; • “perldoc perlunicode” lists the properties • You can even create your own
  171. 171. Section 12.12. Whitespace • Arbitrary whitespace better than constrained whitespace • So N{SPACE}* matches even “weird” whitespace
  172. 172. Section 12.13. Unconstrained Repetitions • Be careful when matching “as much as possible” • For example:
 “Activation=This=that” =~ /(.*)=(.*)/ • $1 is “Activation=This”! • Consider .*? instead:
 “Activation=This=that” =~ /(.*?)=(.*)/ • Now we have $1 = “Activation”, $2 = “This=that” • Don’t do this blindly:
 “Activation=This=that” =~ /(.*?)=(.*?)/ • Now $2 has nothing!
  173. 173. Section 12.14. Capturing Parentheses • Use capturing parens only when capturing • Every ( ... ) in a regex is a capture regex • This affects $1, etc • But it also affects speed • When you don’t need to capture, don’t! • Use (?: ... ) • Yes, a little more typing • But it’s clear that you don’t need that value
  174. 174. Section 12.15. CapturedValues • Use $1 only when you’re sure you had a match • Common error:
 $foo =~ /(.*?)=(.*)/;
 push @{$hash{$1}}, $2; • What if the match fails? • Pushes the previous $2 onto the previous $1 • Always use $1 with a conditional • Even if it can never match: • Add “or die”:
 $foo =~ /(.*?)=(.*)/ or die “Should not happen”; • Then you know something broke somewhere
  175. 175. Section 12.16. CaptureVariables • Name your captures • There can be quite a distance between • / (w+) s+ (w+) /xms • $1, $2 • Easy to make mistake, or modify and break things • Instead, name your captures when you can: • my ($given, $surname) = /(w+)s+(w+)/; • If this match fails in a scalar context, it returns false
  176. 176. Section 12.17. Piecewise Matching • Tokenize input using /gc • Also called “inchworming” • Match string against a series of /G.../gc constructs • If one succeeds, pos() is updated • Example: { last if /Gz/gc; # exit at end of string if (/G(w+)/gc) { push @tokens,“keyword $1” } elsif (/G(“.*?”)/gcs) { push @tokens,“quoted $1” } elsif (/Gs+/gc) { ... ignore whitespace ... } else { die “Cannot parse:“, /G(.*)/s } # grab rest redo; }
  177. 177. Section 12.18. Tabular Regexes • Build regex from tables • To replace keys of %FOO with values of %FOO • Construct the matching regex:
 my $regex = join “|”, map quotemeta,
 reverse sort keys %FOO;
 $regex = qr/$regex/; # if used more than once • Do the replacement:
 $string =~ s/($regex)/$FOO{$1}/g; • It works
  178. 178. Section 12.19. Constructing Regexes • Build complex regex from simpler pieces • Example:
 my $DIGITS = qr{ d+ (?: [.] d*)? | [.] d+ }xms;
 my $SIGN = qr{ [+-] }xms;
 my $EXPONENT = qr{ [Ee] $SIGN? d+ }xms;
 my $NUMBER = qr{ ( ($SIGN?) ($DIGITS) ($EXPONENT?) ) }xms; • Now you can use /$NUMBER/
  179. 179. Section 12.20. Canned Regexes • Use Regexp::Common • Don’t reinvent the common regex • Use the expert coding in Regexp::Common • $number =~ /$RE{num}{real}/ • $balanced_parens =~ /$RE{balanced}/ • $bad_words =~ /$RE{profanity}/ • Module comes with 2,315,992 tests (over 4 minutes to run the tests!)
  180. 180. Section 12.21. Alternations • Prefer character classes to single-char alternations • [abc] is far faster than (a|b|c) • As in, 10 times faster • Consider refactoring too • Replace: (a|b|ca|cb|cc) • With: (a|b|c[abc]) • Same job, and again faster
  181. 181. Section 12.22. Factoring Alternations • An advancement of previous hint • Slow: /with s+ foo|with s+ bar/x • Faster: /with s+ (foo|bar)/x • Keep refactoring until it hurts • Beware changes to $1, etc • Also beware changes to order of attempted matches
  182. 182. Section 12.23. Backtracking • Prevent useless backtracking • Consider ($?> ... ) • Once it has matched, it’s backtracked as a unit
  183. 183. Section 12.24. String Comparisons • Prefer eq to fixed-pattern regex matches • Don’t say $foo =~ /ABARz/ • Use $foo eq “BAR” • Don’t say $foo =~ /ABARz/i • Use (uc $foo) eq “BAR”
  184. 184. Chapter 13. Error Handling • Hopefully, everything works • But when things go bad, you need to know • Sometimes, it’s minor, and just needs repair • Sometimes, it’s major, and needs help • Sometimes, it’s catastrophic
  185. 185. Section 13.1. Exceptions • Throw exceptions instead of funny returns • People might forget to test for “undef” • But they have to work pretty hard to ignore an exception • Exceptions can be easily caught with eval {} • But better to use Try::Tiny • Or for more flexibility,TryCatch
  186. 186. Section 13.2. Builtin Failures • Use Fatal to turn failures into exceptions • Tired of forgetting “or die” on “open”? • use Fatal qw(:void open);
 ...
 open my $f,“BOGUS FILE”; # dies! • Yes, finally open does the right thing • Damian also suggests Lexical::Failure
  187. 187. Section 13.3. Contextual Failure • You can tell Fatal to be context-sensitive • So this still works: • unless (open my $f,“BOGUS”) { ... } • That’s dangerous though • People might use it in a non-void context without checking • Says Damian • I think you can use it with discipline
  188. 188. Section 13.4. Systemic Failure • Be careful with system() • Returns 0 (false) if everything is OK • Returns non-zero if something broke • Technically, the return value of wait(): • Exit status * 256 • Plus 128 if core was dumped • Plus signal number that killed it, if any • 0 = “good”, non-zero = “bad” • Damian suggests WIFEXITED:
 use POSIX qw(WIFEXITED);WIFEXITED(system ...) • Or just add “!” in front:
 !system “foo” or die “...”; # no access to $! here!
  189. 189. Section 13.5. Recoverable Failure • Throw exceptions on all failures • Including recoverable ones • Easy enough to put an eval {} around it • Don’t use undef • Developers don’t always check • Says Damian • I say, trust your developers a bit
  190. 190. Section 13.6. Reporting Failure • Carp blames someone else • use Carp qw(croak);
 ... open my $f, $file or croak “$file?”; • Now the caller is blamed • Makes sense if it’s the caller’s fault
  191. 191. Section 13.7. Error Messages • Error messages should speak user-based terminology • Nobody knows ‘@foo’ • Better chance of knowing ‘input files’ • Be as detailed as you can • Remember that this is all you’re likely to get in the bug report • And they will file bug reports
  192. 192. Section 13.8. Documenting Errors • Document every error message • Explain it in even more detail • Again in the user’s terminology • Nothing more frustrating than a confusing error message • ... that isn’t explained! • Good example:“perldoc perldiag”
  193. 193. Section 13.9. OO Exceptions • Throw an object instead of text • The object can contain relevant information • Exception catchers can pull info apart • Exception objects can also have nice stringifications • This helps naive catchers that print “Error: $@”
  194. 194. Section 13.10. Volatile Error Messages • Relying on catchers to parse text is dangerous • Throwing an object isolates text changes • Additional components can be added later • Likely won’t break existing code
  195. 195. Section 13.11. Exception Hierarchies • Use exception objects when exceptions are related • Give them different classes, but common inheritance • Catchers can easily test for categories of exceptions • Or distinguish specific items when it matters
  196. 196. Section 13.12. Processing Exceptions • Test the most specific exception first • For example, log file error, before generic file error
  197. 197. Section 13.13. Exception Classes • Use Exception::Class for nice exceptions • Exception::Class-based exceptions capture: • caller • current filename, linenumber, package • user data • Stringify nicely for naive displays (customized if needed) • Create hierarchies for classified handling • Consider “Throwable” role for Moo or Moose
  198. 198. Section 13.14. Unpacking Exceptions • Grab $@ early in your catcher • Then parse and act on it • One of your parsing steps may alter $@ accidentally • Best to capture it first! • Or consider Try::Tiny, which puts $@ safely into block- scoped $_
  199. 199. Chapter 14. Command-Line Processing • The great homecoming • In the beginning, everything was command-line • Perl still works well here • But you should follow some standards and conventions
  200. 200. Section 14.1. Command-Line Structure • Enforce a single consistent command-line structure • GNU tools • Need I say more? • Every tool, slightly different • Pick a standard, stick with it • Especially amongst a tool suite • If you use -i for input here, use it there too
  201. 201. Section 14.2. Command-Line Conventions • Use consistent APIs for arguments • Require a flag in front of every arg, except filenames • Use - prefix for single-letter flags • Use -- prefix for longer flags • Provide long flag aliases for every single-letter flag • Always allow “-” as a filename • Always use “--” to indicate “end of args”
  202. 202. Section 14.3. Meta-options • Standardize your meta options • --usage • --help • --version • --man
  203. 203. Section 14.4. In-situ Arguments • If possible, allow same file for both input and output • foo_command -i thisfile -o thisfile • Obviously might require some jiggery • Or IO::InSitu from the CPAN:
 my $in_name = ....;
 my $out_name = ....;
 my ($in, $out) = open_rw($in_name, $out_name);
 while (<$in>) { print $out “$.: $_” }
  204. 204. Section 14.5. Command-Line Processing • Standardize on a single approach to command-line processing • Look at Getopt::Long (core) • For a more Damianized approach, Getopt::Declare • Spell out your usage, and the args follow! • Manpage is longer than most program manpages
  205. 205. Section 14.6. Interface Consistency • Ensure interface, run-time, and docs are consistent • Usage messages should match what the code does • Simple with Getopt::Declare! • Help docs should match what the manpage says • If Getopt::Declare wasn’t enough... • ... consider Getopt::Euclid • Constructs your parser from POD • Then the manpage definitely matches the code! • Again, suitably Damianized • Requires many “aha!” moments to fully understand
  206. 206. Section 14.7. Interapplication Consistency • Factor out common CLI items to shared modules • If every tool has -i and -o, don’t keep rewriting that • Getopt::Euclid “pod modules” can provide common code • Also permits readers to learn it once • Instead of having to recognize it each time • “This program implements L<Our::Standard::CLI>.”
  207. 207. Chapter 15. Objects • Used by practically every large practical program • Even used by many little programs • “There’s more than one way to implement it” • Hash-based? Array-based? Inside-out? • Using some framework, or hand-coded? • So what’s best? • Here we go...
  208. 208. Section 15.1. Using OO • Make OO a choice, not a default • Don’t introduce needless objects • Especially over-complex frameworks • You probably don’t need a different class for every token
  209. 209. Section 15.2. Criteria • Use objects if you have the right conditions • Large systems • Obvious large structures • Natural hierarchy (polymorphism and inheritance) • Many different operations on the same data • Same or similar operations on related data • Likely to add new types later • Operator overloading can be used logically • Implementations need to be hidden • System design is already OO • Large numbers of other programmers use your code
  210. 210. Section 15.3. Pseudohashes • Don’t • “We’re sorry” • Removed in 5.10 anyway (long time ago!)
  211. 211. Section 15.4. Restricted Hashes • Don’t • Not really restricted • For every lock, there’s a means to unlock • Inside-out objects handle most of the needs here
  212. 212. Section 15.5. Encapsulation • Don’t bleed your guts • Provide methods for all accessors • If you want to provide hashref access • Provide a hashref overload • Prepare to pay the price for such access • Consider inside-out objects to ensure guts are private
  213. 213. Section 15.6. Constructors • Give every constructor the same standard name • And that’s “new”, duh • Says Damian • Or, just use what’s natural • After all, DBI->connect does it. • Or rewrite that as DBI->new({ connect => [...] }) • Not likely any time soon
  214. 214. Section 15.7. Cloning • Don’t clone in your constructor • $instance->new should be an error • If you want an “object of the same type as...”, use ref:
 my $similar = (ref $instance)->new(...); • If you want a proper clone, write a clone/copy routine:
 my $clone = $instance->clone; • Note: please stop cargo-culting:
 sub new {
 my $self = shift;
 my $class = ref $self || $self; # bad
 ...
 }
  215. 215. Section 15.8. Destructors • Inside out objects need destructors • Or they leak • And be sure to destroy all the parent classes too:
 sub DESTROY {
 my $deadbody = shift;
 ... destroy my attributes here ...
 for (our @ISA) {
 my $can = $_->can(“DESTROY”);
 $deadbody->$can if $can;
 }
 } • Or use “NEXT” in the CPAN
  216. 216. Section 15.9. Methods • Methods are subroutines • Except methods can be same-named as built-ins • They’ll never be confused with subroutines
  217. 217. Section 15.10. Accessors • Provide separate read and write accessors • Same-named accessors might have spooky non-behavior:
 $person->name(“new name”);
 $person->rank(“new rank”);
 $person->serial_number(“new number”); # read only! • Of course, these double-duty routines should abort • But they often don’t • Also, differently named accessors easier to grep for • Help understand places a value could change
  218. 218. Section 15.11. Lvalue Accessors • Don’t use lvalue accessors • Require returning direct access to a variable • No place to do error checking • Or require returning tied var • Slow • Really slow • Like, slower than just a method call
  219. 219. Section 15.12. Indirect Objects • Indirect object syntax is broken • Bad: my $item = new Thing; • Good: my $item = Thing->new; • No chance of mis-parse • Mis-parse has bitten someVery Smart People • If you think you’re smarter than someVery Smart People • ... let us know how that’s working out for you
  220. 220. Section 15.13. Class Interfaces • Provide optimal interface, not minimal one • Consider common operations and implement them • As class author, you can optimize internally • Class users have to use your published interface • Worse, they might repeat the code, or build layers • Resulting in more wrappers or repeated code!
  221. 221. Section 15.14. Operator Overloading • Overload only for natural-mapping operator • Don’t just pick operators and map them • You’ll get bit by precedence, most likely • Or leave one out, and really confuse users • Don’t make the C++ mistake:“>>” meaning write-to
  222. 222. Section 15.15. Coercions • If possible, provide sane coercions: • Boolean (for tests) • Numeric • String • If your object is “logically empty”, then: if ($your_object) { ... } should return false else true • And “$your_object” shouldn’t provide a hex address • Overloaded numerics and strings are also used in sorting
  223. 223. Chapter 16. Class Hierarchies • Objects get their power from inheritance • Designing inheritance well takes practice
  224. 224. Section 16.1. Inheritance • Don’t manipulate @ISA directly • Prefer “use parent”: use parent qw(This That Other); • Nice side-effect of loading those packages automatically • Or use a framework that establishes this directly • Replaces the heavier-weight “use base”
  225. 225. Section 16.2. Objects • Use inside-out objects • Said Damian back in the day • Avoids attribute collisions • Encapsulates attributes securely • These days, somewhat discouraged • Makes objects a bit too opaque • Still useful for hostile subclassing though
  226. 226. Section 16.3. Blessing Objects • Never use one-argument bless • Blessing into “the current package” breaks subclassing:
 package Parent;
 sub new { bless {} }
 package Child;
 use base qw(Parent);
 package main;
 my $kid = Child->new;
 print ref $kid; # Parent?? • That should have been:
 package Parent; sub new { bless {}, shift }
  227. 227. Section 16.4. Constructor Arguments • Pass constructor args as a hashref • Gives you key/value pairs easily • Broken hashref caught at the caller, not “new” • “Odd number of elements...” • Derived class constructor can pick out what it wants • Pass the remaining items up to base-class constructor
  228. 228. Section 16.5. Base Class Initialization • Distinguish initializers by class name • To avoid collisions:
 my $thing = Child->new(
 child => { this => 3, that => 5 },
 parent => { other => 7 }); • Says Damian • I say it reveals too much of the hierarchy • Gets complex if Parent/Child are refactored • Use logical names for attributes, not class-based names
  229. 229. Section 16.6. Construction and Destruction • Separate construction, initialization, and destruction • Multiple inheritance breaks if every class does its own bless • Instead, all classes should separate bless from initialize • Suggested name “BUILD” • Constructor can call base constructor, then all BUILD routines in hierarchy • Similar strategy for DESTROY • Or just use Moose or Moo
  230. 230. Section 16.7. Automating Class Hierarchies • Build standard class infrastructure automatically • Getting accessors and privacy correct can take special care • Use inside-out objects for safety and best development speed • Class::Std or Object::InsideOut
  231. 231. Section 16.8. Attribute Demolition • Use Class:Std or Object::InsideOut to ensure deallocation of attributes • Calls DESTROY in all the right places
  232. 232. Section 16.9. Attribute Building • Initialize and verify attributes automatically • Again, Class::Std or Object::InsideOut to the rescue • Calls BUILD in all the right places • Or attribute hashes can be annotated for the common case
  233. 233. Section 16.10. Coercions • Specify coercions as attributed methods • Again, provided with Class::Std/Object::InsideOut • For numeric context:
 sub count : NUMERIFY { return ... } • For string context:
 sub as_string : STRINGIFY { return ... } • For boolean context:
 sub is_true : BOOLIFY { return ... }
  234. 234. Section 16.11. Cumulative Methods • Use attributed cumulative methods instead of SUPER • Class::Std/Object::InsideOut provide CUMULATIVE:
 sub items : CUMULATIVE { ... } • When called in a list context, all calls are made • Result is the concatenated separate lists • Scalar context similar • Result is the string concatenation of all calls • No need for “SUPER::” • Works even with multiple inheritance
  235. 235. Section 16.12. Autoloading • Don’t use AUTOLOAD • Effectively creates a runtime interface • Not a compile time interface • Must be prepared for • Methods you don’t want to handle • DESTROY • Calling “NEXT” for either of those is tricky
  236. 236. Chapter 17. Modules • Reusable collections of subroutines and their data • Unit of exchange for the CPAN • Unit of testability internally
  237. 237. Section 17.1. Interfaces • Design the module’s interface first • A bad interface makes a module unusable • Think about how your module will be used • Name things from the user’s perspective • Not from how it’s implemented • Create examples, or even “use cases” for your interface • Keep the examples around for your documentation!
  238. 238. Section 17.2. Refactoring • Place original code inline • Place duplicated code in a subroutine • Place duplicated subroutines in a module
  239. 239. Section 17.3. Version Numbers • Use 3-part version numbers • But vstrings are broken • Didn’t get them right until 5.8.1 • Deprecating them for 5.10! • Use the “version” module from the CPAN: use version; our $VERSION = qv(‘1.0.3’); • Also supports “development versions”: use version; our $VERSION = qv(‘1.5_12’);
  240. 240. Section 17.4. Version Requirements • Enforce version requirements programatically • Enforce Perl version:
 use 5.008; # 5.8 or greater, only • Enforce package versions:
 use List::Util 1.13 qw(max); • Use “use only” for more precise control • Get “only” from the CPAN
  241. 241. Section 17.5. Exporting • Export carefully • Your default export list can’t change in the future • Might break something if new items added • Where possible, only on request • Flooding the namespace is counterproductive • Especially with common names, like “fail” or “display”
  242. 242. Section 17.6. Declarative Exporting • Export declaratively • Perl6::Export::Attrs module:
 package Test::Utils; use Perl6::Export::Attrs;
 sub ok :Export(:DEFAULT, :TEST, :PASS) { ... }
 use pass :Export(:TEST, :PASS) { ... }
 use fail :Export(:TEST) { ... }
 use skip :Export () { ... } # can leave () off • Now @EXPORT will contain only “ok” (:DEFAULT) • And @EXPORT_OK will contain all 4 • And the :TEST and :PASS groups will be set up:
 use Test::Utils qw(:PASS); # ok, pass
  243. 243. Section 17.7. InterfaceVariables • Export subroutines, not data • Yes, let’s create a module, but then reintroduce global variables • OK, that’s a bad idea • Variables have only get/set interface • And very little checking (unless you “tie”) • Export subroutines to give more control • And more flexibility for the future
  244. 244. Section 17.8. Creating Modules • Build new module frameworks automatically • The classic: h2xs • Modern: • ExtUtils::ModuleMaker • Module::Starter
  245. 245. Section 17.9. The Standard Library • Use core modules when possible • See “perldoc perlmodlib” • Lots and lots of things • The core gets bigger over time • See Module::Corelist to know when • Command line “corelist”
  246. 246. Section 17.10. CPAN • Use CPAN modules where feasible • Smarter people than you have hacked the code • Or at least had more time than you have • Reuse generally means higher quality code • Also means communities can form • Questions answered • Pool of talent available • Someone else might fix bugs • metacpan.org has the best interface
  247. 247. Chapter 18. Testing and Debugging • We all write perfect software! • In our dreams... • Every code has one bug or more • The trick is... where? • Testing and debugging for the win
  248. 248. Section 18.1. Test Cases • Write tests first • Turn your examples into tests • Create the tests before coding • Run the tests, which should fail • Code (correctly!) • Now the tests should succeed • Tests are also documentation • So, comment your tests
  249. 249. Section 18.2. Modular Testing • Use Test::More • Testing testing testing! • Don’t code your tests ad-hoc • Use the proven Perl framework • “perldoc Test::Tutorial”
  250. 250. Section 18.3. Test Suites • For local things, create additional Test::* mixins • Use Test::Harness as a basis for additional tests • New harness is being developed • Separates reporting from detecting
  251. 251. Section 18.4. Failure • Write test cases to make sure that failures actually fail • This is psychologically hard sometimes • Have to out-trick yourself • Good task for pairing with someone else
  252. 252. Section 18.5. What to Test • Test the likely and the unlikely • Some ideas: • Minimum and maximum, and near those • Empty strings, multiline strings, control characters • undef and lists of undef,“0 but true” • Inputs that might never be entered • Non-numeric data for numeric parameters • Missing arguments, extra arguments • Using the wrong version of a module • And every bug you’ve ever encountered • The best testers have the most scars
  253. 253. Section 18.6. Debugging and Testing • Add new test cases before debugging • A bug report is a test case! • Add it to your test suite • (You do have a test suite, right?) • Then debug. • You’ll know the bug is gone when the test passes • And you’ll never accidentally reintroduce that bug!
  254. 254. Section 18.7. Strictures • Use strict • Barewords in the wrong place:
 my @months = (jan, feb, mar, apr, may, jun,
 jul, aug, sep, oct, nov, dec); • Variables that are probably typos:
 my $bamm_bamm = 3; .... $bammbamm; • Evil symbolic references:
 $data{fred} = “flintstone”;
 ...
 $data{fred}{age} = 30; • You just set $flintstone{age} to 30!
  255. 255. Section 18.8. Warnings • Use warnings • “use warnings” catches most beginner mistakes • Beware “-w” on the command line • Unless you want to debug someone else’s mistakes when you use a module
  256. 256. Section 18.9. Correctness • Never assume that a warning-free compile implies correctness • Oh, if life were only that simple! • You have to be more clever at debugging than you were at coding
  257. 257. Section 18.10. Overriding Strictures • Reduce warnings for troublesome spots • Add “no warnings” in a block:
 {
 no warnings ‘redefine’;
 ...;
 } • The block will narrow down the influence • Note that this has lexical influence • Not dynamic influence
  258. 258. Section 18.11. The Debugger • Learn a subset of the debugger: • Starting and exiting • Setting breakpoints • Setting watchpoints • Step into, out of, around • Examining variables • Getting the methods of a class/instance • Getting help
  259. 259. Section 18.12. Manual Debugging • Used serialized warnings when debugging “manually” • Reserve “warn” for “if debugging” • You shouldn’t be using warn for anything else • Automatically goes to STDERR • Or use Log::Log4Perl with debug levels
  260. 260. Section 18.13. Semi-Automatic Debugging • Consider “smart comments” rather than warn • As in,“Smart::Comments”:
 use Smart::Comments;
 ...
 for my $result (@some_messy_array) {
 ### $result
 } • Supports assertions too:
 ### check: ref $result
 ### require: $result->foo > 3 • No delivery penalty • Just don’t include Smart::Comments!
  261. 261. Chapter 19. Miscellanea • Sure, it’s gotta fit somewhere • Might as well be at the end
  262. 262. Section 19.1. Revision Control • Use revision control • Essential for team programming • But even good for one person • Great for sharing updates • Helpful for “when did it break” • And more importantly “who broke it” • Also prevents potential loss from hardware failures and human failures
  263. 263. Section 19.2. Other Languages • Integrate non-Perl code with Inline:: modules • Relatively simple • Compared to raw XS codewriting • Caches nicely • Can be shipped pre-expanded with a distro • Can be installed per-user as needed
  264. 264. Section 19.3. Configuration Files • Consider Config::Std • Simple to use:
 use Config::Std;
 read_config ‘~/.demorc’ => my %config;
 $config{Interface}{Disclaimer} = ‘Whatever, dude!’; • Allow user changes to be rewritten for the next run:
 write_config %config; • Whitespace and comments are preserved! • Config::General: all that and XML blocks, heredocs, includes
  265. 265. Section 19.4. Formats • Don’t • Consider Perl6::Form instead
  266. 266. Section 19.5. Ties • Don’t • Fairly inefficient • Can be spooky to the naive • Every tied operation can be replaced with an explicit method call
  267. 267. Section 19.6. Cleverness • Don’t • Example:
 my $max_result =
 [$result1=>$result2]->[$result2<=$result1]; • Can you spot the bug? • Yeah, it’s min, not max • Better:
 use List::Util qw(max);
 $max_result = max($result1, $result2);
  268. 268. Section 19.7. Encapsulated Cleverness • if you must be clever, at least hide it well! • Worked for the Wizard of Oz for many years • “Pay no attention to the man behind the curtain!” • Easy-to-understand interfaces better than scary comments • Be clever, and hide it
  269. 269. Section 19.8. Benchmarking • Don’t optimize until you benchmark • Get the code right first • Then make it go faster, if necessary • Use the tools • Benchmark • Devel::DProf • Devel::SmallProf • Devel::NYTProf
  270. 270. Section 19.9. Memory • Measure data before optimizing it • Use Devel::Size for details:
 size(%hash) # measures overhead
 total_size(%hash) # measures overhead + data • Overhead is often surprising
  271. 271. Section 19.10. Caching • Look for opportunities to use caches • Subroutines that return the same result • Example: SHA digest of data:
 BEGIN {
 my %cache;
 sub sha1 {
 my $input = shift;
 return $cache{$input} ||= do {
 ... expensive calculation here ...
 };
 }
 }
  272. 272. Section 19.11. Memoization • Automate subroutine caching via Memoize • No that’s not a typo • No need to rewrite the same cache logic • Simpler example:
 use Memoize;
 memoize(‘sha1’);
 sub sha1 {
 expensive calculation here
 } • Lots of options (defining key, time-based cache)
  273. 273. Section 19.12. Caching for Optimization • Benchmark any caching • Cache is always a trade-off • time vs space • time to compute vs time to fetch/store cached value • time to compute vs time to figure out cache is fresh • Caching isn’t necessarily a win
  274. 274. Section 19.13. Profiling • Don’t optimize apps: profile them • Find the hotspots that are slow, and fix them • Use Devel::DProf for subroutine-level visibilities • Use Devel::SmallProf for statement-level
  275. 275. Section 19.14. Enbugging • Carefully preserve semantics when refactoring • The point of refactoring is not to break things
  276. 276. Bonus • Damian isn’t the only one with good ideas
  277. 277. Use Perl::Critic • Automatically test your code against various suggestions made here • Can be configured to require or ignore guidelines • Consider it “lint for Perl”
  278. 278. In summary • Learn from the ones who have the most scars • Buy the book... • Damian could use the money!

×