Dealing with Legacy Perl Code - Peter Scott
Upcoming SlideShare
Loading in...5
×
 

Dealing with Legacy Perl Code - Peter Scott

on

  • 9,759 views

Peter Scott, author of the O'Reilly School of Technology's Perl Programming Certificate series, talks about how to deal with "legacy" Perl code - written by someone else, or maybe even yourself when ...

Peter Scott, author of the O'Reilly School of Technology's Perl Programming Certificate series, talks about how to deal with "legacy" Perl code - written by someone else, or maybe even yourself when you were younger and less wise.

Statistics

Views

Total Views
9,759
Views on SlideShare
9,759
Embed Views
0

Actions

Likes
1
Downloads
40
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Thank you for coming. Ask questions before break so I have time to research.
  • Code written by someone else Or you, long enough ago Say a couple of weeks Why is Perl so susceptible?
  • Perl’s motto is also a curse Perl is like English If you have William F. Buckley Jr., you also have Homer Simpson
  • 100 line Perl script may not get the same attention to coding standards, documentation, or other methodology as a 1,000 line C program even though it deserves it just as much.
  • Slow - how often is that really a problem? Odds are that if it’s too slow in Perl it’s going to be too slow in any language. Or, it was written using a poor algorithm to begin with, which is something that likewise you can do in any language.
  • Books like PBP help with this problem by telling you what not to do as much as what to do. Brian Kernighan: Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are by definition, not smart enough to debug it.
  • How good a programmer are/were they? How fluent in Perl?
  • Too many people skip this step and assume they know what they should do. Are original optimization goals still valid?
  • Think about testing now. Test with same tools as for Test First. Tests inline make code harder to read though.
  • Sounds obvious, but if tests aren’t fun to write, haven’t followed. Look for more Test:: modules on CPAN.
  • Could do same as before - save HTML to file and compare. This is better.
  • Explain how works.
  • Pretty to look at means easy on the eyes - not in the usual sense but easy to read. Not formatted with Acme::Eyedropts to look like Mona Lisa. Consistent layout is easy with editors - hit TAB in Emacs.
  • Common indentation style - none at all.
  • Not quite my style - braces in K&R style to use fewer lines.
  • Ok… here it is using my style. So what if it takes up more space. How many people prefer this style, anyway?
  • You probably want to know which parts of the code are being executed. You might want to know how fast those parts are being executed.
  • Analysis you do with eyeballs. Example app: 2.5 hrs/1GB -> 20min/12MB, code halved, using hashes. Regexes - except may be optimized for performance. So profile.
  • If you're working in a team, once you've settled on indentation style you ought to settle commenting style. Here you see the result of paying someone by the line. Note that
  • this is the only thing here that actually does anything. Dealing with the Documentation Hound is an insidious problem because of course many comments *are* useful and on the whole, people don't use enough of them. It’s like talking to a random group of people and saying, “Y’all need to eat more.” There might be a couple of anorexics in the audience, but for most people, that’s not the right message. But the point is that everything in the program, whether it's code or comments, should contribute towards getting the job done or understanding how it works, and anything that doesn't do that is getting in the way of the stuff that does. So some of these comments would be helpful, but the majority of them are just making it harder for you to read the code.
  • What can you do about it? Firstly, get rid of the ASCII art. If you want a gap, use a blank line. -click- Next, if the comments relate to how to call public methods or functions, then put them in documentation so anyone can see a proper interface document just by running perldoc. But if you're the kind of project that uses comments about the signature of a function, those are fine. Personally, I leave that information to POD. -click- The whole point of this pruning exercise is to expose the code itself. It was pretty obvious in this example that there was no point at all in having a function for adding one to a number since it would be shorter and clearer just to use ++ in line. Usually, you're not going to be able to inline a subroutine, but if you're taking over maintenance of a coding horror this is the first phase in a rewriting campaign and the next step is line by line editing that may shorten the code still further. Ultimately, you'd like every function and method to be no longer than one screen of your editor. Look at Brian Ingerson. His methods are so short that just the line "my $self = shift" accounted for 10% of their line counts, so he wrote a module to put it in automatically. -click- The comments that you really want are any that answer the question, "Why?" That is the whole point of all documentation: Why is this line of code so weird, why does this function have a bizarre name, why does my database not follow normalization rules, why am I using substr() and index() instead of a simpler regular expression, why, why, why? If you come back to a program you wrote six months ago without perfect documentation, I guarantee that the first question you ask when you look at it again is going to begin with "Why". Anything that answers that question needs to be preserved, enshrined, and embalmed.
  • I'm not saying you should get rid of all that information about who wrote the program and when, and what changes they made; just that it belongs in the right place. If the program is something you're using in your own environment then you can just use a source code control system. Everything from RCS to Subversion will keep track of all that stuff for you. That's where you'd expect to find that information if you were looking for it, so put it there. If you're making a distribution to send to someone else - such as CPAN - then you can stick that information in a README and a change log file. Okay, moving on to the next example…
  • Ok, next pop quiz: what does this do? (beat) Beep - time's up.
  • You just look at that and go, "Oh, of course, it replaces groups of consecutive non-digits with a single space."
  • Okay, here we have something I copied here verbatim from where I saw it so you could see how the author daringly defied the normal rules of layout for the sake of conciseness, alignment, and general prettiness, and it certainly is pretty. Too bad it's so darned repetitive. Repetitive code is a coding horror that's like a leech sucking on your brain, because part of your brain is going, "okay, skip all these, they're all the same," and another, wiser part of your brain is going, "Wait a minute, I have to check to make sure they *are* all the same." So is this like the regex example - is one of the digits missing? No! But the more cautious among you probably wondered, "Gee, should it include zero in the list? Well, it's a month, and they're usually numbered starting at one, so…" Too much stuff for your brain to think about! You need it for more important things, like how to explain to your wife why you just bought that 50” TV.
  • So repetitive code is a giant wake-up call that there's a better way of doing something, and of course, here it is. Speaking of repetitive code,
  • here's another example that's all too common… again, we have someone who's just not impatient enough. Either they like typing, or they like using their editor macros, either way, you've got better things to do than stare at repetitive code making sure all of those things are really the same. Trust me, it's happening, even if you don't realize it - it's the subconscious programmer's brain at work, the same thing that tells you when it's time to eat. Give it something better to do.
  • This kind of task is exactly why here docs were invented. You just type what you want with the line breaks the way you want them and you don't have to worry about quoting delimiters. "Oh, but Peter, I don't like the text being up against the left margin instead of indented to make it look separate." Well then,
  • indent it, if that's what you want to do. It's just a regex away. There are umpteen ways of doing this, and if you want to indent the heredoc terminator as well, you can do that too if you quote it right to begin with, or you can use one of a couple of nifty source code filters that do the job for you. The point is, don't be satisfied with repetitive code.
  • Especially something heinous like this, which always drives me crazy. Folks, when I'm reading a Perl program, I'm in Perl reading mode, and maybe sometimes in regex reading mode. I'm not in HTML reading mode, and I don't want to be. -click- I prefer not to look at the HTML at all, and there are a lot of ways of just get it out into a separate file or files. If you put it in a separate file then you can use an HTML editor on it - no HTML editor is going to be able to validate HTML that’s intertwingled with Perl like this. My favorite HTML editor is - someone else. Because inevitably you get all kinds of requests to make this font two points bigger, or make this background a little bit pinker, and it’s not my kind of thing. But there are people who get off on that, and I’m only too happy to let them edit HTML without having to look at my code. HTML and Perl are like dogs and cats; they just shouldn't breed together. It's bad enough when the HTML is in a heredoc, but when they're mixed together on the same line like this over and over, it's nuts. If you really want to put the HTML in the same file - say because you are the HTML editor and you won’t want anyone else touching your HTML - then use Inline::Files so you can have it in a nice clearly marked separate section where it can't escape and start molesting the Perl code.
  • Does this look familiar to anyone? It should, because it is pasted verbatim from the perldoc documentation for localtime. You see this all the time, but then inevitably they only usae a couple of the variables. And that part of your brain that acts like a little Perl interpreter is left wondering what those other variables are for and when they’re going to get used.
  • Just declare the ones you’re going to use.
  • And if you don’t like the numbers 4 and 3 here,
  • well you don't even have to do that either, if you don't mind using a module that's come with Perl since at least version 5.004. The anal-retentive part of me is forced to point out that there’s a small bug here in that if in between the first and the second call to localtime the system clock passes midnight at the end of the last day of the month, then the results will be inconsistent. If that should happen to bother you, then by all means use the previous example with the list slice and symbolic constants set to 4 and 3 so you know what they mean. Next…
  • How many people have seen a program that starts with a laundry list like this? How many of you liked it? This has already filled up the entire screen and we've hardly gotten to any executable code yet. Of course, the Documentation Hound cures this problem by declaring each one on a separate line with its own comment block preceding it saying what it's for, when it first appeared in the program, what values it's allowed to take, and other stuff that makes this ten times longer than it already is. So what do we do about it?
  • First and most important, variable declarations should go as late as possible. Remember, part of the reader's brain is going to be occupied with remembering every variable for as long as it's in scope, so push them to the innermost scope levels possible. The one exception to that is the variable that's used for specifying some global configuration setting, like the path to an important directory or the value of Avagadro's constant, or something. The main reason for putting those at the top of the code is so that a lesser programmer who comes in wanting to change one of those things in your program will find it quickly and not go spelunking through the rest of your code where they might break something. -click- You can help keep variables in the innermost scope possible by using the fact that 'my' can appear before a variable just about anywhere in Perl and in particular, you can declare a loop variable at the point you say it's a loop variable so that it'll go out of scope as soon as the loop finishes. -click- In fact, you can put 'my' in some pretty unlikely places. The second argument to the getopts() function exported by Getopt::Std is a reference to a hash to store the options specified by the user on the command line. You can even put the 'my' inside an enreferencing backslash there to save on having to declare it in a separate line. -click- This kind of coding horror is usually a symptom of another kind.
  • It's not really possible to show an example of this on the screen so you'll have to use your imagination. -click- But I'm sure it's familiar to many of you. (How many people know what JCL is?) -click- This is the usual excuse for how the program got that way. -click- And this is the inevitable defense when you point out that you need a machete to hack through the code.
  • So what do you do? It's not easy. You have to identify areas where variables are used only over a short part of the program, and then see whether that means tghat that part can be taken out into a subroutine. -click- If you use Eclipse - I'm afraid I haven't yet - then you might want to uses the Devel::Refactor module from CPAN, which can figure out what the subroutine for any given chunk of code should be. That doesn't necessarily mean it's going to be a good choice, mind you. -click- Because Monolithic Madness seldom uses strict, that means you're going to have to add it at some point if you want to stay sane. Or become sane, whichever applies to you. You can take a piecemeal approach to this if you want; first, turn on strictness for the whole program, but then embed the rest of the whole program in a naked block and immediately turn strictness off, which of course is going to be a giant no-op. But now you can start pulling code out of that no strictness block at the beginning and the end and fixing the code to be strict compliant as you go, and keep shrinking the size of the inner no strictness block.
  • I have taken some creative license with the spacing in these examples for the sake of some entertainment value. So this first example is code written by someone whose favorite language is - what?
  • Yep, C. This just prepends a root path to a bunch of relative paths. Of course, the C programmer is probably wondering at this point how Perl works without a malloc or realloc function.
  • Here's how a native Perl speaker would do that same task.
  • This next example is what might come from someone who was more used to… what?
  • Yep, FORTRAN. This prints a matrix of numbers to a file. I've tried to pick examples here that are sort of in keeping with the kinds of things those languages are generally used for. So a FORTRAN programmer when they want to print something might start looking for something called FORMAT and figure that's what they need to use for printing formatted values. And… yeessss, technically that's correct, but of course that's not how we'd really do that most of the time.
  • And here's the native Perl solution, or at least one way of doing it.
  • Okay, this should be a fairly easy one… this is Perl spoken by a what programmer?
  • That's right. This being a COBOL program, it just adds two numbers together, but because they're numbers representing money, that's okay.
  • And here's the native Perl solution. It's not really changed much because the original COBOL implies strict fixed-length formatted records, and we use unpack() for handling those in Perl. Of course, we're more used to writing applications that handle less structured input formats, or more modern structured input formats like XML.
  • Okay, this last example should be pretty easy, you've got the hang of this by now…
  • Right, BASIC. Okay, so this is the kind of program I used to write when I was learning BASIC: guess what number someone's thinking of, the hard way.
  • And here's a native Perl solution. The astute reader will note that since the original program never actually *did* anything with the right answer, in this version, it doesn't even stay in scope once we found it. CHANGE TO USE IO::Prompt
  • Okay, something a little more serious now. I'm going to assume that all of you have seen this code by now, because it's multiplying around the internet like a bad virus. --click- Of course, this is the code to parse CGI inputs - well, the beginning of the code, anyway, I couldn't bring myself to finish pasting it. This is just one instance of it I found somewhere - it wasn't hard, I just stuck out my foot and this tripped over it - and it's better than many I've seen, but still, isn't
  • *this* a lot easier to read? Not to mention the fact that it's come bundled with the Perl core since 5.004, and it gets the decoding right, unlike the previous code and virtually every other home-grown solution floating out there. It's just a lot harder than most people think to handle all the possibilities. Of course, it doesn't end with CGI.pm. There are still zillions of people parsing XML and HTML with simple regular expressions, people calling out to programs like 'date' and 'cal' instead of using a Date:: module, and people calling database programs like sqlplus instead of using DBI.
  • Mixed abstraction levels refers to code of different complexity in a method - it should all look the same level of abstraction. Time dependencies means the caller must call methods in a certain order - this pushes some of the responsibilities of the class onto the caller - create methods managing correct calling sequence
  • Restructuring is easier the less code you have. I recently compressed subroutine from 200 lines to 12. Ideally all blocks fit on screen (mine is 50 lines high). Code that’s not covered, write test for or delete (have got revision control).
  • Typical. Bug in this line, what is it? Quote missing after $zx4. Another bug. Says $x2 in the values line where the corresponding column name is x3. Tim wouldn’t settle - read DBI, discover placeholders
  • Still has a bug. $y2 and $y3 are reversed in the argument list. That's not the only bug. There's one too many question marks in the values line.
  • Maintaining associations between two sets is a hash. Now every time you need to insert values in a table, just call the subroutine.
  • Nothing ruins reading like text lump. Lots of print statements -> Here document -> separate file (config file) Usually reason for text is creating some web page etc. Candidates for templating, give work to somone else.
  • People post code using symref unwittingly. Flamed ‘cos couldn't be using strict. strict subs stops calling sub without parens if no def yet, so define sub earlier, or declare stub, or put empty parens after call. strict vars requires lexicals or effort to use package var.
  • use warnings is relatively new - so probably see -w. And that's fine. use warnings is lexically scoped, so… -W forces warnings everywhere, regardless of attempts to turn off. So use it, decide, remove, use warnings.
  • Still zillions of people parsing XML and HTML with regexen, calling out to 'date' and 'cal' & sql+. Some code may have been pasted from third-party modules before was okay to trust CPAN.

Dealing with Legacy Perl Code - Peter Scott Dealing with Legacy Perl Code - Peter Scott Presentation Transcript

  • Maintaining Code While Staying Sane Peter Scott O’Reilly School of Technology February 2011
  • Dealing With Legacy Perl
    • Legacy Perl can stink
    • Even when you wrote it
    • Or especially when you wrote it
    • But Why?
  • Why So Many Ugly Perl Programs?
    • Unfortunately, some of those ways stink
    • Or, people use more than one way of doing the same thing in the same program
  • The “DWIM” Myth
    • “ Perl programming doesn’t require the same discipline as other languages”
    • Indeed; it may require more
      • So many WTDI
    • Cure: Adopt best practices
  • The “Prototyping Only” Myth
    • “ Perl is too slow and/or unpredictable to be used for serious work”
    • Too slow - sometimes, not always when you’d expect it
    • Unpredictable - only when programming without discipline or understanding
    • Cure: Learn algorithms, profiling, benchmarking
  • The “$@%*!” Myth
    • “ Perl is a write-only language”
    • Another product of insufficient discipline
    • The dark side of TMTOWTDI
    • Cure: Adopt best practices, eschew obfuscation
  • Find the Author(s)!
    • Are they a better programmer than you or worse?
    • Especially, better or worse at Perl?
    • This helps you evaluate code you don’t understand
      • If you find code you don’t understand, it may be wrong
      • Or it may be right, and over your head
    • What was their background?
      • A Shell programmer uses different idioms from a C++ programmer
  • What Are You Dealing With?
    • What was the code optimized for?
      • Maintainability
      • Performance
      • Brevity
      • Job security
      • Something else?
  • Maintainability
    • # Print words with an even number of letters, AND even
    • # number of each vowel, AND even position in the input
    • # (input is a dictionary that has one word per line)
    • OUTER: while (<>)
    • {
    • next if $. % 2;
    • chomp;
    • next if length() % 2;
    • for my $vowel (qw/a e i o u y/)
    • {
    • my @vowels = /$vowel/g;
    • next OUTER if @vowels %2;
    • }
    • print &quot;$_n&quot;;
    • }
  • Performance
    • while (<>)
    • {
    • next if ($. | length() - 1)) % 2;
    • next if tr/e// % 2;
    • next if tr/a// % 2;
    • next if tr/i// % 2;
    • next if tr/o// % 2;
    • next if tr/u// % 2;
    • next if tr/y// % 2;
    • print;
    • }
  • Brevity
    • #!/usr/bin/perl -ln
    • ($x=aeiouy)=~s#.#y/$&//|#g;eval(&quot;$x$.|y///c&quot;)%2&&next;print
  • Job Security
    • @i = map { chop; $x++ %2 ? $_ : () } <>;
    • while ($i = shift @i)
    • {
    • ord(pack &quot;w/a*&quot;, $i) & 1 and next;
    • $_ = &quot;$in&quot;;
    • $i =~ s/$_(.*)$_/$1/ for qw/a e i o u y/;
    • print unless $i =~ /[aeiouy]/;
    • }
  • Testing
    • You can’t test too early
      • Or too much
      • Use Test::More
    • Also useful:
      • Test::Exception
      • Test::Inline
      • Test::NoWarnings
  • Tests are Real Programs, Too
    • Don’t abandon good indentation, variable naming, design, etc just because they’re “tests”
      • Follow good development practices
      • use strict and use warnings in them
      • Abstract common code to modules in t/
    • Keep tests small
    • They’ll grow anyway
    • Refactor as necessary
    • It’s fine for tests to prompt for passwords, etc
  • Testing Web Applications
    • Good design would mean you wouldn't have to go through a web server, of course
    • Start out simple by using WWW::Mechanize
      • Acts like a virtual browser
      • Easy to navigate and fill in forms
  • Web Testing Example
    • my $ua = WWW::Mechanize->new;
    • my $res = $ua->get(&quot;http://www.example.com/&quot;);
    • ok( $res->is_success, &quot;Got first page&quot;)
    • or die $res->message;
    • $ua->set_visible($username, $password);
    • ok( $ua->submit->is_success, &quot;Logged in&quot; )
    • or die $ua->res->message;
  • Modern Web Testing
    • Now you can use Test::WWW::Mechanize
      • $mech->get_ok(...)
      • $mech->title_like(...)
      • $mech->content_contains(...)
      • $mech->follow_link_ok(...)
      • $mech->has_tag_like(...)
      • etc
  • Layout
    • Code should be pretty to look at
    • Add comments where you had to think a lot
    • What’s your role?
    • Don’t reformat if it doesn’t belong to you
    • Use perltidy to fix up even the worst layout
  • Before perltidy
    • for my $word (keys %{$word{$len}}){
    • chop(my $prefix = $word);if ($opt{g}){
    • while( $prefix ){
    • if(my $words=delete$chain{ $prefix} ){ $chain{$word} = [ @$words, $word ];
    • $maxcount=max ($maxcount,@$words+1); last;}
    • chop $prefix; }
    • }else{ if (my $words = delete
    • $chain{$prefix}){$chain{$word} = [@$words,
    • $word]; $changed = 1;} }}
  • After perltidy
    • for my $word (keys %{$word{$len}}) {
    • chop(my $prefix = $word);
    • if ($opt{g}) {
    • while ($prefix) {
    • if (my $words = delete $chain{$prefix}) {
    • $chain{$word} = [@$words, $word];
    • $maxcount = max($maxcount, @$words + 1);
    • last;
    • }
    • chop $prefix;
    • }
    • }
    • else {
    • if (my $words = delete $chain{$prefix}) {
    • $chain{$word} = [@$words, $word];
    • $changed = 1;
    • }
    • }
    • }
  • After perltidy
    • for my $word (keys %{$word{$len}})
    • {
    • chop(my $prefix = $word);
    • if ($opt{g})
    • {
    • while ($prefix)
    • {
    • if (my $words = delete $chain{$prefix})
    • {
    • $chain{$word} = [@$words, $word];
    • $maxcount = max($maxcount, @$words + 1);
    • last;
    • }
    • chop $prefix;
    • }
    • }
    • else
    • {
    • if (my $words = delete $chain{$prefix})
    • {
    • $chain{$word} = [@$words, $word];
    • $changed = 1;
    • }
    • }
    • }
  • Analysis
    • Eliminate superfluous code through coverage analysis:
      • Devel::Coverage
      • Devel::Cover
    • Improve speed through profiling:
      • Devel::Dprof
      • Devel::NYTProf
  • Devel::NYTProf
    • Very new
    • Incredibly flexible and accurate
    • Terrific reporting
  • Devel::NYTProf
  • What to Look Out For in Inherited Code
    • Apparent level of Perl expertise
      • Uses hashes? Regexes?
      • Uses parallel arrays/hashes instead of LoLs?
      • Calls unnecessary external programs?
    • What version of Perl was it apparently developed for?
      • Uses my ? Or local ? Uses use ?
    • Cargo Cult Perl
  • The Documentation Hound
    • ##########################################
    • # Function name: increment_number
    • # Author: John Q. Lifer
    • # Date Created: 1996-07-14 13:45:22 PDT
    • # Last modified: 2005-03-21 11:09:32 PST
    • # Inputs: Number
    • # Outputs: None
    • # Returns: Input number plus one
    • # Exceptions: none
    • # Change history:
    • # 1996-07-21: Fixed off by one bug - jql
    • # 2002-10-23: Changed obfuscatory ++ operator - jql
    • ##########################################
    • sub increment_number {
    • ### Formal parameter list
    • my ($num) = @_;
    • ### Function body
    • # TODO: Throw exception on missing input, NaN, etc... - jql
    • $num = $num + 1; # Add one to $num
    • return $num;
    • }
  • The Documentation Hound
    • ##########################################
    • # Function name: increment_number
    • # Author: John Q. Lifer
    • # Date Created: 1996-07-14 13:45:22 PDT
    • # Last modified: 2005-03-21 11:09:32 PST
    • # Inputs: Number
    • # Outputs: None
    • # Returns: Input number plus one
    • # Exceptions: none
    • # Change history:
    • # 1996-07-21: Fixed off by one bug - jql
    • # 2002-10-23: Changed obfuscatory ++ operator - jql
    • ##########################################
    • sub increment_number {
    • ### Formal parameter list
    • my ($num) = @_;
    • ### Function body
    • # TODO: Throw exception on missing input, NaN, etc... - jql
    • $num = $num + 1; # Add one to $num
    • return $num;
    • }
  • The Documentation Hound Cure
    • s/^#+n//mg
    • Move to POD later in this file
      • Or maybe another file
        • Such as /dev/null
      • But keep function/method signatures and descriptions
    • Make the code tell the story
    • Preserve comments that answer 'Why?'
  • The Documentation Hound Cure
    • For local programs, author and history information can be taken care of by a source code control system
    • For distributions:
      • Move author information to a README or POD AUTHOR section
      • Move history information to change log
  • String Manipulation, BASIC-Style
    • $repl = ' ';
    • for ($off = 0; $off < length($str); $off++) {
    • $c = substr($str, $off, 1);
    • if (index(&quot;012345789&quot;, $c) < 0) {
    • substr($str, $off, 1, $repl);
    • $off-- unless $repl;
    • $repl = '';
    • }
    • else {
    • $repl = ' ';
    • }
    • }
  • … i.e., Without Regexes
    • $str =~ s/D+/ /g;
  • Nice Formatting, But…
    • if ($month == &quot;1&quot;) { $month = &quot;0&quot; . $month; }
    • if ($month == &quot;2&quot;) { $month = &quot;0&quot; . $month; }
    • if ($month == &quot;3&quot;) { $month = &quot;0&quot; . $month; }
    • if ($month == &quot;4&quot;) { $month = &quot;0&quot; . $month; }
    • if ($month == &quot;5&quot;) { $month = &quot;0&quot; . $month; }
    • if ($month == &quot;6&quot;) { $month = &quot;0&quot; . $month; }
    • if ($month == &quot;7&quot;) { $month = &quot;0&quot; . $month; }
    • if ($month == &quot;8&quot;) { $month = &quot;0&quot; . $month; }
    • if ($month == &quot;9&quot;) { $month = &quot;0&quot; . $month; }
  • Nice Formatting, But…
    • $month = sprintf &quot;%02d&quot;, $month;
  • Too Much Time on Their Hands
    • $smtp->datasend(&quot;To: santa.claus@north.polen&quot;);
    • $smtp->datasend(&quot;From: johnny@homen&quot;);
    • $smtp->datasend(&quot;Subject: I've Been Goodn&quot;);
    • $smtp->datasend(&quot;n&quot;);
    • $smtp->datasend(&quot;Dear Santan&quot;);
    • $smtp->datasend(&quot;For Christmas I would like:n&quot;);
    • $smtp->datasend(&quot; Perl 6n&quot;);
    • $smtp->datasend(&quot;Thank youn&quot;);
  • Too Much Time on Their Hands
    • $smtp->datasend(<<'EOTEXT');
    • To: santa.claus@north.pole
    • From: johnny@home
    • Subject: I've Been Good
    • Dear Santa
    • For Christmas I would like:
    • Perl 6
    • Thank you
    • EOTEXT
  • Too Much Time on Their Hands
    • $smtp->datasend(<<'EOTEXT' =~ /[^Sn]*(.*?n)/g);
    • To: santa.claus@north.pole
    • From: johnny@home
    • Subject: I've Been Good
    • Dear Santa
    • For Christmas I would like:
    • Perl 6
    • Thank you
    • EOTEXT
    • $smtp->datasend(<<'EOTEXT' =~ /[^Sn]*(.*?n)/g);
    • EOTEXT
    Too Much Time on Their Hands To: santa.claus@north.pole From: johnny@home Subject: I've Been Good Dear Santa For Christmas I would like: Perl 6 Thank you
    • Or use, say, Text::Outdent or similar
  • Way Too Much Time On Their hands
    • print &quot;<HTML><HEAD>n&quot;;
    • print &quot;<TITLE>My Home Page</TITLE>n&quot;;
    • print &quot;</HEAD><BODY>n&quot;;
    • print &quot;<H1>My Home Page</H1>n&quot;;
    • print &quot;<H2>What I Did Last Summer</H2>n&quot;;
    • print &quot;<H3>by Cuthbert J. Bigglesworth</H3>n&quot;;
    • print &quot;<P>Me and my dog <I>Fang</I> went down &quot;;
    • print &quot;to the river and caught toads.</P>n&quot;;
    • print &quot;<P>P.S. I also learned Perl.</P>n&quot;;
    • print &quot;<P>Here is a scalar: <KBD>$x</KBD>.</P>n&quot;;
    • print &quot;</BODY></HTML>n&quot;;
    • Use HTML::Template, Text::Template, the Template Toolkit, or Inline::Files
    • my ($sec, $min, $hour,
    • $mday, $mon, $year,
    • $wday, $yday, $isdst)
    • = localtime(time);
    The Perils of Cut and Paste # But now use only $mon and $mday...
    • my ($mday, $mon)
    • =(localtime)[4,3];
    The Perils of Cut and Paste
    • my ($mday, $mon)
    • = (localtime)[4,3];
    The Perils of Cut and Paste
    • use Time::localtime;
    • my ($mday, $mon)
    • = (localtime->mday, localtime->mon);
    The Perils of Cut and Paste
  • Scope? What is This Thing You Call Scope?
    • my ($count, $ncount, @recs, @nrecs, %ccname, %ccphone, %ccaddr, %cccity)
    • my ($count2, $temp, @vbinfo, @pscan, $is_true);
    • my $temp2;
    • my $tempcount;
    • my $fudgeFactor;
    • my ($fname, $lname, $mi, $address1, $address2, $city, $state, $country, $c_code, $phone, $email, $email_valid);
    • my ($form1, $form2, $form3, $form4, $form4a, $form4b, $form4b_valid, @subtotals, $preTaxTotal, $postTaxTotal, $shipping, $TotalTotal);
    • my ($is_valid, $discount, $mealpref, $likes_pie, $whatisthisfor);
    • my ($PI, $PIE) = (3.14159265358979, &quot;cherry&quot;);
    • $count2 = 3;
    • [...]
  • Scope Ignorance Cure
    • Move variable declaration to latest possible point
      • Exception: configuration settings
        • Put those in a separate file if appropriate
    • Use in-line declarations for loop variables:
      • foreach my $dog (@schnauzers)
      • while (my $imp = shift @demons)
    • You can carry this even further:
      • getopts('dq:v', my %Opt);
    • Scope Ignorance is frequently combined with Monolithic Madness
  • Monolithic Madness
    • (Visualize 2500 lines of code without the word 'sub')
    • Sufferers’ favorite language: JCL
    • “ It started out at 30 lines… it just grew”
    • “ I know where everything is”
      • Of course, no one else does
  • Monolithic Madness Cure
    • Look for variables with short scopes and evaluate the area for subroutine-ness
    • If you like Eclipse, try Devel::Refactor and the extract_subroutine method for the EPIC plug-in
    • use strict and turn it off over an ever-narrowing scope:
      • use strict;
      • [...]
      • {
      • no strict;
      • [...]
      • }
      • [...]
  • Perl from ???
    • $#abspaths = $num;
    • for ($i=0; $i<$num; $i++) {
    • my $newlen =
    • $ROOTLEN+1+length($paths[$i]);
    • $abspaths[$i] = ' ' x $newlen;
    • $abspaths[$i] = sprintf(&quot;%s/%s&quot;, $ROOT,
    • $dirs[$i]);
    • }
  • Perl from C
    • abspaths = realloc(abspaths,
    • num * sizeof(char*));
    • for ( i=0; i< num; i++) {
    • int newlen =
    • ROOTLEN+1+ strlen(paths[ i]);
    • char temp[newlen];
    • abspaths[ i] = malloc(newlen);
    • sprintf(abspaths[i], &quot;%s/%s&quot;, ROOT,
    • dirs[ i]);
    • }
  • Perl from C
    • @abspaths = map { &quot;$ROOT/$_&quot; } @paths;
  • Perl from ???
    • $file = &quot;matrix.dat&quot;;
    • open (FH, &quot;>$file&quot;);
    • for ($I = 1, $I <= 4; $I++) {
    • $value = $X[$I][$_],
    • write (FH) for 1..10;
    • }
    • format FH =
    • <<<<<<<<<<<<<
    • $value
    • .
    • close (FH);
    • exit &quot;Done&quot;;
  • Perl from FORTRAN
    • ofile = &quot;matrix.dat&quot;
    • OPEN (42, FILE=ofile)
    • DO 10 I = 1, 4
    • 10 WRITE (42,100) (X(I,J),J=1,10)
    • 100 FORMAT &quot;(10F10.3)&quot;
    • CLOSE (42)
    • STOP &quot;Done&quot;
  • Perl from FORTRAN
    • open my $fh, '>', $file or die $!;
    • for my $i (1 .. 4) {
    • printf {$fh} &quot;%10.3f&quot; $X[$i][$_]
    • for 1..10;
    • print {$fh} &quot;n&quot;;
    • }
  • Perl from ???
    • ######################
    • #Program name: Report.
    • ######################
    • sub decipher {
    • unpack $fmt, shift;
    • }
    • $fmt = &quot;A7&quot; . # base
    • &quot;x4&quot; . # filler
    • &quot;A7&quot;; # bonus
    • $rec = <>;
    • ($base, $bonus) = decipher($rec);
    • $salary = $base + $bonus;
    • printf &quot;%7.2fn&quot;, $salary;
    • exit;
  • Perl from COBOL
    • IDENTIFICATION DIVISION.
    • PROGRAM-ID. Report.
    • DATA DIVISION.
    • WORKING-STORAGE SECTION.
    • 01 salary PICTURE 99999V99
    • 01 rec
    • 02 base PICTURE 99999V99
    • 02 FILLER PICTURE X(4)
    • 02 bonus PICTURE 99999V99
    • PROCEDURE DIVISION.
    • READ rec
    • ADD bonus TO base GIVING salary
    • DISPLAY salary
    • STOP RUN.
  • Perl from COBOL
    • my $rec = <>;
    • my ($bonus, $base) = unpack &quot;A7x4A7&quot;, $rec;
    • my $salary = $bonus + $base;
    • printf &quot;%7.2fn&quot;, $salary;
  • Perl from ???
    • #!/usr/bin/perl -l
    • print &quot;Think of a number: &quot;; $dummy = <>;
    • $I = 1;
    • AGAIN: print &quot;Is it &quot;,$I, &quot;?&quot;;
    • $A = <>;
    • $X = substr($A,0,1);
    • if($X eq&quot;Y&quot; or $X eq&quot;y&quot;) { goto DONE }
    • $I =$I + 1
    • goto AGAIN;
    • DONE: exit;
    • # END
  • Perl from BASIC
    • 100 INPUT &quot;Think of a number: &quot;;D$
    • 110 LET I = 1
    • 120 PRINT &quot;Is it &quot;, I;
    • 130 INPUT &quot;?&quot;; A$
    • 140 LET X$ = LEFT$(A$,1)
    • 145 IF X$ = &quot;Y&quot; OR X$ = &quot;y&quot; THEN GOTO 149
    • 147 LET I = I + 1
    • 148 GOTO 120
    • 149 STOP
    • 150 END
  • Perl from BASIC
    • print &quot;Think of a numbern&quot;;
    • my $ans = '';
    • for (my $guess = 1; $ans !~ /^y/i; $guess++) {
    • print &quot;Is it $guess? &quot;;
    • chomp($ans = <>);
    • }
    • read(STDIN, $buffer, $ENV{CONTENT_LENGTH});
    • my @pairs = split(/&/, $buffer);
    • push(@pairs, map { split(/&/, $_) } $ENV{QUERY_STRING});
    • push(@pairs, map { split(/&/, $_) } @ARGV);
    • foreach my $pair (@pairs) {
    • my ($name, $value) = split(/=/, $pair);
    • $name =~ tr/+/ /;
    • $name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack(&quot;C&quot;, hex($1))/eg;
    • $value =~ tr/+/ /;
    • $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack(&quot;C&quot;, hex($1))/eg;
    • [... You know the rest... ]
    Cargo Cult Perl
    • Stop the insanity!
  • Cargo Cult Perl use CGI;
  • Comment Code Smells
    • Non-O-O:
      • Wannabee Objects and Data Clumps*
      • Cut And Paste
    • O-O:
      • Mixed Abstraction Levels
      • Time Dependencies*
    • * Courtesy of “The Art of Agile Development”, Shore & Warden
  • Line Editing
    • Reduce bloat
      • Mothball code that coverage analysis indicates is not called
      • Shorten subroutines and main program to one screen’s length at most
      • Don’t exceed screen width
  • Consolidate Variables
    • my (%hits_by_client, %hits_by_method, %hits_by_ext,
    • %hits_by_protocol, %hits_by_uri);
    • $hits_by_client{$client}++;
    • $hits_by_method{$method}++;
    • $hits_by_ext{$extension}++;
    • $hits_by_protocol{$protocol}++;
    • $hits_by_uri{$uri}++;
  • Consolidate Variables
    • my %hits;
    • $hits{CLIENT}{$client}++;
    • $hits{METHOD}{$method}++;
    • $hits{EXTENSION}{$extension}++;
    • $hits{PROTOCOL}{$protocol}++;
    • $hits{URI}{$uri}++;
  • Consolidate Variables
    • my ($client, $method, $extension, $protocol, $uri) =
    • ($line =~ /^(S+) - .../);
    • my %hits;
    • $hits{CLIENT}{$client}++;
    • $hits{METHOD}{$method}++;
    • $hits{EXTENSION}{$extension}++;
    • $hits{PROTOCOL}{$protocol}++;
    • $hits{URI}{$uri}++;
  • Consolidate Variables
    • my @KEYS = qw(CLIENT METHOD EXTENSION PROTOCOL URI);
    • my %access;
    • @access{@KEYS} = ($line =~ /^(S+) - .../);
    • my %hits;
    • for my $key (@KEYS) {
    • $hits{$key}{ $access{$key} }++;
    • }
  • Line Editing
    • Remove rote stuff that the computer can figure out for you
      • Example:
      • $dbh->do(&quot;INSERT INTO perf (s, s1a, s1b, x1, x3, y1, y2, y3, zx, zx4, zx4a) VALUES ($s, $s1a, '$s1b', '$x1', $x2, $y1, $y2, $y3, $zx, '$zx4, '$zx4a')&quot;);
      • Arrgh
  • Line Editing
    • Try:
      • my $sth = $dbh->prepare(&quot;INSERT INTO perf (s, s1a, s1b, x1, x3, y1, y2, y3, zx, zx4, zx4a) VALUES (?,?,?,?,?,?,?,?,?,?,?,?)&quot;);
      • $sth->execute($s, $s1a, $s1b, $x1, $x3, $y1, $y3, $y2, $zx, $zx4, $zx4a);
  • Line Editing
    • Still too much work, too error-prone. Try:
      • $dbh->do(
      • make_insert(perf => keys %data),
      • undef, values %data);
      • sub make_insert
      • {
      • my ($table, @cols) = @_;
      • &quot;INSERT INTO $table (&quot;
      • . join(',' => @cols) . &quot;) VALUES (&quot;
      • . join(',' => ('?') x @cols) . &quot;)&quot;;
      • }
  • Line Editing
    • This wheel has been invented several times, e.g.:
      • use DBIx::Recordset;
      • # ...
      • DBIx::Recordset->insert( {
      • '!DataSource' => $dbh,
      • '!Table' => 'perf',
      • %data } );
  • Line Editing
    • Get rid of massive strings
    • Especially for HTML, use a templating system instead
      • HTML::Template works great, even for non-HTML
      • So does Text::Template and the Template Toolkit
  • use strict
    • Use it or get sand kicked in your face
    • Eliminate all errors in order to get the code to run
    • Eliminate unnecessary package variables
    • Declare lexical variables explicitly
    • Eliminate symbolic references
      • They’re hard to maintain anyway and just plain ugly
    • Turn strictness off with no strict
      • I’ve only ever needed no strict 'refs'
  • use warnings
    • Use it or get sand kicked in your face
      • Can use -w instead on older perls
      • On newer perls, use -W in testing to force warnings on across all modules
    • Leave warnings enabled in production
      • But if users might see the warnings, have them sent to you instead
      • Trap via $SIG{__WARN__} handler
  • Commonly Neglected Modules
    • Date:: * - look for unnecessary calls to date or cal
    • DBI, DBD:: * - look for unnecessary calls to database programs
    • LWP:: * - look for unnecessary calls to lynx , wget or GET