Low maintenance perl notes


Published on

These are my notes from my talk "Low-Maintenance Perl"

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Low maintenance perl notes

  1. 1. 3/3/12 No Title Low-Maintenance Perl "Optimizing for an Easy Life" How many times have you been reading some blog and seen something like this? "I would never use Perl for a large project." "Perl is write-only." "Perl is unmaintainable." Most likely this was coming from someone who has an agenda, and doesnt know much about Perl, but still, the idea gets out there. A typical response to this is that perl is no worse to maintain than any other code if you hire good programmers, and I believe this is true. However, this leads to an interesting question: What would these fabled good programmers do, in order to make their perl code more maintainable? [ Slide: "What Would Good Programmers Do?" ] Larry Wall famously said that Perl is intentionally flexible in its syntax in order to allow people to optimize for the things they care about -- performance, brevity (as in Perl golf), entertainment value, etc. Okay then, lets take a look at what we could do in order to optimize for maintability. I want code thats easy to write, easy to read, and easy to debug. I wont pretend to have all the answers, but I have worked on some large, long-term projects written in perl over the years, and Ill tell what has worked for me. Some Things I Shouldnt Have to Tell You Lets get something out of the way up front. Perl provides some very good tools for avoiding common mistakes, and if you arent using these, you really cant expect to have maintainable code. This should be second-nature by now for anyone working on code with an expected lifespan of more than 15 minutes. use strict; use warnings; There are no more excuses for not using these. I think of these as a directive to Perl meaning "I want this code to actually work." But Im sure you already use them, and you dont need to listen to me going on about them. This is a no-brainer. A somewhat less widely-used tool for keeping your code healthy is perltidy. [ Slide: before and after of tidied code ] If you havent heard of it, this is a code formatter for perl that supports a very configurable style. It used to be that if you wanted to have consistent formatting for all the code in a project, you had to write up some formatting standards and then go aorund rapping people on the knuckles for not following them. Besides wasting a lot of time, this tended to cause a dis-proportionate amount of team friction. With perltidy, consistent formatting takes no effort at all. In fact I save a lot of time by writing my code without bothering to format it and then tidying it when I pause to test it.file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html 1/10
  2. 2. 3/3/12 No Title Consistent formatting helps much more than you would imagine. It really removes some of the "alien" feeling of working on code written by someone else. And a big project means you will be working on other peoples code. Know Your Audience Okay, with the preliminaries out of the way, lets get down to some juicier stuff. Consider this quote, from Andy Hunt of "Pragmatic Programmers" fame: "Code is always read many more times than it is written." Seems pretty obvious, right? So why do we spend so much time worrying about how fast we can write it, when were going to be spending more time trying to read, understand, and possibly debug it? A programmer working on a large or long-lived piece of code is not so different from a newspaper writer working on an article. You want your writing to be elegant, but your primary goal is to convey some information in a way that will be clear to your reader. Newspapers know something about their target audience and they adjust the vocabulary and style of their writing to suit. Most US newspapers aim for around a 6th to 8th grade reading level. Who is your target audience? Ill tell you who its not. Its not you, fresh from a stroll through the perlfunc man page, with the full structure of your entire program clearly laid out in your mind. I like to think that my target audience is me, woken up on some emergency 2am phone call, still a little tipsy from the night before, trying to debug something over a flaky dial-up connection. I think that keeping this image in mind helps to curb daredevil coding. Choosing a Dialect Where am I going with this? Am I really suggesting that you should limit your Perl vocabulary in the name of code thats easier to maintain? I am. Ill admit that I have been called a crackpot for suggesting this, and you may agree by the time Im done talking, but maybe it will still give you some things to chew on. Limiting your vocabulary is a large part of optimization in a high-level language like Perl. If you want to optimize for speed, you might avoid certain file manipulation idioms, or make heavier use of the non-regex string testing functions. If you want to optimize for golf, youd lean on the punctuation variables and default values. Optimizing for poetry would mean choosing functions whose names have the most resonance in English. And so on. Optimizing Perl for easy maintenance is about making it harder to screw up, and quicker to understand, and therefore debug. The philosophy of my Perl dialect is based on five principles: Dont use something complex when something simple will work. This one seems terribly obvious, doesnt it? Nevertheless, many Perl programmers, when confronted with a simple problem, reach for an extreme, whiz-bang solution. It may be neat to use AUTOLOAD to build a couple of accessor methods, but its kind of like swatting a fly with a hand-grenade. Dont do things in a magical way when an explicit way will work.file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html 2/10
  3. 3. 3/3/12 No Title Somewhat related to the last one, Perl provides many features that modify something about the behavior of the base language, or allow for sneaky code (like tied variables) where what appears to be happening is very different from whats really happening. This is going to make your code harder to understand. Its also going to make the error messages that show up when someone makes a mistake very confusing. They thought they were doing something simple, and suddenly they have this crazy error message about missing overload magic. Perhaps more importantly, many of the magical features can lead to whats called "action at a distance" -- when some code in one section of a program modifies how code in a different part of the program behaves, possibly by accident. To Perls credit, many of the old magic-variable tricks that were the worst offenders in this area -- stuff like $[, which changes the index that arrays start at! -- are clearly discouraged now. Most of the features in programming languages for managing complexity have to do with isolating sections of code from each other, so that you dont have to keep the entire program in your head all the time just to write a line. This is a good thing, and we dont want to break it. Dont make your code complex, just so you can get a certain syntax. Some Perl programmers really obsess over syntax. They want function prototypes so their sub calls can look like built-ins. (Guess what? Theyre not built-ins.) They spend lots of energy using overloading and abusing import and indirect object syntax to get certain effects. For example, (and I dont mean to pick on anyone here, but I need to show you something to make this point) the web-scraping module FEAR::API allows you to say this: fetch("search.cpan.org") > my @cont; Thats roughly equivalent to this: my $scraper = FEAR::API->fear(); my $page = $scraper->fetch("search.cpan.org"); push my @cont, $page->document->as_string; Which one of these do you think will be more likely to break when someone uses it in an unexpected way? Which one do you think will give clearer error messages if it breaks? To be fair to the author of FEAR::API, he was trying to optimize for something totally different from what Im after: shorter code. Having less code is generally a good thing for maintability, but not when you have to tweak Perls syntax to do it, and not when it sacrifices readability. Follow common conventions when you can. There are few rules in Perl, but there are many things that people have come to expect, after seeing them again and again in books, documentation, and other peoples code. Every time you go against these, youre making the reader work harder. Dont do it when you dont have to. Take regex syntax, for example. Damian Conway made some good arguments for writing all your regexes with alternative delimiters (s{foo}{bar}) in his book Perl Best Practices. However, most Perl programmers have learned to instinctively think "regex" when they see those forward slashes (s/foo/bar/). It saves them some think time if you follow the conventions. Dont use an obscure language feature when a common one will work.file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html 3/10
  4. 4. 3/3/12 No Title Perl has some dark corners. There are features that many people either dont know about at all or have just never used. There are even some things that seem to work, but you wonder if they were really intended to. This quote from a PerlMonks member is relevant here: Dragonchilds Law "If I have to ask if something is possible on PerlMonks, I probably should rethink my design." Why avoid obscure features? Why not explore every nook and cranny of Perl? Obscure features are more likely to be used incorrectly. Life would be pretty dull if you never tried anything new, but maybe the critical path of your shipping cost calculator is not the ideal place to break out. The docs may not be as well-reviewed. If fewer people use the feature, fewer people feel qualified to review the docs. They may not be as good as the ones for commonly-used features. Peers may not have enough experience with them to give good advice. Your questions on mailing lists may be answered with a resounding silence, since fewer people actually know how to use the feature youre asking about. Obscure features are not as widely tested. Perl has a great test suite, but the most commonly used features get more real-world testing, by definition. Obscure features are more likely to change in future versions. I know, Perl has great backwards compatiblity. Still, some things just dont make the cut. Remember pseudo-hashes? If you built a whole bunch of code based on them, Ill bet you do. Or what about this evil static variable trick: my $foo = 1 if $bar; # like my $foo = 1 if 0 That now gives a warning in Perl 5.10, which is great, because Ive seen it cause some really hard to find bugs when people did it by accident. If you used this sneaky way of getting a static variable instead of one of the more obvious ones, youre going to be in trouble when Perl stops supporting this behavior (which is the plan). An Example Dialect Enough talk. You probably want to see where your pet feature falls in my low-maintenance dialect of Perl, so lets get on with it. Never [ Slide: Toaster oven, "Dont touch it! Its evil!" ]file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html 4/10
  5. 5. 3/3/12 No Title These are the features that I just plain avoid. Formats Nobody remembers how to use these. Just use sprintf or a templating module. Punctuation variables As mentioned before, these are major offenders in the action-at-a-distance area. They can change a programs behavior globally. Having said that, there are some you just cant avoid. If you want to read files efficiently, you may need the common slurp idiom: my $text = do { local $/ = undef; <$fh>; }; Just dont forget that local! At one place I worked, someone forgot to localize a change in $/ and ending up killing all the order processing on a major e-commerce site for a few hours. When you see these punctuation variables in code, you should be on yellow alert. import() functions that do something other than import I think maybe Test::More led a lot of people down the primrose path with this one. It looks really inviting to stick some kind of configuration directives in there. Most people wont understand whats going on though, and may accidentally break it, or be confused about alternative ways to call it. As an example, Catalyst has code in its synopsis that looks like this: use Catalyst qw/-Debug/; What do you think happens if you make that an empty list, maybe to save some memory by not importing anything? use Catalyst (); Your code blows up, because Catalyst was using that hook to add itself to the @ISA of your class. A neat trick, but totally unnecessary. If you just inherit from Catalyst in the normal way, it works fine. Prototypes What can I say that hasnt already been said? Lots of potential problems, all for some syntactic sugar. The Error module is a good example of the trouble prototypes can cause. The try/catch syntax that it supports has some very confusing behaviors (which are too long to go into here, but can be found by Googling) due to the use of code ref prototypes. It used to be a common cause of memory leaks because of this too, but that has been fixed in more recent versions of Perl. Indirect object syntax It can trip you up in various ways. Just get over it. Its Class->new() , not new Class . UNIVERSAL::file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html 5/10
  6. 6. 3/3/12 No Title Using the UNIVERSAL:: namespace is a pretty blatant violation of encapsulation. Shoving methods into other peoples classes is not nice, and seeing some other code calling methods on your own class that you know arent there will blow your mind when youre trying to debug something. Run-time @ISA manipulation Sometimes people decide that they dont like the way perl does inheritance, and the best solution is to mess with @ISA or otherwise change how inheritance works to suit them. Id agree that other inheritance schemes might be better, but the extra risk is just not worth it to me. Re-blessing existing objects bless, $object, Some::Other::Class; Its just asking for trouble, not to mention breaking encapsulation. Objects that are not hashes I know people will disagree with me on this one, but hashes are how objects are done in Perl. Using something else is going to make things more confusing for everyone who has to work on the code, and is likely to break some of their favorite tools and idioms. I do understand why some people like inside-out objects though, and I think that Jerry Heddens Object::InsideOut is your best option if you want them. overloading This is another feature that obscures whats going on, and when it bites you, it bites hard. Take a look at this example code using Exception::Class, my favorite exception module: use Exception::Class qw(MyProject::InsufficientKarma); # in some method, the exception is triggered MyProject::InsufficientKarma->throw(); # in the caller, we catch it with eval and then check the type if ($@ and $@->isa(MyProject::InsufficientKarma)) { Looks reasonable, right? It doesnt work. It doesnt work because Exception::Class overloads stringification, and since that throw() call didnt pass in a message, its testing for truth on an empty string. Thats an hour of my life that Ill never get back. You could argue (correctly) that I should have read the docs more closely, but who expects an exception to evaluate to false? Multiple packages in one file Have another file. Theyre free. And then I wont have to grep all the code looking for where on earth that package is hidden. Source filters Im sure youve heard the warnings. Theyre easy to break, and can play havoc with your code.file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html 6/10
  7. 7. 3/3/12 No Title use constant It causes too many problems with string interpolation, especially where stringification is less obvious, like with the fat comma operator. I just use normal package variables instead. # bad use constant TEA => Darjeeling; %beverages = (TEA => 1); # good our $TEA = Darjeeling; %beverages = ($TEA => 1); Rarely These are things I use very rarely, when Im out of other options. DESTROY methods The problem with DESTROY methods is that scoping in Perl is much more confusing than you realize, and if you put something important in a DESTROY it may not happen when you expect it to. Its not just a simple matter of lexical variables vs. package variables -- there are some strange corner cases with returns from blocks and the like that are really counter-intuitive. Tim Bunce showed us some doozies on the Class::DBI mailing list. The module Apache::Session is a poster child for relying too much on DESTROY. It has a tied hash for an interface and uses the DESTROY method to save all changes and release locks. Sounds pretty simple, but Perl mailing lists are full of anguished souls asking why their data is not getting written to the session. An explicit method to commit changes would be a lot less error-prone. Weak refs There are some problems that are really hard to solve without them, but once you decide to use them you have a whole new set of things to worry about, like how to handle refs that point to objects that are gone. This is not a feature to use lightly. AUTOLOAD I was a huge fan of AUTOLOAD when I first found it. It seemed like the hammer for all of my nails. Some things really are a lot easier with AUTOLOAD, but dont be too quick on the draw. Writing an AUTOLOAD that handles failures well can be a challenge, and the invisibility of the magic AUTOLOAD methods will break things like can(). As chromatic has pointed out before, you can write more code to make can() work again, but I consider that the programming equivalent of throwing good money after bad. If you dont break it, you dont have to fix it. Tied variables tie is another feature that sounds great, until you try to use it for something important. Then you discover all the caveats and limitations, and you get into confusing questions about whether you should be passing around references to the tied variable or the underlying object. The nail in the coffin is that its actually slower than just calling methods directly. I think the fundamental problem with tie is that its a feature that is intentionally misleading. Itfile://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html 7/10
  8. 8. 3/3/12 No Title obscures whats really happening, and thats exactly what were trying to get away from. Sometimes These are things I use sometimes, but try hard not to overuse, since I consider them accident-prone. Closures Sorry, but only MJD really knows how to use them right. Most of the little things I see people use closure variables for could just be done with package variable, and theyd be more obvious to most readers. Sub refs Not to pick on you Higher-Order Perl fans. I know you love them, but dont overruse them. I usually use them when I have a variation in behavior that seems too small to make a set of classes for. If they start to become the focal point of the code, or I need several of them to make something work, I switch to an OO design. String eval Sometimes you have to use it, but its another yellow alert. Not something to do lightly. Subroutine attributes I use these a bit, in things like Test::Class, and they seem pretty reliable now. I just try to avoid situations where theres anything more than very simple data in them. In most cases, a little configuration hash would work just as well. Exported subs Those short subroutine calls look nice, but they just dont play well with OO code. Ive had to go back and change things to class methods enough times that now I just start them out that way. Chained map/grep These are too useful to simply avoid, but those big chains of them that you have to read backwards are a nightmare. wantarray I know, people love their contextual returns. I prefer consistent return values. First, because wantarray makes testing your code a pain -- you have to test things in each context! Second, it can cause lots of hard-to-spot bugs like this Class::DBI one: @books = Book->search(author => $author) || die "book not found"; Class::DBI returns an iterator object in scalar context, and the || forces scalar context, breaking this code. It works if you use or instead. I think most experienced coders know the contextual returns of the built-ins very well, but thats no reason to make your own code harder to use.file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html 8/10
  9. 9. 3/3/12 No Title ternary operator Its just an uglier, harder to read version of if. Ill only use it when the test and results are very short. And that chained stuff makes my eyes bleed. $_ Last, but not least. Sure, its neat that Perl has pronouns, and its unavoidable if you want to use map, but its easy to break (by doing something that modifies $_ as a side effect), and almost always harder to read than a real variable name. Some Anticipated Questions Lets answer a few questions that you might have about my approach. Doesnt this take all the fun out of programming? No, it doesnt. Next question. Okay, it does possibly shift the level you work at a bit, from focusing on being clever on individual lines to being clever in the larger scheme. Theres a saying that I like about writing. It goes like this: When you write poem, you work at the word level. When you write a short story, you work at the sentence novel. When you write a novel, you work at the paragraph level. A large coding prohect is a novel. The higher-level view in programming is that of interface design, of data structures, of groups of classes, and finally of entire systems. If you keep the low-level stuff simple, you have more time to think about these things. Wont this make your code longer? Probably. Although writing less code is usually a good thing, there is a balance. Generally sound advice like "Dont Repeat Yourself" can sometimes lead people to obsessively shorten their code with techniques that make it more complex. Do you really need to run regexes on your POD to parse out the names of the arguments your script takes, just to avoid listing them again in your code? Probably not. But AUTOLOAD is awesome! Agreed. AUTOLOAD is awesome. Every time I drink a 40, I pour a little on the ground for my old friend AUTOLOAD. But I still dont use it. Why dont you just use Java? Who let that guy in here? I have used Java, and I mostly like it, but I use Perl when I can for probably the same reason other people do: it lets me get things done faster. But its not enough to get things done -- you need them to stay done, by writing them in a way that you can maintain when other things change around them.file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html 9/10
  10. 10. 3/3/12 No Title Beyond the Code I think were about out of time, but let me just briefly mention a few huge topics, in order to get credit for having thought about them. Configuration Management You need configuration management. You need a predictable environment for your code to run in. You need to control your perl version, web server, database version, CPAN module versions, and probably more. Without that, youll spend all your time debugging strange compile problems that you cant reproduce on your own system. If you are installing all your dependencies by just grabbing the latest with the CPAN shell, this means you. Revision Control with Branches Its not enough to just have source code control. If you arent using multiple branches, how can you work on a new feature that will take two months while also fixing bugs on your released version? Not in any good way. Perl::Critic This is a really interesting module that could form the basis for evaluating how close some source code is to your own local dialect. I havent tried doing it yet, but its on my list. Tests Can Save Your Life Over the past few years, its become clear to me that large-scale coding is essentially impossible without automated tests. Test::Class Can Save Your Tests Your test code is important, and if youre doing it right, there will be lots of it. Dont let it turn into a mess. Test::Class allows you to write test code with better organization and code reuse. Smolder One of my co-workers, Michael Peters, wrote this great smoke-testing server that you can use. It makes pretty graphs and has some slick AJAX features that make it more fun than your smoke tester. http://sourceforge.net/projects/smolder/file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html 10/10