A Beginners Introduction to Perl Web Programming       By chromaticSeptember 5, 2008 | Comments: 17So far, this series has...
For a complete list of Perl books, visit the                       Perl topic page in the OReilly Store.Though this may ta...
say $q->escapeHTML($value);        }        say </p>;    }    say $q->end_html();    Some of this syntax may look new to y...
program. Its easiest to give these permissions to everybody by using chmod      filename 755.•     Make a note of the prog...
CGI programs are unruly beasts at the best of times; dont worry if it takes a bit ofwork to make them run properly.If your...
If you call param() without giving it the name of a form item, it will return a list of allthe form items that are availab...
verb number agreement. That is, its obvious whats wrong with this sentence: Perlare a nice language!. The subject, Perl, i...
(Remember that you may need to change the URL!)This HTML page contains two different types of form item in this HTML page....
my @type_keys;    # Are we sorting by the KEY, or by the NUMBER of accesses?    if ( param(number) eq ALL ) {        @type...
In fact, numeric sorting happens so often, Perl gives you a convenient shorthand forit: the <=> (spaceship) operator. This...
You saw that you can modify HTML forms when you pasted the pizza-toppingsample code into the backatcha page. You can also ...
Dont just take my word for it, though. The CGI Security FAQ has more information    about safe CGI programming in Perl tha...
Upcoming SlideShare
Loading in …5
×

Perl web programming

8,497 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
8,497
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Perl web programming

  1. 1. A Beginners Introduction to Perl Web Programming By chromaticSeptember 5, 2008 | Comments: 17So far, this series has talked about Perl as a language for mangling numbers,strings, and files -- the original purpose of the language. (A Beginners Introductionto Perl 5.10, A Beginners Introduction to Files and Strings with Perl 5.10, and ABeginners Introduction to Perl Regular Expressions) covered flow control, mathand string operations, and files. Now its time to talk about what Perl does on theWeb. This installment discusses CGI programming with Perl.What is CGI?The Web uses a client-server model: your browser (the client) makes requests of aWeb server. Most of these are simple requests for documents or images, which theserver delivers to the browser for display.Sometimes you want the server to do more than just dump the contents of a file.Youd like to do something with a server-side program -- whether that "something" isreading and sending e-mail, looking up a phone number in a database, or ordering acopy of Perl Best Practices for your favorite techie. This means the browser mustbe able to send information (an e-mail address, a name to look up, shippinginformation for a book) to the server, and the server must be able to use thatinformation and return the results to the user.The standard for communication between a users Web browser and a server-sideprogram running on the Web server is called CGI, or Common Gateway Interface. Allpopular web server software supports it. To get the most out of this article, you willneed to have a server that supports CGI. This may be a server running on yourdesktop machine or an account with your ISP (though probably not a free Web-pageservice). If you dont know whether you have CGI capabilities, ask your ISP or a localsysadmin how to set things up.Notice that I havent described how CGI works; thats because you dont need toknow. The standard Perl module CGI handles the protocol for you. This module ispart of the core Perl distribution; any properly installed Perl should have it available.Telling your CGI program that you want to use the CGI module is as simple as:use CGI;CGI versus Everything ElseYou may have heard that "CGI is slow" or "Perl is slow" for web programming. (A similarassertion is "Perl doesnt scale".) While CGI technically describes how server-side languagescan send and receive information to and from clients, people often mean that the executionmodel associated with standalone CGI programs can be slow. Traditionally, a web serverlaunches a new process to handle CGI requests. This often means loading Perl andrecompiling the program for each incoming request.
  2. 2. For a complete list of Perl books, visit the Perl topic page in the OReilly Store.Though this may take fractions of a second, if you have hundreds of thousands ofrequests a day (or hundreds of requests within the span of a few minutes), you maynotice that the overhead of launching new processes is significant. Other executionmodels exist, from embedding Perl in the web server (mod_perl) to running yourPerl program as a persisten application and talking to it through another protocol(FastCGI).CGI programming is still worth your time learning for two reasons. First,understanding the webs model of client-server programming and the way Perl fitsinto the model is important to all models of web programming with Perl. Second,persistence or acceleration models can be more complex in some ways -- and itslikely that your first few server-side Perl programs will need the advanced features ofthe other execution models.A Real CGI ProgramIts time to write your first real CGI program. Instead of doing something complex,how about something that will simply echo back whatever you throw at it. Call thisprogram backatcha.cgi:#!/usr/bin/perl -Tuse 5.010;use CGI;use strict;use warnings;my $q = CGI->new();say $q->header(), $q->start_html();say "<h1>Parameters</h1>";for my $param ($q->param()) { my $safe_param = $q->escapeHTML($param); say "<p><strong>$safe_param</strong>: "; for my $value ($q->param($param)) {
  3. 3. say $q->escapeHTML($value); } say </p>; } say $q->end_html(); Some of this syntax may look new to you: in particular, the arrow operator (->). When used here, it represents a method call on an object. Object oriented programming can be a deep subject, but using objects and methods is relatively simple. An object (contained in $q in this example, and returned from CGI->new()) is a self- contained bundle of data and behavior. Think of it like a black box, or a little chunk of a program. You communicate with that object by sending it messages with the -> operator. Messages work a lot like functions: they have names, they can take arguments, and they can return values. (In fact, their definitions look almost identical to Perl functions. They have two subtle differences, which is why they have a different name: methods. Calling a method and sending a message are basically the same thing.) Thus: $q->header() ... sends the header() message to the CGI object in $q, which performs some behavior and returns a string. (In this case, a valid HTTP header per the CGI protocol.) Later in the program, the $q->param() and $q->param( $param ) messages appear. By now, you should be able to guess at what they return, even if you dont know how they work or why. If youve paid close attention, you may have noticed that CGI->new() follows the same form. In this case, it calls the new() method on something referred to by CGI, which returns a CGI object. This explanation is deliberately vague, because theres a little more to it than that, but for now all you need to know is that you can send messages to $q named as methods in the CGI documentation. If youve never used HTML, the pair of <strong> and </strong> tags mean "begin strong emphasis" and "end strong emphasis", respectively. (A good paper reference to HTML is OReillys HTML & XHTML: The Definitive Guide, and online, I like the Web Design Group.) One method you may not have seen in other tutorials is escapeHTML(). There are a lot of subtleties to why this is necessary; for now its enough to say that displaying anything which comes from a client directly to the screen without escaping, validation, or other scrubbing represents a very real security hole in your application. If you start now by thinking that all incoming data needs careful thought and analysis, you will prevent many unpleasant surprises later. Install this program on your server and do a test run. Heres where the real test starts; understanding how to set up a CGI program on your server can be frustrating. Heres a short list of the requirements:• Place the program where your Web server will recognize it as a CGI program. This may be a special cgi-bin directory. Alternately (or even additionally), make sure the programs filename ends in .pl or .cgi. If you dont know where to place the program, your ISP or sysadmin should.• Make sure the web server can run the program. If you are using a Unix system, you may have to give the Web server user read and execute permission for the
  4. 4. program. Its easiest to give these permissions to everybody by using chmod filename 755.• Make a note of the programs URL, which will probably be something like http://server name/cgi-bin/backatcha.cgi) and go to that URL in your browser. (Take a guess what you should do if you dont the URL of the program is. Hint: It involves the words "ask," "your" and "ISP.") If this works, you will see in your browser only the word "Parameters". Dont worry, this is what is supposed to happen. The backatcha.cgi program throws back what you throw at it, and you havent thrown anything at it yet. Itll show more in a moment. If it didnt work, you probably saw either an error message or the source code of the program. These problems are common, and you need to learn how to solve them. Uh-Oh! If you saw an error message, your Web server had a problem running the CGI program. This may be a problem with the program or the file permissions. First, are you sure the program has the correct file permissions? Did you set the file permissions on your program to 755? If not, do it now. (Windows Web servers will have a different way of doing this.) Try it again; if you see a blank page now, youre good. Second, are you sure the program actually works? (Dont worry, it happens to the best of us.) Change the use CGI line in the program to read: use CGI -debug; Now run the program from the command line. You should see: (offline mode: enter name=value pairs on standard input) This message indicates that youre testing the program. You can now press Ctrl-D to tell the program to continue running without telling it any form items. If Perl reports any errors in the program, you can fix them now. (The -debug option is incredibly useful. Use it whenever you have problems with a CGI program. Ignore it at your peril.) The other common problem is that youre seeing the source code of your program, not the result of running your program. There are two simple problems that can cause this. First, are you sure youre going through your Web server? If you use your browsers "load local file" option (to look at something like /etc/httpd/cgi-bin/backatcha.cgi instead of something like http://localhost/cgi-bin/backatcha.cgi), you arent even touching the Web server! Your browser is doing what you "wanted" to do: loading the contents of a local file and displaying them. Second, are you sure the Web server knows its a CGI program? Most web servers have a special way of designating a file as a CGI program, whether its a special cgi- bin directory, the .cgi or .pl extension on a file, or something else. Unless you live up to these expectations, the Web server will think the program is a text file, and serve up your programs source code in plaintext form. Ask your ISP for help.
  5. 5. CGI programs are unruly beasts at the best of times; dont worry if it takes a bit ofwork to make them run properly.If youre still having problems with errors, consult your servers error log. On Unix-likesystems, with Apache httpd, look for a file called error_log.If you dont have access to this file (or cant find it), add one more line to the start ofyour program:use CGI::Carp fatalsToBrowser;This core module redirects error messages away from the error log to the client, sothat theyll appear in your web browser where you can read them. As you mightexpect, this is suboptimal behavior when running a serious, public-facing application.Its fine for debugging -- just be sure to remove it when your application goes live.Making the Form Talk BackAt this point, you should have a working copy of backatcha.cgi spitting out nearly-blank pages. Want it to tell you something? Save this HTML code to a file:<form action="putyourURLhere" method="GET"> <p>What is your favorite color? <input name="favcolor" /></p> <input type=submit value="Send form" /></form>Be sure to replace putyourURLhere with the actual URL of your copy ofbackatcha.cgi!This is a simple form. It will show a text box where you can enter your favorite colorand a "submit" button that sends your information to the server. Load this form inyour browser and submit a favorite color. You should see this returned from theserver:favcolor: greenCGI MethodsThe CGI module provides several methods to CGI objects, as mentioned earlier.What are these methods?The first one, header(), produces the necessary HTTP headers before the programcan display HTML output. Try taking this line out; youll get an error from the Webserver when you try to run it. This is another common source of bugs!The start_html() method is there for convenience. It returns a simple HTML headerfor you. You can pass parameters to it by using a hash, like this:print $q->start_html( -title => "My document" );(The end_html() method is similar, but outputs the footers for your page.)Finally, the most important CGI method is param(). Call it with the name of a formitem, and youll get a list of all the values of that form item. (If you ask for a scalar,youll only get the first value, no matter how many there are in the list.)my $name = $q->escapeHTML( $q->param(firstname) );say "<p>Hi, $name!</p>";
  6. 6. If you call param() without giving it the name of a form item, it will return a list of allthe form items that are available. This form of param() is the core of the backatchaprogram:for my $value ($q->param($param)) { say $q->escapeHTML($value);}Remember, a single form item can have more than one value. You might encountercode like this on the Web site of a pizza place that takes orders over the Web:<p>Pick your toppings!<br /> <input type="checkbox" NAME="top" VALUE="pepperoni"> Pepperoni <br /> <input type="checkbox" NAME="top" VALUE="mushrooms"> Mushrooms <br /> <input type="checkbox" NAME="top" VALUE="ham"> Ham <br /></p>Someone who wants all three toppings would submit a form where the form item tophas three values: pepperoni, mushrooms, and ham. The server-side code mightinclude:say "<p>You asked for the following pizza toppings: ";for my $top ($q->param( top )) { say $q->escapeHTML($top), . ;}say "</p>";Heres something to watch out for. Take another look at the pizza-topping HTMLcode. Try pasting that little fragment into the backatcha form, just above the <inputtype="submit"...> tag. Enter a favorite color, and check all three toppings. Youll seethis:favcolor: burnt siennatop: pepperonimushroomshamWhy did this happen? When you call $q->param(name), you get back a list of all ofthe values for that form item. (Why? Because the call is in list context, thanks to thesay operator which starts the entire expression.) This could be a bug in thebackatcha.cgi program, but its easy to fix by using join() to separate the item values:say "<p><strong>$param</strong>: ", join(, , map { $q->escapeHTML( $_ ) } $q->param($param)), "</p>";... or call $q->param() in a scalar context first to get only the first value:my $value = param($param);say "$param: $value";Always keep in mind that form items can have more than one value!Okay, I lied about the list form being easy. Your eyes may have crossed as youwonder what exactly that map block does, and why I made you read it. This isactually a great time to discuss a very clever and useful part of Perl.Remember how that code exists to handle a list of values? I explained earlier thatthe param() method returns a list of values when you want a list of values, and asingle value when you want a single value. This notion of context is pervasive inPerl. It may sound like a strange notion, but think of it linguistically in terms of noun-
  7. 7. verb number agreement. That is, its obvious whats wrong with this sentence: Perlare a nice language!. The subject, Perl, is singular and so the verb, to be, shouldalso be singular. Getting to know Perl and its contexts means understanding whichcontexts are list contexts (plural) and which contexts are scalar contexts (singular).What about that map though? Think of it as a device for transforming one list intoanother, sort of a pipeline. You can drop it in anywhere you have a list to perform thetransformation. Its equivalent in behavior to:my @params = $q->param( $param );my @escaped_params;for my $p (@params){ push @escaped_params, $q->escapeHTML( $p );}say "<p><strong>$param</strong>: ", join(, , @escaped_params), "</p>";... but its significantly shorter. You can safely ignore the details of how it works for afew minutes.Your Second ProgramNow you know how to build a CGI program, thanks to a simple example. How aboutsomething useful? The previous article showed how to build a pretty good HTTPlog analyzer. Why not Web enable it? This will allow you to look at your usagefigures from anywhere you can get to a browser.Before starting on the revisions, decide what to do with the analyzer. Instead ofshowing all of the reports generated at once, show only those the user selects.Second, let the user choose whether each report shows the entire list of items, or thetop 10, 20 or 50 sorted by access count.The user interface can be a simple form:<form action="/cgi-bin/http-report.pl" method="post"> <p>Select the reports you want to see:</p> <p><input type="checkbox" name="report" value="url" />URLs requested<br / /> <input type="checkbox" name="report" value="status" />Status codes<br /> <input type="checkbox" name="report" value="hour" />Requests by hour<br /> <input type="checkbox" name="report" value="type" />File types</P> <p><select name="number" /> <option value="ALL">Show all</option> <option value="10">Show top 10</option> <option value="20">Show top 20</option> <option value="50">Show top 50</option> </select></p><input TYPE="submit" value="Show report" /></form>
  8. 8. (Remember that you may need to change the URL!)This HTML page contains two different types of form item in this HTML page. One isa series of checkbox widgets, which set values for the form item report. The other isa single drop-down list which will assign a single value to number: either ALL, 10, 20or 50.Take a look at the original HTTP log analyzer. Start with two simple changes. First,the original program gets the filename of the usage log from a command-lineargument:# We will use a command line argument to determine the log filename.my $logfile = shift;This obviously cant work, because the Web server wont allow anyone to enter acommand line for a CGI program! Instead, hard-code the value of $logfile. Ive used /var/log/httpd/access_log as a sample value.my $logfile = /var/log/httpd/access_log;Second, make sure that you output all the necessary headers to the web serverbefore printing anything else:my $q = CGI->new();say $q->header();say $q->start_html( -title => "HTTP Log report" );Now look at the report() sub from the original program. It has one problem, relative tothe new goals: it outputs all the reports instead of only the selected ones weveselected. Its time to rewrite report() so that it will cycle through all the values of thereport form item and show the appropriate report for each.sub report { my $q = shift; for my $type ( $q->param(report) ) { my @report_args; given ($type) { when (url) { @report_args = ( "URL requests", %url_requests ) } when (status) { @report_args = ( "Status code requests",%status_requests ) } when (hour) { @report_args = ( "Requests by hour", %hour_requests ) } when (type) { @report_args = ( "Requests by file type", %type_requests ) } } report_section( $q, @report_args ); }}You probably havent seen given/when before. It works like you might expect fromreading the code out loud. Given a variable or expression, when its a specific value,perform the associated action. When the report type is url, produce the "URLrequests" section of the report.Finally, rewrite the report_section() sub to output HTML instead of plain text.sub report_section { my ( $q, $header, %types ) = @_;
  9. 9. my @type_keys; # Are we sorting by the KEY, or by the NUMBER of accesses? if ( param(number) eq ALL ) { @type_keys = sort keys %type; } else { my $number = $q->param( number ); @type_keys = sort { $type{$b} <=> $type{$a} } keys %type; # truncate the list if we have too many results splice @type_keys, $number if @type_keys > $number; } # Begin a HTML table say "<table>n"; # Print a table row containing a header for the table say <tr><th colspan="2">, $header, </th></tr>; # Print a table row containing each item and its value for my $key (@type_keys) { say "<tr><td>", $i, "</td><td>", $type{$i}, "</td></tr>n"; } # Finish the table print "</table>n";}SortingPerl allows you to sort lists with the sort keyword. By default, the sort will happenalphanumerically: numbers before letters, uppercase before lowercase. This issufficient 99 percent of the time. The other 1 percent of the time, you can write acustom sorting routine for Perl to use.This sorting routine is just like a small sub. In it, you compare two special variables,$a and $b, and return one of three values depending on how you want them to showup in the list. Returning -1 means "$a should come before $b in the sorted list," 1means "$b should come before $a in the sorted list" and 0 means "theyre equal, so Idont care which comes first." Perl will run this routine to compare each pair of itemsin your list and produce the sorted result.For example, if you have a hash called %type, heres how you might sort its keys indescending order of their values in the hash.sort { return 1 if $type{$b} > $type{$a}; return -1 if $type{$b} < $type{$a}; return 0;} keys %type;
  10. 10. In fact, numeric sorting happens so often, Perl gives you a convenient shorthand forit: the <=> (spaceship) operator. This operator will perform the above comparisonbetween two values for you and return the appropriate value. That means you canrewrite that test as:sort { $type{$b} <=> $type{$a}; } keys %typeYou can also compare strings with sort. The lt and gt operators are the stringequivalents of < and >, and cmp will perform the same test as <=>. (Remember,string comparisons will sort numbers before letters and uppercase beforelowercase.)For example, you have a list of names and phone numbers in the format "John Doe555-1212." You want to sort this list by the persons last name, and sort by first namewhen the last names are the same. This is a job made for cmp!my @sorted = sort { my ($left_surname) = ($a =~ / (w+)/); my ($right_surname) = ($b =~ / (w+)/); # Last names are the same, sort on first name if ($left_surname eq $right_surname) { my ($left_first) = ($a =~ /^(w+)/); my (right_first) = ($b =~ /^(w+)/); return $left_first cmp $right_first; } else { return $left_surname cmp $right_surname; }} @phone_numbers;say $_ for @sorted;If you look closely at the regexp assignment lines, youll see list context. Where? Theparentheses around the variable name are not just there for decoration; they group asingle scalar into a one-element list, which is sufficient to provide list context on theright-hand side of the assignment.In scalar context (without the parentheses), the regular expression returns thenumber of matches. In list context (as written), it returns the captured values. Thusthis is the Perl idiom for performing a regexp match and capture and assignment in asingle line.Trust No OneNow that you know how CGI programs can do what you want, you need to makesure they wont do what you dont want. This is harder than it looks, because youcant trust anyone to do what you expect.Heres a simple example: You want to make sure the HTTP log analyzer will nevershow more than 50 items per report, because it takes too long to send larger reportsto the user. The easy thing to do would be to eliminate the "ALL" line from the HTMLform, so that the only remaining options are 10, 20, and 50. It would be very easy --and wrong.Download the source code for the HTTP analyzer with security enhancements.
  11. 11. You saw that you can modify HTML forms when you pasted the pizza-toppingsample code into the backatcha page. You can also use the URL to pass form itemsto a program -- try going to http://example.com/backatcha.cgi?itemsource=URL&typedby=you in your browser. Obviously, if someone can do thiswith the backatcha program, they can also do it with your log analyzer and stick anyvalue for number in that they want: "ALL" or "25000", or "four score and seven yearsago."Your form doesnt allow this, you say. Who cares? People will write custom HTMLforms to exploit weaknesses in your programs, or will just pass bad form items toyour program directly. You cannot trust anything users or their browsers tell you.They might not even use a browser at all -- anything which can speak HTTP cancontact your program, regardless of whether its even ever seen your form before (orcares what your form allows and disallows).Eliminate these problems by knowing what you expect from the user, anddisallowing everything else. Whatever you do not expressly permit is totallyforbidden. Secure CGI programs consider everything guilty until it is made innocent.For example, you want to limit the size of reports from the HTTP log analyzer. Youdecide that means the number form item must have a value that is between 10 and50. Verify it like:# Make sure that the "number" form item has a reasonable value my ($number) = ($q->param(number) =~ /(d+)/); if ($number < 10) { $number = 10; } elsif ($number > 50) { $number = 50; }Of course, you also have to change the report_section() sub so it uses the $numbervariable. Now, whether your user tries to tell your log analyzer that the value ofnumber is "10," "200," "432023," "ALL" or "redrum," your program will restrict it to areasonable value.You dont need to do anything with report, because it only acts when one of itsvalues is something expected. If the user tries to enter something other than theexpressly permitted values ("url," "status," "hour" or "type"), the code just ignores it.Do note that report_section is a little smarter to avoid printing nothing when theresnothing to print. If the user entered an invalid value, report will call report_sectionwith only the CGI object $q, and the latter sub will return early, without printinganything.Use this sort of logic everywhere you know what the user should enter. You mightuse s/D//g to remove non-numeric characters from items that should be numbers(and then test to make sure whats left is within your range of allowable numbers!), or/^w+$/ to make sure that the user entered a single word.All of this has two significant benefits. First, you simplify your error-handling code,because you make sure as early in your program as possible that youre workingwith valid data. Second, you increase security by reducing the number of"impossible" values that might help an attacker compromise your system or messwith other users of your Web server.
  12. 12. Dont just take my word for it, though. The CGI Security FAQ has more information about safe CGI programming in Perl than you ever thought could possibly exist, including a section listing some security holes in real CGI programs. Play Around! You should now know enough about CGI programming to write a useful Web application. (Oh, and you learned a little bit more about sorting and comparison.) Now for some assignments:• Write the quintessential CGI program: a guestbook. Users enter their name, e-mail address and a short message. Append these to an HTML file for all to see. Be careful! Never trust the user! A good beginning precaution is to disallow all HTML by either removing < and > characters from all of the users information or replacing them with the &lt; and &gt; character entities. The escapeHTML method in the CGI module is very good for this. Use substr(), too, to cut anything the user enters down to a reasonable size. Asking for a "short" message will do nothing to prevent the user dumping a 500k file into the message field!• Write a program that plays tic-tac-toe against the user. Be sure that the computer AI is in a sub so it can be easily upgraded. (Youll probably need to study HTML a bit to see how to output the tic-tac-toe board.)

×