Quick Upload

Practical Web Programming

from brian_d_foy, 9 months ago Add as contact

747 views | 0 comments | 0 favorites | 0 embeds (Stats)

This is a Stonehenge Consulting Services (www.stonehenge.com) talk from 2002. It's content may not be current, and it's techniques may have been superceded by better ones.

Embed customize close
 

More Info

This slideshow is Public

Views: 747 Comments: 0 Favorites: 0 Downloads: 0

View Details: 747 on Slideshare
0 from embeds
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this slideshow as inappropriate.

If needed, use the feedback form to let us know more details.

Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer. To install it, go here.
Post to Twitter Post to Twitter
Share on Facebook Share on Facebook
Post to Blogger Post to Blogger
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons

Presentation Transcript

  1. Slide 1: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM Practical Web Programming a very short course by Randal L. Schwartz Stonehenge Consulting Services Version 1.5.0 (7/30/02)[S] Copyright ©1998—2002by Randal L. Schwartz, Stonehenge Consulting Services, Inc. Page 1 of 1
  2. Slide 2: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM Table of Contents Introduction 2 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 What this course is about . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 CASE STUDY: returning a long response with “more...” 4 CASE STUDY: The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 stateful transactions via a single-threaded mini-webserver The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 38 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 CASE STUDY: The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 making searchable web pages 13 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 CASE STUDY: The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 stateful transactions via a multi-threaded mini-webserver The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 45 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 CASE STUDY: The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 when an action takes a long time 20 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 CASE STUDY: The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 an anonymous proxy server 53 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 CASE STUDY: The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 handling multi-page forms 28 Page 1 of 2
  3. Slide 3: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 CASE STUDY: CASE STUDY: embedding dynamic graphics in your CGI output 90 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 reorganizing data from other servers 61 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 CASE STUDY: CASE STUDY: serializing an expensive CGI script 98 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 finding out how they got here 68 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 CASE STUDY: CASE STUDY: basic session management with cookies 103 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 finding out where they went from here 75 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 The code (part one: handle the redirections) . . . . . . . . . . . . . . . . . . . 78 CASE STUDY: The code (part two: rewriting existing pages) . . . . . . . . . . . . . . . . . . 80 one-click processing 112 CASE STUDY: The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 extracting a portion of website to a tar file 82 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Conclusion 119 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Questions and answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Page 2 of 2
  4. Slide 4: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM Practical Web Programming a very short course by Randal L. Schwartz Stonehenge Consulting Services Version 1.5.0 (7/30/02)[S] Copyright ©1998—2002 by Randal L. Schwartz, Stonehenge Consulting Services, Inc. Page 1 of 120
  5. Slide 5: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM Introduction Introduction Page 2 of 120
  6. Slide 6: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM What this course is about • Practical web programming—simple web-related techniques illustrated in case-study form • Not the only way to do it (remember the Perl motto!) • Some examples are updated versions of programs I’ve written for WebTechniques and other magazines • For more examples like this, see the column archives at http://www.stonehenge.com/merlyn/columns.html • Each presented as “problem...solution” form • The CPAN is your friend—nearly all examples here use CGI.pm and the LWP library—definitely “must haves” for any serious web programming • Most of these programs will run as coded, but they’re really meant more to be representative—steal the technique, not the code! Introduction Page 3 of 120
  7. Slide 7: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: returning a long response with “more...” CASE STUDY: returning a long response with “more...” Page 4 of 120
  8. Slide 8: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You’ve got a query page, and the result could return a lot of hits • If you returned them all as one page, it might take uncomfortably long to download • This will probably lead to a user-initiated abort on a slow link (at least, it does for me) CASE STUDY: returning a long response with “more...” Page 5 of 120
  9. Slide 9: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Do the query, then return the hits in a series of pages • On the first page, show the first 25 (or whatever) hits • Include a link to later hits • Later pages can also include a link to earlier hits • The links actually call back to the same script with a session number and starting point • The session number is a pointer to data in /tmp or whatever (should be cleaned up later) • For our example, we’ll extract entries from /usr/dict/words that match a pattern CASE STUDY: returning a long response with “more...” Page 6 of 120
  10. Slide 10: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use CGI::Carp qw(fatalsToBrowser); my $MAXHITS = 25; # constant: number of hits per page my $TMP = \"/tmp/more.\"; # constant: location of session files use HTML::Entities qw(encode_entities); use CGI \":all\"; my $session; # global: session-ID my $search; # global: search string my @found; # global: array of valid hits CASE STUDY: returning a long response with “more...” Page 7 of 120
  11. Slide 11: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM print header, start_html( 'So you want more???', 'merlyn@stonehenge.com' ); print h1(\"Query the dictionary\"); if ($session = param('session')) { ## we are in the midst of a session &load_session(); # sets $search, @found &display(param('start')); } elsif ($search = param('search')) { ## we are beginning the query ## perform the query, and set up for session if necessary open WORDS,\"/usr/dict/words\"; chomp(@found = grep /$search/o, <WORDS>); close WORDS; $session = unpack(\"H*\", pack(\"Nn\", time, $$)); # 12 hex chars CASE STUDY: returning a long response with “more...” Page 8 of 120
  12. Slide 12: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM &save_session(); &display(0); } else { ## we are being invoked initially ## print the basic search form print hr, startform, p, \"Search for:\", textfield('search'), submit('Search'), endform, hr; } print end_html; exit 0; sub load_session { CASE STUDY: returning a long response with “more...” Page 9 of 120
  13. Slide 13: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM open TMP, \"<$TMP$session\" or die \"missing session file $TMP$session: $!\"; chop(($search, @found) = <TMP>); close TMP; } sub save_session { open TMP,\">$TMP$session\" or die \"Cannot create $TMP$session: $!\"; print TMP map \"$_\\n\", $search, @found; close TMP; } sub display { my $start = shift; # where to start (undef/0 if beginning) print \"You are searching for: \", encode_entities($search), \"\\n\"; my $low = $start; CASE STUDY: returning a long response with “more...” Page 10 of 120
  14. Slide 14: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM ## sanity checking... won't happen unless user fakes us out $low = 0 if ($low || 0) <= 0; $low = $#found if $low > $#found; my $high = $low + $MAXHITS - 1; $high = $#found if $high > $#found; print br, \"Hits \", $low + 1, \"..\", $high + 1, \" (of \".@found.\") hits:\\n\", pre(join \"\\n\", map { encode_entities($_) } @found[$low..$high]), hr; if ($high < $#found) { print br, a({Href => join \"\", script_name(), \"?session=$session&start=\", $low + $MAXHITS}, \"See next $MAXHITS hits...\"); CASE STUDY: returning a long response with “more...” Page 11 of 120
  15. Slide 15: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM } if ($low > 0) { print br, a({Href => join \"\", script_name(), \"?session=$session&start=\", $low - $MAXHITS}, \"See previous $MAXHITS hits...\"); } } CASE STUDY: returning a long response with “more...” Page 12 of 120
  16. Slide 16: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: making searchable web pages CASE STUDY: making searchable web pages Page 13 of 120
  17. Slide 17: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You’ve got a lot of information on a series of similar web pages • You can provide interesting table of contents pages • But people sometimes want to search for words of interest CASE STUDY: making searchable web pages Page 14 of 120
  18. Slide 18: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Provide a search web form • Searching can be done native with Perl, or with more advanced tools (like Glimpse at http://glimpse.cs.arizona.edu/) • The example here is a native Perl solution searching my online WebTechniques column archive CASE STUDY: making searchable web pages Page 15 of 120
  19. Slide 19: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use CGI::Carp qw(fatalsToBrowser); use HTML::Entities (); BEGIN { *ent = \\&HTML::Entities::encode_entities; } use CGI \":all\"; my $DIR = \"/home/merlyn/Html/merlyn/WebTechniques\"; my $URL = \"http://www.stonehenge.com/merlyn/WebTechniques\"; my $FILEPAT = \"\\\\.listing\\\\.txt\\$\"; print header, CASE STUDY: making searchable web pages Page 16 of 120
  20. Slide 20: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM start_html(\"-title\" => \"Search WebTechniques Perl Scripts\"), h1(\"Search WebTechniques Perl Scripts\"), \"Search the \", a({Href => $URL}, \"Perl WebTechniques programs\"), \" by submitting this form:\\n\", hr, start_form, p, \"Search for: \", textfield(\"-name\" => \"search\"), p, checkbox(\"-name\" => \"regex\", \"-label\" => \"Use Regular Expressions\"), p, checkbox(\"-name\" => \"ignore\", \"-label\" => \"Ignore case\"), p, submit, end_form, hr; my $searchstring = param(\"search\"); # the search item if (defined $searchstring and length $searchstring) { chdir $DIR or die \"Cannot chdir $DIR: $!\"; opendir DIR, \".\" or die \"Cannot opendir $DIR: $!\"; @ARGV = grep /$FILEPAT/o, readdir DIR; # get matching filenames for <> closedir DIR; CASE STUDY: making searchable web pages Page 17 of 120
  21. Slide 21: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM unless (param(\"regex\")) { # if ordinary string... $searchstring = quotemeta $searchstring; # make ordinary. } my $ignore = param(\"ignore\") ? \"(?i)\" : \"\"; # make case insensitive print p, \"Follow the link to get the full listing:\\n\", \"<PRE>\\n\"; my $per_file = 0; # how many hits this file? while (<>) { if (eof) { close ARGV; # resets $. $per_file = 0; } chomp; my $per_line = 0; # how many hits this line? while (s/$ignore$searchstring//o) { print a({Href => \"$URL/$ARGV\"}, ent($ARGV)), \":$.: \" CASE STUDY: making searchable web pages Page 18 of 120
  22. Slide 22: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM unless $per_line++; # first time, print prefix print ent($`), b(ent $&); $_ = $'; last if $per_line >= 5; # only five hits max per line } if ($per_line) { # at least one hit? print ent($_),\"\\n\"; # finish line off if (++$per_file >= 5) { # only five lines max per file print \"[skipping to next file]\\n\"; close ARGV; # force EOF $per_file = 0; } } } print \"</PRE>\\n\"; } print end_html; CASE STUDY: making searchable web pages Page 19 of 120
  23. Slide 23: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: when an action takes a long time CASE STUDY: when an action takes a long time Page 20 of 120
  24. Slide 24: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • Sometimes, a query or some other action can take a long time • People don’t want to wait • They want immediate response, but you’re not ready immediately to give a response CASE STUDY: when an action takes a long time Page 21 of 120
  25. Slide 25: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Provide an “action in progress” page imediately, then continue the search in the background • Make the page do “auto-reload” via client-pull • Also include text that says “reload this”, so non-client-pull browsers will also work • When the action completes, replace the “in progress” page with the results • Might want to combine this with the “more” solution earlier, if there are many hits • You could even provide a partial solution this way: hits 1-25 available now, but others are showing up later • Sample solution here does a “traceroute” CASE STUDY: when an action takes a long time Page 22 of 120
  26. Slide 26: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/usr/bin/perl -T use strict; $|++; $ENV{PATH} = \"/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin\"; use CGI qw(:all delete_all escapeHTML); if (my $session = param('session')) { # returning to pick up session data my $cache = get_cache_handle(); my $data = $cache->get($session); unless ($data and ref $data eq \"ARRAY\") { # something is wrong show_form(); exit 0; } CASE STUDY: when an action takes a long time Page 23 of 120
  27. Slide 27: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM print header; print start_html(-title => \"Traceroute Results\", ($data->[0] ? () : (-head => [\"<meta http-equiv=refresh content=5>\"]))); print h1(\"Traceroute Results\"); print pre(escapeHTML($data->[1])); print p(i(\"... continuing ...\")) unless $data->[0]; print end_html; } elsif (my $host = param('host')) { # returning to select host if ($host =~ /^([a-zA-Z0-9.\\-]{1,100})\\z/) { # create a session $host = $1; # untainted now my $session = get_session_id(); my $cache = get_cache_handle(); $cache->set($session, [0, \"\"]); # no data yet if (my $pid = fork) { # parent does delete_all(); # clear parameters param('session', $session); CASE STUDY: when an action takes a long time Page 24 of 120
  28. Slide 28: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM print redirect(self_url()); } elsif (defined $pid) { # child does close STDOUT; # so parent can go on unless (open F, \"-|\") { open STDERR, \">&=1\"; exec \"/usr/sbin/traceroute\", $host; die \"Cannot execute traceroute: $!\"; } my $buf = \"\"; while (<F>) { $buf .= $_; $cache->set($session, [0, $buf]); } $cache->set($session, [1, $buf]); exit 0; } else { die \"Cannot fork: $!\"; } CASE STUDY: when an action takes a long time Page 25 of 120
  29. Slide 29: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM } else { show_form(); } } else { # display form show_form(); } exit 0; sub show_form { print header, start_html(\"Traceroute\"), h1(\"Traceroute\"); print start_form; print submit('traceroute to this host:'), \" \", textfield('host'); print end_form, end_html; } CASE STUDY: when an action takes a long time Page 26 of 120
  30. Slide 30: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM sub get_cache_handle { require Cache::FileCache; Cache::FileCache->new ({ namespace => 'tracerouter', username => 'nobody', default_expires_in => '30 minutes', auto_purge_interval => '4 hours', }); } sub get_session_id { require Digest::MD5; Digest::MD5::md5_hex(Digest::MD5::md5_hex(time().{}.rand().$$)); } CASE STUDY: when an action takes a long time Page 27 of 120
  31. Slide 31: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: handling multi-page forms CASE STUDY: handling multi-page forms Page 28 of 120
  32. Slide 32: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You’ve got a form that’s too long to comfortably fit on one page • People make mistakes while entering forms, so they like to back up if there’s multiple pages • You suspect there’s a way to make a data-driven solution rather than ad-hoc code for each page CASE STUDY: handling multi-page forms Page 29 of 120
  33. Slide 33: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Use a Perl datastructure to define the pages • Have a common form element (like textfield) that can be overriden for odd forms • Generate hidden fields as you go forward to remember previous entries • Include a “back up” button that generates the hidden fields and fills in previous values • Sticky values are easy with CGI.pm, as long as we know what the values need to be CASE STUDY: handling multi-page forms Page 30 of 120
  34. Slide 34: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/usr/local/bin/perl -Tw use strict; $| = 1; use CGI \":all\"; ### configuration ## all-capital names are used in code, so don’t change them ## type of default fields unless overridden with $_[0] is fieldname sub DEFAULT_FIELD { textfield(shift, \"\", 60) } CASE STUDY: handling multi-page forms Page 31 of 120
  35. Slide 35: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM my @PAGES = ( ## first and second pages have no back buttons [\"Introduction\", \\&intro_page], [\"Tell us about yourself\", [Name => \"Name\"], [Address => \"Address\"], [City => \"City\"], [State => \"State\"], [Zipcode => \"Zip Code\"], ], [\"Tell us about your experience\", [Movie => \"Movie you saw\"], [Review => \"What you thought of it\", sub { radio_group(shift, [‘No opinion’, qw(Excellent Good Fair Poor Bombed)]); }], ], CASE STUDY: handling multi-page forms Page 32 of 120
  36. Slide 36: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM [\"Tell us anything else\", [Theatre => \"Theatre conditions\"], [Snackbar => \"Snack bar\"], [General => \"Any other comments?\", sub { textarea(shift) }], ], [\"Thank you!\", \\&thank_you_page], [\"Goodbye!\", \\&submit_page], ## last page has no back or forward buttons ); ## Internal use: these must not collide with fieldnames above my $PREVIOUS = \"Previous Page\"; my $NEXT = \"Next Page\"; my $PAGE = \"__page__\"; CASE STUDY: handling multi-page forms Page 33 of 120
  37. Slide 37: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM sub intro_page { print p([\"Thank you for visiting ScratchySound Theatres!\", \"Please take a moment to tell us about your experience.\", ]); } sub thank_you_page { print p([\"Thank you for taking the time to fill out our survey!\", \"Go back now to change any entries you need, or forward to send to us!\", ]); } CASE STUDY: handling multi-page forms Page 34 of 120
  38. Slide 38: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM sub submit_page { ## code to save params would go here ## be sure to ignore param($PAGE), param($NEXT), param($PREVIOUS) print p([\"Your entry has been submitted to us!\", \"Thank you!\"]); print $CGI::Q->dump; # debugging } ### end my $previous = param($PREVIOUS); my $next = param($NEXT); my $page = param($PAGE); CASE STUDY: handling multi-page forms Page 35 of 120
  39. Slide 39: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM if (defined $page and $page =~ /^\\d+$/ and $page >= 0 and $page <= $#PAGES) { $page += defined($previous) ? -1 : +1; # default to forward } else { $page = 0; } param($PAGE, $page); # set it for hidden my @info = @{$PAGES[$page]}; my $title = shift @info; print header, start_html($title), h1($title), start_form; if (ref $info[0] eq \"CODE\") { $info[0]->(); } else { print table({ Border => 1, Cellpadding => 5 }, map { Tr( th($_->[1]), td(($_->[2] || \\&DEFAULT_FIELD)->($_->[0])) )} @info ); } CASE STUDY: handling multi-page forms Page 36 of 120
  40. Slide 40: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM ## no backing up from first or second or last page: print submit($PREVIOUS) if $page > 1 and $page < $#PAGES; ## no going forward from last page: print submit($NEXT) if $page < $#PAGES; ## generate hidden fields print hidden($PAGE); for my $other (0..$#PAGES) { next if $other == $page; # don’t dump hiddens for current my @info = @{$PAGES[$other]}; shift @info; # toss title next if ref($info[0]) eq \"CODE\"; # code page for (map {$_->[0]} @info) { print hidden($_) if defined param($_); } } print end_form; print end_html; CASE STUDY: handling multi-page forms Page 37 of 120
  41. Slide 41: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: stateful transactions via a single-threaded mini-webserver CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 38 of 120
  42. Slide 42: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • HTTP is essentially stateless—there’s no necessary correlation between one hit and the next • You can pass session identifiers via cookies or extended URLs or hidden fields • However, the session data sometimes is too large to fit via these systems • So pass a session ID instead, and let that index the real data being held on the server • If the data is difficult to save and reload into a CGI script, we’re in trouble though • For example, if the data is a object tree, marshalling the data between various CGI invocations can get difficult or expensive CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 39 of 120
  43. Slide 43: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Use a mini-web-server created with HTTP::Daemon • Fire off a separate server for each stateful conversation • Have the initial CGI script redirect the client to this server • The server stays up as long as queries come in “often enough” • You’ll need to figure out a reasonable timeout value • Too short, and we lose track of a conversation • Too long, and we waste a lot of processes • This example uses the Eliza module from the CPAN, simulating a psychotherapist • The data in this case is “office visit notes”—the doctor replays previous comments when nothing of interest is spoken in the current statement • Use the CPAN—lots of good stuff in there CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 40 of 120
  44. Slide 44: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use CGI::Carp qw(fatalsToBrowser); use CGI \":all\"; use HTTP::Daemon; use HTTP::Status; use Chatbot::Eliza; my $HOST = \"www.stonehenge.com\"; # where are we? my $TIMEOUT = 120; # number of seconds until this doc dies my $d = new HTTP::Daemon (LocalAddr => $HOST); my $unique = join \".\", time, $$, int(rand 1000); CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 41 of 120
  45. Slide 45: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM my $url = $d->url.$unique; defined(my $pid = fork) or die \"Cannot fork: $!\"; if ($pid) { # I am, apparently, the parent print redirect($url); exit 0; } close(STDOUT); # to let the kid live on my $eliza = new Chatbot::Eliza; { alarm($TIMEOUT); # (re-)set the deadman timer my $c = $d->accept; # $c is a connection my $r = $c->get_request; # $r is a request if ($r->url->epath ne \"/$unique\") { $c->send_error(RC_FORBIDDEN, \"I don't think we've made an appointment!\"); close $c; CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 42 of 120
  46. Slide 46: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM redo; } $c->send_basic_header; $CGI::Q = new CGI $r->content; my $eliza_says = \"How do you do? Please tell me your problem.\"; my $message = param(\"message\") || \"\"; if ($message) { param(\"message\",\"\"); $eliza_says = $eliza->transform($message); } print $c header, start_html(\"The doctor is in!\"), h1(\"The doctor is in!\"), hr, startform(\"POST\", $url), p($eliza_says), p, textfield(-name => \"message\", -size => 60), CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 43 of 120
  47. Slide 47: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM p, submit(\"What do you say, doc?\"), p(\"Note: the doctor is patient, but waits only $TIMEOUT seconds, so hurry!\" ), endform, hr, end_html; close $c; redo; } CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 44 of 120
  48. Slide 48: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: stateful transactions via a multi-threaded mini-webserver CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 45 of 120
  49. Slide 49: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • Like before, we want a stateful conversation in the stateless HTTP world • But firing off a separate webserver for each conversation might be too expensive • Or perhaps the threads should interact somehow (like for a chat server) CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 46 of 120
  50. Slide 50: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Use a single, multi-threaded mini-web-server • Again, created with HTTP::Daemon • On the first CGI invocation, fire off a daemon, and redirect to it as before • The unique session ID serves as a key to the mini web-server’s per-thread data • Once again, we’ll show this with Eliza • Downside—each response must be fast, because we’ve got only one thread active at a time CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 47 of 120
  51. Slide 51: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use CGI::Carp qw(fatalsToBrowser); use CGI \":all\"; use HTTP::Daemon; use HTTP::Status; use Chatbot::Eliza; my $HOST = \"www.stonehenge.com\"; # where are we? my $PORT = 42001; # at what port my $TIMEOUT = 300; # number of seconds until this doc dies my $d = do { CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 48 of 120
  52. Slide 52: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM local($^W) = 0; new HTTP::Daemon (LocalAddr => $HOST, LocalPort => $PORT) }; my $unique = join \".\", time, $$, int(rand 1000); my $url_prefix = \"http://$HOST:$PORT\"; my $url = \"$url_prefix/$unique\"; print redirect($url); exit 0 unless defined $d; # do we need to become the server? defined(my $pid = fork) or die \"Cannot fork: $!\"; exit 0 if $pid; # I am the parent close(STDOUT); my %eliza; # Chatbot::Eliza objects, keyed on session my %when; # most recent activity time, keyed on session { CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 49 of 120
  53. Slide 53: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM alarm($TIMEOUT); # (re-)set the deadman timer my $c = $d->accept; # $c is a connection my $r = $c->get_request; # $r is a request (my $session = $r->url->epath) =~ s{^/}{}; unless ($session =~ /^\\d+\\.\\d+\\.\\d+$/) { $c->send_error(RC_FORBIDDEN, \"I don't think we've made an appointment!\"); close $c; redo; } $c->send_basic_header; $CGI::Q = new CGI $r->content; my $eliza_says = \"How do you do? Please tell me your problem.\"; my $message = param(\"message\") || \"\"; if ($message) { param(\"message\",\"\"); $eliza{$session} ||= new Chatbot::Eliza; CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 50 of 120
  54. Slide 54: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM $eliza_says = $eliza{$session}->transform($message); $when{$session} = time; } print $c header, start_html(\"The doctor is in!\"), h1(\"The doctor is in!\"), hr, startform(\"POST\", \"$url_prefix/$session\"), $eliza_says && p($eliza_says), p, textfield(-name => \"message\", -size => 60), p, submit(\"What do you say, doc?\"), p(\"Note: the doctor is patient, but waits only $TIMEOUT seconds, so hurry!\" ), endform, hr, end_html; close $c; CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 51 of 120
  55. Slide 55: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM for (keys %when) { next if $when{$_} > time - $TIMEOUT; delete $eliza{$_}; delete $when{$_}; } redo; } CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 52 of 120
  56. Slide 56: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: an anonymous proxy server CASE STUDY: an anonymous proxy server Page 53 of 120
  57. Slide 57: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You hate cookies, or having a transaction tracked to a particular IP address • Or you wanna process incoming pages to strip ads, or ugly specific-browser formatting (www.cnn.com, ugh!) • Or, you just wanna say “I can write a full-functioning anonymous proxy web server in 90 lines of Perl” CASE STUDY: an anonymous proxy server Page 54 of 120
  58. Slide 58: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Create a mini-web-server with HTTP::Daemon • Point your browser at it as a proxy server • Have each request turn into an LWP request to be fetched • Alter the request and response headers and content to your heart’s content • If you’re brave, even implement pre-forking (ala Apache) and caching (not illustrated here) CASE STUDY: an anonymous proxy server Page 55 of 120
  59. Slide 59: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; my $HOST = \"www.stonehenge.com\"; my $PORT = \"4242\"; sub prefix { my $now = localtime; join \"\", map { \"[$now] [${$}] $_\\n\" } split /\\n/, join \"\", @_; } $SIG{__WARN__} = sub { warn prefix @_ }; $SIG{__DIE__} = sub { die prefix @_ }; CASE STUDY: an anonymous proxy server Page 56 of 120
  60. Slide 60: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM $SIG{CLD} = $SIG{CHLD} = sub { wait; }; my $AGENT; # global user agent (for efficiency) use LWP::UserAgent; $AGENT = LWP::UserAgent->new; $AGENT->agent(\"anon/0.07\"); $AGENT->env_proxy; { ### MAIN ### use HTTP::Daemon; my $master = new HTTP::Daemon LocalAddr => $HOST, LocalPort => $PORT; warn \"set your proxy to <URL:\", $master->url, \">\"; my $slave; &handle_connection($slave) while $slave = $master->accept; exit 0; } ### END MAIN ### CASE STUDY: an anonymous proxy server Page 57 of 120
  61. Slide 61: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM sub handle_connection { my $connection = shift; # HTTP::Daemon::ClientConn my $pid = fork; if ($pid) { # spawn OK, and I'm the parent close $connection; return; } ## spawn failed, or I'm a good child my $request = $connection->get_request; if (defined($request)) { my $response = &fetch_request($request); $connection->send_response($response); close $connection; } exit 0 if defined $pid; # exit if I'm a good child with a good parent } CASE STUDY: an anonymous proxy server Page 58 of 120
  62. Slide 62: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM sub fetch_request { my $request = shift; # HTTP::Request use HTTP::Response; my $url = $request->url; warn \"fetching $url\"; if ($url->scheme !~ /^(http|gopher|ftp)$/) { my $res = HTTP::Response->new(403, \"Forbidden\"); $res->content(\"bad scheme: @{[$url->scheme]}\\n\"); $res; } elsif (not $url->rel->netloc) { my $res = HTTP::Response->new(403, \"Forbidden\"); $res->content(\"relative URL not permitted\\n\"); $res; } else { &fetch_validated_request($request); CASE STUDY: an anonymous proxy server Page 59 of 120
  63. Slide 63: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM } } sub fetch_validated_request { # return HTTP::Response my $request = shift; # HTTP::Request ## uses global $AGENT ## warn \"orig request: <<<\", $request->headers_as_string, \">>>\"; $request->remove_header(qw(User-Agent From Referer Cookie)); ## warn \"anon request: <<<\", $request->headers_as_string, \">>>\"; my $response = $AGENT->simple_request($request); ## warn \"orig response: <<<\", $response->headers_as_string, \">>>\"; $response->remove_header(qw(Set-Cookie)); ## warn \"anon response: <<<\", $response->headers_as_string, \">>>\"; $response; } CASE STUDY: an anonymous proxy server Page 60 of 120
  64. Slide 64: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: reorganizing data from other servers CASE STUDY: reorganizing data from other servers Page 61 of 120
  65. Slide 65: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • That darn Dilbert1 archive • If you miss a week’s worth of Dilbert, you’ve got to click all over the place • It’d sure be neat to have a page where I could just say “give me the last week of Dilberts”, and it shows up as a single page that I could just wait for a short while to load 1. Dilbert is clearly a trademark of someone, and copyrighted by them. You knew that, so I’m not repeating it here. CASE STUDY: reorganizing data from other servers Page 62 of 120
  66. Slide 66: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • One of my most frequently used programs! • Create a CGI that asks for “how many back dates” • With the response, go out to the master archive page, extract all the sub archive pages (with the actual GIF links) and put all those link URLs into one table • Respond with that, which means the browser will ultimately suck down all the GIFs • Downside—if they change the precise format of the pages, you’ll have to change the script to keep up (this happened already with Dilbert) • Other downside—check with a lawyer before running these kind of programs—the source server may consider this a repackaging or derivation of copyrighted stuff... bad news! • Certainly would be bad to claim the data was your own • On the other hand, places like www.metacrawler.com use this technique to consolidate data from many search engines CASE STUDY: reorganizing data from other servers Page 63 of 120
  67. Slide 67: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use CGI::Carp qw(fatalsToBrowser); use LWP::Simple \"get\"; use URI::URL; use CGI qw/:form :html param header/; use HTML::Entities; ## configure my $TOP = \"http://www.unitedmedia.com/comics/dilbert/archive/\"; my $HTML_RE = '/comics/dilbert/archive/dilbert\\d+.html'; my $GIF_RE = '/comics/dilbert/archive/images/dilbert\\d+\\.gif'; my $KEEP = 99; CASE STUDY: reorganizing data from other servers Page 64 of 120
  68. Slide 68: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM ## end configure sub td_center { td({ align => \"center\" }, @_); } BEGIN { my $notes = \"\"; sub add_note { $notes .= join \"\", @_; } sub get_notes { $notes; } sub get_ent_notes { encode_entities $notes; } } print header, start_html(\"Dilbert\"), h1(\"Recent Dilberts\"), \"\\n\"; my $max = param(\"max\"); if (defined $max and $max =~ /^\\d+$/) { CASE STUDY: reorganizing data from other servers Page 65 of 120
  69. Slide 69: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM $max = $KEEP if $max > $KEEP; my $top = get $TOP; my @gif_urls = (); if (not defined $top) { add_note \"cannot get $TOP\"; } else { my @old_urls = map url($_,$TOP)->abs, $top =~ m!($HTML_RE)!og; @old_urls = @old_urls[-$max..-1] if @old_urls > $max; for my $url (@old_urls) { my $content = get $url or (add_note \"cannot get $url\\n\"), next; my ($gif) = $content =~ m!($GIF_RE)!o; push @gif_urls, url($gif,$url)->abs; } } print table( CASE STUDY: reorganizing data from other servers Page 66 of 120
  70. Slide 70: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM (map { TR(td_center($_)) } map { table(TR(td_center(encode_entities $_)), TR(td_center(img{-src => $_}))) } @gif_urls), p(get_ent_notes()) ); } else { print hr, start_form, p(submit(\"get this many days of back-images:\"), popup_menu(\"max\", [1..45], \"14\")), end_form, hr; } print \"\\n\", end_html; CASE STUDY: reorganizing data from other servers Page 67 of 120
  71. Slide 71: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: finding out how they got here CASE STUDY: finding out how they got here Page 68 of 120
  72. Slide 72: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You have an interesting web site • People are coming to your site • A lot of hits are likely to be from search engines • If you maintain a referer (sic) log, you may have noticed that some of the hits are from big search engines, and it looks like some of the search strings are in the URLs • You’re curious about what these search strings are CASE STUDY: finding out how they got here Page 69 of 120
  73. Slide 73: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Write a program that parses the referer log • Look for the URLs depicting the big search engines • Parse the “GET” string with URI::URL • Pick out the form elements (different for each search engine) • Search engines change over time, so this program will require updating and tuning CASE STUDY: finding out how they got here Page 70 of 120
  74. Slide 74: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use URI::URL; my %count = (); while (<>) { my ($ref) = split; ## may require adjustment my $url = url $ref; next unless ($url->scheme || \"\") eq \"http\"; next unless my %form = eval { $url->query_form }; my @search_fields = do { local $_ = lc $url->host; if (0) { () } CASE STUDY: finding out how they got here Page 71 of 120
  75. Slide 75: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM elsif (/\\baltavista\\b/) { \"q\" } elsif (/\\bnetfind\\.aol\\.com$/) { qw(s search) } elsif (/\\baskjeeves\\.com$/) { \"ask\" } elsif (/\\bdejanews\\.com$/) { () } elsif (/\\bdigiweb\\.com$/) { \"string\" } elsif (/\\bdogpile\\.com$/) { \"q\" } elsif (/\\bexcite\\.com$/) { qw(s search) } elsif (/\\bhotbot\\.com$/) { \"mt\" } elsif (/\\binference\\.com$/) { \"query\" } elsif (/\\binfoseek\\.com$/) { qw(oq qt) } elsif (/\\blooksmart\\.com$/) { \"key\" } elsif (/\\blycos\\b/) { \"query\" } elsif (/\\bmckinley\\.com$/) { \"search\" } elsif (/\\bmetacrawler\\b/) { \"general\" } elsif (/\\bnlsearch\\.com$/) { \"qr\" } elsif (/\\bprodigy\\.net$/) { \"query\" } elsif (/\\bsearch\\.com$/) { qw(oldquery query) } elsif (/\\bsenrigan\\.ascii\\.co\\.jp$/) { \"word\" } CASE STUDY: finding out how they got here Page 72 of 120
  76. Slide 76: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM elsif (/\\bswitchboard\\.com$/) { \"sp\" } elsif (/\\bwebcrawler\\.com$/) { qw(search searchtext text) } elsif (/\\bedit\\.my\\.yahoo\\.com$/) { () } ## must come before yahoo.com elsif (/\\byahoo\\b/) { \"p\" } else { \"UNKNOWN\" } }; next unless @search_fields; my %wanted = map { $_, 1 } @search_fields; my @show_fields = grep { $wanted{lc $_} } keys %form; if (@show_fields) { for (@show_fields) { $count{$url->host}{$form{$_}}++; } } else { print $url->host, \"\\n\"; for (sort keys %form) { print \"?? $_ => $form{$_}\\n\"; } CASE STUDY: finding out how they got here Page 73 of 120
  77. Slide 77: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM } } for my $host (sort keys %count) { my $hostinfo = $count{$host}; for my $text (sort keys %$hostinfo) { my $times = $hostinfo->{$text}; print \"$host: $text\"; print \" ($times times)\" if $times > 1; print \"\\n\"; } } CASE STUDY: finding out how they got here Page 74 of 120
  78. Slide 78: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: finding out where they went from here CASE STUDY: finding out where they went from here Page 75 of 120
  79. Slide 79: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You’ve created a website with a lot of interesting links to other places • With the server logs, you can see how people bounce around within your site • But you can’t tell where they go, so you can’t tell whether the links you’re providing are useful CASE STUDY: finding out where they went from here Page 76 of 120
  80. Slide 80: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Change all outbound links so that they invoke a CGI script • The parameter of the real URL gets passed as the PATH_INFO value <A HREF=\"/cgi/go/http://www.stonehenge.com/perltraining/\">Learn!</A> • The CGI script records the referer (your page) and the destination into a special log file • Then it redirects the browser to the actual location • Downside—this adds one CGI hit per outbound link • Downside—also means you must rewrite your outbound links, but you can do this programatically CASE STUDY: finding out where they went from here Page 77 of 120
  81. Slide 81: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code (part one: handle the redirections) #!/home/merlyn/bin/perl -Tw use strict; $|++; my $GO_LOG = \"/home/merlyn/Web/golog\"; my $result = eval { die unless defined (my $res = $ENV{PATH_INFO}); die unless $res =~ s/^\\///; my $query = $ENV{QUERY_STRING}; if (defined $query and length $query) { $res .= \"?$query\"; } $res; }; CASE STUDY: finding out where they went from here Page 78 of 120
  82. Slide 82: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM if ($@) { print \"Status: 404 Not Found\\n\\n\"; exit 0; } print \"Location: $result\\n\\n\"; my $pid = fork; $pid = 0 unless defined $pid; # be the kid if fork failed exit 0 if $pid; ## child... close(STDOUT); open GOLOG, \">>$GO_LOG\" or die \"Cannot open $GO_LOG: $!\"; flock(GOLOG,2); # wait for exclusive seek GOLOG, 0, 2; # seek to end, refresh buffers print GOLOG join(\"\\t\", scalar localtime, $result, ($ENV{HTTP_REFERER} || \"[unknown]\")), \"\\n\"; close GOLOG; CASE STUDY: finding out where they went from here Page 79 of 120
  83. Slide 83: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code (part two: rewriting existing pages) #!/home/merlyn/bin/perl -w use strict; $|++; use File::Find; unless (@ARGV) { find sub { push @ARGV, $File::Find::name if /\\.html/; }, \"/home/merlyn/Html/\"; } undef $/; $^I = \"~\"; while (<>) { s{(href=\"(.*?)\")}{ CASE STUDY: finding out where they went from here Page 80 of 120
  84. Slide 84: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM my ($old,$url,$new) = ($1,$2); if ($url =~ /^http:(?!.*cgi\\/go)/) { $new = qq{href=\"/cgi/go/$url\"}; print STDOUT \"$ARGV: changing $old to $new\\n\"; } else { $new = $old; } $new; }egi; print if defined $^I; } CASE STUDY: finding out where they went from here Page 81 of 120
  85. Slide 85: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: extracting a portion of website to a tar file CASE STUDY: extracting a portion of website to a tar file Page 82 of 120
  86. Slide 86: STONEHEN