Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer. To install it, go here.
-
Added to the group Perl by brian_d_foy
-
Added to the group Stonehenge Consulting Services, Inc by brian_d_foy
Presentation Transcript
- Slide 1: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM Practical Web Programming a very short course by Randal L. Schwartz Stonehenge Consulting Services Version 1.5.0 (7/30/02)[S] Copyright ©1998—2002by Randal L. Schwartz, Stonehenge Consulting Services, Inc. Page 1 of 1
- Slide 2: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM Table of Contents Introduction 2 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 What this course is about . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 CASE STUDY: returning a long response with “more...” 4 CASE STUDY: The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 stateful transactions via a single-threaded mini-webserver The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 38 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 CASE STUDY: The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 making searchable web pages 13 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 CASE STUDY: The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 stateful transactions via a multi-threaded mini-webserver The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 45 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 CASE STUDY: The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 when an action takes a long time 20 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 CASE STUDY: The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 an anonymous proxy server 53 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 CASE STUDY: The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 handling multi-page forms 28 Page 1 of 2
- Slide 3: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 CASE STUDY: CASE STUDY: embedding dynamic graphics in your CGI output 90 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 reorganizing data from other servers 61 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 CASE STUDY: CASE STUDY: serializing an expensive CGI script 98 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 finding out how they got here 68 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 CASE STUDY: CASE STUDY: basic session management with cookies 103 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 finding out where they went from here 75 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 The code (part one: handle the redirections) . . . . . . . . . . . . . . . . . . . 78 CASE STUDY: The code (part two: rewriting existing pages) . . . . . . . . . . . . . . . . . . 80 one-click processing 112 CASE STUDY: The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 extracting a portion of website to a tar file 82 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Conclusion 119 The code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Questions and answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Page 2 of 2
- Slide 4: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM Practical Web Programming a very short course by Randal L. Schwartz Stonehenge Consulting Services Version 1.5.0 (7/30/02)[S] Copyright ©1998—2002 by Randal L. Schwartz, Stonehenge Consulting Services, Inc. Page 1 of 120
- Slide 5: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM Introduction Introduction Page 2 of 120
- Slide 6: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM What this course is about • Practical web programming—simple web-related techniques illustrated in case-study form • Not the only way to do it (remember the Perl motto!) • Some examples are updated versions of programs I’ve written for WebTechniques and other magazines • For more examples like this, see the column archives at http://www.stonehenge.com/merlyn/columns.html • Each presented as “problem...solution” form • The CPAN is your friend—nearly all examples here use CGI.pm and the LWP library—definitely “must haves” for any serious web programming • Most of these programs will run as coded, but they’re really meant more to be representative—steal the technique, not the code! Introduction Page 3 of 120
- Slide 7: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: returning a long response with “more...” CASE STUDY: returning a long response with “more...” Page 4 of 120
- Slide 8: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You’ve got a query page, and the result could return a lot of hits • If you returned them all as one page, it might take uncomfortably long to download • This will probably lead to a user-initiated abort on a slow link (at least, it does for me) CASE STUDY: returning a long response with “more...” Page 5 of 120
- Slide 9: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Do the query, then return the hits in a series of pages • On the first page, show the first 25 (or whatever) hits • Include a link to later hits • Later pages can also include a link to earlier hits • The links actually call back to the same script with a session number and starting point • The session number is a pointer to data in /tmp or whatever (should be cleaned up later) • For our example, we’ll extract entries from /usr/dict/words that match a pattern CASE STUDY: returning a long response with “more...” Page 6 of 120
- Slide 10: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use CGI::Carp qw(fatalsToBrowser); my $MAXHITS = 25; # constant: number of hits per page my $TMP = \"/tmp/more.\"; # constant: location of session files use HTML::Entities qw(encode_entities); use CGI \":all\"; my $session; # global: session-ID my $search; # global: search string my @found; # global: array of valid hits CASE STUDY: returning a long response with “more...” Page 7 of 120
- Slide 11: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM print header, start_html( 'So you want more???', 'merlyn@stonehenge.com' ); print h1(\"Query the dictionary\"); if ($session = param('session')) { ## we are in the midst of a session &load_session(); # sets $search, @found &display(param('start')); } elsif ($search = param('search')) { ## we are beginning the query ## perform the query, and set up for session if necessary open WORDS,\"/usr/dict/words\"; chomp(@found = grep /$search/o, <WORDS>); close WORDS; $session = unpack(\"H*\", pack(\"Nn\", time, $$)); # 12 hex chars CASE STUDY: returning a long response with “more...” Page 8 of 120
- Slide 12: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM &save_session(); &display(0); } else { ## we are being invoked initially ## print the basic search form print hr, startform, p, \"Search for:\", textfield('search'), submit('Search'), endform, hr; } print end_html; exit 0; sub load_session { CASE STUDY: returning a long response with “more...” Page 9 of 120
- Slide 13: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM open TMP, \"<$TMP$session\" or die \"missing session file $TMP$session: $!\"; chop(($search, @found) = <TMP>); close TMP; } sub save_session { open TMP,\">$TMP$session\" or die \"Cannot create $TMP$session: $!\"; print TMP map \"$_\\n\", $search, @found; close TMP; } sub display { my $start = shift; # where to start (undef/0 if beginning) print \"You are searching for: \", encode_entities($search), \"\\n\"; my $low = $start; CASE STUDY: returning a long response with “more...” Page 10 of 120
- Slide 14: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM ## sanity checking... won't happen unless user fakes us out $low = 0 if ($low || 0) <= 0; $low = $#found if $low > $#found; my $high = $low + $MAXHITS - 1; $high = $#found if $high > $#found; print br, \"Hits \", $low + 1, \"..\", $high + 1, \" (of \".@found.\") hits:\\n\", pre(join \"\\n\", map { encode_entities($_) } @found[$low..$high]), hr; if ($high < $#found) { print br, a({Href => join \"\", script_name(), \"?session=$session&start=\", $low + $MAXHITS}, \"See next $MAXHITS hits...\"); CASE STUDY: returning a long response with “more...” Page 11 of 120
- Slide 15: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM } if ($low > 0) { print br, a({Href => join \"\", script_name(), \"?session=$session&start=\", $low - $MAXHITS}, \"See previous $MAXHITS hits...\"); } } CASE STUDY: returning a long response with “more...” Page 12 of 120
- Slide 16: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: making searchable web pages CASE STUDY: making searchable web pages Page 13 of 120
- Slide 17: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You’ve got a lot of information on a series of similar web pages • You can provide interesting table of contents pages • But people sometimes want to search for words of interest CASE STUDY: making searchable web pages Page 14 of 120
- Slide 18: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Provide a search web form • Searching can be done native with Perl, or with more advanced tools (like Glimpse at http://glimpse.cs.arizona.edu/) • The example here is a native Perl solution searching my online WebTechniques column archive CASE STUDY: making searchable web pages Page 15 of 120
- Slide 19: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use CGI::Carp qw(fatalsToBrowser); use HTML::Entities (); BEGIN { *ent = \\&HTML::Entities::encode_entities; } use CGI \":all\"; my $DIR = \"/home/merlyn/Html/merlyn/WebTechniques\"; my $URL = \"http://www.stonehenge.com/merlyn/WebTechniques\"; my $FILEPAT = \"\\\\.listing\\\\.txt\\$\"; print header, CASE STUDY: making searchable web pages Page 16 of 120
- Slide 20: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM start_html(\"-title\" => \"Search WebTechniques Perl Scripts\"), h1(\"Search WebTechniques Perl Scripts\"), \"Search the \", a({Href => $URL}, \"Perl WebTechniques programs\"), \" by submitting this form:\\n\", hr, start_form, p, \"Search for: \", textfield(\"-name\" => \"search\"), p, checkbox(\"-name\" => \"regex\", \"-label\" => \"Use Regular Expressions\"), p, checkbox(\"-name\" => \"ignore\", \"-label\" => \"Ignore case\"), p, submit, end_form, hr; my $searchstring = param(\"search\"); # the search item if (defined $searchstring and length $searchstring) { chdir $DIR or die \"Cannot chdir $DIR: $!\"; opendir DIR, \".\" or die \"Cannot opendir $DIR: $!\"; @ARGV = grep /$FILEPAT/o, readdir DIR; # get matching filenames for <> closedir DIR; CASE STUDY: making searchable web pages Page 17 of 120
- Slide 21: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM unless (param(\"regex\")) { # if ordinary string... $searchstring = quotemeta $searchstring; # make ordinary. } my $ignore = param(\"ignore\") ? \"(?i)\" : \"\"; # make case insensitive print p, \"Follow the link to get the full listing:\\n\", \"<PRE>\\n\"; my $per_file = 0; # how many hits this file? while (<>) { if (eof) { close ARGV; # resets $. $per_file = 0; } chomp; my $per_line = 0; # how many hits this line? while (s/$ignore$searchstring//o) { print a({Href => \"$URL/$ARGV\"}, ent($ARGV)), \":$.: \" CASE STUDY: making searchable web pages Page 18 of 120
- Slide 22: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM unless $per_line++; # first time, print prefix print ent($`), b(ent $&); $_ = $'; last if $per_line >= 5; # only five hits max per line } if ($per_line) { # at least one hit? print ent($_),\"\\n\"; # finish line off if (++$per_file >= 5) { # only five lines max per file print \"[skipping to next file]\\n\"; close ARGV; # force EOF $per_file = 0; } } } print \"</PRE>\\n\"; } print end_html; CASE STUDY: making searchable web pages Page 19 of 120
- Slide 23: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: when an action takes a long time CASE STUDY: when an action takes a long time Page 20 of 120
- Slide 24: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • Sometimes, a query or some other action can take a long time • People don’t want to wait • They want immediate response, but you’re not ready immediately to give a response CASE STUDY: when an action takes a long time Page 21 of 120
- Slide 25: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Provide an “action in progress” page imediately, then continue the search in the background • Make the page do “auto-reload” via client-pull • Also include text that says “reload this”, so non-client-pull browsers will also work • When the action completes, replace the “in progress” page with the results • Might want to combine this with the “more” solution earlier, if there are many hits • You could even provide a partial solution this way: hits 1-25 available now, but others are showing up later • Sample solution here does a “traceroute” CASE STUDY: when an action takes a long time Page 22 of 120
- Slide 26: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/usr/bin/perl -T use strict; $|++; $ENV{PATH} = \"/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin\"; use CGI qw(:all delete_all escapeHTML); if (my $session = param('session')) { # returning to pick up session data my $cache = get_cache_handle(); my $data = $cache->get($session); unless ($data and ref $data eq \"ARRAY\") { # something is wrong show_form(); exit 0; } CASE STUDY: when an action takes a long time Page 23 of 120
- Slide 27: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM print header; print start_html(-title => \"Traceroute Results\", ($data->[0] ? () : (-head => [\"<meta http-equiv=refresh content=5>\"]))); print h1(\"Traceroute Results\"); print pre(escapeHTML($data->[1])); print p(i(\"... continuing ...\")) unless $data->[0]; print end_html; } elsif (my $host = param('host')) { # returning to select host if ($host =~ /^([a-zA-Z0-9.\\-]{1,100})\\z/) { # create a session $host = $1; # untainted now my $session = get_session_id(); my $cache = get_cache_handle(); $cache->set($session, [0, \"\"]); # no data yet if (my $pid = fork) { # parent does delete_all(); # clear parameters param('session', $session); CASE STUDY: when an action takes a long time Page 24 of 120
- Slide 28: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM print redirect(self_url()); } elsif (defined $pid) { # child does close STDOUT; # so parent can go on unless (open F, \"-|\") { open STDERR, \">&=1\"; exec \"/usr/sbin/traceroute\", $host; die \"Cannot execute traceroute: $!\"; } my $buf = \"\"; while (<F>) { $buf .= $_; $cache->set($session, [0, $buf]); } $cache->set($session, [1, $buf]); exit 0; } else { die \"Cannot fork: $!\"; } CASE STUDY: when an action takes a long time Page 25 of 120
- Slide 29: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM } else { show_form(); } } else { # display form show_form(); } exit 0; sub show_form { print header, start_html(\"Traceroute\"), h1(\"Traceroute\"); print start_form; print submit('traceroute to this host:'), \" \", textfield('host'); print end_form, end_html; } CASE STUDY: when an action takes a long time Page 26 of 120
- Slide 30: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM sub get_cache_handle { require Cache::FileCache; Cache::FileCache->new ({ namespace => 'tracerouter', username => 'nobody', default_expires_in => '30 minutes', auto_purge_interval => '4 hours', }); } sub get_session_id { require Digest::MD5; Digest::MD5::md5_hex(Digest::MD5::md5_hex(time().{}.rand().$$)); } CASE STUDY: when an action takes a long time Page 27 of 120
- Slide 31: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: handling multi-page forms CASE STUDY: handling multi-page forms Page 28 of 120
- Slide 32: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You’ve got a form that’s too long to comfortably fit on one page • People make mistakes while entering forms, so they like to back up if there’s multiple pages • You suspect there’s a way to make a data-driven solution rather than ad-hoc code for each page CASE STUDY: handling multi-page forms Page 29 of 120
- Slide 33: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Use a Perl datastructure to define the pages • Have a common form element (like textfield) that can be overriden for odd forms • Generate hidden fields as you go forward to remember previous entries • Include a “back up” button that generates the hidden fields and fills in previous values • Sticky values are easy with CGI.pm, as long as we know what the values need to be CASE STUDY: handling multi-page forms Page 30 of 120
- Slide 34: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/usr/local/bin/perl -Tw use strict; $| = 1; use CGI \":all\"; ### configuration ## all-capital names are used in code, so don’t change them ## type of default fields unless overridden with $_[0] is fieldname sub DEFAULT_FIELD { textfield(shift, \"\", 60) } CASE STUDY: handling multi-page forms Page 31 of 120
- Slide 35: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM my @PAGES = ( ## first and second pages have no back buttons [\"Introduction\", \\&intro_page], [\"Tell us about yourself\", [Name => \"Name\"], [Address => \"Address\"], [City => \"City\"], [State => \"State\"], [Zipcode => \"Zip Code\"], ], [\"Tell us about your experience\", [Movie => \"Movie you saw\"], [Review => \"What you thought of it\", sub { radio_group(shift, [‘No opinion’, qw(Excellent Good Fair Poor Bombed)]); }], ], CASE STUDY: handling multi-page forms Page 32 of 120
- Slide 36: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM [\"Tell us anything else\", [Theatre => \"Theatre conditions\"], [Snackbar => \"Snack bar\"], [General => \"Any other comments?\", sub { textarea(shift) }], ], [\"Thank you!\", \\&thank_you_page], [\"Goodbye!\", \\&submit_page], ## last page has no back or forward buttons ); ## Internal use: these must not collide with fieldnames above my $PREVIOUS = \"Previous Page\"; my $NEXT = \"Next Page\"; my $PAGE = \"__page__\"; CASE STUDY: handling multi-page forms Page 33 of 120
- Slide 37: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM sub intro_page { print p([\"Thank you for visiting ScratchySound Theatres!\", \"Please take a moment to tell us about your experience.\", ]); } sub thank_you_page { print p([\"Thank you for taking the time to fill out our survey!\", \"Go back now to change any entries you need, or forward to send to us!\", ]); } CASE STUDY: handling multi-page forms Page 34 of 120
- Slide 38: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM sub submit_page { ## code to save params would go here ## be sure to ignore param($PAGE), param($NEXT), param($PREVIOUS) print p([\"Your entry has been submitted to us!\", \"Thank you!\"]); print $CGI::Q->dump; # debugging } ### end my $previous = param($PREVIOUS); my $next = param($NEXT); my $page = param($PAGE); CASE STUDY: handling multi-page forms Page 35 of 120
- Slide 39: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM if (defined $page and $page =~ /^\\d+$/ and $page >= 0 and $page <= $#PAGES) { $page += defined($previous) ? -1 : +1; # default to forward } else { $page = 0; } param($PAGE, $page); # set it for hidden my @info = @{$PAGES[$page]}; my $title = shift @info; print header, start_html($title), h1($title), start_form; if (ref $info[0] eq \"CODE\") { $info[0]->(); } else { print table({ Border => 1, Cellpadding => 5 }, map { Tr( th($_->[1]), td(($_->[2] || \\&DEFAULT_FIELD)->($_->[0])) )} @info ); } CASE STUDY: handling multi-page forms Page 36 of 120
- Slide 40: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM ## no backing up from first or second or last page: print submit($PREVIOUS) if $page > 1 and $page < $#PAGES; ## no going forward from last page: print submit($NEXT) if $page < $#PAGES; ## generate hidden fields print hidden($PAGE); for my $other (0..$#PAGES) { next if $other == $page; # don’t dump hiddens for current my @info = @{$PAGES[$other]}; shift @info; # toss title next if ref($info[0]) eq \"CODE\"; # code page for (map {$_->[0]} @info) { print hidden($_) if defined param($_); } } print end_form; print end_html; CASE STUDY: handling multi-page forms Page 37 of 120
- Slide 41: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: stateful transactions via a single-threaded mini-webserver CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 38 of 120
- Slide 42: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • HTTP is essentially stateless—there’s no necessary correlation between one hit and the next • You can pass session identifiers via cookies or extended URLs or hidden fields • However, the session data sometimes is too large to fit via these systems • So pass a session ID instead, and let that index the real data being held on the server • If the data is difficult to save and reload into a CGI script, we’re in trouble though • For example, if the data is a object tree, marshalling the data between various CGI invocations can get difficult or expensive CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 39 of 120
- Slide 43: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Use a mini-web-server created with HTTP::Daemon • Fire off a separate server for each stateful conversation • Have the initial CGI script redirect the client to this server • The server stays up as long as queries come in “often enough” • You’ll need to figure out a reasonable timeout value • Too short, and we lose track of a conversation • Too long, and we waste a lot of processes • This example uses the Eliza module from the CPAN, simulating a psychotherapist • The data in this case is “office visit notes”—the doctor replays previous comments when nothing of interest is spoken in the current statement • Use the CPAN—lots of good stuff in there CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 40 of 120
- Slide 44: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use CGI::Carp qw(fatalsToBrowser); use CGI \":all\"; use HTTP::Daemon; use HTTP::Status; use Chatbot::Eliza; my $HOST = \"www.stonehenge.com\"; # where are we? my $TIMEOUT = 120; # number of seconds until this doc dies my $d = new HTTP::Daemon (LocalAddr => $HOST); my $unique = join \".\", time, $$, int(rand 1000); CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 41 of 120
- Slide 45: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM my $url = $d->url.$unique; defined(my $pid = fork) or die \"Cannot fork: $!\"; if ($pid) { # I am, apparently, the parent print redirect($url); exit 0; } close(STDOUT); # to let the kid live on my $eliza = new Chatbot::Eliza; { alarm($TIMEOUT); # (re-)set the deadman timer my $c = $d->accept; # $c is a connection my $r = $c->get_request; # $r is a request if ($r->url->epath ne \"/$unique\") { $c->send_error(RC_FORBIDDEN, \"I don't think we've made an appointment!\"); close $c; CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 42 of 120
- Slide 46: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM redo; } $c->send_basic_header; $CGI::Q = new CGI $r->content; my $eliza_says = \"How do you do? Please tell me your problem.\"; my $message = param(\"message\") || \"\"; if ($message) { param(\"message\",\"\"); $eliza_says = $eliza->transform($message); } print $c header, start_html(\"The doctor is in!\"), h1(\"The doctor is in!\"), hr, startform(\"POST\", $url), p($eliza_says), p, textfield(-name => \"message\", -size => 60), CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 43 of 120
- Slide 47: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM p, submit(\"What do you say, doc?\"), p(\"Note: the doctor is patient, but waits only $TIMEOUT seconds, so hurry!\" ), endform, hr, end_html; close $c; redo; } CASE STUDY: stateful transactions via a single-threaded mini-webserver Page 44 of 120
- Slide 48: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: stateful transactions via a multi-threaded mini-webserver CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 45 of 120
- Slide 49: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • Like before, we want a stateful conversation in the stateless HTTP world • But firing off a separate webserver for each conversation might be too expensive • Or perhaps the threads should interact somehow (like for a chat server) CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 46 of 120
- Slide 50: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Use a single, multi-threaded mini-web-server • Again, created with HTTP::Daemon • On the first CGI invocation, fire off a daemon, and redirect to it as before • The unique session ID serves as a key to the mini web-server’s per-thread data • Once again, we’ll show this with Eliza • Downside—each response must be fast, because we’ve got only one thread active at a time CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 47 of 120
- Slide 51: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use CGI::Carp qw(fatalsToBrowser); use CGI \":all\"; use HTTP::Daemon; use HTTP::Status; use Chatbot::Eliza; my $HOST = \"www.stonehenge.com\"; # where are we? my $PORT = 42001; # at what port my $TIMEOUT = 300; # number of seconds until this doc dies my $d = do { CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 48 of 120
- Slide 52: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM local($^W) = 0; new HTTP::Daemon (LocalAddr => $HOST, LocalPort => $PORT) }; my $unique = join \".\", time, $$, int(rand 1000); my $url_prefix = \"http://$HOST:$PORT\"; my $url = \"$url_prefix/$unique\"; print redirect($url); exit 0 unless defined $d; # do we need to become the server? defined(my $pid = fork) or die \"Cannot fork: $!\"; exit 0 if $pid; # I am the parent close(STDOUT); my %eliza; # Chatbot::Eliza objects, keyed on session my %when; # most recent activity time, keyed on session { CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 49 of 120
- Slide 53: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM alarm($TIMEOUT); # (re-)set the deadman timer my $c = $d->accept; # $c is a connection my $r = $c->get_request; # $r is a request (my $session = $r->url->epath) =~ s{^/}{}; unless ($session =~ /^\\d+\\.\\d+\\.\\d+$/) { $c->send_error(RC_FORBIDDEN, \"I don't think we've made an appointment!\"); close $c; redo; } $c->send_basic_header; $CGI::Q = new CGI $r->content; my $eliza_says = \"How do you do? Please tell me your problem.\"; my $message = param(\"message\") || \"\"; if ($message) { param(\"message\",\"\"); $eliza{$session} ||= new Chatbot::Eliza; CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 50 of 120
- Slide 54: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM $eliza_says = $eliza{$session}->transform($message); $when{$session} = time; } print $c header, start_html(\"The doctor is in!\"), h1(\"The doctor is in!\"), hr, startform(\"POST\", \"$url_prefix/$session\"), $eliza_says && p($eliza_says), p, textfield(-name => \"message\", -size => 60), p, submit(\"What do you say, doc?\"), p(\"Note: the doctor is patient, but waits only $TIMEOUT seconds, so hurry!\" ), endform, hr, end_html; close $c; CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 51 of 120
- Slide 55: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM for (keys %when) { next if $when{$_} > time - $TIMEOUT; delete $eliza{$_}; delete $when{$_}; } redo; } CASE STUDY: stateful transactions via a multi-threaded mini-webserver Page 52 of 120
- Slide 56: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: an anonymous proxy server CASE STUDY: an anonymous proxy server Page 53 of 120
- Slide 57: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You hate cookies, or having a transaction tracked to a particular IP address • Or you wanna process incoming pages to strip ads, or ugly specific-browser formatting (www.cnn.com, ugh!) • Or, you just wanna say “I can write a full-functioning anonymous proxy web server in 90 lines of Perl” CASE STUDY: an anonymous proxy server Page 54 of 120
- Slide 58: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Create a mini-web-server with HTTP::Daemon • Point your browser at it as a proxy server • Have each request turn into an LWP request to be fetched • Alter the request and response headers and content to your heart’s content • If you’re brave, even implement pre-forking (ala Apache) and caching (not illustrated here) CASE STUDY: an anonymous proxy server Page 55 of 120
- Slide 59: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; my $HOST = \"www.stonehenge.com\"; my $PORT = \"4242\"; sub prefix { my $now = localtime; join \"\", map { \"[$now] [${$}] $_\\n\" } split /\\n/, join \"\", @_; } $SIG{__WARN__} = sub { warn prefix @_ }; $SIG{__DIE__} = sub { die prefix @_ }; CASE STUDY: an anonymous proxy server Page 56 of 120
- Slide 60: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM $SIG{CLD} = $SIG{CHLD} = sub { wait; }; my $AGENT; # global user agent (for efficiency) use LWP::UserAgent; $AGENT = LWP::UserAgent->new; $AGENT->agent(\"anon/0.07\"); $AGENT->env_proxy; { ### MAIN ### use HTTP::Daemon; my $master = new HTTP::Daemon LocalAddr => $HOST, LocalPort => $PORT; warn \"set your proxy to <URL:\", $master->url, \">\"; my $slave; &handle_connection($slave) while $slave = $master->accept; exit 0; } ### END MAIN ### CASE STUDY: an anonymous proxy server Page 57 of 120
- Slide 61: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM sub handle_connection { my $connection = shift; # HTTP::Daemon::ClientConn my $pid = fork; if ($pid) { # spawn OK, and I'm the parent close $connection; return; } ## spawn failed, or I'm a good child my $request = $connection->get_request; if (defined($request)) { my $response = &fetch_request($request); $connection->send_response($response); close $connection; } exit 0 if defined $pid; # exit if I'm a good child with a good parent } CASE STUDY: an anonymous proxy server Page 58 of 120
- Slide 62: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM sub fetch_request { my $request = shift; # HTTP::Request use HTTP::Response; my $url = $request->url; warn \"fetching $url\"; if ($url->scheme !~ /^(http|gopher|ftp)$/) { my $res = HTTP::Response->new(403, \"Forbidden\"); $res->content(\"bad scheme: @{[$url->scheme]}\\n\"); $res; } elsif (not $url->rel->netloc) { my $res = HTTP::Response->new(403, \"Forbidden\"); $res->content(\"relative URL not permitted\\n\"); $res; } else { &fetch_validated_request($request); CASE STUDY: an anonymous proxy server Page 59 of 120
- Slide 63: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM } } sub fetch_validated_request { # return HTTP::Response my $request = shift; # HTTP::Request ## uses global $AGENT ## warn \"orig request: <<<\", $request->headers_as_string, \">>>\"; $request->remove_header(qw(User-Agent From Referer Cookie)); ## warn \"anon request: <<<\", $request->headers_as_string, \">>>\"; my $response = $AGENT->simple_request($request); ## warn \"orig response: <<<\", $response->headers_as_string, \">>>\"; $response->remove_header(qw(Set-Cookie)); ## warn \"anon response: <<<\", $response->headers_as_string, \">>>\"; $response; } CASE STUDY: an anonymous proxy server Page 60 of 120
- Slide 64: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: reorganizing data from other servers CASE STUDY: reorganizing data from other servers Page 61 of 120
- Slide 65: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • That darn Dilbert1 archive • If you miss a week’s worth of Dilbert, you’ve got to click all over the place • It’d sure be neat to have a page where I could just say “give me the last week of Dilberts”, and it shows up as a single page that I could just wait for a short while to load 1. Dilbert is clearly a trademark of someone, and copyrighted by them. You knew that, so I’m not repeating it here. CASE STUDY: reorganizing data from other servers Page 62 of 120
- Slide 66: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • One of my most frequently used programs! • Create a CGI that asks for “how many back dates” • With the response, go out to the master archive page, extract all the sub archive pages (with the actual GIF links) and put all those link URLs into one table • Respond with that, which means the browser will ultimately suck down all the GIFs • Downside—if they change the precise format of the pages, you’ll have to change the script to keep up (this happened already with Dilbert) • Other downside—check with a lawyer before running these kind of programs—the source server may consider this a repackaging or derivation of copyrighted stuff... bad news! • Certainly would be bad to claim the data was your own • On the other hand, places like www.metacrawler.com use this technique to consolidate data from many search engines CASE STUDY: reorganizing data from other servers Page 63 of 120
- Slide 67: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use CGI::Carp qw(fatalsToBrowser); use LWP::Simple \"get\"; use URI::URL; use CGI qw/:form :html param header/; use HTML::Entities; ## configure my $TOP = \"http://www.unitedmedia.com/comics/dilbert/archive/\"; my $HTML_RE = '/comics/dilbert/archive/dilbert\\d+.html'; my $GIF_RE = '/comics/dilbert/archive/images/dilbert\\d+\\.gif'; my $KEEP = 99; CASE STUDY: reorganizing data from other servers Page 64 of 120
- Slide 68: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM ## end configure sub td_center { td({ align => \"center\" }, @_); } BEGIN { my $notes = \"\"; sub add_note { $notes .= join \"\", @_; } sub get_notes { $notes; } sub get_ent_notes { encode_entities $notes; } } print header, start_html(\"Dilbert\"), h1(\"Recent Dilberts\"), \"\\n\"; my $max = param(\"max\"); if (defined $max and $max =~ /^\\d+$/) { CASE STUDY: reorganizing data from other servers Page 65 of 120
- Slide 69: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM $max = $KEEP if $max > $KEEP; my $top = get $TOP; my @gif_urls = (); if (not defined $top) { add_note \"cannot get $TOP\"; } else { my @old_urls = map url($_,$TOP)->abs, $top =~ m!($HTML_RE)!og; @old_urls = @old_urls[-$max..-1] if @old_urls > $max; for my $url (@old_urls) { my $content = get $url or (add_note \"cannot get $url\\n\"), next; my ($gif) = $content =~ m!($GIF_RE)!o; push @gif_urls, url($gif,$url)->abs; } } print table( CASE STUDY: reorganizing data from other servers Page 66 of 120
- Slide 70: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM (map { TR(td_center($_)) } map { table(TR(td_center(encode_entities $_)), TR(td_center(img{-src => $_}))) } @gif_urls), p(get_ent_notes()) ); } else { print hr, start_form, p(submit(\"get this many days of back-images:\"), popup_menu(\"max\", [1..45], \"14\")), end_form, hr; } print \"\\n\", end_html; CASE STUDY: reorganizing data from other servers Page 67 of 120
- Slide 71: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: finding out how they got here CASE STUDY: finding out how they got here Page 68 of 120
- Slide 72: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You have an interesting web site • People are coming to your site • A lot of hits are likely to be from search engines • If you maintain a referer (sic) log, you may have noticed that some of the hits are from big search engines, and it looks like some of the search strings are in the URLs • You’re curious about what these search strings are CASE STUDY: finding out how they got here Page 69 of 120
- Slide 73: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Write a program that parses the referer log • Look for the URLs depicting the big search engines • Parse the “GET” string with URI::URL • Pick out the form elements (different for each search engine) • Search engines change over time, so this program will require updating and tuning CASE STUDY: finding out how they got here Page 70 of 120
- Slide 74: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code #!/home/merlyn/bin/perl -Tw use strict; $|++; use URI::URL; my %count = (); while (<>) { my ($ref) = split; ## may require adjustment my $url = url $ref; next unless ($url->scheme || \"\") eq \"http\"; next unless my %form = eval { $url->query_form }; my @search_fields = do { local $_ = lc $url->host; if (0) { () } CASE STUDY: finding out how they got here Page 71 of 120
- Slide 75: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM elsif (/\\baltavista\\b/) { \"q\" } elsif (/\\bnetfind\\.aol\\.com$/) { qw(s search) } elsif (/\\baskjeeves\\.com$/) { \"ask\" } elsif (/\\bdejanews\\.com$/) { () } elsif (/\\bdigiweb\\.com$/) { \"string\" } elsif (/\\bdogpile\\.com$/) { \"q\" } elsif (/\\bexcite\\.com$/) { qw(s search) } elsif (/\\bhotbot\\.com$/) { \"mt\" } elsif (/\\binference\\.com$/) { \"query\" } elsif (/\\binfoseek\\.com$/) { qw(oq qt) } elsif (/\\blooksmart\\.com$/) { \"key\" } elsif (/\\blycos\\b/) { \"query\" } elsif (/\\bmckinley\\.com$/) { \"search\" } elsif (/\\bmetacrawler\\b/) { \"general\" } elsif (/\\bnlsearch\\.com$/) { \"qr\" } elsif (/\\bprodigy\\.net$/) { \"query\" } elsif (/\\bsearch\\.com$/) { qw(oldquery query) } elsif (/\\bsenrigan\\.ascii\\.co\\.jp$/) { \"word\" } CASE STUDY: finding out how they got here Page 72 of 120
- Slide 76: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM elsif (/\\bswitchboard\\.com$/) { \"sp\" } elsif (/\\bwebcrawler\\.com$/) { qw(search searchtext text) } elsif (/\\bedit\\.my\\.yahoo\\.com$/) { () } ## must come before yahoo.com elsif (/\\byahoo\\b/) { \"p\" } else { \"UNKNOWN\" } }; next unless @search_fields; my %wanted = map { $_, 1 } @search_fields; my @show_fields = grep { $wanted{lc $_} } keys %form; if (@show_fields) { for (@show_fields) { $count{$url->host}{$form{$_}}++; } } else { print $url->host, \"\\n\"; for (sort keys %form) { print \"?? $_ => $form{$_}\\n\"; } CASE STUDY: finding out how they got here Page 73 of 120
- Slide 77: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM } } for my $host (sort keys %count) { my $hostinfo = $count{$host}; for my $text (sort keys %$hostinfo) { my $times = $hostinfo->{$text}; print \"$host: $text\"; print \" ($times times)\" if $times > 1; print \"\\n\"; } } CASE STUDY: finding out how they got here Page 74 of 120
- Slide 78: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: finding out where they went from here CASE STUDY: finding out where they went from here Page 75 of 120
- Slide 79: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The problem • You’ve created a website with a lot of interesting links to other places • With the server logs, you can see how people bounce around within your site • But you can’t tell where they go, so you can’t tell whether the links you’re providing are useful CASE STUDY: finding out where they went from here Page 76 of 120
- Slide 80: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The solution • Change all outbound links so that they invoke a CGI script • The parameter of the real URL gets passed as the PATH_INFO value <A HREF=\"/cgi/go/http://www.stonehenge.com/perltraining/\">Learn!</A> • The CGI script records the referer (your page) and the destination into a special log file • Then it redirects the browser to the actual location • Downside—this adds one CGI hit per outbound link • Downside—also means you must rewrite your outbound links, but you can do this programatically CASE STUDY: finding out where they went from here Page 77 of 120
- Slide 81: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code (part one: handle the redirections) #!/home/merlyn/bin/perl -Tw use strict; $|++; my $GO_LOG = \"/home/merlyn/Web/golog\"; my $result = eval { die unless defined (my $res = $ENV{PATH_INFO}); die unless $res =~ s/^\\///; my $query = $ENV{QUERY_STRING}; if (defined $query and length $query) { $res .= \"?$query\"; } $res; }; CASE STUDY: finding out where they went from here Page 78 of 120
- Slide 82: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM if ($@) { print \"Status: 404 Not Found\\n\\n\"; exit 0; } print \"Location: $result\\n\\n\"; my $pid = fork; $pid = 0 unless defined $pid; # be the kid if fork failed exit 0 if $pid; ## child... close(STDOUT); open GOLOG, \">>$GO_LOG\" or die \"Cannot open $GO_LOG: $!\"; flock(GOLOG,2); # wait for exclusive seek GOLOG, 0, 2; # seek to end, refresh buffers print GOLOG join(\"\\t\", scalar localtime, $result, ($ENV{HTTP_REFERER} || \"[unknown]\")), \"\\n\"; close GOLOG; CASE STUDY: finding out where they went from here Page 79 of 120
- Slide 83: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM The code (part two: rewriting existing pages) #!/home/merlyn/bin/perl -w use strict; $|++; use File::Find; unless (@ARGV) { find sub { push @ARGV, $File::Find::name if /\\.html/; }, \"/home/merlyn/Html/\"; } undef $/; $^I = \"~\"; while (<>) { s{(href=\"(.*?)\")}{ CASE STUDY: finding out where they went from here Page 80 of 120
- Slide 84: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM my ($old,$url,$new) = ($1,$2); if ($url =~ /^http:(?!.*cgi\\/go)/) { $new = qq{href=\"/cgi/go/$url\"}; print STDOUT \"$ARGV: changing $old to $new\\n\"; } else { $new = $old; } $new; }egi; print if defined $^I; } CASE STUDY: finding out where they went from here Page 81 of 120
- Slide 85: STONEHENGE CONSULTING SERVICES, Inc. 0333 SW Flower St, Portland, Oregon 97201 TM CASE STUDY: extracting a portion of website to a tar file CASE STUDY: extracting a portion of website to a tar file Page 82 of 120
- Slide 86: STONEHEN

