A Strand of Perls: Some Home Grown Utilities

629 views

Published on

Detailing some locally written Perl utilities
Presented at EUGM 2003

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
629
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Strand of Perls: Some Home Grown Utilities

  1. 1. A Strand of Perls: Some Home Grown Utilities
  2. 2. Our New Books List Call Number Sorting Getting Operator Profiles QPID – Quick Patron Information Dump (cupid…) Syllabus
  3. 3. Our New Books List Call Number Sorting Getting Operator Profiles QPID – Quick Patron Information Dump (cupid…)
  4. 4. Why present another new books list? Different strokes for different folks… <ul><li>Given: </li></ul><ul><li>professors don’t care about call numbers; they just want to go to their area and see what’s new – information sorted by department </li></ul>
  5. 5. Why present another new books list? Different strokes for different folks… <ul><li>Given: </li></ul><ul><li>professors don’t care about call numbers; they just want to go to their area and see what’s new – information sorted by department </li></ul><ul><li>information needed on a monthly basis </li></ul>
  6. 6. Why present another new books list? Different strokes for different folks… <ul><li>Given: </li></ul><ul><li>professors don’t care about call numbers; they just want to go to their area and see what’s new – information sorted by department </li></ul><ul><li>information needed on a monthly basis </li></ul><ul><li>can go back through data for several previous months </li></ul>
  7. 7. Why present another new books list? Different strokes for different folks… <ul><li>Given: </li></ul><ul><li>professors don’t care about call numbers; they just want to go to their area and see what’s new – information sorted by department </li></ul><ul><li>information needed on a monthly basis </li></ul><ul><li>can go back through data for several previous months </li></ul><ul><li>here’s an overview… </li></ul>
  8. 8. New Books List Process at WMU get last month’s acquisitions
  9. 9. New Books List Process at WMU get last month’s acquisitions break up by department
  10. 10. New Books List Process at WMU get last month’s acquisitions Department X Department C Department B Department A break up by department
  11. 11. New Books List Process at WMU get last month’s acquisitions Department X Department C Department B Department A break up by department one text output file
  12. 12. New Books List Process at WMU ftp to Batch PC get last month’s acquisitions Department X Department C Department B Department A break up by department one text output file
  13. 13. New Books List Process at WMU put on library LAN for the Web Office ftp to Batch PC get last month’s acquisitions Department X Department C Department B Department A break up by department one text output file
  14. 14. New Books List Process at WMU Our production-type jobs get the database password from a file, for easy maintenance. Then use DBI to set up access to the database.
  15. 15. New Books List Process at WMU The query (sprintf wrapper removed for clarity)
  16. 16. New Books List Process at WMU Get data from the query in a loop and put in an array
  17. 17. New Books List Process at WMU Get rid of headphones!
  18. 18. New Books List Process at WMU Create sort vector and put in array
  19. 19. New Books List Process at WMU Got the deduping code from one of the O’Reilly Perl books. Data will implicitly be in call number order due to sort vector structure. … line noise…?
  20. 20. Digression… Speaking of line noise… Broken up for clarity This puts the line count of a MARC file into a shell script.
  21. 21. New Books List Process at WMU Now we need to get the results classified by department, going by call number ranges. raw ranges file…
  22. 22. New Books List Process at WMU The call number range specifications are normalized in the same manner used for sorting.
  23. 23. New Books List Process at WMU The call number range specifications are normalized in the same manner used for sorting. Great for the computer, not so easy for us humans. Created a utility to make a human-readable version. formatted ranges file…
  24. 24. New Books List Process at WMU The range data is read into arrays. (If a syntactic error was found, the program stops and shows where it is.)
  25. 25. New Books List Process at WMU The range data is read into arrays. (If a syntactic error was found, the program stops and shows where it is.) Then loop for each department. If the current call number falls within the current range, it goes into the current department file.
  26. 26. New Books List Process at WMU The output files are sorted. For our final processing, we loop through each of these sorted files of raw data. We ignore the call number chunks created during the normalization process. The desired fields are concatenated and line-wrapped. field1 | field2 | etc. how this was done…
  27. 27. New Books List Process at WMU As we loop through the contents of each departmental file: We split up the sort vector, and store the output fields with a vertical bar in between.
  28. 28. New Books List Process at WMU Some additional processing is done, including the always visually entertaining regular expression manipulations.
  29. 29. New Books List Process at WMU The output is line-wrapped prior to writing to the file.
  30. 30. New Books List Process at WMU The output is line-wrapped prior to writing to the file. You’ll need some initial setup for the above wrap to work.
  31. 31. New Books List Process at WMU Now we have our output file… When we first implemented our list, this was the whole process. The file was handed off to the library, where staff separated the departmental data out of the file, added the HTML, and put it on our web site. It took several hours to do this!
  32. 32. New Books List Process at WMU Once I knew this, I looked into further automation. Now we have an additional Perl script that takes care of the rest of the story. I looked at the new books’ web pages the library had created and figured out that I could break out three sections of static html.
  33. 33. New Books List Process at WMU We read the previous output file, paying attention to which department we’re “in”.
  34. 34. New Books List Process at WMU We read the previous output file, paying attention to which department we’re “in”. Next, we create a separate .html file for each department, incorporating the static HTML sections, adding date information where necessary.
  35. 35. New Books List Process at WMU We read the previous output file, paying attention to which department we’re “in”. Next, we create a separate .html file for each department, incorporating the static HTML sections, adding date information where necessary. Finally, these files are put on the library LAN and a reminder email is sent out.
  36. 36. New Books List Process at WMU put on library LAN for the Web Office ftp to Batch PC get last month’s acquisitions Department X Department C Department B Department A static HTML static HTML static HTML static HTML break up by department separate HTML file for each department, ready to be incorporated into the library web pages
  37. 37. New Books List Process at WMU See the results at: http://www.wmich.edu/library/newbooks/index.html
  38. 38. Our New Books List Call Number Sorting Getting Operator Profiles QPID – Quick Patron Information Dump (cupid…)
  39. 39. Call Number Sorting Seems right to call it sorting, but it’s really in the normalization process that the “magic” occurs.
  40. 40. Call Number Sorting Seems right to call it sorting, but it’s really in the normalization process that the “magic” occurs. Uses intelligent parsing, not a quick regular expression implementation.
  41. 41. Call Number Sorting Seems right to call it sorting, but it’s really in the normalization process that the “magic” occurs. Uses intelligent parsing, not a quick regular expression implementation. Designed with LC call numbers in mind, but pretty much handles everything, including locally generated call numbers.
  42. 42. Call Number Sorting Seems right to call it sorting, but it’s really in the normalization process that the “magic” occurs. Uses intelligent parsing, not a quick regular expression implementation. Designed with LC call numbers in mind, but pretty much handles everything, including locally generated call numbers. Resulting sorts appear to be about 99% accurate (my estimate).
  43. 43. Call Number Sorting Seems right to call it sorting, but it’s really in the normalization process that the “magic” occurs. Uses intelligent parsing, not a quick regular expression implementation. Designed with LC call numbers in mind, but pretty much handles everything, including locally generated call numbers. Resulting sorts appear to be about 99% accurate (my estimate). The algorithm divides call numbers into chunks, based on separators.
  44. 44. Call Number Sorting Explicit separators: colon (:)
  45. 45. Call Number Sorting Explicit separators: colon (:) semicolon (;)
  46. 46. Call Number Sorting Explicit separators: colon (:) semicolon (;) comma (,)
  47. 47. Call Number Sorting Explicit separators: colon (:) semicolon (;) comma (,) period (.)
  48. 48. Call Number Sorting Explicit separators: colon (:) semicolon (;) comma (,) period (.) space ( )
  49. 49. Call Number Sorting Explicit separators: colon (:) semicolon (;) comma (,) period (.) space ( ) forward slash (/)
  50. 50. Call Number Sorting Explicit separators: colon (:) semicolon (;) comma (,) period (.) space ( ) forward slash (/) Implicit separators: transitions: alpha->numeric numeric->alpha
  51. 51. Call Number Sorting Explicit separators: colon (:) semicolon (;) comma (,) period (.) space ( ) forward slash (/) Implicit separators: transitions: alpha->numeric numeric->alpha During parsing, separators are absorbed, but the period may be uniquely retained.
  52. 52. Call Number Sorting Further processing and normalization include: Whole numbers are treated differently from decimal numbers.
  53. 53. Call Number Sorting Further processing and normalization include: Whole numbers are treated differently from decimal numbers. Decimal numbers may affect as many as several following chunks.
  54. 54. Call Number Sorting Further processing and normalization include: Whole numbers are treated differently from decimal numbers. Decimal numbers may affect as many as several following chunks. Look-ahead and look-back for one or more chunks is also employed.
  55. 55. Call Number Sorting demo… democall.pl democall.lst
  56. 56. Call Number Sorting This code is available at: http://homepages.wmich.edu/~zimmer
  57. 57. Our New Books List Call Number Sorting Getting Operator Profiles QPID – Quick Patron Information Dump (cupid…)
  58. 58. The next two programs are designed to run on PCs, not on a Voyager box. In order to run them, you will need: Perl, (I use ActiveState) DBI and DBD for Oracle (get from ActiveState) Oracle Client software (from Oracle) preface
  59. 59. Get ActiveState Perl at: http://www.activestate.com/Products/Download/Register.plex?id=ActivePerl This puts you at the registration (optional) screen for the download. At the next page, you’ll probably want to select the “MSI” installation for Windows. Get version 5.6.1. preface
  60. 60. How to get DBI and DBD: Once ActivePerl is installed, open a command prompt window (DOS prompt) Run PPM Once in PPM, install DBI and DBD Exit PPM preface
  61. 61. Oracle Client software Required! DBI and DBD rely on this. Check to make sure that the Oracle licensing arrangement at your site allows you to install the client software, if you do not already have a suitably equipped PC available. I used 8.1.6. The stated combination of versions is the only one I got to work. This is on machines running Windows 2000. preface
  62. 62. Getting Operator Profiles Demo getprof…
  63. 63. Getting Operator Profiles Program outline: Setup and initialization Look for each possible profile; when found: get count of affected locations get profile info format Y/N boolean values output info to file in HTML format Invoke browser to display profile data
  64. 64. Getting Operator Profiles Query example using the master profile
  65. 65. Getting Operator Profiles Do some data massaging, then use tables in HTML for formatting.
  66. 66. Our New Books List Call Number Sorting Getting Operator Profiles QPID – Quick Patron Information Dump (cupid…)
  67. 67. Patron Information Dump Demo patdump…
  68. 68. Patron Information Dump Program outline: Setup and initialization. Run a series of queries to get all the patron data. Take resulting data from each query and format in html, creating a file. Invoke browser to display patron data.
  69. 69. Both the operator profile and patron dump programs allow for choice of browser, via an .ini file, so that different users can use different browsers.
  70. 70. Sample .ini file
  71. 71. Some users like to explore… Slow down or stop them by associating an icon with the .bat file and hiding the other files. .bat normal file your choice of browser .ini hidden, read-only .pl hidden, read-only .html normal file
  72. 72. The code for these two programs is available at: http://homepages.wmich.edu/~zimmer
  73. 73. Resources http://www.tek-tips.com/gfaqs.cfm/pid/219/fid/1711 for general installation issues on a PC http://metalink.oracle.com login, then search for 131299.1 for Pentium IV problems with the client software install http://www.activestate.com/Products/Download/Register.plex?id=ActivePerl get your ActivePerl here http://www.wmich.edu/library/newbooks/index.html Western Michigan University new books list http://homepages.wmich.edu/~ zimmer some of the code from this presentation
  74. 74. Questions? Email: zimmer@wmich.edu Phone: 269.387.3885 Thanks for listening.

×