The LITA Forum & 
library data in 
Python
Library and 
Information 
Technology 
Association (LITA)
Nov 5-8 
LITA Forum 
Albuquerque
Learn Python by Playing 
with Library Data 
By Francis Kayiwa 
& Eric Phetteplace
Github
BitBucket
Main class 
https://bitbucket.org/ 
fkayiwa/litaconf/overview
PyMARC scripts 
By Eric 
Phetteplace 
https://github.com/phette23/pymarc-ebooks- 
scripts
• count-tag.py find out many records have a particular tag 
• dual856.py find all your records with multiple 856 (electronic location) tags 
• ebooks-to-csv.py save all your ebook (defined as anything with an 856 $u) titles to 
a CSV file 
• gmd-counter.py count number of occurrences of different General Material 
Designations (245 $h) in a collection of records. Example JSON output included. 
• pymarc-notes.md some very minimal notes on using pymarc, mostly links to 
documentation 
• python-on-windows.md notes on getting set up on a Windows machine 
• proxy-ebooks.py the main script I wrote, others were basically tests leading up to 
this. We were implementing a proxy server and this cleaned up our 856 fields while 
proxying appropriate vendor URLs. 
• search-gmd.py find titles of records with a certain GMD 
• subfield-counter.py count subfields used in all records? I actually don't know, this is 
horrible code, Eric. 
• web-links.py output stats on 856 fields in records 
• webfeet.py find records with "[selected by Web Feet]" in the title since at some 
point we imported one of these misguided attempts to catalog "the good parts" of 
the Internet 
• write856s.py write records with multiple 856 fields out to a separate MARC file
MARCkbart 
https://github.com/lpmagnuson
EZProxy 
Analysis 
https://github.com/robincamille/ezproxy-analysis
Analyzes EZproxy-generated log files and spits out a CSV with this info: 
• Filename of log being analyzed 
• # total connections 
• # on-campus connections (as determined by IP addresses starting with 
"10." -- may be different for your campus) 
• % on-campus connections of total 
• # off-campus connections 
• % off-campus connections of total 
• # library connections (as determined by IP addresses starting with 
"10.11" and "10.12" -- will almost certainly be different for your campus) 
• % library of on-campus connections 
• % library of total connections 
• # student sessions off-campus 
• % student sessions of total off-campus 
• # fac/staff sessions off-campus 
• % fac/staff sessions of total off-campus
Beautiful Soup
Real world
Real world
TIPS: 
Don’t use python 3
Albequerque is 
lovely and small

Code4 lib 20141129 python

  • 1.
    The LITA Forum& library data in Python
  • 2.
    Library and Information Technology Association (LITA)
  • 3.
    Nov 5-8 LITAForum Albuquerque
  • 4.
    Learn Python byPlaying with Library Data By Francis Kayiwa & Eric Phetteplace
  • 5.
  • 6.
  • 7.
    Main class https://bitbucket.org/ fkayiwa/litaconf/overview
  • 8.
    PyMARC scripts ByEric Phetteplace https://github.com/phette23/pymarc-ebooks- scripts
  • 9.
    • count-tag.py findout many records have a particular tag • dual856.py find all your records with multiple 856 (electronic location) tags • ebooks-to-csv.py save all your ebook (defined as anything with an 856 $u) titles to a CSV file • gmd-counter.py count number of occurrences of different General Material Designations (245 $h) in a collection of records. Example JSON output included. • pymarc-notes.md some very minimal notes on using pymarc, mostly links to documentation • python-on-windows.md notes on getting set up on a Windows machine • proxy-ebooks.py the main script I wrote, others were basically tests leading up to this. We were implementing a proxy server and this cleaned up our 856 fields while proxying appropriate vendor URLs. • search-gmd.py find titles of records with a certain GMD • subfield-counter.py count subfields used in all records? I actually don't know, this is horrible code, Eric. • web-links.py output stats on 856 fields in records • webfeet.py find records with "[selected by Web Feet]" in the title since at some point we imported one of these misguided attempts to catalog "the good parts" of the Internet • write856s.py write records with multiple 856 fields out to a separate MARC file
  • 10.
  • 11.
  • 12.
    Analyzes EZproxy-generated logfiles and spits out a CSV with this info: • Filename of log being analyzed • # total connections • # on-campus connections (as determined by IP addresses starting with "10." -- may be different for your campus) • % on-campus connections of total • # off-campus connections • % off-campus connections of total • # library connections (as determined by IP addresses starting with "10.11" and "10.12" -- will almost certainly be different for your campus) • % library of on-campus connections • % library of total connections • # student sessions off-campus • % student sessions of total off-campus • # fac/staff sessions off-campus • % fac/staff sessions of total off-campus
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.

Editor's Notes

  • #2 TREV
  • #3 TREV ALA division looking at IT
  • #4 TREV
  • #5 Intro to Python with some Library data Follow up to Alex’s class in the spring New and very active programing language. Very readable Learned how to write and use it at a basic level How to share code with Bitbucket and GitHub
  • #6 How many people already use this? Place where you can store and share your code
  • #7 “Bitbucket is where the action is for Government and education as we can have an unlimited number of repositories to use.” If you are in the US and have and EDU account, you automatically get unlimited repositories. I had to email them and they gave me the US EDU abilities
  • #8  Nice intro to Python Real winning part was learning what else is being done in the library community
  • #9 How to edit bulk ebooks for Polaris
  • #11 MARC to KBART for OCLC uploads Also has scripts for Dspace batch injestions Blog post is very descriptive
  • #13 Did a dry run of the presentation with George Saw this and realized it would answer a lot of the questions he had recently And we share an office!
  • #17 GEORGE