Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Anonymization of Voyager
Circulation Transaction Records
NISO Patron Privacy Virtual Forum #2
May 21, 2015
Richard Entlich...
Background
 Cornell has used Ex Libris Voyager as its
LMS since 2000
 Cornell has maintained all historical
circulation ...
Voyager privacy options for
historical circulation transactions
 Retain full user data via PATRON_ID
 Assign demographic...
Cornell policy
 Sever link to user upon return of item
 Limited use of demographic codes (e.g.,
for Borrow Direct transa...
The need for better user data
 Challenges faced in 2009
◦ Fiscal crisis of 2008
◦ Overcrowded stacks in some unit librari...
Snapshots of currently circulating
items
 Pull circulation transactions from Voyager
 Combine with demographic data from...
User IDs: Balancing Confidentiality
vs. Analytical Value
 Discard all unique identifiers
◦ Safest, but limits analysis op...
Anonymization Technique and
Example
 Use Cryptographic one-way hash (e.g.
MD5 or SHA-1)
 Characteristics
◦ irreversible
...
Creating bibliographic and
demographic surrogates
 What level of granularity?
 If detailed bibliographic data,
demograph...
Example: Emphasizing
bibliographic detail
 Specific bibliographic data points
◦ Title
◦ Author
◦ Full call number
◦ Detai...
Example: Emphasizing
demographic detail
 Specific demographic data points
◦ Department
◦ Undergraduate major
◦ Graduate f...
Recommended reading
Nicholson, Scott, and Catherine Arnott
Smith. “Using Lessons from Health Care to
Protect the Privacy o...
Upcoming SlideShare
Loading in …5
×

NISO Patron Privacy VM#2-Richard Entlich anonymization of Voyager circulation transaction records

3,226 views

Published on

May 21, 2013
NISO Patron Privacy in Digital Library and Information Systems
http://www.niso.org/topics/tl/patron_privacy/

Published in: Education
  • Be the first to comment

  • Be the first to like this

NISO Patron Privacy VM#2-Richard Entlich anonymization of Voyager circulation transaction records

  1. 1. Anonymization of Voyager Circulation Transaction Records NISO Patron Privacy Virtual Forum #2 May 21, 2015 Richard Entlich Collection Analyst Librarian Cornell University Library
  2. 2. Background  Cornell has used Ex Libris Voyager as its LMS since 2000  Cornell has maintained all historical circulation transaction data since inception  Voyager maintains transaction level circulation data with different privacy options, however ◦ For currently circulating items, there is access to full bibliographic and user data
  3. 3. Voyager privacy options for historical circulation transactions  Retain full user data via PATRON_ID  Assign demographic category codes to patrons to allow linking to selected user characteristics via PATRON_STAT_ID (e.g., college, department, graduate field)  Break link to user, retaining only a “patron group” identifier via PATRON_GROUP_ID (e.g., faculty, undergrad, grad, staff, ILL)
  4. 4. Cornell policy  Sever link to user upon return of item  Limited use of demographic codes (e.g., for Borrow Direct transactions, to store the name of the borrowing institution within Voyager)
  5. 5. The need for better user data  Challenges faced in 2009 ◦ Fiscal crisis of 2008 ◦ Overcrowded stacks in some unit libraries ◦ Pressure from departments to free up library space to accommodate need for office space for faculty/staff  Needed user-level circulation data to help ◦ Make library closing/consolidation decisions with smallest impact on users ◦ Identify least disruptive destinations for collections being moved
  6. 6. Snapshots of currently circulating items  Pull circulation transactions from Voyager  Combine with demographic data from “patron feed”  Process and maintain outside of Voyager  Decide which bibliographic and demographic data to keep, and at what level of granularity  Anonymize User IDs
  7. 7. User IDs: Balancing Confidentiality vs. Analytical Value  Discard all unique identifiers ◦ Safest, but limits analysis options  Retain unique identifiers ◦ Maximizes analysis potential; unacceptably intrusive  The middle ground—anonymize unique identifiers ◦ Balance risk and benefit ◦ Support analysis of individual borrower behavior without revealing identity  e.g., We notice that Romance Studies faculty are borrowing a lot of physics books. Is it one borrower, or an important new trend?
  8. 8. Anonymization Technique and Example  Use Cryptographic one-way hash (e.g. MD5 or SHA-1)  Characteristics ◦ irreversible ◦ unique input  unique output ◦ minor change to input  major change to output  Original userID: 12345  ID after random transformation (discourage “forward engineering”): 123&zQ?45  ID after transformation and encryption: 94D51D75B7AFBCD0F85D1844F06BE73
  9. 9. Creating bibliographic and demographic surrogates  What level of granularity?  If detailed bibliographic data, demographic data should be very broad  If detailed demographic data, bibliographic data should be very broad  Could do both ways, as long as there are no data points in common  To be useful, best practices should address this issue with considerable specificity
  10. 10. Example: Emphasizing bibliographic detail  Specific bibliographic data points ◦ Title ◦ Author ◦ Full call number ◦ Detailed subject classification  Corresponding demographic data points ◦ Broad status classification (faculty, student, etc.) ◦ Broad disciplinary classification (STEM, Humanities, Social Sciences) ◦ College (e.g., Engineering, Veterinary Medicine, Arts&Sciences) depending on
  11. 11. Example: Emphasizing demographic detail  Specific demographic data points ◦ Department ◦ Undergraduate major ◦ Graduate field of study ◦ Job Title  Corresponding bibliographic data points ◦ Broad subject classification (e.g., LC subclass only) ◦ Language of publication, or language family ◦ Decade of publication
  12. 12. Recommended reading Nicholson, Scott, and Catherine Arnott Smith. “Using Lessons from Health Care to Protect the Privacy of Library Users: Guidelines for the de-Identification of Library Data Based on HIPAA.” Journal of the American Society for Information Science and Technology v. 58, no. 8 (2007) pp. 1198–1206. http://bibliomining.com/nicholson/nicholsonp dfs/hipaalibraryfinal.pdf

×