Data Warehousing and Mining Data from Library and UniversitySystems for Assessment of Library           Operations   Under...
Outline• What is Data Mining and Data  Warehousing and Why Do We Do It?• Our Library and University• Patron Statistical Ca...
What is Data Mining and Data        Warehousing• Extracting data from legacy systems and other  resources;• cleaning, scru...
• The primary purpose of these efforts is  to provide easy access to specifically  prepared data that can be used with  de...
Our University•   9000 undergraduates•   1000 graduates (mostly education majors)•   400 faculty•   800 adjuncts•   1000 s...
Our Library•   19 librarians and 26 library staff•   350,000 volumes•   18,000 audiovisual items•   47,000 print and elect...
Our Transactions•   600,000 Database Searches•   413,000 Gate Counts•   40,000 Library Materials Circulation•   34,000 Equ...
Our Systems•   Voyager ILS•   Clio ILL Software•   EZProxy Server•   Banner – University ERP•   University Networked Drive...
Vendor Services• Serials Solutions    • A to Z list    • MARC Record Service    • Link Resolver• OCLC – Bibliographic Util...
Voyager Overdue and Fine     Notices - Daily                           10
Quarterly Extract for Serials  Solutions AtoZ Service                                11
What would we like to see?• Breakdowns by department and majors.• Combined usage by department/majors  of more than one li...
Patron Statistical Categories• Voyager Patron Database allows a maximum of 10  statistical categories per patron record.• ...
Groups and Services• Major                              •   Circulation• Status                                   – Books ...
From Students•College and Mercer Identifier•Class Level (Freshman, Sophomore, Junior, Senior, Graduate)•Total Hours Regist...
From Faculty / Staff / Adjuncts•College•Full or Part-Time•Status (Faculty, Adjunct, Staff, Professional Staff, Tenured,Ten...
History Department - 12 months -                                                                              Feb. 2008   ...
Communications Majors FY08/09                                                                          CommunicationsStati...
Challenges with combining data from various services• Little to no linkage of data• Multiple user IDs for authentication  ...
Application Server• A machine or its software that works in  conjunction with a web server to deliver  application service...
Why an Application Server?• Relevant data in logfiles need to be in  a database to be analyze.• Need your own DBMS to crea...
Authentication of ILL and other forms are routed through the EZProxy server                                            22
Daily and Weekly Email   Reports from the Application              ServerCirc Fines Audit Daily Report - Daily at 6:05 AM....
Monthly Email Reports from      the Application ServerCirc Fines Audit - Monthly at 6:10 AM.Circulation by Location and It...
25
On Demand Reports                    26
Lending Services ReportsLists of patrons with fines between $10 and $19.99•   Student and Alumni fines list - Sorted by ei...
Lending Services Reports, cont.Lists of VALE patrons with overdues older than 6 months• VALE patron overdues list - Sorted...
One of Our Projects• Mining EZProxy logfiles and linking to patron  statistical categories from the Voyager Patron  Databa...
EZProxy via LDAP authenticates          access to:DatabasesElectronic journalsILL/Doc Delivery forms
Example of EZProxy log entry•   Ip address     nj.dhcp.embarqhsd.net•   (Not used)     -•   user id        theuser•   date...
Patron Privacy and Standards                           32
Using Voyager as the model     for Patron Privacy                             33
• Active Circ transactions are stored in a  table with patron ID and linked to  statistical categories.• Completed Circ tr...
MySQL operations• Extract patron statistical categories out of  Voyager and build them into the MySQL  database.• EZProxy ...
Slide removed for Privacy Reasons
Slide removed for Privacy Reasons                                    38
ILL request form authentications by majorArticle                              BookCount Major                          Cou...
49
Reporting and Standards• Reporting     –   Emailed periodically - e.g., daily         dossiers, and other event triggered ...
51
52
53
Further Reading•Coombs, Karen A. (2005). Lessons learned from analyzinglibrary database usage data. Library Hi Tech, 23:4,...
Questions?             Ray Schwartz,      Systems Specialist LibrarianCheng Library, William Paterson University,        W...
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
Upcoming SlideShare
Loading in...5
×

Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations

877

Published on

Presentation for course 'Understanding Library Systems and Software Applications' School of Communication and Information, Rutgers University,
New Brunswick, New Jersey, Thursday, October 25, 2012

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
877
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations

  1. 1. Data Warehousing and Mining Data from Library and UniversitySystems for Assessment of Library Operations Understanding Library Systems and Software Applications School of Communication and Information, Rutgers University, New Brunswick, New Jersey, Thursday, October 25, 2012 Ray Schwartz, Systems Specialist Librarian Cheng Library, William Paterson University, Wayne, New Jersey, USA schwartzr2 @ wpunj.edu
  2. 2. Outline• What is Data Mining and Data Warehousing and Why Do We Do It?• Our Library and University• Patron Statistical Categories• Application Server• Reporting 2
  3. 3. What is Data Mining and Data Warehousing• Extracting data from legacy systems and other resources;• cleaning, scrubbing and preparing data for decision support;• maintaining data in appropriate data stores;• accessing and analysing data using a variety of end user tools;• and mining data for significant relationships. • Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall. 3
  4. 4. • The primary purpose of these efforts is to provide easy access to specifically prepared data that can be used with decision support applications such as management reports, queries, decision support systems, executive information systems and data mining.• Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall. 4
  5. 5. Our University• 9000 undergraduates• 1000 graduates (mostly education majors)• 400 faculty• 800 adjuncts• 1000 staff 5
  6. 6. Our Library• 19 librarians and 26 library staff• 350,000 volumes• 18,000 audiovisual items• 47,000 print and electronic periodicals• 124 general and subject specific databases• $1,100,000 Non-Salary Allocations 6
  7. 7. Our Transactions• 600,000 Database Searches• 413,000 Gate Counts• 40,000 Library Materials Circulation• 34,000 Equipment Circulation• 19,000 Reference Queries• 3,000 Interlibrary Loans• 5,000 Documents Delivered 7
  8. 8. Our Systems• Voyager ILS• Clio ILL Software• EZProxy Server• Banner – University ERP• University Networked Drive K:• University Email Server• University Web Server 8
  9. 9. Vendor Services• Serials Solutions • A to Z list • MARC Record Service • Link Resolver• OCLC – Bibliographic Utility • Worldcat Collection Analysis• Coutts (was Blackwell) – Book Jobber• Ebsco – Subscription Agent• Marcive – Authority Control• Database Vendors 9
  10. 10. Voyager Overdue and Fine Notices - Daily 10
  11. 11. Quarterly Extract for Serials Solutions AtoZ Service 11
  12. 12. What would we like to see?• Breakdowns by department and majors.• Combined usage by department/majors of more than one library service.• Which categories of patrons are accessing which services? 12
  13. 13. Patron Statistical Categories• Voyager Patron Database allows a maximum of 10 statistical categories per patron record.• Worked with our University Information Systems Department to extract the relevant data from the relevant sources.• Weekly extract from SIS and HRS to load into Voyager 13
  14. 14. Groups and Services• Major • Circulation• Status – Books – Media – Undergrad or Grad – Reserve – Faculty, Adjunct Faculty or – By Fund Code Staff – Location• Department • ILL / Document Delivery• College • Databases• Degree • Library Web Pages• No. of Credits – Subject Area Resource Guides – Reference Requests• Year of Study • Catalog• Campus Location • Other Vendor Services – Serials Solutions 14
  15. 15. From Students•College and Mercer Identifier•Class Level (Freshman, Sophomore, Junior, Senior, Graduate)•Total Hours Registered for Current Semester•Major•2nd Major•Degree•CA-Collection Agency•SOILS•Student Entrance Level (New Non-Traditional Freshman, NewFirst Time Transfer, etc.)•Department
  16. 16. From Faculty / Staff / Adjuncts•College•Full or Part-Time•Status (Faculty, Adjunct, Staff, Professional Staff, Tenured,Tenure-Track)•Division•Departments
  17. 17. History Department - 12 months - Feb. 2008 % BORROW CIRC/ CIRC/ PATRON STATUS BOOK CIRC MEDIA CIRC EQUIP CIRC TOTAL CIRC MEMBERS BORROWERS ING MEMBER BORROWERUNDERGRADUATESTUDENTS 2,715 250 698 3,663 238 186 78% 15.39 19.69GRADUATESTUDENTS 419 13 76 508 14 13 93% 36.29 39.08ADJUNCT FACULTY 100 65 20 185 32 20 63% 5.78 9.25FULL-TIME FACULTY 159 115 194 468 24 23 96% 19.50 20.35HISTORY TOTALS 3,393 443 988 4,824 308 242 79% 15.66 19.93LIBRARY TOTALS 23,370 8,713 20,703 52,756 7,418 4,981 67% 7.11 10.59DEFINITIONS:BOOK CIRCULATION = books, book disks, maps, oversize, Curriculum materials, reserve books, NJ History, Leisure LoungeMEDIA CIRCULATION = audio & video materials, including media reservesEQUIPMENT CIRCULATION = camcorders, overhead & data projectors, laptops, easels, DVD players, etc.MEMBER = declared major or department memberBORROWER = any member who borrowed materialsLibrary Total = declared undergrad & grad majors, adjuncts & full time faculty borrowers 17
  18. 18. Communications Majors FY08/09 CommunicationsStatistical Categories // Item Type / Location / Call No Type / Call No Majors Freshman Sophomore Junior SeniorM- DVD / Media Services / Other / DVD 194 17 31 52 94M- VideoCass / Media Services / Other / VC 228 11 40 67 110T- Book / 2nd Floor - Circulating / Library of Congress / B 34 9 8 11 6T- Book / 2nd Floor - Circulating / Library of Congress / BD 3 1 2T- Book / 2nd Floor - Circulating / Library of Congress / BF 30 5 5 12 8...2nd Floor Circulating 1531 222 310 403 596T- Juvenile / CMC / 125 14 26 20 35T- NJDoc / Askew Documents Room / Other / 1 1New Jersey History 10 0 2 7 1T- ReserveBk / Reserves Desk / 189 13 46 68 62T- SpecColl / Special Collection / Library of Congress / LC 3 3T- Book-McNaughton / Leisure Lounge / Library of Congress / F 2 1 1T- Book-McNaughton / Leisure Lounge / Library of Congress / HF 1 1T- Book-McNaughton / Leisure Lounge / Library of Congress / HS 2 2T- Book-McNaughton / Leisure Lounge / Library of Congress / HV 5 1 2 2T- Book-McNaughton / Leisure Lounge / Library of Congress / ML 1 1T- Book-McNaughton / Leisure Lounge / Library of Congress / PN 3 3T- Book-McNaughton / Leisure Lounge / Library of Congress / PS 29 4 10 15T- Book-McNaughton / Leisure Lounge / Library of Congress / RC 2 1 1T- Book-McNaughton / Leisure Lounge / Library of Congress / TL 1 1Leisure Lounge 49 9 1 19 20 18
  19. 19. Challenges with combining data from various services• Little to no linkage of data• Multiple user IDs for authentication 19
  20. 20. Application Server• A machine or its software that works in conjunction with a web server to deliver application services such as the dynamic creation of a webpage from content stored in a database. From http://www.webtools.ca.gov/help/Glossary.asp• Web Server Software (Apache or IIS)• Database Management System – DBMS (MySQL, Oracle, MS SQL Server)• Scripting Language (Perl, PHP, ColdFusion, ASP) 20
  21. 21. Why an Application Server?• Relevant data in logfiles need to be in a database to be analyze.• Need your own DBMS to create new tables and queries. 21
  22. 22. Authentication of ILL and other forms are routed through the EZProxy server 22
  23. 23. Daily and Weekly Email Reports from the Application ServerCirc Fines Audit Daily Report - Daily at 6:05 AM.Dupe Patron Record Report - Daily at 5:56 AM.Hobart Media Services Equipment Pickup Summary - Daily at 6:58 AM.Media Service Scheduling Rooms Report - Daily at 6:02 AM.Media Services Equipment Pickup Summary - Daily at 7:00 AM.Received Title Alert - Daily at 6:59 AM.Reserves Overdues - Daily at 5:59 AM.Scheduled LIS Tasks - Daily at 6:00 AM.ILL Borrowing Overdues Report - Weekly at 5:59 AM.ILL Lending Reports - Weekly at 6:15 AM. 23
  24. 24. Monthly Email Reports from the Application ServerCirc Fines Audit - Monthly at 6:10 AM.Circulation by Location and Item Type - Monthly at 6:21 AM.Circulation Lost and Paid - Monthly at 6:25 AM.Circulation Online Renewal Count - Monthly at 6:30 AM.Media Circulation - Monthly at 6:35 AM.Reserve Circulation - Monthly at 6:40 AM. 24
  25. 25. 25
  26. 26. On Demand Reports 26
  27. 27. Lending Services ReportsLists of patrons with fines between $10 and $19.99• Student and Alumni fines list - Sorted by either Name, Amount or Notice Date.• PALS and Courtesy Patron fines list - Sorted by Name.• All other Patron fines list - Sorted by Name.Lists of patrons with fines over $19.99• Student and Alumni fines list - Sorted by either Name, IID, Amount, Notice Date or Notes.• PALS and Courtesy Patron fines list - Sorted by Name.• VALE Patron fines list - Sorted by Name.• All other Patron fines list - Sorted by Name.Lists of patrons with overdues older than 30 days• Student and Alumni overdues list - Sorted by either Name, IID or Notes.• PALS and Courtesy Patron overdues list - Sorted by Name.• All other Patron overdues list except VALE - Sorted by Name. 27
  28. 28. Lending Services Reports, cont.Lists of VALE patrons with overdues older than 6 months• VALE patron overdues list - Sorted by Name.Miscellaneous Reports• Patrons with the word "Collection Agency" or "CA" in their notes.• Patrons with the word "FINE" in one of their notes.• Patrons with the word "SOILS" in their notes.• Patrons with the word "FALL07 SOILS" in their notes.• Patrons with the word "HOLD" in their notes.• Combined list of HOLD, FINE, and CA.Circulation Reports by Item Type from 2003 to the present• All Staff.• All Colleges• Undergraduates by Major.• Graduates by Major• Patrons that have reached a total fine balance of $10 or more after 31-Dec-2009 and 30-Nov-2009 28
  29. 29. One of Our Projects• Mining EZProxy logfiles and linking to patron statistical categories from the Voyager Patron Database – What majors and departments are accessing which database services? – What majors and departments are accessing the ILL services? 29
  30. 30. EZProxy via LDAP authenticates access to:DatabasesElectronic journalsILL/Doc Delivery forms
  31. 31. Example of EZProxy log entry• Ip address nj.dhcp.embarqhsd.net• (Not used) -• user id theuser• date/time 1/1/2008 4:25:15 AM• Method GET• page http://ezproxy.wpunj.edu:2048/connect?session=sGHMbeSss121YxZa&url= retrieved http://www.wpunj.edu/scripts/webscript.exe?fs.scr• Version HTTP/1.1• response 302 code• no. of bytes 537• Referring http://ezproxy.wpunj.edu:2048/login?url=http://www.wpunj.edu/scripts/w URL ebscript.exe?fs.scr• User agent Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322) 31
  32. 32. Patron Privacy and Standards 32
  33. 33. Using Voyager as the model for Patron Privacy 33
  34. 34. • Active Circ transactions are stored in a table with patron ID and linked to statistical categories.• Completed Circ transactions are moved to another table without the patron ID, but still linked with the patron statistical categories.• The Patron Table contains the total counts of transactions for each patron, but no link to which transactions they are. 34
  35. 35. MySQL operations• Extract patron statistical categories out of Voyager and build them into the MySQL database.• EZProxy transactions would be stored in one table and linked to patron statistical categories via the user ID.• Once completed, user ids are deleted.• Logs are collected monthly and loaded and deleted monthly. 35
  36. 36. Slide removed for Privacy Reasons
  37. 37. Slide removed for Privacy Reasons 38
  38. 38. ILL request form authentications by majorArticle BookCount Major Count Major 62 M- Psychology 90 M- History 60 M- Sociology 28 M- Non-Degree 42 M- Applied Clinical Psych 25 M- Pub Pol & Intl Affairs 35 M- Education 20 M- Spanish 31 M- History 18 M- English 30 M- Spanish 16 M- Undecided 29 M- Nursing 14 M- Art M- Communication 14 M- Education 19 Disorders 11 M- Sociology 19 M- Communication 10 M- Biology 14 M- Biotechnology 9 M- Music 14 M- Counseling 9 M- Special Programs 14 M- English 8 M- Psychology 12 M- Non-Degree 7 M- Biotechnology 10 M- Community/Sch Health 7 M- Political Science 7 M- Biology 6 M- Anthropology 7 M- Political Science 6 M- Music - Jazz Studies 6 M- Undecided 4 M- Business 5 M- Comm Media Studies 4 M- Communication 5 M- Reading 4 M- Nursing 4 M- Business 48
  39. 39. 49
  40. 40. Reporting and Standards• Reporting – Emailed periodically - e.g., daily dossiers, and other event triggered reports. – On demand, via email, web pages or a printer.• Standards – Share data for comparative research. – Groups of libraries and consortia 50
  41. 41. 51
  42. 42. 52
  43. 43. 53
  44. 44. Further Reading•Coombs, Karen A. (2005). Lessons learned from analyzinglibrary database usage data. Library Hi Tech, 23:4, 598.• Diana, Birkin James. dashboard_beta.http://library.brown.edu/dashboard/info/• Metridoc. http://code.google.com/p/metridoc/• Morton-Owens, Emily (2011) Trends at a glance. LITA 2011.http://connect.ala.org/files/79651/trends_at_a_glance_dashboards_pdf_12068.pdf
  45. 45. Questions? Ray Schwartz, Systems Specialist LibrarianCheng Library, William Paterson University, Wayne, New Jersey, USA schwartzr2 @ wpunj.edu 55
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×