Crushing, Blending, and Stretching Data
Upcoming SlideShare
Loading in...5
×
 

Crushing, Blending, and Stretching Data

on

  • 3,068 views

Data Warehousing and Mining Data from Library and University Information Systems for Assessment of Library Operations: A Case Study in Progress

Data Warehousing and Mining Data from Library and University Information Systems for Assessment of Library Operations: A Case Study in Progress

Statistics

Views

Total Views
3,068
Views on SlideShare
3,068
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • In the end, we are developing various types of reporting to support the management of library services. Many of the reports are emailed periodically - e.g., daily dossiers, and event triggered reports. And other reports are on demand, where the output can be via email, webpages or a printer. However, standards are needed to share data for comparative research. It is important to work with other groups of libraries and consortia to comply and develop the necessary standards for the sharing of data.

Crushing, Blending, and Stretching Data Crushing, Blending, and Stretching Data Presentation Transcript

  • Crushing, Blending, and Stretching DataData Warehousing and Mining Data from Library and University Information Systems for Assessment of Library Operations: A Case Study in Progress Ecole des sciences de linformation, Rabat, Morocco, Monday, April 13, 2009 Ray Schwartz, Systems Specialist Librarian Cheng Library, William Paterson University, Wayne, New Jersey, USA schwartzr2 @ wpunj.edu
  • Outline• Why Assessment and Why Now?• What is Data Mining and Data Warehousing and Why Do We Do It?• Our Library and University• Groups and Services• Steps• Reporting 2
  • Have We Always Assessed?• Anecdotally—Yes.• Systematically—Not usually. – Large scale assessment of manual systems (such as serials check-in, and card catalogs, circulation files) are not practical. – Smaller scale and directed assessment is possible. 3
  • What changed since the days of manual systems? 4
  • • For many institutions in the West, the Integrated Library System (ILS) has been in use for over 20 years.• Larger scale assessment is now possible with the electronic systems. – Counts of circulation transactions – Fund codes for purchases of library materials• Reports from vendor services – Bibliographic utilities – Subscription agents – Book jobbers 5
  • 6
  • 7
  • What is different now?• New services have come into existence. – Inside libraries • Full-Text Databases • Link Resolvers – Outside of libraries • Google • Amazon 8
  • 9
  • What is Data Mining and Data Warehousing• Extracting data from legacy systems and other resources;• cleaning, scrubbing and preparing data for decision support;• maintaining data in appropriate data stores;• accessing and analysing data using a variety of end user tools;• and mining data for significant relationships. • Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall. 10
  • • The primary purpose of these efforts is to provide easy access to specifically prepared data that can be used with decision support applications such as management reports, queries, decision support systems , executive information systems and data mining.• Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall. 11
  • Of course there are many ways to measure – Scott Nicholson’s Measurement Model 12
  • Measurement Matrix with methodologies TopicPerspective Library System Use Procedures and Standards Recorded interactions withInternal (Library •Staff survey and interviews interface & materialsSystem) •Audits of collections, systems, •Bibliomining or staff •Transaction/Web Log Analysis •Observation of User Behavior Usability Knowledge states and UserExternal •Effectiveness of the system for citations to materials the staff and institution. •How useful is the library(User) system? •Focus groups, User Citation tracking 13 Nicholson, Scott (2004). A Conceptual framework for the holistic measurement and cumulative evaluation of library services. Journal of Documentation 60(2) p.164-181
  • Our University• 9000 undergraduates• 1000 graduates (mostly education majors)• 400 faculty• 800 adjuncts• 1000 staff 14
  • Our Library• 19 librarians and 26 library staff• 350,000 volumes• 18,000 audiovisual items• 22,000 print and electronic periodicals• 100 general and subject specific databases 15
  • Our Systems since 2005• Voyager ILS• Online Periodical Database (OPD)• Clio ILL Software• EZProxy Server• Banner – University ERP• University Networked Drive K:• University Email Server• University Web Server 16
  • Systems Chart – ca. 2005Integrated Library System www.wpunj.edu Online Periodicals Serials Form Scripting Language Database Scripting Language ILL Form Web Server ER Micro Pag Web Server DBMS Form e Voyager Materials Proxy Server Circulation Media Scheduling Off Campus Dbase Hits Patrons Patrons Searches & ILL Form ( EZProxy Log ) Banner SIS HRS University Networked Drive K:( University ERP System ) University Email Server Patrons Materials OCLC – Bibliographic Utility ILL ( Cliodata ) Serials Solutions A to Z WorldCat ILL Other Vendors‘ Database Services Current Relationships Internal Externally & Usage Reports only accessible Non WPUNJ WPUNJ WPUNJ 17 Server Server Server
  • Vendor Services• Serials Solutions• OCLC – Bibliographic Utility• Blackwell – Book Jobber• Ebsco – Subscription Agent• Marcive – Authority Control• Database Vendors 18
  • The QuestionWhich categories of patrons are accessing which services? 19
  • First Step – Patron Statistical Categories 20
  • • Voyager Patron Database allows a maximum of 10 statistical categories per patron record.• Decide which statistical categories are needed for each patron group defined.• Work with your University Information Systems Department to extract the relevant data from the relevant sources. 21
  • Groups and Services• Major • Circulation• Status – Books – Media – Undergrad or Grad – Reserve – Faculty, Adjunct Faculty or – By Fund Code Staff – Location• Department • ILL / Document Delivery• College • Databases• Degree • Library Web Pages• – Subject Area Resource Guides No. of Credits – Reference Requests• Year of Study • Catalog• Campus Location • Other Vendor Services – Serials Solutions 22
  • History Department - 12 months - Feb. 2008 % BORROW CIRC/ CIRC/ PATRON STATUS BOOK CIRC MEDIA CIRC EQUIP CIRC TOTAL CIRC MEMBERS BORROWERS ING MEMBER BORROWERUNDERGRADUATESTUDENTS 2,715 250 698 3,663 238 186 78% 15.39 19.69GRADUATESTUDENTS 419 13 76 508 14 13 93% 36.29 39.08ADJUNCT FACULTY 100 65 20 185 32 20 63% 5.78 9.25FULL-TIME FACULTY 159 115 194 468 24 23 96% 19.50 20.35HISTORY TOTALS 3,393 443 988 4,824 308 242 79% 15.66 19.93LIBRARY TOTALS 23,370 8,713 20,703 52,756 7,418 4,981 67% 7.11 10.59DEFINITIONS:BOOK CIRCULATION = books, book disks, maps, oversize, Curriculum materials, reserve books, NJ History, Leisure LoungeMEDIA CIRCULATION = audio & video materials, including media reservesEQUIPMENT CIRCULATION = camcorders, overhead & data projectors, laptops, easels, DVD players, etc.MEMBER = declared major or department memberBORROWER = any member who borrowed materialsLibrary Total = declared undergrad & grad majors, adjuncts & full time faculty borrowers 23
  • Problems with Configuration of Services• Little to no linkage of data• Need to search multiple services to get complete picture of serial holdings• Multiple user IDs for authentication 24
  • Retirement the the OPD• Serials holdings data was extracted from the OPD and added to Voyager catalog• From Voyager catalog, serials holdings data is extracted and added to Serials Solutions A to Z list 25
  • • Authentication of ILL form is routed through the EZProxy server• A web bug is placed in the microform request page to record submission in the Voyagers web server logfile. 26
  • New Services Added• Serials Solutions MARC Record Service• Serials Solutions Link Resolver• OCLC Worldcat Collection Analysis 27
  • Second Step – Setup an Application Server 28
  • Our Systems in 2008• Voyager ILS• Shared Application Server• Clio ILL Software• EZProxy Server• Banner – University ERP• University Networked Drive K:• University Email Server• University Web Server 29
  • Systems Chart - 2008Integrated Library System Application Server www.wpunj.edu Serials Form Scripting Language Scripting Language ILL Form Scripting Language Web Server ER Micro Pag Web Server Form e Voyager Web Server Proxy Server Circulation Media Scheduling DBMS Off Campus Dbase Hits Patrons Searches & ILL Form OffCampus ILL ILL Dbase Patrons/ Patrons/ ( EZProxy Log ) Usage by Materials Materials Patron Requested Groups Received Banner SIS HRS University Networked( University ERP System ) University Email Server Drive K: Patrons Materials Serials Solutions OCLC – Bibliographic Utility ILL ( Cliodata ) A to Z W WorldCat MARC Records C Link Resolver A ILL Other Vendors‘ Database Services & Usage Reports Current Relationships Internal Externally only accessible Non WPUNJ WPUNJ WPUNJ 30 Server Server Server
  • What is an Application Server?• A machine or its software that works in conjunction with a web server to deliver application services such as the dynamic creation of a webpage from content stored in a database. From http://www.webtools.ca.gov/help/Glossary.asp• Web Server Software (Apache or IIS)• Database Management System – DBMS (MySQL, Oracle, MS SQL Server)• Scripting Language (Perl, PHP, ColdFusion, ASP) 31
  • Why an Application Server?• Relevant data in logfiles need to be in a database to be analyze.• Need your own DBMS to create new tables and queries. 32
  • • Decide how you will use the Application Server.• Decide on the best and most plausible configuration. 33
  • One of Our Projects• Mining EZProxy logfiles and linking to patron statistical categories from the Voyager Patron Database – What majors and departments are accessing which database services? – What majors and departments are accessing the ILL services? 34
  • Systems Chart - 2008Integrated Library System Application Server www.wpunj.edu Serials Form Scripting Language Scripting Language Scripting Language ILL Form Web Server ER Micro Pag Web Server Form e Voyager Web Server Proxy Server Circulation Media Scheduling DBMS Off Campus Dbase Hits Patrons Searches & ILL Form OffCampus ILL ILL Dbase Patrons/ Patrons/ ( EZProxy Log ) Usage by Materials Materials Patron Requested Groups Received Banner SIS HRS University Networked( University ERP System ) University Email Server Drive K: Patrons Materials Serials Solutions OCLC ILL ( Cliodata ) A to Z W WorldCat MARC Records C Link Resolver A ILL Other Vendors‘ Database Services & Usage Reports Current Relationships Internal Externally ILL Collection and Patron Group Analyses only accessible Non WPUNJ WPUNJ WPUNJ 35 Off Campus Database Hits by Patron Group Server Server Server
  • ILL request form authentications by major – Academic year 07/08Article BookCount Major Count Major 62 M- Psychology 90 M- History 60 M- Sociology 28 M- Non-Degree 42 M- Applied Clinical Psych 25 M- Pub Pol & Intl Affairs 35 M- Education 20 M- Spanish 31 M- History 18 M- English 30 M- Spanish 16 M- Undecided 29 M- Nursing 14 M- Art M- Communication 14 M- Education 19 Disorders 11 M- Sociology 19 M- Communication 10 M- Biology 14 M- Biotechnology 9 M- Music 14 M- Counseling 9 M- Special Programs 14 M- English 8 M- Psychology 12 M- Non-Degree 7 M- Biotechnology 10 M- Community/Sch Health 7 M- Political Science 7 M- Biology 6 M- Anthropology 7 M- Political Science 6 M- Music - Jazz Studies 6 M- Undecided 4 M- Business 5 M- Comm Media Studies 4 M- Communication 5 M- Reading 4 M- Nursing 4 M- Business 36
  • Which Databases are accessed by Majors and Departments?07/29/08
  • By Major and Host Major Count Host M- Nursing 3377 ebscohost.com M- Non-Degree 3010 ebscohost.com M- Psychology 2303 ebscohost.com M- Counseling 1487 ebscohost.com M- Communication 1359 ebscohost.com M- Education 1267 ebscohost.com M- Business 1246 proquest.umi.com M- Sociology 1152 ebscohost.com M- Business 1145 lexis-nexis.com M- Undecided 1100 ebscohost.com M- Applied Clinical Psych 1075 ebscohost.com M- English 1034 ebscohost.com M- Sociology 916 csa.com M- Business 794 ebscohost.com M- Accounting 738 lexis-nexis.com M- Reading 683 ebscohost.com M- Physical Education 653 ebscohost.com M- Special Programs 600 ebscohost.com M- Non-Degree 463 ereserve.wpunj.edu07/29/08
  • By Dept and HostDepartment Count HostS- Information Systems 933 webscript.exe?fs.scrS- Psychology Dept. 742 ebscohost.comS- Accounting and Law 559 lexis-nexis.comS- Political Sci Dept. 308 lexis-nexis.comS- Nursing Dept. 204 ebscohost.comS- Market & Mgt. Dept. 175 proquest.umi.comS- Library 167 ebscohost.comS- Sociology Dept. 151 ebscohost.comS- Sociology Dept. 134 csa.comS- History Dept. 121 serials.abc-clio.comS- Exercise & Mov Sci 110 ebscohost.comS- Political Sci Dept. 104 ebscohost.comS- Library 103 ILL_article.cfmS- Library 100 webscript.exe?fs.scrS- History Dept. 94 webscript.exe?fs.scr07/29/08
  • By Dept and ServiceDepartment Count ServiceS- Information Systems 933 http://www.wpunj.edu/scripts/webscript.exe?fs.scrS- Accounting and Law 549 http://www.lexis-nexis.com/universeS- Psychology Dept. 364 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=psychS- Nursing Dept. 114 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=c8hS- Sociology Dept. 96 http://www.csa.com/htbin/dbrng.cgi?&db=socioabs-set-c&adv=1S- Sociology Dept. 75 http://search.ebscohost.com/login.asp?profile=asp http://webspirs4.silverplatter.com:8900/c119646?S- Philosophy Dept. 74 sp.form.first.p=srchmain.htm&sp.dbid.p=S(PHILS- Library 65 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=aspS- Anthropology Dept. 62 http://www.sciencedirect.com/S- History Dept. 61 http://serials.abc-clio.com/active/start?_appname=serials&initialdb=AHLS- Psychology Dept. 61 http://search.ebscohost.com/login.asp?profile=psyartS- History Dept. 58 http://serials.abc-clio.com/active/start?_appname=serials&initialdb=HAS- Psychology Dept. 54 http://search.ebscohost.com/login.asp?profile=psychS- Psychology Dept. 42 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=psyartS- English Dept. 42 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=mzh 07/29/08
  • IP Address Location = 149.151.VlanID.*Admin VLANs   Labs VLANs   Vlan ID Vlan Name Vlan ID Vlan Name 2 Servers 3 Lab Servers 4 Admin 9 Imaging 5 Science 160 Lib Labs 6 Test Servers 174 STU VPN 7 NAS 175 Ben Shahn Lab 101 Energy Management 178 Hobart Lab 102 Diebold 179 SCI Lab 104 Xerox 187 CS Lab 150 Media Services 192 Atrium 161 Dorms Offices 209 Labs 162 RBI 212 Resnet Labs
  • Some concernsPatron Privacy and Standards07/29/08
  • Using Voyager as the model for Patron Privacy07/29/08
  • • Active Circ transactions are stored in a table with patron ID and statistical categories.• Completed Circ transactions are stored in a table without the patron ID, but still with the patron statistical categories.• The Patron Table contains the total counts of transactions for each patron, but no link to which transactions they are.07/29/08
  • • EZProxy transactions would be stored in one table with patron statistical categories, but without the user ID.• User ID s would be stored in another table with counts for each service divided by academic year.• Logs are collected monthly and loaded and deleted monthly.07/29/08
  • Example of EZProxy log entry• Ip address nj.dhcp.embarqhsd.net• (Not used) -• user id theuser• date/time 1/1/2008 4:25:15 AM• Method GET• page http://ezproxy.wpunj.edu:2048/connect? session=sGHMbeSss121YxZa&url=http://www.wpunj.edu/scripts/ retrieved webscript.exe?fs.scr HTTP/1.1• Version 302• response code• no. of bytes 537• Referring http://ezproxy.wpunj.edu:2048/login? url=http://www.wpunj.edu/scripts/webscript.exe?fs.scr URL Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR• User agent 1.1.4322) 46
  • Perl Script for loading ezproxy log into MySQLuse strict;my%month=(Jan=>01,Feb=>02,Mar=>03,Apr=>04,May=>05,Jun=>06,Jul=>07,Aug=>08,Sep=>09,Oct=>10,Nov=>11,Dec=>12);while (<>){ my $pattern = ^(S*) (S*) (S*) (S*) . [(..)/(...)/(....):(..):(..):(..) .....]. "(S*) (S*) (S*)" . (d*) (-|d*) "([^"]*)" "([^"]*)"; if (m/$pattern/){ my ($tgt,$ref,$agt) = (esc($12),esc($16),esc($17)); my $byt = $15 eq _?NULL:$15; print "INSERT INTO ezproxylogs VALUES ($1,$2,$3,". " TIMESTAMP $7/$month{$6}/$5 $8:$9:$10,$11,$tgt,". "$13,$14,$byt,$ref,$agt);r."; }else{ print "--Skipped line $.n"; }}sub esc{ my ($p) = @_; $p =~ s///g; return $p;} 47
  • Created table to assist the linkingSELECT PATRON_ADDRESS.ADDRESS_TYPE,Left([ADDRESS_LINE1],InStr([ADDRESS_LINE1],"@")-1) AS usr ,PATRON_ADDRESS.PATRON_ID,PATRON_ADDRESS.ADDRESS_STATUS,PATRON_ADDRESS.EFFECT_DATE,PATRON_ADDRESS.EXPIRE_DATE,PATRON_ADDRESS.MODIFY_DATE,PATRON_ADDRESS.MODIFY_OPERATOR_ID INTOemailprefixFROM PATRON_ADDRESSWHERE(((PATRON_ADDRESS.ADDRESS_TYPE)="3")); 48
  • Reporting and Standards• Reporting – emailed periodically - e.g., daily dossiers, and other event triggered reports. – On demand, via email, web pages or a printer.• Standards – Share data for comparative research. – Groups of libraries and consortia
  • Questions? Ray Schwartz, Systems Specialist LibrarianCheng Library, William Paterson University, Wayne, New Jersey, USA schwartzr2 @ wpunj.edu 50