AWS Community Day CPH - Three problems of Terraform
Â
Crushing, Blending, and Stretching Data
1. Crushing, Blending, and
Stretching Data
Data Warehousing and Mining Data
from Voyager and Other Library
and University Systems for
Assessment of Library Operations
ELUNA Conference 2008, Long Beach, CA,
Friday, August 1, 2008
Ray Schwartz,
Systems Specialist Librarian
Cheng Library, William Paterson University,
Wayne, New Jersey, USA
schwartzr2 @ wpunj.edu
2. Outline
⢠Why Assessment and Why Now?
⢠What is Data Mining and Data
Warehousing and Why Do We Do It?
⢠Our Context
⢠Groups and Services
⢠Steps
⢠Reporting
2
3. Outline
⢠What is Data Mining and Data
Warehousing?
⢠Our Context
⢠Groups and Services
⢠Steps
⢠Reporting
3
4. Have We Always Assessed?
⢠AnecdotallyâYes.
⢠SystematicallyâNot usually.
â Large scale assessment of manual systems
(such as serials check-in, and card catalogs,
circulation files) are not practical.
â Smaller scale and directed assessment is
possible.
4
5. What changed since the days
of manual systems?
⢠For many institutions in the West, the
Integrated Library System has been in use
for over 20 years.
⢠Larger scale assessment is now possible
with the electronic systems.
5
8. What is different now?
⢠New services have come into existence.
â Inside libraries
⢠Full-Text Databases
⢠Link Resolvers
â Outside of libraries
⢠Google
⢠Amazon
8
10. What is Data Mining and Data
Warehousing
⢠Extracting data from legacy systems and other
resources;
⢠cleaning, scrubbing and preparing data for decision
support;
⢠maintaining data in appropriate data stores;
⢠accessing and analysing data using a variety of end
user tools;
⢠and mining data for significant relationships.
⢠Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing:
Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.
10
11. ⢠The primary purpose of these efforts
is to provide easy access to specifically
prepared data that can be used with
decision support applications such as
management reports, queries,
decision support systems ,
executive information systems and
data mining.
⢠Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing:
Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.
11
12. Of course there are many
ways to measure
â
Scott Nicholsonâs
Measurement Model
12
13. Measurement Matrix with
methodologies
Topic
Perspective Library System Use
Procedures and Standards Recorded interactions with
Internal (Library â˘Staff survey and interviews interface & materials
System) â˘Audits of collections, systems, â˘Bibliomining
or staff â˘Transaction/Web Log Analysis
â˘Observation of User Behavior
Aboutness and Usability Knowledge states and User
External â˘Surveys and interviews citations to materials
â˘Talk-alouds and inprocess â˘Surveys and interviews
(User) feedback mechanisms â˘Focus groups
â˘Focus groups â˘User Citation tracking
13
Nicholson, Scott (2004). A Conceptual framework for the holistic measurement and cumulative evaluation
of library services. Journal of Documentation 60(2) p.164-181
16. Our Library
⢠19 librarians and 26 library staff
⢠350,000 volumes
⢠18,000 audiovisual items
⢠22,000 print and electronic periodicals
⢠100 general and subject specific databases
16
17. Our Systems circa 2005
⢠Voyager ILS â Cheng Server
⢠Online Periodical Database (OPD)
⢠Clio ILL Software
⢠EZProxy Server - Zeus
⢠Banner â University ERP
⢠University Networked Drive K:
⢠University Email Server
⢠University Web Server
17
21. ⢠Voyager Patron Database allows a maximum
of 10 statistical categories per patron record.
⢠Decide which statistical categories are needed
for each patron group defined.
⢠Work with your University Information Systems
Department to extract the relevant data from
the relevant sources.
21
22. Groups and Services
⢠Major ⢠Circulation
⢠Status â Books
â Media
â Undergrad or Grad
â Reserve
â Faculty, Adjunct Faculty or
â By Fund Code
Staff
â Location
⢠Department ⢠ILL / Document Delivery
⢠College ⢠Databases
⢠Degree ⢠Library Web Pages
⢠â Subject Area Resource Guides
No. of Credits
â Reference Requests
⢠Year of Study ⢠Catalog
⢠Campus Location ⢠Other Vendor Services
â Serials Solutions
22
23. History Department - 12 months - Feb. 2008
%
BORROW CIRC/ CIRC/
PATRON STATUS BOOK CIRC MEDIA CIRC EQUIP CIRC TOTAL CIRC MEMBERS BORROWERS ING MEMBER BORROWER
UNDERGRADUATE
STUDENTS 2,715 250 698 3,663 238 186 78% 15.39 19.69
GRADUATE
STUDENTS 419 13 76 508 14 13 93% 36.29 39.08
ADJUNCT FACULTY 100 65 20 185 32 20 63% 5.78 9.25
FULL-TIME FACULTY 159 115 194 468 24 23 96% 19.50 20.35
HISTORY TOTALS 3,393 443 988 4,824 308 242 79% 15.66 19.93
LIBRARY TOTALS 23,370 8,713 20,703 52,756 7,418 4,981 67% 7.11 10.59
DEFINITIONS:
BOOK CIRCULATION = books, book disks, maps, oversize, Curriculum materials, reserve books, NJ History, Leisure Lounge
MEDIA CIRCULATION = audio & video materials, including media reserves
EQUIPMENT CIRCULATION = camcorders, overhead & data projectors, laptops, easels, DVD players, etc.
MEMBER = declared major or department member
BORROWER = any member who borrowed materials
Library Total = declared undergrad & grad majors, adjuncts & full time faculty borrowers
23
24. Problems with Configuration of
Services
⢠Little to no linkage of data
⢠Need to search multiple services to
get complete picture of serial holdings
⢠Multiple user IDs for authentication
24
25. Systems Chart â ca. 2005
Cheng Server www.wpunj.edu
Online Periodicals Serials
Form
Perl Database ColdFusion
ILL Form
Web Server ER
Micro Pag
Web Server Oracle Form e
Voyager Materials
Zeus
Circulation Media
Scheduling
Off Campus Dbase Hits
Patrons
Patrons Searches & ILL Form
( EZProxy Log )
Banner
SIS HRS University Networked
Drive K:
( University ERP System ) University Email Server
Patrons Materials
ILL ( Cliodata )
Serials Solutions OCLC
A to Z
WorldCat
ILL
Other Vendorsâ
Database Services
Current Relationships
Internal Externally & Usage Reports
only accessible Non
WPUNJ WPUNJ WPUNJ
25
Server
Server Server
26. Retirement the the OPD
⢠Serials holdings data was extracted
from the OPD and added to
Voyager catalog
⢠From Voyager catalog, serials
holdings data is extracted and added
to Serials Solutions A to Z list
26
27. Retirement of the OPD cont.
⢠Authentication of ILL form is routed
through the EZProxy server
⢠A web bug is placed in the microform
request page to record submission in the
Voyager server's web logfile.
27
28. Systems Chart â ca. 2005 â Retiring the OPD
Cheng Server www.wpunj.edu
Online Periodicals Serials
Form
Perl Database ColdFusion
ILL Form
Web Server ER
Micro Pag
Web Server Oracle Form e
Voyager Materials
Zeus
Circulation Media
Scheduling
Off Campus Dbase Hits
Patrons
Patrons Searches & ILL Form
( EZProxy Log )
Banner
SIS HRS University Networked
Drive K:
( University ERP System ) University Email Server
Patrons Materials
ILL
Serials Solutions OCLC
A to Z
WorldCat
ILL
Other Vendorsâ
Database Services
Current Relationships
Internal Externally & Usage Reports
only accessible Non
WPUNJ WPUNJ WPUNJ
28
Server
Server Server
29. New Services Added
⢠Serials Solutions MARC Record Service
⢠Serials Solutions Link Resolver
⢠OCLC Worldcat Collection Analysis
29
30. Systems Chart â ca. 2005 â New Services Added
Cheng Server www.wpunj.edu Serials
Form
Perl ColdFusion
ILL Form
Web Server ER
Micro Pag
Web Server Form e
Voyager Zeus
Circulation Media
Scheduling
Off Campus Dbase Hits
Patrons Searches & ILL Form
( EZProxy Log )
Banner
SIS HRS University Networked
Drive K:
( University ERP System ) University Email Server
Patrons Materials
ILL ( Cliodata )
Serials Solutions OCLC
A to Z
W WorldCat
MARC Records C
Link Resolver A ILL
Other Vendorsâ
Database Services
Current Relationships
Internal Externally & Usage Reports
only accessible Non
WPUNJ WPUNJ WPUNJ
30
Server
Server Server
31. Our Systems in 2008
⢠Voyager ILS â Cheng Server
⢠Shared Application Server
⢠Clio ILL Software
⢠EZProxy Server - Zeus
⢠Banner â University ERP
⢠University Networked Drive K:
⢠University Email Server
⢠University Web Server
31
32. Systems Chart - 2008
Cheng Server Application Server www.wpunj.edu Serials
Form
Perl ColdFusion
ILL Form
ColdFusion Web Server ER
Micro Pag
Web Server Form e
Voyager Web Server Zeus
Circulation Media
Scheduling
DBMS Off Campus Dbase Hits
Patrons Searches & ILL Form
OffCampus ILL ILL
Dbase Patrons/ Patrons/ ( EZProxy Log )
Usage by Materials
Materials
Patron Requested
Groups Received
Banner
SIS HRS University Networked
( University ERP System ) University Email Server Drive K:
Patrons Materials
Serials Solutions OCLC ILL ( Cliodata )
A to Z
W WorldCat
MARC Records C
Link Resolver A ILL
Other Vendorsâ
Database Services
& Usage Reports
Current Relationships
Internal Externally
only accessible Non
WPUNJ WPUNJ WPUNJ
32
Server
Server Server
34. What is an Application Server?
⢠A machine or its software that works in
conjunction with a web server to deliver
application services such as the dynamic
creation of a webpage from content stored in a
database. From http://www.webtools.ca.gov/help/Glossary.asp
⢠Web Server Software (Apache or IIS)
⢠Database Management System â DBMS (MySQL,
Oracle, MS SQL Server)
⢠Scripting Language (Perl, PHP, ColdFusion, ASP)
34
35. Why an Application Server?
⢠Relevant data in logfiles need to be in
a database to be analyze.
⢠Need your own DBMS to create new
tables and queries.
35
36. ⢠Decide how you will use the
Application Server.
⢠Decide on the best and most plausible
configuration.
36
37. The Projects
⢠Mining EZProxy logfiles and linking to
patron statistical categories from the
Voyager Patron Database
â What majors and departments are accessing
which database services?
â What majors and departments are accessing
the ILL services?
37
38. Systems Chart - 2008
Integrated Library System Application Server www.wpunj.edu Serials
Form
Scripting Language
Scripting Language Scripting Language
ILL Form
Web Server ER
Micro Pag
Web Server Form e
Voyager Web Server Proxy Server
Circulation Media
Scheduling
DBMS Off Campus Dbase Hits
Patrons Searches & ILL Form
OffCampus ILL ILL
Dbase Patrons/ Patrons/ ( EZProxy Log )
Usage by Materials
Materials
Patron Requested
Groups Received
Banner
SIS HRS University Networked
( University ERP System ) University Email Server Drive K:
Patrons Materials
Serials Solutions OCLC ILL ( Cliodata )
A to Z
W WorldCat
MARC Records C
Link Resolver A ILL
Other Vendorsâ
Database Services
& Usage Reports
Current Relationships
Internal Externally
ILL Collection and Patron Group Analyses only accessible Non
WPUNJ WPUNJ WPUNJ
38
Off Campus Database Hits by Patron Group Server
Server Server
39. ILL request form authentications by major â
Academic year 07/08
Article Book
Count Major Count Major
62 M- Psychology 90 M- History
60 M- Sociology 28 M- Non-Degree
42 M- Applied Clinical Psych 25 M- Pub Pol & Intl Affairs
35 M- Education 20 M- Spanish
31 M- History 18 M- English
30 M- Spanish 16 M- Undecided
29 M- Nursing 14 M- Art
M- Communication 14 M- Education
19 Disorders 11 M- Sociology
19 M- Communication 10 M- Biology
14 M- Biotechnology 9 M- Music
14 M- Counseling 9 M- Special Programs
14 M- English 8 M- Psychology
12 M- Non-Degree 7 M- Biotechnology
10 M- Community/Sch Health 7 M- Political Science
7 M- Biology 6 M- Anthropology
7 M- Political Science 6 M- Music - Jazz Studies
6 M- Undecided 4 M- Business
5 M- Comm Media Studies 4 M- Communication
5 M- Reading 4 M- Nursing
4 M- Business 39
46. ⢠Active Circ transactions are stored in a
table with patron ID and statistical
categories.
⢠Completed Circ transactions are stored
in a table without the patron ID, but still
with the patron statistical categories.
⢠The Patron Table contains the total
counts of transactions for each patron,
but no link to which transactions they
are.
07/29/08
47. ⢠EZProxy transactions would be stored in
one table with patron statistical
categories, but without the user ID.
⢠User ID s would be stored in another
table with counts for each service divided
by academic year.
⢠Logs are collected monthly and loaded
and deleted monthly.
07/29/08
48. Example of EZProxy log entry
⢠Ip address nj.dhcp.embarqhsd.net
⢠(Not used) -
⢠user id theuser
⢠date/time 1/1/2008 4:25:15 AM
⢠Method GET
⢠page http://ezproxy.wpunj.edu:2048/connect?
session=sGHMbeSss121YxZa&url=http://www.wpunj.edu/scripts/
retrieved webscript.exe?fs.scr
HTTP/1.1
⢠Version
302
⢠response
code
⢠no. of bytes 537
⢠Referring http://ezproxy.wpunj.edu:2048/login?
url=http://www.wpunj.edu/scripts/webscript.exe?fs.scr
URL
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR
⢠User agent 1.1.4322)
48
49. Perl Script for loading ezproxy
log into MySQL
use strict;
my
%month=(Jan=>'01',Feb=>'02',Mar=>'03',Apr=>'04',May=>'05',Jun=>'06',Jul=>'07',
Aug=>'08',Sep=>'09',Oct=>'10',Nov=>'11',Dec=>'12');
while (<>){
my $pattern =
'^(S*) (S*) (S*) (S*) '.
'[(..)/(...)/(....):(..):(..):(..) .....]'.
' "(S*) (S*) (S*)" '.
'(d*) (-|d*) "([^"]*)" "([^"]*)"';
if (m/$pattern/){
my ($tgt,$ref,$agt) = (esc($12),esc($16),esc($17));
my $byt = $15 eq '_'?'NULL':$15;
print "INSERT INTO ezproxylogs VALUES ('$1','$2','$3',".
" TIMESTAMP '$7/$month{$6}/$5 $8:$9:$10','$11','$tgt',".
"'$13',$14,$byt,'$ref','$agt');r.";
}else{
print "--Skipped line $.n";
}
}
sub esc{
my ($p) = @_;
$p =~ s/'/''/g;
return $p;
} 49
50. Created table to assist the
linking
SELECT PATRON_ADDRESS.ADDRESS_TYPE,
Left([ADDRESS_LINE1],InStr([ADDRESS_LIN
E1],"@")-1) AS usr ,
PATRON_ADDRESS.PATRON_ID,
PATRON_ADDRESS.ADDRESS_STATUS,
PATRON_ADDRESS.EFFECT_DATE,
PATRON_ADDRESS.EXPIRE_DATE,
PATRON_ADDRESS.MODIFY_DATE,
PATRON_ADDRESS.MODIFY_OPERATOR_ID INTO
emailprefix
FROM PATRON_ADDRESS
WHERE
(((PATRON_ADDRESS.ADDRESS_TYPE)="3"));
50
51. The question of standards
Need standards to share data for
comparative research
51
52. Types of Reporting
Email Reports
Periodic - e.g., Daily Dossiers
Event Triggered
On Demand
Email, web or print
Use by Dept/Major
Use by Fund Code Purchases
52
53. Questions?
Ray Schwartz,
Systems Specialist Librarian
Cheng Library, William Paterson University,
Wayne, New Jersey, USA
schwartzr2 @ wpunj.edu
53