3. GIVE THEM A FISH OR
TEACHING THEM TO FISH
(OR SOMETHING ELSE?)
THREE MODELS FOR REPORTING FROM
SIERRA’S POSTGRESQL DATABASE
4. •Migrated to Sierra in late 2012
•Metropolitan public library
•Main downtown library and 15 branches
•Serving > 1,000,000 Orange County Residents
•~ 1.5 million items
•~ 0.5 million checkouts a month
•~10% of that are requests delivered
•~350 employees
•Annual budget totals approximately $35.3 million
Orange County Library System
6. •1minute 50 times/day = 8 weeks
((50 x 365 x 5)/60)/24 = 63.367 days
•6 hours/week = 2 months
(6 x 52 x 5)/24 = 65 days
•(56 x 24)/8 = 168 8 hour work days
ARITHMETIC AND STUFF
7. "Don't forget the time you spend finding
the chart to look up what you save. And
the time spent reading this reminder
about the time spent. And the time trying
to figure out if either of those actually
make sense. Remember, every second
counts toward your life total, including
these right now."
8. …or the time you are spending in this
presentation regarding evaluating how
much time you will spend determining
whether or not it is worth automating.
9. •Give them a fish
–Just do it for them
•Bait the hook and drop the line for them
–Build an application (which is basically giving them
the fish)
•Teach them to fish
–Write queries for them
–Teach them how it works
–Let them run with it
THREE WAYS TO DELIVER DATA USING SQL
10. •Just do it for them?
–Write the query
–SAVE THE QUERY
– --COMMENT YOUR CODE!
–Export the data, save it as Excel, and give it to the
(information) needy
GIVE THEM A FISH
12. •Build an application for them?
–Frequently performed process
–Manual process disrupts high volume workflow
–Takes *significant time (*relatively)
–Needed by
•Numerous people
•non-professional staff
–Introduces significant chance for human error
Maybe you should build it?
BAIT THE HOOK AND DROP THE LINE FOR THEM
13. •Held Items Delivery
•~40,000 items delivered each month
•Circumvent traditional holds in WebPac
•Traditional holds only in Sierra
•Staff had to perform an absurd process
WHO HAS IT?
15. WHO HAS IT? SELECT (CASE pe.index_entry WHEN ' '
THEN 'no data' else pe.index_entry
end) as pid
FROM iii.sierra_view.hold hold
LEFT JOIN sierra_view.phrase_entry pe
ON hold.patron_record_id = pe.record_id
LEFT JOIN
sierra_view.item_record_property irp
ON hold.record_id = irp.item_record_id
WHERE pe.index_tag = 'b'
AND irp.barcode = [item barcode];
17. •Teach them to do it for themselves?
–Needed by
•Numerous people
•Professional staff
–Frequently performed process
–Monotonous and labor intensive
–Introduces significant chance for human error
–More complex data
TEACH THEM TO FISH (but give them the fish first)
18. •Number of users
•Time of day
•Complexity of queries
•You may have to first convince them that they will
really like fishing.
PRACTICAL CONSIDERATIONS FOR FISHING
19. DOCUMENTATION
•Prepared three documents
–Connecting with pgAdmn
–Executing Saved SQL Queries in pgAdmin
–Basic SQL Concepts Pertaining to Sierra
•Most basic parts of a query
•How changing predicates impact query
•Comments
•Operators and functions
•Pattern Matching
•SierraDNA
23. Change predicate to meet data need.
For example change
“HAVING brp.best_author LIKE 'Patterson, Jam%'”
to
“HAVING brp.best_author LIKE 'Atwood, Marg%'”
WORKFLOW (cont’d)
27. --2015-11-06 please direct questions to David Noe
--Collection Analysis By 710$a
--
--SEARCH BY SUBJECT IN "HAVING -Added Entry-Corporate Name.CONTENT ~*
'.*[STRING].*'" (IN JOIN WITH SUBWQUERY ALIAS AddCorpName )
--SEARCH BY MULTIPLE MATERIAL TYPES IN "WHERE BRP.MATERIAL_CODE = ' '"
--SEARCH BY YEAR OF PUBLICATION IN "AND BRP.PUBLISH_YEAR > '2010'"
--
SELECT
md.record_type_code||md.record_num||'a' AS record_number,
brp.best_author,
brp.best_title,
brp.publish_year,
DATE(br.cataloging_date_gmt) AS Cat_Date,
call.field_content AS Call,
SUM(ir.checkout_total) AS checkouts,
SUM(ir.renewal_total) AS renewals,
SUM(ir.checkout_total + renewal_total) AS total,
COUNT(ir.record_id) AS items,
SUM(CASE WHEN ir.item_status_code='-' OR ir.item_status_code='t' THEN 1 ELSE 0 END)
AS active,
ROUND(SUM(ir.checkout_total + renewal_total)/COUNT(ir.record_id)::decimal,2) AS ratio,
SUM(CASE WHEN ir.location_code LIKE 'a%' THEN ir.checkout_total ELSE 0 END) AS A,
SUM(CASE WHEN ir.location_code LIKE 'c%' THEN ir.checkout_total ELSE 0 END) AS C,
SUM(CASE WHEN ir.location_code LIKE 'd%' THEN ir.checkout_total ELSE 0 END) AS D,
SUM(CASE WHEN ir.location_code LIKE 'e%' THEN ir.checkout_total ELSE 0 END) AS E,
SUM(CASE WHEN ir.location_code LIKE 'g%' THEN ir.checkout_total ELSE 0 END) AS G,
SUM(CASE WHEN ir.location_code LIKE 'h%' THEN ir.checkout_total ELSE 0 END) AS H,
SUM(CASE WHEN ir.location_code LIKE 'k%' THEN ir.checkout_total ELSE 0 END) AS K,
SUM(CASE WHEN ir.location_code LIKE 'm%' THEN ir.checkout_total ELSE 0 END) AS M,
SUM(CASE WHEN ir.location_code LIKE 'n%' THEN ir.checkout_total ELSE 0 END) AS N,
SUM(CASE WHEN ir.location_code LIKE 'p%' THEN ir.checkout_total ELSE 0 END) AS P,
SUM(CASE WHEN ir.location_code LIKE 'r%' THEN ir.checkout_total ELSE 0 END) AS R,
SUM(CASE WHEN ir.location_code LIKE 's%' THEN ir.checkout_total ELSE 0 END) AS S,
SUM(CASE WHEN ir.location_code LIKE 't%' THEN ir.checkout_total ELSE 0 END) AS T,
SUM(CASE WHEN ir.location_code LIKE 'v%' THEN ir.checkout_total ELSE 0 END) AS V,
SUM(CASE WHEN ir.location_code LIKE 'w%' THEN ir.checkout_total ELSE 0 END) AS W,
SUM(CASE WHEN ir.location_code LIKE 'y%' THEN ir.checkout_total ELSE 0 END) AS Y
FROM sierra_view.bib_record br
LEFT JOIN sierra_view.bib_record_property brp
ON br.id = brp.bib_record_id
LEFT JOIN sierra_view.record_metadata md
ON brp.bib_record_id = md.id
LEFT JOIN
(SELECT
record_id,
marc_tag,
field_content
FROM sierra_view.varfield
WHERE marc_tag = '092'
GROUP BY
record_id,
marc_tag,
field_content
) call
ON br.record_id = call.record_id
LEFT JOIN
(SELECT
record_id,
marc_tag,
tag,
content
FROM sierra_view.subfield
--CHANGE THE TAG AGAINST WHICH YOU ARE SEARCHING HERE
WHERE marc_tag = '710'
--CHANGE THE SUBFIELD HERE
AND tag = 'a'
--CHANGE THE QUERY FOR THE FIELD CONTENT
--enter Added Entry-Corporate Name after ~ (POSIX regex) inside quotes
--use '~*' for case insensitive
--use ".*" to truncate or "." for a wildcard
AND content ~* '^History.*'
GROUP BY
record_id,
marc_tag,
tag,
content
) AddCorpName
ON br.record_id = AddCorpName.record_id
LEFT JOIN
(
SELECT
bib_record_id,
item_record_id
FROM
sierra_view.bib_record_item_record_link
GROUP BY
bib_record_id,
item_record_id
) svl
ON brp.bib_record_id = svl.bib_record_id
LEFT JOIN
(
SELECT
record_id,
checkout_total,
renewal_total,
item_status_code,
location_code
FROM
sierra_view.item_record
GROUP BY
record_id,
checkout_total,
renewal_total,
item_status_code,
location_code
) svi
ON svl.item_record_id = ir.record_id
--LIMIT BY BIB RECORD MATERIAL TYPE
--to include multiple material types, replace "= 'x'" with list, e.g. "IN ('x','y','z')"
WHERE brp.material_code = 'n'
--LIMIT BY YEAR PUBLISHED
AND brp.publish_year > '2004'
GROUP BY
record_number,
brp.best_author,
brp.best_title,
brp.publish_year,
cataloging_date_gmt,
call.marc_tag,
call.field_content,
AddCorpName.marc_tag,
AddCorpName.tag,
AddCorpName.content
-- ORDER BY YEAR PUBLISHED, DESCENDING
ORDER BY record_number
28. SUM(ir.checkout_total) AS checkouts,
SUM(ir.renewal_total) AS renewals,
SUM(ir.checkout_total + ir.renewal_total) AS total,
COUNT(ir.record_id) AS items,
SUM(CASE WHEN ir.item_status_code='-' OR ir.item_status_code='t' THEN 1 ELSE 0
END) AS active,
ROUND(SUM(ir.checkout_total + ir.renewal_total)/COUNT(ir.record_id)::decimal,2) AS
ratio,
SUM(CASE WHEN ir.location_code LIKE 'a%' THEN ir.checkout_total ELSE 0 END)
AS A,
SUM(CASE WHEN ir.location_code LIKE 'c%' THEN ir.checkout_total ELSE 0 END)
AS C, …
31. "I need an extension for my research project because I
spent all month trying to figure out whether learning
Dvorak would help me type it faster."
https://xkcd.com/1445/
EFFICIENCY
Hello,
Who reads XKCD?
Randall Munroe “A webcomic of romance, sarcasm, math, and language.”
Highly recommend Truth in jest
Laugh to keep from crying
How long can you …
Is everyone having a great IUG?
Who’s from east of the Mississippi?
Been out here seldom. Feels like a different continent.
Introduction to presentation:
Name Library
Library system working within confines of older systems with limited access to data. Millennium
Now that we have direct SQL access (indirect due to views), how do we avail ourselves of such POWER?
Give them a fish Just do it for them many of us used to this
Bait the hook and drop the line for them Build an application sort of like giving them the fish
Teach them to fish focusing on this how we have done this
Frame of reference My name My background with OCLS and before
Orange County Library System
Migrated to Sierra in late 2012
Metropolitan public library
Main downtown library and 15 branches
Main - downtown Orlando
Serving > 1,000,000 Orange County Residents
~ 1.5 million items
~ 0.5 million checkouts a month
~10% of that are requests delivered BBM
~350 employees
Annual budget totals approximately $35.3 million
Describe table
Numbers are rounded down quite a bit
24 hour days
But let’s verify this. Rigor is good, right?
56 days in 8 weeks – 2 months
Don't forget the time you spend finding the chart to look up what you save. And the time spent reading this reminder about the time spent. And the time trying to figure out if either of those actually make sense. Remember, every second counts toward your life total, including these right now.
…or the time you are spending in this presentation regarding evaluating how much time you will spend determining whether or not it is worth automating.
And I am very sorry to tell you that choosing to listen to me may have been a poor choice.
Again, we have a really powerful tool at our disposal. What do we do with it?
Give them a fish this is what we do most of the time
Bait the hook and drop the line for them automation is good it’s what we try to do when practical
Teach them to fish
Write queries for them
Teach them how it works
Let them run with it
(but give them the fish first)
GIVE THEM A FISH
These are examples of what I am here to not talk about today.
One off requests
stuck holds
Patron stats
Easiest to just write the thing Remember to save your work, especially if you spent some serious time getting it to work
Though writing sql queries for reporting may not be considered “PROGRAMMING,” comment your code. Seriously. Comment that stuff
Unless you are a beginner, you know the routine from here: export it send it. But don’t forget about it.
To recap:
If it appears one iteration of the output is all that is needed, it’s not worth the effort to develop a script or an application
These are more examples of what I am here to not talk about today.
It helps to know when folks cannot really help themselves.
Again, ask these questions
How often is the task performed?
Does it disrupt and otherwise smooth workflow?
How long does it take to perform the task?
Who needs it?
Not is the sense of whether their work is important. Everyone’s work is important.
Think of who it is and what is being asked of them.
Which leads to the question of human error.
OCLS uses Held Items Delivery
to the tune of about 40,000 items delivered each month
Circumvent traditional holds in WebPac
Traditional holds only in Sierra
When an item with a traditional hold comes up, staff had to perform an absurd process
Warning in Held Items Delivery
Check In to get patron info macro
Message for item hold patron barcode written (1st opportunity for human error)
Check Out macro type in patron barcode (2nd opportunity for human error)
Scan item barcode
Simple application
Scan a barcode carriage return sent patron barcode output
One to one relationship between query and result
Frequently performed process: 100 – 500 daily perhaps
Disrupts high volume workflow
Takes significant time
Needed by
Numerous people
non-professional staff
Introduces significant chance for human error
So, yeah. This was worth the time. Everyone is happy.
The thing is, we have knocked out automating *most* of the stuff they had in mind when I came on
I presented on some this last year Sooooo what next?
My aim is to look for places to improve efficiency anywhere I can in the organization
5 acquisitions librarians plus one more experienced para
Generally doing collection analysis one day a week each Generally involves running list(s) of titles or authors
collecting item circ info for each result Entering into spreadsheet and performing calculations
But you may have to first convince them that they will really like fishing.
One member of La Résistance remains. What I am trying to make him understand is that the aim is not remove the necessity of his work; it’s to make it possible to do more collection development work that requires critical though. You know, the fun part of the job?
Server load
Number of users
Watch for slowness
Have them communicate with one another
Time of day
Early morning for testing
Complexity of queries
Make sure they know it’s fine to kill a query
Give them a fish, and show them how awesome going fishing really is.
Prepared three documents
Basic SQL Concepts Pertaining to Sierra
Connecting
Executing Saved SQL Queries in pgAdmin Navigate to shared folder
Most basic parts of a query
How changing predicates impact query
Comments
Operators and functions most relevant
Pattern Matching
TechDocs SierraDNA
Data need presented
Existing query requiring minimal changes Copy to new file Edit to meet need Thoroughly comment predicate statements changed by user
Review results Test with user If does not meet needs revise and test again
If results are not the droids they are looking for start over clarify need
No existing query Interview the user Define needs Thoroughly comment predicate statements changed by user
Ask what else they are working on? What else would they LIKE to do?
Prepared three documents
Basic SQL Concepts Pertaining to Sierra
Connecting
Executing Saved SQL Queries in pgAdmin Navigate to shared folder
Focusing on LIKE instead of POSIX where possible
Written with example data
Focusing on LIKE instead of POSIX where possible
They are librarians, so they know when this does not meet their needs.
Tell the “DON’T SAVE” If you saved, don’t worry. I backed it up.
So what does this look like in practice?
Example problem
~200 titles in about 16 hours assuming no errors
~200 titles 4 hours
Totaled up total items + total renewal
Totaled up items
Calculated circ / copies
Manually moved all item information to one row.
I was able to replicate all of this work with one query.
Not going to go into the query.
You can learn more about it from http://www.postgresql.org/, Sierra DNA, and Stackexchange than you can from me.
This is just to illustrate what it is I am sharing with them and what I am asking of them.
These are people who can learn this. None of them have written anything on their own yet.
I suspect they will in not too long.
I was able to replicate all of this work with one query.
Not going to go into the query.
You can learn more about it from http://www.postgresql.org/, Sierra DNA, and Stackexchange than you can from me.
This is just to illustrate what it is I am sharing with them and what I am asking of them.
These are people who can learn this. None of them have written anything on their own yet.
I suspect they will in not too long.
Replaced a routine workflow.
With the time freed up, more time can be spent on understanding the data, communicating information
They need help from time to time adjusting queries, mostly regex, to evaluate new criteria
Conservative estimate:
~200 titles in about 16 hours assuming no errors
~200 titles 4 hours
6 hours/week = 2 months
(6 x 52 x 5)/24 = 65 days
(56 x 24)/8 = 168 8 hour work days
I learned what I know about SQL from looking other people’s work and applying it.
I think this model is working for us.
Will have to track over time.