SlideShare a Scribd company logo
1 of 33
https://xkcd.com/1205/
GIVE THEM A FISH OR
TEACHING THEM TO FISH
(OR SOMETHING ELSE?)
THREE MODELS FOR REPORTING FROM
SIERRA’S POSTGRESQL DATABASE
•Migrated to Sierra in late 2012
•Metropolitan public library
•Main downtown library and 15 branches
•Serving > 1,000,000 Orange County Residents
•~ 1.5 million items
•~ 0.5 million checkouts a month
•~10% of that are requests delivered
•~350 employees
•Annual budget totals approximately $35.3 million
Orange County Library System
https://xkcd.com/1205/
•1minute 50 times/day = 8 weeks
((50 x 365 x 5)/60)/24 = 63.367 days
•6 hours/week = 2 months
(6 x 52 x 5)/24 = 65 days
•(56 x 24)/8 = 168 8 hour work days
ARITHMETIC AND STUFF
"Don't forget the time you spend finding
the chart to look up what you save. And
the time spent reading this reminder
about the time spent. And the time trying
to figure out if either of those actually
make sense. Remember, every second
counts toward your life total, including
these right now."
…or the time you are spending in this
presentation regarding evaluating how
much time you will spend determining
whether or not it is worth automating.
•Give them a fish
–Just do it for them
•Bait the hook and drop the line for them
–Build an application (which is basically giving them
the fish)
•Teach them to fish
–Write queries for them
–Teach them how it works
–Let them run with it
THREE WAYS TO DELIVER DATA USING SQL
•Just do it for them?
–Write the query
–SAVE THE QUERY
– --COMMENT YOUR CODE!
–Export the data, save it as Excel, and give it to the
(information) needy
GIVE THEM A FISH
IS IT WORTH AUTOMATING?
Probably not.
•Build an application for them?
–Frequently performed process
–Manual process disrupts high volume workflow
–Takes *significant time (*relatively)
–Needed by
•Numerous people
•non-professional staff
–Introduces significant chance for human error
Maybe you should build it?
BAIT THE HOOK AND DROP THE LINE FOR THEM
•Held Items Delivery
•~40,000 items delivered each month
•Circumvent traditional holds in WebPac
•Traditional holds only in Sierra
•Staff had to perform an absurd process
WHO HAS IT?
OLD PROCESS
WHO HAS IT? SELECT (CASE pe.index_entry WHEN ' '
THEN 'no data' else pe.index_entry
end) as pid
FROM iii.sierra_view.hold hold
LEFT JOIN sierra_view.phrase_entry pe
ON hold.patron_record_id = pe.record_id
LEFT JOIN
sierra_view.item_record_property irp
ON hold.record_id = irp.item_record_id
WHERE pe.index_tag = 'b'
AND irp.barcode = [item barcode];
IS IT WORTH AUTOMATING?
…uh, yeah.
•Teach them to do it for themselves?
–Needed by
•Numerous people
•Professional staff
–Frequently performed process
–Monotonous and labor intensive
–Introduces significant chance for human error
–More complex data
TEACH THEM TO FISH (but give them the fish first)
•Number of users
•Time of day
•Complexity of queries
•You may have to first convince them that they will
really like fishing.
PRACTICAL CONSIDERATIONS FOR FISHING
DOCUMENTATION
•Prepared three documents
–Connecting with pgAdmn
–Executing Saved SQL Queries in pgAdmin
–Basic SQL Concepts Pertaining to Sierra
•Most basic parts of a query
•How changing predicates impact query
•Comments
•Operators and functions
•Pattern Matching
•SierraDNA
WRITE IT FOR THEM AND SHARE
WORKFLOW
WORKFLOW (cont’d)
Open saved query. “Search by” comments are at the top
Change predicate to meet data need.
For example change
“HAVING brp.best_author LIKE 'Patterson, Jam%'”
to
“HAVING brp.best_author LIKE 'Atwood, Marg%'”
WORKFLOW (cont’d)
WORKFLOW (cont’d)
• DVD collection analysis by publisher
EXAMPLE PROBLEM
EXAMPLE PROBLEM (cont’d)
--2015-11-06 please direct questions to David Noe
--Collection Analysis By 710$a
--
--SEARCH BY SUBJECT IN "HAVING -Added Entry-Corporate Name.CONTENT ~*
'.*[STRING].*'" (IN JOIN WITH SUBWQUERY ALIAS AddCorpName )
--SEARCH BY MULTIPLE MATERIAL TYPES IN "WHERE BRP.MATERIAL_CODE = ' '"
--SEARCH BY YEAR OF PUBLICATION IN "AND BRP.PUBLISH_YEAR > '2010'"
--
SELECT
md.record_type_code||md.record_num||'a' AS record_number,
brp.best_author,
brp.best_title,
brp.publish_year,
DATE(br.cataloging_date_gmt) AS Cat_Date,
call.field_content AS Call,
SUM(ir.checkout_total) AS checkouts,
SUM(ir.renewal_total) AS renewals,
SUM(ir.checkout_total + renewal_total) AS total,
COUNT(ir.record_id) AS items,
SUM(CASE WHEN ir.item_status_code='-' OR ir.item_status_code='t' THEN 1 ELSE 0 END)
AS active,
ROUND(SUM(ir.checkout_total + renewal_total)/COUNT(ir.record_id)::decimal,2) AS ratio,
SUM(CASE WHEN ir.location_code LIKE 'a%' THEN ir.checkout_total ELSE 0 END) AS A,
SUM(CASE WHEN ir.location_code LIKE 'c%' THEN ir.checkout_total ELSE 0 END) AS C,
SUM(CASE WHEN ir.location_code LIKE 'd%' THEN ir.checkout_total ELSE 0 END) AS D,
SUM(CASE WHEN ir.location_code LIKE 'e%' THEN ir.checkout_total ELSE 0 END) AS E,
SUM(CASE WHEN ir.location_code LIKE 'g%' THEN ir.checkout_total ELSE 0 END) AS G,
SUM(CASE WHEN ir.location_code LIKE 'h%' THEN ir.checkout_total ELSE 0 END) AS H,
SUM(CASE WHEN ir.location_code LIKE 'k%' THEN ir.checkout_total ELSE 0 END) AS K,
SUM(CASE WHEN ir.location_code LIKE 'm%' THEN ir.checkout_total ELSE 0 END) AS M,
SUM(CASE WHEN ir.location_code LIKE 'n%' THEN ir.checkout_total ELSE 0 END) AS N,
SUM(CASE WHEN ir.location_code LIKE 'p%' THEN ir.checkout_total ELSE 0 END) AS P,
SUM(CASE WHEN ir.location_code LIKE 'r%' THEN ir.checkout_total ELSE 0 END) AS R,
SUM(CASE WHEN ir.location_code LIKE 's%' THEN ir.checkout_total ELSE 0 END) AS S,
SUM(CASE WHEN ir.location_code LIKE 't%' THEN ir.checkout_total ELSE 0 END) AS T,
SUM(CASE WHEN ir.location_code LIKE 'v%' THEN ir.checkout_total ELSE 0 END) AS V,
SUM(CASE WHEN ir.location_code LIKE 'w%' THEN ir.checkout_total ELSE 0 END) AS W,
SUM(CASE WHEN ir.location_code LIKE 'y%' THEN ir.checkout_total ELSE 0 END) AS Y
FROM sierra_view.bib_record br
LEFT JOIN sierra_view.bib_record_property brp
ON br.id = brp.bib_record_id
LEFT JOIN sierra_view.record_metadata md
ON brp.bib_record_id = md.id
LEFT JOIN
(SELECT
record_id,
marc_tag,
field_content
FROM sierra_view.varfield
WHERE marc_tag = '092'
GROUP BY
record_id,
marc_tag,
field_content
) call
ON br.record_id = call.record_id
LEFT JOIN
(SELECT
record_id,
marc_tag,
tag,
content
FROM sierra_view.subfield
--CHANGE THE TAG AGAINST WHICH YOU ARE SEARCHING HERE
WHERE marc_tag = '710'
--CHANGE THE SUBFIELD HERE
AND tag = 'a'
--CHANGE THE QUERY FOR THE FIELD CONTENT
--enter Added Entry-Corporate Name after ~ (POSIX regex) inside quotes
--use '~*' for case insensitive
--use ".*" to truncate or "." for a wildcard
AND content ~* '^History.*'
GROUP BY
record_id,
marc_tag,
tag,
content
) AddCorpName
ON br.record_id = AddCorpName.record_id
LEFT JOIN
(
SELECT
bib_record_id,
item_record_id
FROM
sierra_view.bib_record_item_record_link
GROUP BY
bib_record_id,
item_record_id
) svl
ON brp.bib_record_id = svl.bib_record_id
LEFT JOIN
(
SELECT
record_id,
checkout_total,
renewal_total,
item_status_code,
location_code
FROM
sierra_view.item_record
GROUP BY
record_id,
checkout_total,
renewal_total,
item_status_code,
location_code
) svi
ON svl.item_record_id = ir.record_id
--LIMIT BY BIB RECORD MATERIAL TYPE
--to include multiple material types, replace "= 'x'" with list, e.g. "IN ('x','y','z')"
WHERE brp.material_code = 'n'
--LIMIT BY YEAR PUBLISHED
AND brp.publish_year > '2004'
GROUP BY
record_number,
brp.best_author,
brp.best_title,
brp.publish_year,
cataloging_date_gmt,
call.marc_tag,
call.field_content,
AddCorpName.marc_tag,
AddCorpName.tag,
AddCorpName.content
-- ORDER BY YEAR PUBLISHED, DESCENDING
ORDER BY record_number
SUM(ir.checkout_total) AS checkouts,
SUM(ir.renewal_total) AS renewals,
SUM(ir.checkout_total + ir.renewal_total) AS total,
COUNT(ir.record_id) AS items,
SUM(CASE WHEN ir.item_status_code='-' OR ir.item_status_code='t' THEN 1 ELSE 0
END) AS active,
ROUND(SUM(ir.checkout_total + ir.renewal_total)/COUNT(ir.record_id)::decimal,2) AS
ratio,
SUM(CASE WHEN ir.location_code LIKE 'a%' THEN ir.checkout_total ELSE 0 END)
AS A,
SUM(CASE WHEN ir.location_code LIKE 'c%' THEN ir.checkout_total ELSE 0 END)
AS C, …
EXAMPLE PROBLEM (cont’d)
WAS IT WORTH THE TIME?
"I need an extension for my research project because I
spent all month trying to figure out whether learning
Dvorak would help me type it faster."
https://xkcd.com/1445/
EFFICIENCY
QUESTIONS?
SUGGESTION?
OUTRAGE?
David Noe
noe.david@ocls.info

More Related Content

Similar to IUG2016

Introduction to Riak and Ripple (KC.rb)
Introduction to Riak and Ripple (KC.rb)Introduction to Riak and Ripple (KC.rb)
Introduction to Riak and Ripple (KC.rb)
Sean Cribbs
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
confluent
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
Jeremy Kendall
 
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
Alex Sharp
 
Ruby on Rails For Java Programmers
Ruby on Rails For Java ProgrammersRuby on Rails For Java Programmers
Ruby on Rails For Java Programmers
elliando dias
 

Similar to IUG2016 (20)

PHPUnit Episode iv.iii: Return of the tests
PHPUnit Episode iv.iii: Return of the testsPHPUnit Episode iv.iii: Return of the tests
PHPUnit Episode iv.iii: Return of the tests
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Introduction to Riak and Ripple (KC.rb)
Introduction to Riak and Ripple (KC.rb)Introduction to Riak and Ripple (KC.rb)
Introduction to Riak and Ripple (KC.rb)
 
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
 
Into The Box 2018 cbelasticsearch
Into The Box 2018   cbelasticsearchInto The Box 2018   cbelasticsearch
Into The Box 2018 cbelasticsearch
 
Downloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyDownloading the internet with Python + Scrapy
Downloading the internet with Python + Scrapy
 
Data Warehouses and Multi-Dimensional Data Analysis
Data Warehouses and Multi-Dimensional Data AnalysisData Warehouses and Multi-Dimensional Data Analysis
Data Warehouses and Multi-Dimensional Data Analysis
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Realtimeanalyticsattwitter strata2011-110204123031-phpapp02
Realtimeanalyticsattwitter strata2011-110204123031-phpapp02Realtimeanalyticsattwitter strata2011-110204123031-phpapp02
Realtimeanalyticsattwitter strata2011-110204123031-phpapp02
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
 
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
 
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
 
Using Aggregation for Analytics
Using Aggregation for Analytics Using Aggregation for Analytics
Using Aggregation for Analytics
 
Using Aggregation for analytics
Using Aggregation for analyticsUsing Aggregation for analytics
Using Aggregation for analytics
 
Agile data presentation 3 - cambridge
Agile data   presentation 3 - cambridgeAgile data   presentation 3 - cambridge
Agile data presentation 3 - cambridge
 
Ruby on Rails For Java Programmers
Ruby on Rails For Java ProgrammersRuby on Rails For Java Programmers
Ruby on Rails For Java Programmers
 

IUG2016

  • 2.
  • 3. GIVE THEM A FISH OR TEACHING THEM TO FISH (OR SOMETHING ELSE?) THREE MODELS FOR REPORTING FROM SIERRA’S POSTGRESQL DATABASE
  • 4. •Migrated to Sierra in late 2012 •Metropolitan public library •Main downtown library and 15 branches •Serving > 1,000,000 Orange County Residents •~ 1.5 million items •~ 0.5 million checkouts a month •~10% of that are requests delivered •~350 employees •Annual budget totals approximately $35.3 million Orange County Library System
  • 6. •1minute 50 times/day = 8 weeks ((50 x 365 x 5)/60)/24 = 63.367 days •6 hours/week = 2 months (6 x 52 x 5)/24 = 65 days •(56 x 24)/8 = 168 8 hour work days ARITHMETIC AND STUFF
  • 7. "Don't forget the time you spend finding the chart to look up what you save. And the time spent reading this reminder about the time spent. And the time trying to figure out if either of those actually make sense. Remember, every second counts toward your life total, including these right now."
  • 8. …or the time you are spending in this presentation regarding evaluating how much time you will spend determining whether or not it is worth automating.
  • 9. •Give them a fish –Just do it for them •Bait the hook and drop the line for them –Build an application (which is basically giving them the fish) •Teach them to fish –Write queries for them –Teach them how it works –Let them run with it THREE WAYS TO DELIVER DATA USING SQL
  • 10. •Just do it for them? –Write the query –SAVE THE QUERY – --COMMENT YOUR CODE! –Export the data, save it as Excel, and give it to the (information) needy GIVE THEM A FISH
  • 11. IS IT WORTH AUTOMATING? Probably not.
  • 12. •Build an application for them? –Frequently performed process –Manual process disrupts high volume workflow –Takes *significant time (*relatively) –Needed by •Numerous people •non-professional staff –Introduces significant chance for human error Maybe you should build it? BAIT THE HOOK AND DROP THE LINE FOR THEM
  • 13. •Held Items Delivery •~40,000 items delivered each month •Circumvent traditional holds in WebPac •Traditional holds only in Sierra •Staff had to perform an absurd process WHO HAS IT?
  • 15. WHO HAS IT? SELECT (CASE pe.index_entry WHEN ' ' THEN 'no data' else pe.index_entry end) as pid FROM iii.sierra_view.hold hold LEFT JOIN sierra_view.phrase_entry pe ON hold.patron_record_id = pe.record_id LEFT JOIN sierra_view.item_record_property irp ON hold.record_id = irp.item_record_id WHERE pe.index_tag = 'b' AND irp.barcode = [item barcode];
  • 16. IS IT WORTH AUTOMATING? …uh, yeah.
  • 17. •Teach them to do it for themselves? –Needed by •Numerous people •Professional staff –Frequently performed process –Monotonous and labor intensive –Introduces significant chance for human error –More complex data TEACH THEM TO FISH (but give them the fish first)
  • 18. •Number of users •Time of day •Complexity of queries •You may have to first convince them that they will really like fishing. PRACTICAL CONSIDERATIONS FOR FISHING
  • 19. DOCUMENTATION •Prepared three documents –Connecting with pgAdmn –Executing Saved SQL Queries in pgAdmin –Basic SQL Concepts Pertaining to Sierra •Most basic parts of a query •How changing predicates impact query •Comments •Operators and functions •Pattern Matching •SierraDNA
  • 20. WRITE IT FOR THEM AND SHARE
  • 22. WORKFLOW (cont’d) Open saved query. “Search by” comments are at the top
  • 23. Change predicate to meet data need. For example change “HAVING brp.best_author LIKE 'Patterson, Jam%'” to “HAVING brp.best_author LIKE 'Atwood, Marg%'” WORKFLOW (cont’d)
  • 25. • DVD collection analysis by publisher EXAMPLE PROBLEM
  • 27. --2015-11-06 please direct questions to David Noe --Collection Analysis By 710$a -- --SEARCH BY SUBJECT IN "HAVING -Added Entry-Corporate Name.CONTENT ~* '.*[STRING].*'" (IN JOIN WITH SUBWQUERY ALIAS AddCorpName ) --SEARCH BY MULTIPLE MATERIAL TYPES IN "WHERE BRP.MATERIAL_CODE = ' '" --SEARCH BY YEAR OF PUBLICATION IN "AND BRP.PUBLISH_YEAR > '2010'" -- SELECT md.record_type_code||md.record_num||'a' AS record_number, brp.best_author, brp.best_title, brp.publish_year, DATE(br.cataloging_date_gmt) AS Cat_Date, call.field_content AS Call, SUM(ir.checkout_total) AS checkouts, SUM(ir.renewal_total) AS renewals, SUM(ir.checkout_total + renewal_total) AS total, COUNT(ir.record_id) AS items, SUM(CASE WHEN ir.item_status_code='-' OR ir.item_status_code='t' THEN 1 ELSE 0 END) AS active, ROUND(SUM(ir.checkout_total + renewal_total)/COUNT(ir.record_id)::decimal,2) AS ratio, SUM(CASE WHEN ir.location_code LIKE 'a%' THEN ir.checkout_total ELSE 0 END) AS A, SUM(CASE WHEN ir.location_code LIKE 'c%' THEN ir.checkout_total ELSE 0 END) AS C, SUM(CASE WHEN ir.location_code LIKE 'd%' THEN ir.checkout_total ELSE 0 END) AS D, SUM(CASE WHEN ir.location_code LIKE 'e%' THEN ir.checkout_total ELSE 0 END) AS E, SUM(CASE WHEN ir.location_code LIKE 'g%' THEN ir.checkout_total ELSE 0 END) AS G, SUM(CASE WHEN ir.location_code LIKE 'h%' THEN ir.checkout_total ELSE 0 END) AS H, SUM(CASE WHEN ir.location_code LIKE 'k%' THEN ir.checkout_total ELSE 0 END) AS K, SUM(CASE WHEN ir.location_code LIKE 'm%' THEN ir.checkout_total ELSE 0 END) AS M, SUM(CASE WHEN ir.location_code LIKE 'n%' THEN ir.checkout_total ELSE 0 END) AS N, SUM(CASE WHEN ir.location_code LIKE 'p%' THEN ir.checkout_total ELSE 0 END) AS P, SUM(CASE WHEN ir.location_code LIKE 'r%' THEN ir.checkout_total ELSE 0 END) AS R, SUM(CASE WHEN ir.location_code LIKE 's%' THEN ir.checkout_total ELSE 0 END) AS S, SUM(CASE WHEN ir.location_code LIKE 't%' THEN ir.checkout_total ELSE 0 END) AS T, SUM(CASE WHEN ir.location_code LIKE 'v%' THEN ir.checkout_total ELSE 0 END) AS V, SUM(CASE WHEN ir.location_code LIKE 'w%' THEN ir.checkout_total ELSE 0 END) AS W, SUM(CASE WHEN ir.location_code LIKE 'y%' THEN ir.checkout_total ELSE 0 END) AS Y FROM sierra_view.bib_record br LEFT JOIN sierra_view.bib_record_property brp ON br.id = brp.bib_record_id LEFT JOIN sierra_view.record_metadata md ON brp.bib_record_id = md.id LEFT JOIN (SELECT record_id, marc_tag, field_content FROM sierra_view.varfield WHERE marc_tag = '092' GROUP BY record_id, marc_tag, field_content ) call ON br.record_id = call.record_id LEFT JOIN (SELECT record_id, marc_tag, tag, content FROM sierra_view.subfield --CHANGE THE TAG AGAINST WHICH YOU ARE SEARCHING HERE WHERE marc_tag = '710' --CHANGE THE SUBFIELD HERE AND tag = 'a' --CHANGE THE QUERY FOR THE FIELD CONTENT --enter Added Entry-Corporate Name after ~ (POSIX regex) inside quotes --use '~*' for case insensitive --use ".*" to truncate or "." for a wildcard AND content ~* '^History.*' GROUP BY record_id, marc_tag, tag, content ) AddCorpName ON br.record_id = AddCorpName.record_id LEFT JOIN ( SELECT bib_record_id, item_record_id FROM sierra_view.bib_record_item_record_link GROUP BY bib_record_id, item_record_id ) svl ON brp.bib_record_id = svl.bib_record_id LEFT JOIN ( SELECT record_id, checkout_total, renewal_total, item_status_code, location_code FROM sierra_view.item_record GROUP BY record_id, checkout_total, renewal_total, item_status_code, location_code ) svi ON svl.item_record_id = ir.record_id --LIMIT BY BIB RECORD MATERIAL TYPE --to include multiple material types, replace "= 'x'" with list, e.g. "IN ('x','y','z')" WHERE brp.material_code = 'n' --LIMIT BY YEAR PUBLISHED AND brp.publish_year > '2004' GROUP BY record_number, brp.best_author, brp.best_title, brp.publish_year, cataloging_date_gmt, call.marc_tag, call.field_content, AddCorpName.marc_tag, AddCorpName.tag, AddCorpName.content -- ORDER BY YEAR PUBLISHED, DESCENDING ORDER BY record_number
  • 28. SUM(ir.checkout_total) AS checkouts, SUM(ir.renewal_total) AS renewals, SUM(ir.checkout_total + ir.renewal_total) AS total, COUNT(ir.record_id) AS items, SUM(CASE WHEN ir.item_status_code='-' OR ir.item_status_code='t' THEN 1 ELSE 0 END) AS active, ROUND(SUM(ir.checkout_total + ir.renewal_total)/COUNT(ir.record_id)::decimal,2) AS ratio, SUM(CASE WHEN ir.location_code LIKE 'a%' THEN ir.checkout_total ELSE 0 END) AS A, SUM(CASE WHEN ir.location_code LIKE 'c%' THEN ir.checkout_total ELSE 0 END) AS C, …
  • 30. WAS IT WORTH THE TIME?
  • 31. "I need an extension for my research project because I spent all month trying to figure out whether learning Dvorak would help me type it faster." https://xkcd.com/1445/ EFFICIENCY

Editor's Notes

  1. Hello, Who reads XKCD? Randall Munroe  “A webcomic of romance, sarcasm, math, and language.” Highly recommend  Truth in jest Laugh to keep from crying How long can you …
  2. Is everyone having a great IUG? Who’s from east of the Mississippi? Been out here seldom. Feels like a different continent.
  3. Introduction to presentation: Name  Library Library system working within confines of older systems with limited access to data. Millennium Now that we have direct SQL access (indirect due to views), how do we avail ourselves of such POWER? Give them a fish  Just do it for them  many of us used to this Bait the hook and drop the line for them  Build an application  sort of like giving them the fish Teach them to fish  focusing on this  how we have done this
  4. Frame of reference  My name  My background with OCLS and before Orange County Library System Migrated to Sierra in late 2012 Metropolitan public library Main downtown library and 15 branches Main - downtown Orlando Serving > 1,000,000 Orange County Residents ~ 1.5 million items ~ 0.5 million checkouts a month ~10% of that are requests delivered  BBM ~350 employees Annual budget totals approximately $35.3 million
  5. Describe table Numbers are rounded down quite a bit 24 hour days
  6. But let’s verify this. Rigor is good, right? 56 days in 8 weeks – 2 months
  7. Don't forget the time you spend finding the chart to look up what you save. And the time spent reading this reminder about the time spent. And the time trying to figure out if either of those actually make sense. Remember, every second counts toward your life total, including these right now.
  8. …or the time you are spending in this presentation regarding evaluating how much time you will spend determining whether or not it is worth automating. And I am very sorry to tell you that choosing to listen to me may have been a poor choice.
  9. Again, we have a really powerful tool at our disposal. What do we do with it? Give them a fish  this is what we do most of the time Bait the hook and drop the line for them  automation is good  it’s what we try to do when practical Teach them to fish Write queries for them Teach them how it works Let them run with it (but give them the fish first)
  10. GIVE THEM A FISH These are examples of what I am here to not talk about today. One off requests stuck holds Patron stats Easiest to just write the thing  Remember to save your work, especially if you spent some serious time getting it to work Though writing sql queries for reporting may not be considered “PROGRAMMING,” comment your code. Seriously. Comment that stuff Unless you are a beginner, you know the routine from here: export it  send it. But don’t forget about it.
  11. To recap: If it appears one iteration of the output is all that is needed, it’s not worth the effort to develop a script or an application
  12. These are more examples of what I am here to not talk about today. It helps to know when folks cannot really help themselves. Again, ask these questions How often is the task performed? Does it disrupt and otherwise smooth workflow? How long does it take to perform the task? Who needs it? Not is the sense of whether their work is important. Everyone’s work is important. Think of who it is and what is being asked of them. Which leads to the question of human error.
  13. OCLS uses Held Items Delivery  to the tune of about 40,000 items delivered each month Circumvent traditional holds in WebPac Traditional holds only in Sierra When an item with a traditional hold comes up, staff had to perform an absurd process
  14. Warning in Held Items Delivery Check In to get patron info  macro Message for item hold  patron barcode  written (1st opportunity for human error) Check Out  macro  type in patron barcode (2nd opportunity for human error) Scan item barcode
  15. Simple application Scan a barcode  carriage return sent  patron barcode output
  16. One to one relationship between query and result Frequently performed process: 100 – 500 daily perhaps Disrupts high volume workflow Takes significant time Needed by Numerous people non-professional staff Introduces significant chance for human error So, yeah. This was worth the time. Everyone is happy. The thing is, we have knocked out automating *most* of the stuff they had in mind when I came on  I presented on some this last year  Sooooo what next?
  17. My aim is to look for places to improve efficiency anywhere I can in the organization 5 acquisitions librarians plus one more experienced para Generally doing collection analysis one day a week each  Generally involves running list(s) of titles or authors  collecting item circ info for each result  Entering into spreadsheet and performing calculations But you may have to first convince them that they will really like fishing. One member of La Résistance remains.  What I am trying to make him understand is that the aim is not remove the necessity of his work; it’s to make it possible to do more collection development work that requires critical though. You know, the fun part of the job?
  18. Server load  Number of users Watch for slowness Have them communicate with one another Time of day Early morning for testing Complexity of queries Make sure they know it’s fine to kill a query Give them a fish, and show them how awesome going fishing really is.
  19. Prepared three documents Basic SQL Concepts Pertaining to Sierra Connecting Executing Saved SQL Queries in pgAdmin Navigate to shared folder Most basic parts of a query How changing predicates impact query Comments Operators and functions most relevant Pattern Matching TechDocs  SierraDNA
  20. Data need presented Existing query requiring minimal changes  Copy to new file  Edit to meet need  Thoroughly comment predicate statements changed by user Review results  Test with user  If does not meet needs revise and test again If results are not the droids they are looking for start over  clarify need No existing query  Interview the user  Define needs  Thoroughly comment predicate statements changed by user Ask what else they are working on?  What else would they LIKE to do?
  21. Prepared three documents Basic SQL Concepts Pertaining to Sierra Connecting Executing Saved SQL Queries in pgAdmin Navigate to shared folder
  22. Focusing on LIKE instead of POSIX where possible
  23. Written with example data Focusing on LIKE instead of POSIX where possible They are librarians, so they know when this does not meet their needs. Tell the “DON’T SAVE”  If you saved, don’t worry. I backed it up.
  24. So what does this look like in practice? Example problem ~200 titles in about 16 hours assuming no errors ~200 titles 4 hours
  25. Totaled up total items + total renewal  Totaled up items  Calculated circ / copies  Manually moved all item information to one row.
  26. I was able to replicate all of this work with one query. Not going to go into the query. You can learn more about it from http://www.postgresql.org/, Sierra DNA, and Stackexchange than you can from me. This is just to illustrate what it is I am sharing with them and what I am asking of them. These are people who can learn this.  None of them have written anything on their own yet. I suspect they will in not too long.
  27. I was able to replicate all of this work with one query. Not going to go into the query. You can learn more about it from http://www.postgresql.org/, Sierra DNA, and Stackexchange than you can from me. This is just to illustrate what it is I am sharing with them and what I am asking of them. These are people who can learn this.  None of them have written anything on their own yet. I suspect they will in not too long.
  28. Replaced a routine workflow. With the time freed up, more time can be spent on understanding the data, communicating information They need help from time to time adjusting queries, mostly regex, to evaluate new criteria Conservative estimate: ~200 titles in about 16 hours assuming no errors ~200 titles 4 hours
  29. 6 hours/week = 2 months (6 x 52 x 5)/24 = 65 days (56 x 24)/8 = 168 8 hour work days I learned what I know about SQL from looking other people’s work and applying it. I think this model is working for us. Will have to track over time.