SlideShare a Scribd company logo
Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation Roy Zimmer Western Michigan University
About 5 or 6  years  ago… No more SSN  switch to using WIN WIN is our Western Identification Number
About 5 or 6  years  ago… No more SSN  switch to using WIN Banner WIN is our Western Identification Number
About 5 or 6  years  ago… No more SSN  switch to using WIN Banner New campus ID cards WIN is our Western Identification Number
A few less years ago… Rewrote the patron update process to use Banner
A few less years ago… Rewrote the patron update process to use Banner Started thinking about not being SSN-based
2007-2008 The WIN had become available in the data feeds for our patron update. Needed to change  Institution ID interim step: arbitrary 14-digits -> WIN final step: WIN -> Bronco NetID Patron update was switched from being SSN-based to WIN-based. BroncoNetID is our single signon ID
Summer 2008 – What we started with Have data for about 74,000 patrons. About 183,000 barcodes (less than half are active!).
Summer 2008 – What we started with Have data for about 74,000 patrons. About 183,000 barcodes (less than half are active!). Several thousand duplicate records, one with SSN, one with WIN (in the SSAN field) The older duplicate record typically had charges, amounts owed, etc.
2008: August – October Most of my time was spent on the cleanup… Dali
Patron duplicate detector – LB4020 foreign students various errors Sample follows… August
(WINs & SSNs above are not real) Sample output used one day
Our first run came up with  3489  duplicate patron records.
We created a program that used the LB4020 report as input to identify patron records that we wanted to alter – call it LB4020fix. These records needed to be extracted from Voyager for modification and re-import. Modify me   with LB4020fix
Voyager has a patron extract utility, but it doesn’t extract  all   relevant data for a patron. We’d started using our own – patronsif.pl - years ago.
Voyager has a patron extract utility, but it doesn’t extract  all   relevant data for a patron. We’d started using our own – patronsif.pl - years ago.  Voyager extract (Pptrnextr) Up to 3 patron-barcode + group combinations Similarly limited number of addresses WMU extract (patronsif.pl) Unlimited patron-barcode + group combinations Unlimited number of addresses + - + -  ->   +
Voyager has a patron extract utility, but it doesn’t extract  all   relevant data for a patron. We’d started using our own – patronsif.pl - years ago.  For the patron cleanup we incorporated patronsif.pl into LB4020fix. Patron notes field problem: CR+LF stored if user pressed the  RETURN  key creates unwanted extra lines within a record drop_crlf utility replaces “CR+LF” with “space+space”
LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records. SIF-A new WIN-based records BroncoNetID in InstitutionID change expiredate to 1981.01.01 SIF-B old SSN-based records change InstitutionID to current BroncoNetID SIF-C new WIN-based records have the current update, expire, and purge dates and BroncoNetID The heart of the cleanup process
SIF-A new WIN-based records BroncoNetID in InstitutionID change expiredate to 1981.01.01 SIF-B old SSN-based records change InstitutionID to current BroncoNetID SIF-C new WIN-based records have the current update, expire, and purge dates and BroncoNetID update, key on SSN purge on expiredate 1982.01.01 [remove new records] 1 LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records. The heart of the cleanup process
SIF-A new WIN-based records BroncoNetID in InstitutionID change expiredate to 1981.01.01 SIF-B old SSN-based records change InstitutionID to current BroncoNetID SIF-C new WIN-based records have the current update, expire, and purge dates and BroncoNetID update, key on SSN purge on expiredate 1982.01.01 [remove new records] update, key on SSN [prep old records  to be “new”] 1 2 LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records. The heart of the cleanup process
SIF-A new WIN-based records BroncoNetID in InstitutionID change expiredate to 1981.01.01 SIF-B old SSN-based records change InstitutionID to current BroncoNetID SIF-C new WIN-based records have the current update, expire, and purge dates and BroncoNetID update, key on SSN purge on expiredate 1982.01.01 [remove new records] update, key on SSN [prep old records  to be “new”] update, key on InstID [unify old records with new data] 1 2 3 LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records. The heart of the cleanup process
SIF-A new WIN-based records have current BroncoNetID change expiredate to 1981.01.01 SIF-B old SSN-based records change InstitutionID to current BroncoNetID SIF-C new WIN-based records have the current update, expire, and purge dates and BroncoNetID update, key on SSN purge on expiredate 1982.01.01 [remove new records] update, key on SSN [prep old records  to be “new”] update, key on InstID [unify old records with new data] 1 2 3 LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records. The heart of the cleanup process This clean-up process, with variations, was repeated many times. Details omitted here for the sake of brevity  (and sanity).
Several things went awry along the way. Not all records could be matched up with a WIN or SSN (as  reported by LB4020), so those had to be handled by  assigning temporary SSNs, WINs, and/or Institution IDs.
Several things went awry along the way. Not all records could be matched up with a WIN or SSN (as  reported by LB4020), so those had to be handled by  assigning temporary SSNs, WINs, and/or Institution IDs. At another point, the interim records used in the process  weren’t deleted during a purge. Those had to be detected,  reassigned an older expiration date (1971.01.01), and  carefully purged before proceeding.
We now had  1081  duplicate patron records.
We added the expiration date to the duplicate detector, LB4020. Now we could see that all the SSN-based records were expired, or about to be.
We added the expiration date to the duplicate detector, LB4020. Now we could see that all the SSN-based records were expired, or about to be. At this time we discovered that new WIN-based records were coming in as duplicates to  SSN-based records that were typically set to expire 2008.09.08.
We added the expiration date to the duplicate detector, LB4020. Now we could see that all the SSN-based records were expired, or about to be. At this time we discovered that new WIN-based records were coming in as duplicates to  SSN-based records that were typically set to expire 2008.09.08. This had to change!
We added the expiration date to the duplicate detector, LB4020. Now we could see that all the SSN-based records were expired, or about to be. At this time we discovered that new WIN-based records were coming in as duplicates to  SSN-based records that were typically set to expire 2008.09.08. This had to change! And the semester was about to start…
Yes, we  did  avert disaster. But we had more problems. Early September…
Yes, we  did  avert disaster. But we had more problems. The duplicate detection report, which had grown to 60 pages, was now down to 1. The next day it had grown to 3 pages. Early September…
Yes, we  did  avert disaster. But we had more problems. The duplicate detection report, which had grown to 60 pages, was now down to 1. The next day it had grown to 3 pages. Some records not having all fields populated on the LB4020 duplicate detector caused problems. Also had to fix duplicate records where the SSAN field was null. Early September…
We removed several hundred obsolete records that had neither WIN nor SSN.  Discovered records that had no Institution ID – yet another problem. Mid September…
We removed several hundred obsolete records that had neither WIN nor SSN.  Discovered records that had no Institution ID – yet another problem. We are now down to 1 SSN-based record. Mid September… This person had our assigned WIN being the same as the SSN. Not supposed to happen! Identified 15 more such instances and submitted them to I.T. for correction.
Found some more SSN-based records – don’t know why they still existed – and converted them to being WIN-based. October… Flipped the “switch” so that we no longer get SSNs for our patron update.
Still had records from our NOTIS era – pre Summer 1998 Purged them if they: did not have life-time borrowing privileges did not have an SSN recorded did have an Institution ID Legacy data
Trouble ahead… 3M  SelfCheck
Trouble ahead… Multiple Active Barcodes will NOT work with SelfCheck! 3M  SelfCheck
3M SelfCheck requires 1 active barcode per patron. We had 11058 patrons with multiple active barcodes.
3M SelfCheck requires 1 active barcode per patron. We had 11058 patrons with multiple active barcodes. Wrote a program to whittle that down. Got them reduced to 300, but the next day, it was up to 1777!
3M SelfCheck requires 1 active barcode per patron. We had 11058 patrons with multiple active barcodes. Wrote a program to whittle that down. Got them reduced to 300, but the next day, it was up to 1777! Under control now, with patrononeactive.pl, running Monday – Friday. This keeps only the most current active barcode for a patron.
3M SelfCheck requires 1 active barcode per patron. We had 11058 patrons with multiple active barcodes. Wrote a program to whittle that down. Got them reduced to 300, but the next day, it was up to 1777! Under control now, with patrononeactive.pl, running Monday – Friday. This keeps only the most current active barcode for a patron. Forgot about those patron records without an Institution ID. Had 882 of them. Fixed them.
We looked at records created before 2008, those that had no SSN but did have an Institution ID. Extracted these records, modified them: expiredate = createdate purgedate = expiredate + 4 years Reimported these records. They should disappear with future annual patron purges. An eye towards the future…
We still had 11,696 records with no SSN (nor WIN). We expect most of these to be routinely purged in the future, leaving us with 456. What we ended with
We still had 11,696 records with no SSN (nor WIN). We expect most of these to be routinely purged in the future, leaving us with 456. When we started, we had about 250,000 patron records. We now have about 68,000. Duplicate records are routinely dealt with. We filter out all but the single most current active barcode for a patron. We will have annual patron purges. What we ended with
Know what you’re starting with. Keep your goal in mind. Figure out a good solution. Be flexible. Be ready for mistakes. Watch out for new/current data undoing your changes. Know when you’re done. Worthwhile points…
patronsif.pl drop_crlf lb4020.pl lb4020fix.pl patrononeactive.pl patrononactive.ksh Contact me if you would like to get any of the above. Resources
patronsif.pl as listed, gets patron data and puts it in patron SIF format. institution ID based. gets all patron+barcode groupings. (not site-specific) drop_crlf shell script that contains this line:   perl -pi -e's// /g' $1 replaces CR+LF combination with two spaces. (this is useful anytime you use patronsif.pl) Some details on the resources…
lb4020.pl detects duplicate patron records. shows:  name, expired (Y/N), SSAN, expire date, modify date, institution ID WMU-specific: indicates whether SSN or WIN in SSAN. modification  required  for your institution. lb4020fix.pl control structure around patronsif.pl code that uses lb4020.pl output as starting point for the fixing process. creates one or more patron SIF files for fixing data. use  drop_crlf if necessary. Some details on the resources…
patrononeactive.pl queries Voyager, checking patrons’ active barcodes. if  more than one is found, changes all but the most recent active barcodes to  other . check the code carefully as it may need modification for your use. (incorporates patronsif.pl code) patrononeactive.ksh combines patrononeactive.pl and drop_crlf in a script suitable for cron use Some details on the resources…
Picture © 2008 by Roy Zimmer Thank you for listening. Roy Zimmer [email_address]

More Related Content

Similar to Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation

Ch04 completing the accounting cycle, intro accounting, 21st edition warren...
Ch04   completing the accounting cycle, intro accounting, 21st edition warren...Ch04   completing the accounting cycle, intro accounting, 21st edition warren...
Ch04 completing the accounting cycle, intro accounting, 21st edition warren...
Trisdarisa Soedarto, MPM, MQM
 
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
asahiushio1
 
Blockchain v Cryptocurrency: Talk for BridgeSF
Blockchain v Cryptocurrency: Talk for BridgeSF Blockchain v Cryptocurrency: Talk for BridgeSF
Blockchain v Cryptocurrency: Talk for BridgeSF
Kaliya "Identity Woman" Young
 
Pycon2020 EN
Pycon2020 ENPycon2020 EN
Pycon2020 EN
Todd Perry
 
[DSC Europe 23][Cryptica] Jovan_Milovanovic-Bank_Statement_Data_Analysis.pdf
[DSC Europe 23][Cryptica] Jovan_Milovanovic-Bank_Statement_Data_Analysis.pdf[DSC Europe 23][Cryptica] Jovan_Milovanovic-Bank_Statement_Data_Analysis.pdf
[DSC Europe 23][Cryptica] Jovan_Milovanovic-Bank_Statement_Data_Analysis.pdf
DataScienceConferenc1
 
Logs & Visualizations at Twitter
Logs & Visualizations at TwitterLogs & Visualizations at Twitter
Logs & Visualizations at Twitter
Krist Wongsuphasawat
 
Acl
AclAcl
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Krist Wongsuphasawat
 
Asset Tracking System
Asset Tracking SystemAsset Tracking System
Asset Tracking System
Orange Technolab Pvt Ltd
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
Christian Martorella
 
AI-SDV 2021: Lighthouse IP
AI-SDV 2021: Lighthouse IPAI-SDV 2021: Lighthouse IP
AI-SDV 2021: Lighthouse IP
Dr. Haxel Consult
 
Cis515
Cis515Cis515
Cis515
Tracy Clark
 
Chapter 9 Exercise 31. Liquidity ratios. Edison, Stagg, and Thor.docx
Chapter 9 Exercise 31. Liquidity ratios. Edison, Stagg, and Thor.docxChapter 9 Exercise 31. Liquidity ratios. Edison, Stagg, and Thor.docx
Chapter 9 Exercise 31. Liquidity ratios. Edison, Stagg, and Thor.docx
christinemaritza
 
Subscribed 2017: Building a Data Pipeline to Engage and Retain Your Subscribers
Subscribed 2017: Building a Data Pipeline to Engage and Retain Your SubscribersSubscribed 2017: Building a Data Pipeline to Engage and Retain Your Subscribers
Subscribed 2017: Building a Data Pipeline to Engage and Retain Your Subscribers
Zuora, Inc.
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Nicola Sandoli
 
Jeremiah O'Connor & David Maynor - Chasing the Crypto Workshop: Tracking Fina...
Jeremiah O'Connor & David Maynor - Chasing the Crypto Workshop: Tracking Fina...Jeremiah O'Connor & David Maynor - Chasing the Crypto Workshop: Tracking Fina...
Jeremiah O'Connor & David Maynor - Chasing the Crypto Workshop: Tracking Fina...
NoNameCon
 
Umuc acct 220 complete course latest 2016 feb
Umuc acct 220 complete course latest 2016 febUmuc acct 220 complete course latest 2016 feb
Umuc acct 220 complete course latest 2016 feb
oking2777
 
Reports in Horizon
Reports in HorizonReports in Horizon
Reports in Horizon
Johnny Pe
 
Umuc acct 220 complete course latest 2016 feb
Umuc acct 220 complete course latest 2016 febUmuc acct 220 complete course latest 2016 feb
Umuc acct 220 complete course latest 2016 feb
ENTIRE COURSES FINAL EXAM
 
Acct 220 complete course latest 2016 feb
 Acct 220 complete course latest 2016 feb Acct 220 complete course latest 2016 feb
Acct 220 complete course latest 2016 feb
ENTIRE COURSES FINAL EXAM
 

Similar to Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation (20)

Ch04 completing the accounting cycle, intro accounting, 21st edition warren...
Ch04   completing the accounting cycle, intro accounting, 21st edition warren...Ch04   completing the accounting cycle, intro accounting, 21st edition warren...
Ch04 completing the accounting cycle, intro accounting, 21st edition warren...
 
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on...
 
Blockchain v Cryptocurrency: Talk for BridgeSF
Blockchain v Cryptocurrency: Talk for BridgeSF Blockchain v Cryptocurrency: Talk for BridgeSF
Blockchain v Cryptocurrency: Talk for BridgeSF
 
Pycon2020 EN
Pycon2020 ENPycon2020 EN
Pycon2020 EN
 
[DSC Europe 23][Cryptica] Jovan_Milovanovic-Bank_Statement_Data_Analysis.pdf
[DSC Europe 23][Cryptica] Jovan_Milovanovic-Bank_Statement_Data_Analysis.pdf[DSC Europe 23][Cryptica] Jovan_Milovanovic-Bank_Statement_Data_Analysis.pdf
[DSC Europe 23][Cryptica] Jovan_Milovanovic-Bank_Statement_Data_Analysis.pdf
 
Logs & Visualizations at Twitter
Logs & Visualizations at TwitterLogs & Visualizations at Twitter
Logs & Visualizations at Twitter
 
Acl
AclAcl
Acl
 
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
 
Asset Tracking System
Asset Tracking SystemAsset Tracking System
Asset Tracking System
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
AI-SDV 2021: Lighthouse IP
AI-SDV 2021: Lighthouse IPAI-SDV 2021: Lighthouse IP
AI-SDV 2021: Lighthouse IP
 
Cis515
Cis515Cis515
Cis515
 
Chapter 9 Exercise 31. Liquidity ratios. Edison, Stagg, and Thor.docx
Chapter 9 Exercise 31. Liquidity ratios. Edison, Stagg, and Thor.docxChapter 9 Exercise 31. Liquidity ratios. Edison, Stagg, and Thor.docx
Chapter 9 Exercise 31. Liquidity ratios. Edison, Stagg, and Thor.docx
 
Subscribed 2017: Building a Data Pipeline to Engage and Retain Your Subscribers
Subscribed 2017: Building a Data Pipeline to Engage and Retain Your SubscribersSubscribed 2017: Building a Data Pipeline to Engage and Retain Your Subscribers
Subscribed 2017: Building a Data Pipeline to Engage and Retain Your Subscribers
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
 
Jeremiah O'Connor & David Maynor - Chasing the Crypto Workshop: Tracking Fina...
Jeremiah O'Connor & David Maynor - Chasing the Crypto Workshop: Tracking Fina...Jeremiah O'Connor & David Maynor - Chasing the Crypto Workshop: Tracking Fina...
Jeremiah O'Connor & David Maynor - Chasing the Crypto Workshop: Tracking Fina...
 
Umuc acct 220 complete course latest 2016 feb
Umuc acct 220 complete course latest 2016 febUmuc acct 220 complete course latest 2016 feb
Umuc acct 220 complete course latest 2016 feb
 
Reports in Horizon
Reports in HorizonReports in Horizon
Reports in Horizon
 
Umuc acct 220 complete course latest 2016 feb
Umuc acct 220 complete course latest 2016 febUmuc acct 220 complete course latest 2016 feb
Umuc acct 220 complete course latest 2016 feb
 
Acct 220 complete course latest 2016 feb
 Acct 220 complete course latest 2016 feb Acct 220 complete course latest 2016 feb
Acct 220 complete course latest 2016 feb
 

More from Roy Zimmer

Automating a Vendor File Load Process with Perl and Shell Scripting
Automating a Vendor File Load Process with Perl and Shell ScriptingAutomating a Vendor File Load Process with Perl and Shell Scripting
Automating a Vendor File Load Process with Perl and Shell Scripting
Roy Zimmer
 
Orientation Session for (New) Presenters and Moderators
Orientation Session for (New) Presenters and ModeratorsOrientation Session for (New) Presenters and Moderators
Orientation Session for (New) Presenters and Moderators
Roy Zimmer
 
Perl DBI Scripting with the ILS
Perl DBI Scripting with the ILSPerl DBI Scripting with the ILS
Perl DBI Scripting with the ILS
Roy Zimmer
 
You Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager NeedsYou Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager Needs
Roy Zimmer
 
Voyager Meets MeLCat: MC'ing the Introductions
Voyager Meets MeLCat: MC'ing the IntroductionsVoyager Meets MeLCat: MC'ing the Introductions
Voyager Meets MeLCat: MC'ing the Introductions
Roy Zimmer
 
Plunging Into Perl While Avoiding the Deep End (mostly)
Plunging Into Perl While Avoiding the Deep End (mostly)Plunging Into Perl While Avoiding the Deep End (mostly)
Plunging Into Perl While Avoiding the Deep End (mostly)
Roy Zimmer
 
Marcive Documents: Catching Up and Keeping Up
Marcive Documents: Catching Up and Keeping UpMarcive Documents: Catching Up and Keeping Up
Marcive Documents: Catching Up and Keeping Up
Roy Zimmer
 
Implementing a Backup Catalog… on a Student Budget
Implementing a Backup Catalog… on a Student BudgetImplementing a Backup Catalog… on a Student Budget
Implementing a Backup Catalog… on a Student Budget
Roy Zimmer
 
A Strand of Perls: Some Home Grown Utilities
A Strand of Perls: Some Home Grown UtilitiesA Strand of Perls: Some Home Grown Utilities
A Strand of Perls: Some Home Grown Utilities
Roy Zimmer
 
Another Way to Attack the BLOB: Server-side Access via PL/SQL and Perl
Another Way to Attack the BLOB: Server-side Access via PL/SQL and PerlAnother Way to Attack the BLOB: Server-side Access via PL/SQL and Perl
Another Way to Attack the BLOB: Server-side Access via PL/SQL and Perl
Roy Zimmer
 
Batchhow
BatchhowBatchhow
Batchhow
Roy Zimmer
 

More from Roy Zimmer (11)

Automating a Vendor File Load Process with Perl and Shell Scripting
Automating a Vendor File Load Process with Perl and Shell ScriptingAutomating a Vendor File Load Process with Perl and Shell Scripting
Automating a Vendor File Load Process with Perl and Shell Scripting
 
Orientation Session for (New) Presenters and Moderators
Orientation Session for (New) Presenters and ModeratorsOrientation Session for (New) Presenters and Moderators
Orientation Session for (New) Presenters and Moderators
 
Perl DBI Scripting with the ILS
Perl DBI Scripting with the ILSPerl DBI Scripting with the ILS
Perl DBI Scripting with the ILS
 
You Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager NeedsYou Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager Needs
 
Voyager Meets MeLCat: MC'ing the Introductions
Voyager Meets MeLCat: MC'ing the IntroductionsVoyager Meets MeLCat: MC'ing the Introductions
Voyager Meets MeLCat: MC'ing the Introductions
 
Plunging Into Perl While Avoiding the Deep End (mostly)
Plunging Into Perl While Avoiding the Deep End (mostly)Plunging Into Perl While Avoiding the Deep End (mostly)
Plunging Into Perl While Avoiding the Deep End (mostly)
 
Marcive Documents: Catching Up and Keeping Up
Marcive Documents: Catching Up and Keeping UpMarcive Documents: Catching Up and Keeping Up
Marcive Documents: Catching Up and Keeping Up
 
Implementing a Backup Catalog… on a Student Budget
Implementing a Backup Catalog… on a Student BudgetImplementing a Backup Catalog… on a Student Budget
Implementing a Backup Catalog… on a Student Budget
 
A Strand of Perls: Some Home Grown Utilities
A Strand of Perls: Some Home Grown UtilitiesA Strand of Perls: Some Home Grown Utilities
A Strand of Perls: Some Home Grown Utilities
 
Another Way to Attack the BLOB: Server-side Access via PL/SQL and Perl
Another Way to Attack the BLOB: Server-side Access via PL/SQL and PerlAnother Way to Attack the BLOB: Server-side Access via PL/SQL and Perl
Another Way to Attack the BLOB: Server-side Access via PL/SQL and Perl
 
Batchhow
BatchhowBatchhow
Batchhow
 

Recently uploaded

How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation

  • 1. Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation Roy Zimmer Western Michigan University
  • 2. About 5 or 6 years ago… No more SSN switch to using WIN WIN is our Western Identification Number
  • 3. About 5 or 6 years ago… No more SSN switch to using WIN Banner WIN is our Western Identification Number
  • 4. About 5 or 6 years ago… No more SSN switch to using WIN Banner New campus ID cards WIN is our Western Identification Number
  • 5. A few less years ago… Rewrote the patron update process to use Banner
  • 6. A few less years ago… Rewrote the patron update process to use Banner Started thinking about not being SSN-based
  • 7. 2007-2008 The WIN had become available in the data feeds for our patron update. Needed to change Institution ID interim step: arbitrary 14-digits -> WIN final step: WIN -> Bronco NetID Patron update was switched from being SSN-based to WIN-based. BroncoNetID is our single signon ID
  • 8. Summer 2008 – What we started with Have data for about 74,000 patrons. About 183,000 barcodes (less than half are active!).
  • 9. Summer 2008 – What we started with Have data for about 74,000 patrons. About 183,000 barcodes (less than half are active!). Several thousand duplicate records, one with SSN, one with WIN (in the SSAN field) The older duplicate record typically had charges, amounts owed, etc.
  • 10. 2008: August – October Most of my time was spent on the cleanup… Dali
  • 11. Patron duplicate detector – LB4020 foreign students various errors Sample follows… August
  • 12. (WINs & SSNs above are not real) Sample output used one day
  • 13. Our first run came up with 3489 duplicate patron records.
  • 14. We created a program that used the LB4020 report as input to identify patron records that we wanted to alter – call it LB4020fix. These records needed to be extracted from Voyager for modification and re-import. Modify me with LB4020fix
  • 15. Voyager has a patron extract utility, but it doesn’t extract all relevant data for a patron. We’d started using our own – patronsif.pl - years ago.
  • 16. Voyager has a patron extract utility, but it doesn’t extract all relevant data for a patron. We’d started using our own – patronsif.pl - years ago. Voyager extract (Pptrnextr) Up to 3 patron-barcode + group combinations Similarly limited number of addresses WMU extract (patronsif.pl) Unlimited patron-barcode + group combinations Unlimited number of addresses + - + - -> +
  • 17. Voyager has a patron extract utility, but it doesn’t extract all relevant data for a patron. We’d started using our own – patronsif.pl - years ago. For the patron cleanup we incorporated patronsif.pl into LB4020fix. Patron notes field problem: CR+LF stored if user pressed the RETURN key creates unwanted extra lines within a record drop_crlf utility replaces “CR+LF” with “space+space”
  • 18. LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records. SIF-A new WIN-based records BroncoNetID in InstitutionID change expiredate to 1981.01.01 SIF-B old SSN-based records change InstitutionID to current BroncoNetID SIF-C new WIN-based records have the current update, expire, and purge dates and BroncoNetID The heart of the cleanup process
  • 19. SIF-A new WIN-based records BroncoNetID in InstitutionID change expiredate to 1981.01.01 SIF-B old SSN-based records change InstitutionID to current BroncoNetID SIF-C new WIN-based records have the current update, expire, and purge dates and BroncoNetID update, key on SSN purge on expiredate 1982.01.01 [remove new records] 1 LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records. The heart of the cleanup process
  • 20. SIF-A new WIN-based records BroncoNetID in InstitutionID change expiredate to 1981.01.01 SIF-B old SSN-based records change InstitutionID to current BroncoNetID SIF-C new WIN-based records have the current update, expire, and purge dates and BroncoNetID update, key on SSN purge on expiredate 1982.01.01 [remove new records] update, key on SSN [prep old records to be “new”] 1 2 LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records. The heart of the cleanup process
  • 21. SIF-A new WIN-based records BroncoNetID in InstitutionID change expiredate to 1981.01.01 SIF-B old SSN-based records change InstitutionID to current BroncoNetID SIF-C new WIN-based records have the current update, expire, and purge dates and BroncoNetID update, key on SSN purge on expiredate 1982.01.01 [remove new records] update, key on SSN [prep old records to be “new”] update, key on InstID [unify old records with new data] 1 2 3 LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records. The heart of the cleanup process
  • 22. SIF-A new WIN-based records have current BroncoNetID change expiredate to 1981.01.01 SIF-B old SSN-based records change InstitutionID to current BroncoNetID SIF-C new WIN-based records have the current update, expire, and purge dates and BroncoNetID update, key on SSN purge on expiredate 1982.01.01 [remove new records] update, key on SSN [prep old records to be “new”] update, key on InstID [unify old records with new data] 1 2 3 LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records. The heart of the cleanup process This clean-up process, with variations, was repeated many times. Details omitted here for the sake of brevity (and sanity).
  • 23. Several things went awry along the way. Not all records could be matched up with a WIN or SSN (as reported by LB4020), so those had to be handled by assigning temporary SSNs, WINs, and/or Institution IDs.
  • 24. Several things went awry along the way. Not all records could be matched up with a WIN or SSN (as reported by LB4020), so those had to be handled by assigning temporary SSNs, WINs, and/or Institution IDs. At another point, the interim records used in the process weren’t deleted during a purge. Those had to be detected, reassigned an older expiration date (1971.01.01), and carefully purged before proceeding.
  • 25. We now had 1081 duplicate patron records.
  • 26. We added the expiration date to the duplicate detector, LB4020. Now we could see that all the SSN-based records were expired, or about to be.
  • 27. We added the expiration date to the duplicate detector, LB4020. Now we could see that all the SSN-based records were expired, or about to be. At this time we discovered that new WIN-based records were coming in as duplicates to SSN-based records that were typically set to expire 2008.09.08.
  • 28. We added the expiration date to the duplicate detector, LB4020. Now we could see that all the SSN-based records were expired, or about to be. At this time we discovered that new WIN-based records were coming in as duplicates to SSN-based records that were typically set to expire 2008.09.08. This had to change!
  • 29. We added the expiration date to the duplicate detector, LB4020. Now we could see that all the SSN-based records were expired, or about to be. At this time we discovered that new WIN-based records were coming in as duplicates to SSN-based records that were typically set to expire 2008.09.08. This had to change! And the semester was about to start…
  • 30. Yes, we did avert disaster. But we had more problems. Early September…
  • 31. Yes, we did avert disaster. But we had more problems. The duplicate detection report, which had grown to 60 pages, was now down to 1. The next day it had grown to 3 pages. Early September…
  • 32. Yes, we did avert disaster. But we had more problems. The duplicate detection report, which had grown to 60 pages, was now down to 1. The next day it had grown to 3 pages. Some records not having all fields populated on the LB4020 duplicate detector caused problems. Also had to fix duplicate records where the SSAN field was null. Early September…
  • 33. We removed several hundred obsolete records that had neither WIN nor SSN. Discovered records that had no Institution ID – yet another problem. Mid September…
  • 34. We removed several hundred obsolete records that had neither WIN nor SSN. Discovered records that had no Institution ID – yet another problem. We are now down to 1 SSN-based record. Mid September… This person had our assigned WIN being the same as the SSN. Not supposed to happen! Identified 15 more such instances and submitted them to I.T. for correction.
  • 35. Found some more SSN-based records – don’t know why they still existed – and converted them to being WIN-based. October… Flipped the “switch” so that we no longer get SSNs for our patron update.
  • 36. Still had records from our NOTIS era – pre Summer 1998 Purged them if they: did not have life-time borrowing privileges did not have an SSN recorded did have an Institution ID Legacy data
  • 37. Trouble ahead… 3M SelfCheck
  • 38. Trouble ahead… Multiple Active Barcodes will NOT work with SelfCheck! 3M SelfCheck
  • 39. 3M SelfCheck requires 1 active barcode per patron. We had 11058 patrons with multiple active barcodes.
  • 40. 3M SelfCheck requires 1 active barcode per patron. We had 11058 patrons with multiple active barcodes. Wrote a program to whittle that down. Got them reduced to 300, but the next day, it was up to 1777!
  • 41. 3M SelfCheck requires 1 active barcode per patron. We had 11058 patrons with multiple active barcodes. Wrote a program to whittle that down. Got them reduced to 300, but the next day, it was up to 1777! Under control now, with patrononeactive.pl, running Monday – Friday. This keeps only the most current active barcode for a patron.
  • 42. 3M SelfCheck requires 1 active barcode per patron. We had 11058 patrons with multiple active barcodes. Wrote a program to whittle that down. Got them reduced to 300, but the next day, it was up to 1777! Under control now, with patrononeactive.pl, running Monday – Friday. This keeps only the most current active barcode for a patron. Forgot about those patron records without an Institution ID. Had 882 of them. Fixed them.
  • 43. We looked at records created before 2008, those that had no SSN but did have an Institution ID. Extracted these records, modified them: expiredate = createdate purgedate = expiredate + 4 years Reimported these records. They should disappear with future annual patron purges. An eye towards the future…
  • 44. We still had 11,696 records with no SSN (nor WIN). We expect most of these to be routinely purged in the future, leaving us with 456. What we ended with
  • 45. We still had 11,696 records with no SSN (nor WIN). We expect most of these to be routinely purged in the future, leaving us with 456. When we started, we had about 250,000 patron records. We now have about 68,000. Duplicate records are routinely dealt with. We filter out all but the single most current active barcode for a patron. We will have annual patron purges. What we ended with
  • 46. Know what you’re starting with. Keep your goal in mind. Figure out a good solution. Be flexible. Be ready for mistakes. Watch out for new/current data undoing your changes. Know when you’re done. Worthwhile points…
  • 47. patronsif.pl drop_crlf lb4020.pl lb4020fix.pl patrononeactive.pl patrononactive.ksh Contact me if you would like to get any of the above. Resources
  • 48. patronsif.pl as listed, gets patron data and puts it in patron SIF format. institution ID based. gets all patron+barcode groupings. (not site-specific) drop_crlf shell script that contains this line: perl -pi -e's// /g' $1 replaces CR+LF combination with two spaces. (this is useful anytime you use patronsif.pl) Some details on the resources…
  • 49. lb4020.pl detects duplicate patron records. shows: name, expired (Y/N), SSAN, expire date, modify date, institution ID WMU-specific: indicates whether SSN or WIN in SSAN. modification required for your institution. lb4020fix.pl control structure around patronsif.pl code that uses lb4020.pl output as starting point for the fixing process. creates one or more patron SIF files for fixing data. use drop_crlf if necessary. Some details on the resources…
  • 50. patrononeactive.pl queries Voyager, checking patrons’ active barcodes. if more than one is found, changes all but the most recent active barcodes to other . check the code carefully as it may need modification for your use. (incorporates patronsif.pl code) patrononeactive.ksh combines patrononeactive.pl and drop_crlf in a script suitable for cron use Some details on the resources…
  • 51. Picture © 2008 by Roy Zimmer Thank you for listening. Roy Zimmer [email_address]