SlideShare a Scribd company logo
1 of 59
Guest Presentation
HOW CLEAN IS YOUR DATABASE?
DATA SCRUBBING FOR ALL SKILL SETS (2020 EDITION)
CHAD PETROVAY, TMS ADMINISTRATOR
THE MORGAN LIBRARY & MUSEUM
Data Quality
Data quality is a measure of the condition of data
based on factors such as accuracy, completeness,
consistency, reliability and whether it’s up to date.
?
What fields do you have that contain
data quality issues?
?
What resources do you have to scrub
your data?
What is your personal skill level?
Power User
Uses the TMS UI;
has expanded rights,
but not full rights.
SQL Expert
Wait? TMS has a UI?
Nah, I’ll just script it in the
database.
Administrator
Full rights in TMS and
access to DB Config.
?
What is your data worth?
The costs of poor data quality are
between 20% and 35% of the operating
revenue of the average organization
LARRY ENGLISH
PREVENTION
“An ounce of prevention is worth a pound of cure”
-Benjamin Franklin
Institution
Value your Data
• Understand the costs
• Make data-driven
decisions
• Be a champion for
accurate data
Institution
Standards
• Establish the rules for data entry
• Conceptualize terms and
authority values
Prevents
• Data entry errors
• Formatting errors
• Inconsistency
• Creativity
Institution
Training
• Makes the system approachable
• Improves user efficiency
Prevents
• Data entry errors
• Unmanaged data silos (Excel)
Power User
Spell Check
Uses the Spelling and Grammar
engine in Microsoft Office.
Prevents:
• Typographical errors
• Misspellings
• Punctuation errors
• Grammatical errors
Power User
Function Keys
Reduces keystrokes when entering
repeated text.
Prevents:
• Typographical errors
• Misspellings
• Punctuation errors
• Grammatical errors
• Formatting errors
System Admin
Customize Field Labels
• Clarify field usage
• Makes system more intuitive
• Align field labels with your
institutional lingo
Prevents
• Confusion
System Admin
Customize Field Labels
In Database Configuration
1. Manage » Tables/Columns
2. Find the table
3. Find the column (i.e. field)
4. Right-click » Edit
5. Change Local Column Name
System Admin
Security Groups
• If your institution does not use a
field, then restrict access
• Restrict control of authority values
to select power users
• Text Types & Term Types
Prevents
• Populating obsolete fields
• Creativity
DATA PROFILING
“Mistakes are the portals of discovery.”
-James Joyce
System Admin
Usage Report
In TMS Module
1. Maintenance » Authorities » Others
2. Usage Report
3. After report generates:
• Browse
• Print
• Edit as RTF
• Save As RTF
System Admin
Frequency Report
In Database Configuration
1. Manage » Tables/Columns
2. Find the table
3. Find the column (i.e. field)
4. Right-click » Frequency
5. Save TXT file
Power User
Crystal Reports
In TMS Module
1. Report » Reports
2. Find report by name
3. Click Run
When creating the report:
1. Add formula “reporttype”
2. "NOTLINKED""NOTLINKED"
SQL
Distinct Values: SQL
A SQL query will return all
records, including:
• Departments you cannot see
• Template records
SELECT
DISTINCT ObjectName
FROM Objects
SELECT
ObjectName, COUNT(*)
FROM Objects
GROUP BY ObjectName
[HAVING COUNT(*) = 1]
[HAVING COUNT(*) > 1]
Power User
Distinct Values: Excel Pivot Table
Is the field in a List View?
Can you export your result set?
1. Export into Excel
2. Copy column into a new sheet
3. Create column “Count”
4. Fill “Count” with 1
5. Create a Pivot Table
Tutorial
• bit.ly/3d4M8Ou
Power User
OpenRefine
Install OpenRefine
• Download at www.openrefine.org
• Extract archive
• Execute openrefine.exe
• Opens in your web browser
Requires Java
• www.java.com/en/download/
Power User
OpenRefine: Facets
• A Facet shows a value
distribution
• Filter records
• Batch change
• Facets
• Word Facet
• Text-Length Facet
• Null / Empty String / Blank
Facets
Power User
OpenRefine: Duplicates
• Facets
• Duplicates Facet
• Facet by Star
• Facet by Flag
• Export to Excel
Power User
DataCleaner
Install Community Edition
• Download at www.datacleaner.org
• Extract archive
• Execute DataCleaner.exe
Requires Java
• www.java.com/en/download/
Power User
DataCleaner: DataStore
If your server uses NT Authentication:
• Add SQL user to the database
Create a datastore in DataCleaner:
1. Select Microsoft SQL Server
2. Supply details
• Hostname = Server name
• Database = TMS
• Username & Password
Power User
DataCleaner: New Job
• Navigation pane
• Datastore elements
• Library of actions
• Canvas
Power User
DataCleaner: Building Job
• Drag database elements and
components onto the canvas
• Best to drag columns instead
of full tables/views
• Use filter to exclude NULLs
and empty strings
Power User
DataCleaner: Results
• String Analysis
• Row Count
• Null/Blank Count
• All upper/lower count
• Char/Word count
• Max/Min/Avg char count
• Max/Min/Avg space count
• Max/Min word count
• Click arrow for details
Power User
DataCleaner: Results
• Value Distribution
• Total count
• Distinct count
• List of distinct values
(except uniques)
• Graphical rank-size of distinct
values
• Click arrow for details
Power User
DataCleaner: Results
• Pattern Finder
• A = Uppercase letter
• a = Lowercase letter
• # = Number
• ? = AlphaNumeric
• Graphical rank-size of distinct
patterns
• Click arrow for details
Power User
Data Quality Services (DQS)
• Knowledge Base
• Projects
• Cleansing
• Matching
• Bundled with SQL Server
• Enterprise Edition
• Developer Edition
• Only works with local
databases
PLANNING
“To achieve great things, two things are needed:
a plan, and not quite enough time.” –Leonard Bernstein
Institution
Human Capital
Human capital is essential
for any data scrubbing
project.
• Colleagues
• Interns
• Volunteers
Power User
Project Management
• Record projects
• Plan future projects
• Track progress
• Provide metrics for
administration
Power User
Cheat Sheets
• Training tool
• Project specific
• Simplifies access to
standards
DATA SCRUBBING
“Cleaning and organizing is a practice not a project.”
-Meagan Francis
The Three Modes
Human Middleware
Direct human contact with UI
Usually Record-by-Record
Labor intensive
Automation
Requires additional
tools/services/platforms
Steeper learning curve
Artificial Intelligence
SQL Script
Change one or more records
through the back-end
Requires intimate knowledge of
database structure
Human Middleware
Finding Records by Pattern
• Query using wildcards:
• single character (?)
• multi-character (*)
• Wrap sequences with
double quotes
Format TMS Search
(646) 733-2239 “(???) ???-????”
646.733.2239 *???.???.????*
+44 (0)207379 8188 +*
(510) 652-8950 ext 223 “* ext*”
Chad M. “* ?.”
Cheryl & Edward *&*
Cheryl and Edward “* and *”
Human Middleware
Search and Replace (String)
In Objects Module
1. Maintenance » Database »
Search and Replace
2. Select Module/Table/Column
3. Provide search and replace terms
4. Review results
• Replace All
• Replace
• Skip
System Admin
Search and Replace (Thesaurus)
In Database Configuration
1. Edit » Search and Replace »
Linked Thesaurus Terms
2. Click Zoom button (…) to find source term
3. Click Zoom button (…) to find target term
4. Click OK and confirm
Human Middleware
Merge Constituents Utility
In Plugins folder
1. Search for duplicate constituents
2. Select candidates from the
suggestions
3. Click Next
Feature Idea: Constituent Packages!!
Human Middleware
Merge Constituents Utility
4. Set Target record
• Right-click » Merge to this
5. Edit data in the columns of
the grid
6. Go section by section and
select the data to keep
7. Ready to merge?
• File » Merge
8. Save an XML file
SQL
Updating with SQL
• Know the system
• Test the SQL script in a sandbox
environment first
• Backup your database
before running SQL script
• Consider converting frequently used
scripts into Stored Procedures
• Gallery Systems may not be able to
provide support
SQL
Finding Records with Patterns
• Use a LIKE Statement
• Query using wildcards:
• single character (_)
• multi-character (%)
Format TMS Search
(646) 733-2239 ( _ _ _ ) _ _ _ - _ _ _ _
646.733.2239 % _ _ _ . _ _ _ . _ _ _ _ %
+44 (0)207379 8188 +%
(510) 652-8950 ext. 223 % ext%
Chad M. % _.
Cheryl & Edward %&%
Cheryl and Edward % and %
SQL
Excel Trick
If you have data in Excel
1. Create a SQL script using a
CONCATENATE formula
2. Copy the formula down the
column
3. Select and copy the column
4. Paste the content in SSMS
5. Execute
SQL
My Stored Procedure
Stored Procedure
• @ColumnID = Identifies the field
• Get the ColumnID from the
Data Dictionary
• @PK = Primary key for the record
• @NewValue = the value you want
• @LoginID = your username
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1243, @PK = 273469,
@NewValue = ‘Gift of John Doe’,
@LoginID = ‘cpetrovay’;
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1228, @PK = 273469,
@NewValue = ‘Loaned Object’,
@LoginID = ‘cpetrovay’;
SQL
My Stored Procedure
Process:
• Truncates new value if too long
• Looks up authority key values
• Updates only when value changes
• Tracks change in Audit Trail
Available:
• github.com/cpetrovay/TMS_UpdateField_SP/
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1243, @PK = 273469,
@NewValue = ‘Gift of John Doe’,
@LoginID = ‘cpetrovay’;
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1228, @PK = 273469,
@NewValue = ‘Loaned Object’,
@LoginID = ‘cpetrovay’;
MONITORING
“Without a systematic way to start and keep data clean,
bad data will happen.” -Donato Diorio
Human Middleware
Saved Queries
• Save time by saving your
periodic review queries.
• User has to initiate query
Human Middleware
Audit Trail Report
• Review changes to the
database
• Identify where to provide
additional training
• User has to run the report
Automation
TMS Alerts
• Notifies user when
predefined criteria is met
• User has to regularly
access TMS
• Setup by SQL Expert
Automation
Database Mail
• Sends email to user when
predefined criteria is met
• Requires configuration in
SQL Server
• Setup by SQL Expert
Automation
SSRS Subscription
• Sends email to user when
predefined criteria is met
• Requires configuration in
SSRS Server
• Setup by Report Writer
FINAL THOUGHTS
“Data that is loved tends to survive.”
–Kurt Bollacker
“While few things in life are guaranteed,
it is safe to say that not addressing data quality
issues this year means you’ll be facing the
same issues next year,
likely on a larger scale.”
-BO CRADER (sgENGAGE)
Data scrubbing
goes on as long as it has to.
THE 7TH RULE OF THE DATA SCRUB
CHAD PETROVAY
TMS ADMINISTRATOR, THE MORGAN LIBRARY & MUSEUM
cpetrovay@themorgan.org
Q&A

More Related Content

What's hot

Microsoft SQL Server 2016 - Everything Built In
Microsoft SQL Server 2016 - Everything Built InMicrosoft SQL Server 2016 - Everything Built In
Microsoft SQL Server 2016 - Everything Built InDavid J Rosenthal
 
For Beginners - Ado.net
For Beginners - Ado.netFor Beginners - Ado.net
For Beginners - Ado.netTarun Jain
 
Sql server 2016 new features
Sql server 2016 new featuresSql server 2016 new features
Sql server 2016 new featuresAjeet Singh
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developersukdpe
 
Introduction to ADO.NET
Introduction to ADO.NETIntroduction to ADO.NET
Introduction to ADO.NETrchakra
 
Vb.net session 05
Vb.net session 05Vb.net session 05
Vb.net session 05Niit Care
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Beat Signer
 
Web based database application design using vb.net and sql server
Web based database application design using vb.net and sql serverWeb based database application design using vb.net and sql server
Web based database application design using vb.net and sql serverAmmara Arooj
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's Newdpcobb
 
World2016_T5_S5_SQLServerFunctionalOverview
World2016_T5_S5_SQLServerFunctionalOverviewWorld2016_T5_S5_SQLServerFunctionalOverview
World2016_T5_S5_SQLServerFunctionalOverviewFarah Omer
 
World2016_T5_S7_TeradataFunctionalOverview
World2016_T5_S7_TeradataFunctionalOverviewWorld2016_T5_S7_TeradataFunctionalOverview
World2016_T5_S7_TeradataFunctionalOverviewFarah Omer
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mark Kromer
 

What's hot (20)

Statistics and Indexes Internals
Statistics and Indexes InternalsStatistics and Indexes Internals
Statistics and Indexes Internals
 
Microsoft SQL Server 2016 - Everything Built In
Microsoft SQL Server 2016 - Everything Built InMicrosoft SQL Server 2016 - Everything Built In
Microsoft SQL Server 2016 - Everything Built In
 
For Beginers - ADO.Net
For Beginers - ADO.NetFor Beginers - ADO.Net
For Beginers - ADO.Net
 
For Beginners - Ado.net
For Beginners - Ado.netFor Beginners - Ado.net
For Beginners - Ado.net
 
Ado.net
Ado.netAdo.net
Ado.net
 
Ado .net
Ado .netAdo .net
Ado .net
 
Sql server 2016 new features
Sql server 2016 new featuresSql server 2016 new features
Sql server 2016 new features
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
 
Introduction to ADO.NET
Introduction to ADO.NETIntroduction to ADO.NET
Introduction to ADO.NET
 
Vb.net session 05
Vb.net session 05Vb.net session 05
Vb.net session 05
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
 
Chap14 ado.net
Chap14 ado.netChap14 ado.net
Chap14 ado.net
 
ASP.NET 09 - ADO.NET
ASP.NET 09 - ADO.NETASP.NET 09 - ADO.NET
ASP.NET 09 - ADO.NET
 
Web based database application design using vb.net and sql server
Web based database application design using vb.net and sql serverWeb based database application design using vb.net and sql server
Web based database application design using vb.net and sql server
 
Ado.net
Ado.netAdo.net
Ado.net
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's New
 
Ado.net
Ado.netAdo.net
Ado.net
 
World2016_T5_S5_SQLServerFunctionalOverview
World2016_T5_S5_SQLServerFunctionalOverviewWorld2016_T5_S5_SQLServerFunctionalOverview
World2016_T5_S5_SQLServerFunctionalOverview
 
World2016_T5_S7_TeradataFunctionalOverview
World2016_T5_S7_TeradataFunctionalOverviewWorld2016_T5_S7_TeradataFunctionalOverview
World2016_T5_S7_TeradataFunctionalOverview
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
 

Similar to How Clean is your Database? Data Scrubbing for all Skill Sets

Common Data Service – A Business Database!
Common Data Service – A Business Database!Common Data Service – A Business Database!
Common Data Service – A Business Database!Pedro Azevedo
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server DatabasesColdFusionConference
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!Pedro Azevedo
 
ASMUG February 2015 Knowledge Event
ASMUG February 2015 Knowledge EventASMUG February 2015 Knowledge Event
ASMUG February 2015 Knowledge Eventjmustac
 
How Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills setsHow Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills setsChad Petrovay
 
My SQL Skills Killed the Server
My SQL Skills Killed the ServerMy SQL Skills Killed the Server
My SQL Skills Killed the ServerdevObjective
 
Lawson Microsoft Addins
Lawson Microsoft AddinsLawson Microsoft Addins
Lawson Microsoft AddinsNogalis Inc
 
Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Eduardo Castro
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsKellyn Pot'Vin-Gorman
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra QUONTRASOLUTIONS
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesCode Mastery
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 

Similar to How Clean is your Database? Data Scrubbing for all Skill Sets (20)

Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Common Data Service – A Business Database!
Common Data Service – A Business Database!Common Data Service – A Business Database!
Common Data Service – A Business Database!
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!
 
ASMUG February 2015 Knowledge Event
ASMUG February 2015 Knowledge EventASMUG February 2015 Knowledge Event
ASMUG February 2015 Knowledge Event
 
How Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills setsHow Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills sets
 
My SQL Skills Killed the Server
My SQL Skills Killed the ServerMy SQL Skills Killed the Server
My SQL Skills Killed the Server
 
Sql killedserver
Sql killedserverSql killedserver
Sql killedserver
 
Lawson Microsoft Addins
Lawson Microsoft AddinsLawson Microsoft Addins
Lawson Microsoft Addins
 
MS-ACCESS.pptx
MS-ACCESS.pptxMS-ACCESS.pptx
MS-ACCESS.pptx
 
Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
 
Rdbms
RdbmsRdbms
Rdbms
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
 
Access
AccessAccess
Access
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
 
06 Excel.pdf
06 Excel.pdf06 Excel.pdf
06 Excel.pdf
 
Introduction to mysql part 1
Introduction to mysql part 1Introduction to mysql part 1
Introduction to mysql part 1
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Chapter 1 introduction to sql server
Chapter 1 introduction to sql serverChapter 1 introduction to sql server
Chapter 1 introduction to sql server
 

More from Chad Petrovay

A Crash Course in SQL Server Administration for Reluctant Database Administra...
A Crash Course in SQL Server Administration for Reluctant Database Administra...A Crash Course in SQL Server Administration for Reluctant Database Administra...
A Crash Course in SQL Server Administration for Reluctant Database Administra...Chad Petrovay
 
The Museum System & Social Media: Changing their relationship status from ‘It...
The Museum System & Social Media: Changing their relationship status from ‘It...The Museum System & Social Media: Changing their relationship status from ‘It...
The Museum System & Social Media: Changing their relationship status from ‘It...Chad Petrovay
 
The Museum System (TMS) & Researchers: Synergizing Collection and Library Inf...
The Museum System (TMS) & Researchers: Synergizing Collection and Library Inf...The Museum System (TMS) & Researchers: Synergizing Collection and Library Inf...
The Museum System (TMS) & Researchers: Synergizing Collection and Library Inf...Chad Petrovay
 
Advanced Crystal Reports: Techniques for compiling Annual Reports & other sta...
Advanced Crystal Reports: Techniques for compiling Annual Reports & other sta...Advanced Crystal Reports: Techniques for compiling Annual Reports & other sta...
Advanced Crystal Reports: Techniques for compiling Annual Reports & other sta...Chad Petrovay
 
The Rest of the Collection: Using virtual objects to manage abstract objects,...
The Rest of the Collection: Using virtual objects to manage abstract objects,...The Rest of the Collection: Using virtual objects to manage abstract objects,...
The Rest of the Collection: Using virtual objects to manage abstract objects,...Chad Petrovay
 
TMS as a Remote Application
TMS as a Remote ApplicationTMS as a Remote Application
TMS as a Remote ApplicationChad Petrovay
 

More from Chad Petrovay (6)

A Crash Course in SQL Server Administration for Reluctant Database Administra...
A Crash Course in SQL Server Administration for Reluctant Database Administra...A Crash Course in SQL Server Administration for Reluctant Database Administra...
A Crash Course in SQL Server Administration for Reluctant Database Administra...
 
The Museum System & Social Media: Changing their relationship status from ‘It...
The Museum System & Social Media: Changing their relationship status from ‘It...The Museum System & Social Media: Changing their relationship status from ‘It...
The Museum System & Social Media: Changing their relationship status from ‘It...
 
The Museum System (TMS) & Researchers: Synergizing Collection and Library Inf...
The Museum System (TMS) & Researchers: Synergizing Collection and Library Inf...The Museum System (TMS) & Researchers: Synergizing Collection and Library Inf...
The Museum System (TMS) & Researchers: Synergizing Collection and Library Inf...
 
Advanced Crystal Reports: Techniques for compiling Annual Reports & other sta...
Advanced Crystal Reports: Techniques for compiling Annual Reports & other sta...Advanced Crystal Reports: Techniques for compiling Annual Reports & other sta...
Advanced Crystal Reports: Techniques for compiling Annual Reports & other sta...
 
The Rest of the Collection: Using virtual objects to manage abstract objects,...
The Rest of the Collection: Using virtual objects to manage abstract objects,...The Rest of the Collection: Using virtual objects to manage abstract objects,...
The Rest of the Collection: Using virtual objects to manage abstract objects,...
 
TMS as a Remote Application
TMS as a Remote ApplicationTMS as a Remote Application
TMS as a Remote Application
 

Recently uploaded

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 

Recently uploaded (20)

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 

How Clean is your Database? Data Scrubbing for all Skill Sets

  • 1. Guest Presentation HOW CLEAN IS YOUR DATABASE? DATA SCRUBBING FOR ALL SKILL SETS (2020 EDITION) CHAD PETROVAY, TMS ADMINISTRATOR THE MORGAN LIBRARY & MUSEUM
  • 2. Data Quality Data quality is a measure of the condition of data based on factors such as accuracy, completeness, consistency, reliability and whether it’s up to date.
  • 3. ? What fields do you have that contain data quality issues?
  • 4. ? What resources do you have to scrub your data?
  • 5. What is your personal skill level? Power User Uses the TMS UI; has expanded rights, but not full rights. SQL Expert Wait? TMS has a UI? Nah, I’ll just script it in the database. Administrator Full rights in TMS and access to DB Config.
  • 6. ? What is your data worth?
  • 7. The costs of poor data quality are between 20% and 35% of the operating revenue of the average organization LARRY ENGLISH
  • 8. PREVENTION “An ounce of prevention is worth a pound of cure” -Benjamin Franklin
  • 9. Institution Value your Data • Understand the costs • Make data-driven decisions • Be a champion for accurate data
  • 10. Institution Standards • Establish the rules for data entry • Conceptualize terms and authority values Prevents • Data entry errors • Formatting errors • Inconsistency • Creativity
  • 11. Institution Training • Makes the system approachable • Improves user efficiency Prevents • Data entry errors • Unmanaged data silos (Excel)
  • 12. Power User Spell Check Uses the Spelling and Grammar engine in Microsoft Office. Prevents: • Typographical errors • Misspellings • Punctuation errors • Grammatical errors
  • 13. Power User Function Keys Reduces keystrokes when entering repeated text. Prevents: • Typographical errors • Misspellings • Punctuation errors • Grammatical errors • Formatting errors
  • 14. System Admin Customize Field Labels • Clarify field usage • Makes system more intuitive • Align field labels with your institutional lingo Prevents • Confusion
  • 15. System Admin Customize Field Labels In Database Configuration 1. Manage » Tables/Columns 2. Find the table 3. Find the column (i.e. field) 4. Right-click » Edit 5. Change Local Column Name
  • 16. System Admin Security Groups • If your institution does not use a field, then restrict access • Restrict control of authority values to select power users • Text Types & Term Types Prevents • Populating obsolete fields • Creativity
  • 17. DATA PROFILING “Mistakes are the portals of discovery.” -James Joyce
  • 18. System Admin Usage Report In TMS Module 1. Maintenance » Authorities » Others 2. Usage Report 3. After report generates: • Browse • Print • Edit as RTF • Save As RTF
  • 19. System Admin Frequency Report In Database Configuration 1. Manage » Tables/Columns 2. Find the table 3. Find the column (i.e. field) 4. Right-click » Frequency 5. Save TXT file
  • 20. Power User Crystal Reports In TMS Module 1. Report » Reports 2. Find report by name 3. Click Run When creating the report: 1. Add formula “reporttype” 2. "NOTLINKED""NOTLINKED"
  • 21. SQL Distinct Values: SQL A SQL query will return all records, including: • Departments you cannot see • Template records SELECT DISTINCT ObjectName FROM Objects SELECT ObjectName, COUNT(*) FROM Objects GROUP BY ObjectName [HAVING COUNT(*) = 1] [HAVING COUNT(*) > 1]
  • 22. Power User Distinct Values: Excel Pivot Table Is the field in a List View? Can you export your result set? 1. Export into Excel 2. Copy column into a new sheet 3. Create column “Count” 4. Fill “Count” with 1 5. Create a Pivot Table Tutorial • bit.ly/3d4M8Ou
  • 23. Power User OpenRefine Install OpenRefine • Download at www.openrefine.org • Extract archive • Execute openrefine.exe • Opens in your web browser Requires Java • www.java.com/en/download/
  • 24. Power User OpenRefine: Facets • A Facet shows a value distribution • Filter records • Batch change • Facets • Word Facet • Text-Length Facet • Null / Empty String / Blank Facets
  • 25. Power User OpenRefine: Duplicates • Facets • Duplicates Facet • Facet by Star • Facet by Flag • Export to Excel
  • 26. Power User DataCleaner Install Community Edition • Download at www.datacleaner.org • Extract archive • Execute DataCleaner.exe Requires Java • www.java.com/en/download/
  • 27. Power User DataCleaner: DataStore If your server uses NT Authentication: • Add SQL user to the database Create a datastore in DataCleaner: 1. Select Microsoft SQL Server 2. Supply details • Hostname = Server name • Database = TMS • Username & Password
  • 28. Power User DataCleaner: New Job • Navigation pane • Datastore elements • Library of actions • Canvas
  • 29. Power User DataCleaner: Building Job • Drag database elements and components onto the canvas • Best to drag columns instead of full tables/views • Use filter to exclude NULLs and empty strings
  • 30. Power User DataCleaner: Results • String Analysis • Row Count • Null/Blank Count • All upper/lower count • Char/Word count • Max/Min/Avg char count • Max/Min/Avg space count • Max/Min word count • Click arrow for details
  • 31. Power User DataCleaner: Results • Value Distribution • Total count • Distinct count • List of distinct values (except uniques) • Graphical rank-size of distinct values • Click arrow for details
  • 32. Power User DataCleaner: Results • Pattern Finder • A = Uppercase letter • a = Lowercase letter • # = Number • ? = AlphaNumeric • Graphical rank-size of distinct patterns • Click arrow for details
  • 33. Power User Data Quality Services (DQS) • Knowledge Base • Projects • Cleansing • Matching • Bundled with SQL Server • Enterprise Edition • Developer Edition • Only works with local databases
  • 34. PLANNING “To achieve great things, two things are needed: a plan, and not quite enough time.” –Leonard Bernstein
  • 35. Institution Human Capital Human capital is essential for any data scrubbing project. • Colleagues • Interns • Volunteers
  • 36. Power User Project Management • Record projects • Plan future projects • Track progress • Provide metrics for administration
  • 37. Power User Cheat Sheets • Training tool • Project specific • Simplifies access to standards
  • 38. DATA SCRUBBING “Cleaning and organizing is a practice not a project.” -Meagan Francis
  • 39. The Three Modes Human Middleware Direct human contact with UI Usually Record-by-Record Labor intensive Automation Requires additional tools/services/platforms Steeper learning curve Artificial Intelligence SQL Script Change one or more records through the back-end Requires intimate knowledge of database structure
  • 40. Human Middleware Finding Records by Pattern • Query using wildcards: • single character (?) • multi-character (*) • Wrap sequences with double quotes Format TMS Search (646) 733-2239 “(???) ???-????” 646.733.2239 *???.???.????* +44 (0)207379 8188 +* (510) 652-8950 ext 223 “* ext*” Chad M. “* ?.” Cheryl & Edward *&* Cheryl and Edward “* and *”
  • 41. Human Middleware Search and Replace (String) In Objects Module 1. Maintenance » Database » Search and Replace 2. Select Module/Table/Column 3. Provide search and replace terms 4. Review results • Replace All • Replace • Skip
  • 42. System Admin Search and Replace (Thesaurus) In Database Configuration 1. Edit » Search and Replace » Linked Thesaurus Terms 2. Click Zoom button (…) to find source term 3. Click Zoom button (…) to find target term 4. Click OK and confirm
  • 43. Human Middleware Merge Constituents Utility In Plugins folder 1. Search for duplicate constituents 2. Select candidates from the suggestions 3. Click Next Feature Idea: Constituent Packages!!
  • 44. Human Middleware Merge Constituents Utility 4. Set Target record • Right-click » Merge to this 5. Edit data in the columns of the grid 6. Go section by section and select the data to keep 7. Ready to merge? • File » Merge 8. Save an XML file
  • 45. SQL Updating with SQL • Know the system • Test the SQL script in a sandbox environment first • Backup your database before running SQL script • Consider converting frequently used scripts into Stored Procedures • Gallery Systems may not be able to provide support
  • 46. SQL Finding Records with Patterns • Use a LIKE Statement • Query using wildcards: • single character (_) • multi-character (%) Format TMS Search (646) 733-2239 ( _ _ _ ) _ _ _ - _ _ _ _ 646.733.2239 % _ _ _ . _ _ _ . _ _ _ _ % +44 (0)207379 8188 +% (510) 652-8950 ext. 223 % ext% Chad M. % _. Cheryl & Edward %&% Cheryl and Edward % and %
  • 47. SQL Excel Trick If you have data in Excel 1. Create a SQL script using a CONCATENATE formula 2. Copy the formula down the column 3. Select and copy the column 4. Paste the content in SSMS 5. Execute
  • 48. SQL My Stored Procedure Stored Procedure • @ColumnID = Identifies the field • Get the ColumnID from the Data Dictionary • @PK = Primary key for the record • @NewValue = the value you want • @LoginID = your username EXECUTE [dbo].[MLM_UpdateFieldValue] @ColumnID = 1243, @PK = 273469, @NewValue = ‘Gift of John Doe’, @LoginID = ‘cpetrovay’; EXECUTE [dbo].[MLM_UpdateFieldValue] @ColumnID = 1228, @PK = 273469, @NewValue = ‘Loaned Object’, @LoginID = ‘cpetrovay’;
  • 49. SQL My Stored Procedure Process: • Truncates new value if too long • Looks up authority key values • Updates only when value changes • Tracks change in Audit Trail Available: • github.com/cpetrovay/TMS_UpdateField_SP/ EXECUTE [dbo].[MLM_UpdateFieldValue] @ColumnID = 1243, @PK = 273469, @NewValue = ‘Gift of John Doe’, @LoginID = ‘cpetrovay’; EXECUTE [dbo].[MLM_UpdateFieldValue] @ColumnID = 1228, @PK = 273469, @NewValue = ‘Loaned Object’, @LoginID = ‘cpetrovay’;
  • 50. MONITORING “Without a systematic way to start and keep data clean, bad data will happen.” -Donato Diorio
  • 51. Human Middleware Saved Queries • Save time by saving your periodic review queries. • User has to initiate query
  • 52. Human Middleware Audit Trail Report • Review changes to the database • Identify where to provide additional training • User has to run the report
  • 53. Automation TMS Alerts • Notifies user when predefined criteria is met • User has to regularly access TMS • Setup by SQL Expert
  • 54. Automation Database Mail • Sends email to user when predefined criteria is met • Requires configuration in SQL Server • Setup by SQL Expert
  • 55. Automation SSRS Subscription • Sends email to user when predefined criteria is met • Requires configuration in SSRS Server • Setup by Report Writer
  • 56. FINAL THOUGHTS “Data that is loved tends to survive.” –Kurt Bollacker
  • 57. “While few things in life are guaranteed, it is safe to say that not addressing data quality issues this year means you’ll be facing the same issues next year, likely on a larger scale.” -BO CRADER (sgENGAGE)
  • 58. Data scrubbing goes on as long as it has to. THE 7TH RULE OF THE DATA SCRUB
  • 59. CHAD PETROVAY TMS ADMINISTRATOR, THE MORGAN LIBRARY & MUSEUM cpetrovay@themorgan.org Q&A