SlideShare a Scribd company logo
1 of 38
Data, how to get it clean
and keep it clean?
The best way to make money is to stop wasting it!
Agenda:
Who are DQ
Setting the scene
Acceptable Quality
Data Defects
Get it Clean
Keep it Clean
Q&A via web chat
Close
Setting the scene…
Who are
we ?
What do
we do ?
How do
we do it
?
What’s in
it for our
clients ?
UK B2C Data – annual rates of change…
UK Population is 63.23 M
UK Households 26.4 M
• Over 3.25 M (5.1%) people move house
• 0.584 M (0.9%) people pass away
• 0.813 M (1.3%) Births
• 0.290 M (0.5%) Marry
• 0.130 M (0.2%) Divorce
• 0.500 M (1.9%) Changes by Royal Mail
• 0.250 M (1.4%) people sign up to MPS
½ life of B2C data 1 to 1.2 years
UK B2B Data – annual rates of change…
4.934 M trading businesses in the UK
• 3.10 M (62.8%) sole proprietorships
• 0.43 M (8.8%) partnerships
• 1.40 M (28.4%) limited companies
• 0.60 M (12.2%) dormant businesses
5.7 M company or individual details changes:
• 1 moves every 6 Minutes
• 1 fails every 4 minutes
On average a person changes jobs 11 times
during their career
Over 1.1 M (22.3%) businesses are registered with the CTPS
2.43 M employees of UK businesses:
• 99.9% of businesses employ less than 250 staff
• 99.2% of businesses employ less than 50 people who employ 59% of total
staff
@ 24% p.a. ½ life attrition = 3 years
@ 35% p.a. ½ life attrition = 2 years
Data decay – the impacts…
Financial:
• £220 M per-annum wasted on inaccurate mailings
• £95 M per-annum wasted by companies mailing people who have moved addresses
• It costs more to mail a moved or deceased individual than to suppress them
• Increase response rates – the same return with less mail
Brand:
• Duplicates and incorrect details cause a negative perception
• Mailing deceased individuals or bereaved families causes significant distress
• Mailing someone who no longer lives at an address does not impress
Compliance:
• Best practice – comply with Direct Marketing Association guidelines
• Calling a consumer who has registered their objection to receiving direct marketing phone calls is illegal
• Mailing a consumer who has registered their objection to receiving direct mail is bad management, contravenes the
DMA Code of Practice and could be illegal
Environment
• Protect the environment – help cut down on wasteful mailing
The human factors
Acknowledging there is a problem
The Data Quality Delusion
Everyone
understand the
importance of
data quality
Everyone agrees
data quality is
important
Everyone cares
about data
quality
Everyone knows
what actions to
take to improve
data quality
Opening the Johari Window
Seeing what you don‟t currently see!
Open Area
Known to others and
known to self
Blind Area
Known to others not
known to self
Hidden Area
Not known to others
and known to self
Unknown Area
Unknown to others
and unknown to self
Johari
Window
Johari Window - You don‟t know what you
don‟t know...
Self
Others
Expand the Open Area
ReduceBlindArea
Reduce the Hidden Area ?
Johari
Window
Acceptable levels of data quality?
All data has some level of
quality, the question is at
what level is it
unacceptable?
How does
anyone
know?
Who‟s
responsible?
How much
is low
quality data
actually
costing?
Unacceptable
Acceptable
All data has some level of quality, the
question is at what level is it unacceptable.
Temp
< 37°C
Hyperthermia
Temp
= 37°C
Normal
Temp
> 37°C
Abnormal
Temp
> 37.8°C
Get help
How can we end up with bad data?
A Boy's name
beginning
with the
letter J:
"Gerald.."
A word
beginning
with Z:
"Xylophone.."
A part of the
body
beginning
with N:
"Knee..“
A mode of
transport
that you can
walk in: "Your
shoes.."
Getting your data clean and keeping it clean
Identify, correct, prevent
Get it Clean the basics
About “CURING” data defects
Batch process automation
Mass defect identification
• Mastering & Merging
• Manual review
Time consuming
More costly than prevention
Keep it Clean the basics
Prevention better than cure
Ongoing process
• People
• Process
• Technology
Costs of prevention many times
lower than cure!
Waging war on error…
Findingdefects
Definingstandards
Correctingdata
Preventingerror
Monitoringdefects
Referencedata
Internaldata
Boolean Logic & Dates
DD/MM/YY v MM/DD/YY
•10/10/09 = 10/10/09
•99/99/99 was accepted as a
valid date structure yet it‟s
clearly wrong
Is it European
format
DD/MM/YYYY or US
format
MM/DD/YYYY?
Precision
•DD/MM/YY or
DD/MM/YYYY
OK to
Mail =
Y
Not OK
to Mail
= Y
OK to
Mail =
N
Not OK
to Mail
= N
Numbers in Text and Shared Numbers
Systems
Contain:
• 0‟s and/or O‟s
• 1‟s and/or I‟s
• Tel numbers with
9 x 000 000 000
Same product
– different
numbers in 2
systems
• Same Part number 99 000 1111
• 99 000 1111 = 1 days cold ration pack
• 99 000 1111 = Radio valves
• Leasing Agreement numbers
• ID Counters shared across systems
• SKU‟s
• Tank & Aircraft Parts
Misinterpretation & Standards
M = Male in one
system and
Married in
another
S = Single in
one system
and
Separated in
another
Gender
•9 variants in
the gender
field of a hotel
project
Padhraic, Pádraig or Páraic
Lane, LN, Ln, Road, Rd, Rd. etc.
MI or Michigan
US or USA or United States
GB or UK or United Kingdom
Mr. or Mister
Hants or Hampshire
Dislocation, misfielding
Address A Address B
123 Arcasia Avenue 123 Arcasia Ave
Fareham
Hampshire Fareham
PO16 8XT Hants
PO16 8XT
Person A Person B
Martin
P Martin P
Doyle Doyle
02392 988303 +1 312-253-7873
+1 312-253-7873 02392 988303
Anomalies & Congruence
eMail does
not tally with
name parts
Currency does
not tally with
location
Goods
shipped
before order
Values not in
application
pick lists
(metadata)
Default
values used
Notes (memo)
fields used
without
validation
rules
DQ Studio – identifying and fixing
• Product demonstration by:
• Martin Kerr
• How to connect, identify and
correct defects…
DQ Studio
Classify
•Is the data in your database what you think it
is?
Compare
•How similar is value A to value B in % similarity
Format
•Email
•I.P.
•Postcode
•Telephone
•URL
Generate:
•phonetic tokens
•pattern tokens
Transform data
•13 Categories
•5 Spoken Languages
Validate
•Email
•I.P. Address
•Postal code
•Telephone
•URL
DQ Studio
Derive:
• Job Title
• Role
• Level
• Gender
• Male, female, unknown
• Telephone
• Country
• Location
• Number Type
Parse:
• Email
• I.P. Address
• Telephone
Verify
• Locations (240 Countries)
• Phones
• Businesses
• Contacts
Record matching
Identifying matches
Linking
Mastering
Merging
Updating
Matching – What is it?
• Identification and
management of records
which:
• Are the same
• Might be the same
• Are not the same
•PAF Batch
•PAF Lookup
•No Way
•Gone Away
•Passed Away
•Append
•Table v Table•Table v Itself
Dedupe
X-
Match
X-Ref
API
X-Ref
Data
How is it done?
Black White
Manually
•Internally
•External Bureau service
Automatically •Software
Using black and
white magic...
•Black = Matches
•White = Non Matches
•Grey = Ambiguous
Carefully to
avoid:
•Too many matches
•Too few matches
•Errors in matches
The grey areas - When is a match a match?
Bob = Bobby = Rob= Robert
= Robby= Roberto?
Thomson = Thompson =
Tomson = Thomson?
Xerox = Zerocks? PO16 8XT = P0I6 8XT?
+44 (0) 2392 988303 =
O2392 9883O3?
10TH Feb 2009 = 10/02/09
= 02/10/2009?
Hants = Hampshire =
Hamps?
martin.doyle@dqglobal.com
=
doyle.martin@dqglobal.com
Grey to Black or Grey to White
• Transformations (Synonyms)
• Phonetics
• String comparisons
• Intelligence
• Rules
• Spelling
• Typo‟s
• Logic
• Experience
• Lookups
Mastering Perfection & merging?
Problems:
• Which data survives?
• Which data gets re-assigned?
• Which data gets stored?
• Which data gets thrown away
Solutions:
• Define the record master
• Define the field merge rules
• Use technology to automate
processes
• Humanise exceptions
Perfect & Merge for
Identify Perfect Merge
Process flow
CRM
Database
PrimaryID SecondaryID Score
{D1C12E3A-B7F2-E211-95FC-0015F298503A} {EFF76F28-E8EE-E211-9968-0015F298503A} 100
{D1C12E3A-B7F2-E211-95FC-0015F298503A} {EE1F80ED-53F0-E211-BBCE-0015F298503A} 86
{E9C12E3A-B7F2-E211-95FC-0015F298503A} {07F86F28-E8EE-E211-9968-0015F298503A} 100
{E9C12E3A-B7F2-E211-95FC-0015F298503A} {062080ED-53F0-E211-BBCE-0015F298503A} 94
{FFC12E3A-B7F2-E211-95FC-0015F298503A} {1DF86F28-E8EE-E211-9968-0015F298503A} 100
{FFC12E3A-B7F2-E211-95FC-0015F298503A} {81F86F28-E8EE-E211-9968-0015F298503A} 92
{FFC12E3A-B7F2-E211-95FC-0015F298503A} {1C2080ED-53F0-E211-BBCE-0015F298503A} 99
{FFC12E3A-B7F2-E211-95FC-0015F298503A} {802080ED-53F0-E211-BBCE-0015F298503A} 100
{CDC12E3A-B7F2-E211-95FC-0015F298503A} {EBF76F28-E8EE-E211-9968-0015F298503A} 100
{CDC12E3A-B7F2-E211-95FC-0015F298503A} {4FF86F28-E8EE-E211-9968-0015F298503A} 82
{CDC12E3A-B7F2-E211-95FC-0015F298503A} {EA1F80ED-53F0-E211-BBCE-0015F298503A} 100
{CDC12E3A-B7F2-E211-95FC-0015F298503A} {4E2080ED-53F0-E211-BBCE-0015F298503A} 82
{71F86F28-E8EE-E211-9968-0015F298503A} {702080ED-53F0-E211-BBCE-0015F298503A} 100
{6BF86F28-E8EE-E211-9968-0015F298503A} {6A2080ED-53F0-E211-BBCE-0015F298503A} 100
{01C22E3A-B7F2-E211-95FC-0015F298503A} {1FF86F28-E8EE-E211-9968-0015F298503A} 100
{01C22E3A-B7F2-E211-95FC-0015F298503A} {83F86F28-E8EE-E211-9968-0015F298503A} 100
{01C22E3A-B7F2-E211-95FC-0015F298503A} {1E2080ED-53F0-E211-BBCE-0015F298503A} 100
{01C22E3A-B7F2-E211-95FC-0015F298503A} {822080ED-53F0-E211-BBCE-0015F298503A} 100
Match demonstration
Connecting Defining Identifying Reviewing Processing
Cleaning up your business systems:
Back-up your data
Define pick lists
Ensure legacy data conforms to picklists
Delete any temporary fields set-up for test and still in the
production system
Delete or archive old data
Identify contacts with no email and/or no telephone #
Identify and correct contacts with bogus phone numbers
Identify records whose email bounces
Identify businesses without contacts
Archive linked documents which are „n‟ years old,
however, take care with legal including: invoices and
contracts
User admin – delete any users who no longer access
systems
Review any prospects, suspects or opportunities not
properly closed i.e. > „n‟ weeks from opening
Actions to consider…
Change attitudes to “ABC” thinking
Think prevention not cure
Apply DQ processes
Verify, Format & Validate
Suppress records
Merge duplicates
Append missing data for segmentation
Govern and Comply
Measure & Manage
Get a CXO sponsor
Prune & Consolidate & Remove competition
Common dictionary of terms
Define customer value, and lifetime?
In conclusion…
Identify
•recognise there is
a problem?
Qualify
•gather evidence,
what, when,
where and how
large is the
problem?
Quantify
•what‟s
specifically doing
the damage?
Accept
•acknowledge the
scale of the task?
Define
•the goals and
what will be
measured?
Perform
•carry out the
tasks agreed in
the order or
significance
Questions…
• Build a better business based on trusted
data…
• Contact DQ Global
• www.DQGlobal.com
• Talk to a consultant
• sales@DQGlobal.com
• +44 2392 988303 (Europe)
• +1 314-253-7873(North America)

More Related Content

Viewers also liked

Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Implementation of SAP BI in Coca Cola
Implementation of SAP BI in Coca ColaImplementation of SAP BI in Coca Cola
Implementation of SAP BI in Coca ColaUjjwal Joshi
 
ERP Case Study On Sony Corporation
ERP Case Study On Sony CorporationERP Case Study On Sony Corporation
ERP Case Study On Sony CorporationParimal Patel
 
ERP Implementation Coca-Cola Hellenic
ERP Implementation Coca-Cola HellenicERP Implementation Coca-Cola Hellenic
ERP Implementation Coca-Cola HellenicStathis Simeonidis
 
Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...Blackbaud Pacific
 
case study on ERP success(cadbury) and failure(hershey's)
case study on ERP success(cadbury) and failure(hershey's)case study on ERP success(cadbury) and failure(hershey's)
case study on ERP success(cadbury) and failure(hershey's)Chitrangada Roy
 

Viewers also liked (10)

ASAP 8.0 Methodology
ASAP 8.0 MethodologyASAP 8.0 Methodology
ASAP 8.0 Methodology
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Data cleansing
Data cleansingData cleansing
Data cleansing
 
Implementation of SAP BI in Coca Cola
Implementation of SAP BI in Coca ColaImplementation of SAP BI in Coca Cola
Implementation of SAP BI in Coca Cola
 
ERP Case Study On Sony Corporation
ERP Case Study On Sony CorporationERP Case Study On Sony Corporation
ERP Case Study On Sony Corporation
 
ERP Implementation Coca-Cola Hellenic
ERP Implementation Coca-Cola HellenicERP Implementation Coca-Cola Hellenic
ERP Implementation Coca-Cola Hellenic
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...
 
case study on ERP success(cadbury) and failure(hershey's)
case study on ERP success(cadbury) and failure(hershey's)case study on ERP success(cadbury) and failure(hershey's)
case study on ERP success(cadbury) and failure(hershey's)
 
ASAP Methodology in Implementing ERP
ASAP Methodology in Implementing ERPASAP Methodology in Implementing ERP
ASAP Methodology in Implementing ERP
 

Similar to Data, how to get it clean and keep it clean

Cleanliness is next to Godliness
Cleanliness is next to GodlinessCleanliness is next to Godliness
Cleanliness is next to GodlinessJonathan Levin
 
Sound Data Quality for CRM
Sound Data Quality for CRMSound Data Quality for CRM
Sound Data Quality for CRMDivya Malik
 
Data Quality for AML
Data Quality for AMLData Quality for AML
Data Quality for AMLPrecisely
 
eCommerce Trends and Strategic Planning for 2024
eCommerce Trends and Strategic Planning for 2024eCommerce Trends and Strategic Planning for 2024
eCommerce Trends and Strategic Planning for 2024PushON Ltd
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data GovernanceTuba Yaman Him
 
Creating Service Strategy For Your Organization Complete Deck
Creating Service Strategy For Your Organization Complete DeckCreating Service Strategy For Your Organization Complete Deck
Creating Service Strategy For Your Organization Complete DeckSlideTeam
 
Creating Service Strategy For Your Organization Complete Deck
Creating Service Strategy For Your Organization Complete DeckCreating Service Strategy For Your Organization Complete Deck
Creating Service Strategy For Your Organization Complete DeckSlideTeam
 
CRM User Group presentation 2014
CRM User Group  presentation 2014CRM User Group  presentation 2014
CRM User Group presentation 2014DQ Global
 
How RingCentral Optimized Account-Based Insights and Buyer Intelligence To Ra...
How RingCentral Optimized Account-Based Insights and Buyer Intelligence To Ra...How RingCentral Optimized Account-Based Insights and Buyer Intelligence To Ra...
How RingCentral Optimized Account-Based Insights and Buyer Intelligence To Ra...G3 Communications
 
From Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data ForumFrom Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data ForumCastlebridge Associates
 
Earl Mardle presentation
Earl Mardle presentationEarl Mardle presentation
Earl Mardle presentationEarl Mardle
 
Address Capture with Seven Keystrokes
Address Capture with Seven KeystrokesAddress Capture with Seven Keystrokes
Address Capture with Seven KeystrokesPrecisely
 
GDPR for Things - ThingsCon Amsterdam 2017
GDPR for Things - ThingsCon Amsterdam 2017GDPR for Things - ThingsCon Amsterdam 2017
GDPR for Things - ThingsCon Amsterdam 2017Saskia Videler
 
CRM360 d&b-final
CRM360 d&b-finalCRM360 d&b-final
CRM360 d&b-finalticomixcrm
 
8 Ways to Rev up your Email List Growth
8 Ways to Rev up your Email List Growth8 Ways to Rev up your Email List Growth
8 Ways to Rev up your Email List GrowthJennifer Soares
 
Want your bank to trust you? You need a credit score. Want your customers to ...
Want your bank to trust you? You need a credit score. Want your customers to ...Want your bank to trust you? You need a credit score. Want your customers to ...
Want your bank to trust you? You need a credit score. Want your customers to ...YeurDreamin'
 
The Accountant Entrepreneur — Doug Sleeter
The Accountant Entrepreneur — Doug SleeterThe Accountant Entrepreneur — Doug Sleeter
The Accountant Entrepreneur — Doug SleeterSleeter Group
 
Nonprofit data migration: You can't take it all with you Webinar
Nonprofit data migration: You can't take it all with you WebinarNonprofit data migration: You can't take it all with you Webinar
Nonprofit data migration: You can't take it all with you WebinarThird Sector Labs
 
Nonprofit data migration webinar 02.20.2014
Nonprofit data migration webinar 02.20.2014Nonprofit data migration webinar 02.20.2014
Nonprofit data migration webinar 02.20.2014Brandon Fix
 

Similar to Data, how to get it clean and keep it clean (20)

Cleanliness is next to Godliness
Cleanliness is next to GodlinessCleanliness is next to Godliness
Cleanliness is next to Godliness
 
Sound Data Quality for CRM
Sound Data Quality for CRMSound Data Quality for CRM
Sound Data Quality for CRM
 
Data Quality for AML
Data Quality for AMLData Quality for AML
Data Quality for AML
 
Responsible Appending
Responsible AppendingResponsible Appending
Responsible Appending
 
eCommerce Trends and Strategic Planning for 2024
eCommerce Trends and Strategic Planning for 2024eCommerce Trends and Strategic Planning for 2024
eCommerce Trends and Strategic Planning for 2024
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
Creating Service Strategy For Your Organization Complete Deck
Creating Service Strategy For Your Organization Complete DeckCreating Service Strategy For Your Organization Complete Deck
Creating Service Strategy For Your Organization Complete Deck
 
Creating Service Strategy For Your Organization Complete Deck
Creating Service Strategy For Your Organization Complete DeckCreating Service Strategy For Your Organization Complete Deck
Creating Service Strategy For Your Organization Complete Deck
 
CRM User Group presentation 2014
CRM User Group  presentation 2014CRM User Group  presentation 2014
CRM User Group presentation 2014
 
How RingCentral Optimized Account-Based Insights and Buyer Intelligence To Ra...
How RingCentral Optimized Account-Based Insights and Buyer Intelligence To Ra...How RingCentral Optimized Account-Based Insights and Buyer Intelligence To Ra...
How RingCentral Optimized Account-Based Insights and Buyer Intelligence To Ra...
 
From Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data ForumFrom Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data Forum
 
Earl Mardle presentation
Earl Mardle presentationEarl Mardle presentation
Earl Mardle presentation
 
Address Capture with Seven Keystrokes
Address Capture with Seven KeystrokesAddress Capture with Seven Keystrokes
Address Capture with Seven Keystrokes
 
GDPR for Things - ThingsCon Amsterdam 2017
GDPR for Things - ThingsCon Amsterdam 2017GDPR for Things - ThingsCon Amsterdam 2017
GDPR for Things - ThingsCon Amsterdam 2017
 
CRM360 d&b-final
CRM360 d&b-finalCRM360 d&b-final
CRM360 d&b-final
 
8 Ways to Rev up your Email List Growth
8 Ways to Rev up your Email List Growth8 Ways to Rev up your Email List Growth
8 Ways to Rev up your Email List Growth
 
Want your bank to trust you? You need a credit score. Want your customers to ...
Want your bank to trust you? You need a credit score. Want your customers to ...Want your bank to trust you? You need a credit score. Want your customers to ...
Want your bank to trust you? You need a credit score. Want your customers to ...
 
The Accountant Entrepreneur — Doug Sleeter
The Accountant Entrepreneur — Doug SleeterThe Accountant Entrepreneur — Doug Sleeter
The Accountant Entrepreneur — Doug Sleeter
 
Nonprofit data migration: You can't take it all with you Webinar
Nonprofit data migration: You can't take it all with you WebinarNonprofit data migration: You can't take it all with you Webinar
Nonprofit data migration: You can't take it all with you Webinar
 
Nonprofit data migration webinar 02.20.2014
Nonprofit data migration webinar 02.20.2014Nonprofit data migration webinar 02.20.2014
Nonprofit data migration webinar 02.20.2014
 

Recently uploaded

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 

Recently uploaded (20)

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 

Data, how to get it clean and keep it clean

  • 1. Data, how to get it clean and keep it clean? The best way to make money is to stop wasting it!
  • 2. Agenda: Who are DQ Setting the scene Acceptable Quality Data Defects Get it Clean Keep it Clean Q&A via web chat Close
  • 3. Setting the scene… Who are we ? What do we do ? How do we do it ? What’s in it for our clients ?
  • 4. UK B2C Data – annual rates of change… UK Population is 63.23 M UK Households 26.4 M • Over 3.25 M (5.1%) people move house • 0.584 M (0.9%) people pass away • 0.813 M (1.3%) Births • 0.290 M (0.5%) Marry • 0.130 M (0.2%) Divorce • 0.500 M (1.9%) Changes by Royal Mail • 0.250 M (1.4%) people sign up to MPS ½ life of B2C data 1 to 1.2 years
  • 5. UK B2B Data – annual rates of change… 4.934 M trading businesses in the UK • 3.10 M (62.8%) sole proprietorships • 0.43 M (8.8%) partnerships • 1.40 M (28.4%) limited companies • 0.60 M (12.2%) dormant businesses 5.7 M company or individual details changes: • 1 moves every 6 Minutes • 1 fails every 4 minutes On average a person changes jobs 11 times during their career Over 1.1 M (22.3%) businesses are registered with the CTPS 2.43 M employees of UK businesses: • 99.9% of businesses employ less than 250 staff • 99.2% of businesses employ less than 50 people who employ 59% of total staff @ 24% p.a. ½ life attrition = 3 years @ 35% p.a. ½ life attrition = 2 years
  • 6. Data decay – the impacts… Financial: • £220 M per-annum wasted on inaccurate mailings • £95 M per-annum wasted by companies mailing people who have moved addresses • It costs more to mail a moved or deceased individual than to suppress them • Increase response rates – the same return with less mail Brand: • Duplicates and incorrect details cause a negative perception • Mailing deceased individuals or bereaved families causes significant distress • Mailing someone who no longer lives at an address does not impress Compliance: • Best practice – comply with Direct Marketing Association guidelines • Calling a consumer who has registered their objection to receiving direct marketing phone calls is illegal • Mailing a consumer who has registered their objection to receiving direct mail is bad management, contravenes the DMA Code of Practice and could be illegal Environment • Protect the environment – help cut down on wasteful mailing
  • 7. The human factors Acknowledging there is a problem
  • 8. The Data Quality Delusion Everyone understand the importance of data quality Everyone agrees data quality is important Everyone cares about data quality Everyone knows what actions to take to improve data quality
  • 9. Opening the Johari Window Seeing what you don‟t currently see!
  • 10. Open Area Known to others and known to self Blind Area Known to others not known to self Hidden Area Not known to others and known to self Unknown Area Unknown to others and unknown to self Johari Window Johari Window - You don‟t know what you don‟t know... Self Others Expand the Open Area ReduceBlindArea Reduce the Hidden Area ? Johari Window
  • 11. Acceptable levels of data quality? All data has some level of quality, the question is at what level is it unacceptable? How does anyone know? Who‟s responsible? How much is low quality data actually costing? Unacceptable Acceptable
  • 12. All data has some level of quality, the question is at what level is it unacceptable. Temp < 37°C Hyperthermia Temp = 37°C Normal Temp > 37°C Abnormal Temp > 37.8°C Get help
  • 13. How can we end up with bad data? A Boy's name beginning with the letter J: "Gerald.." A word beginning with Z: "Xylophone.." A part of the body beginning with N: "Knee..“ A mode of transport that you can walk in: "Your shoes.."
  • 14. Getting your data clean and keeping it clean Identify, correct, prevent
  • 15. Get it Clean the basics About “CURING” data defects Batch process automation Mass defect identification • Mastering & Merging • Manual review Time consuming More costly than prevention
  • 16. Keep it Clean the basics Prevention better than cure Ongoing process • People • Process • Technology Costs of prevention many times lower than cure!
  • 17. Waging war on error… Findingdefects Definingstandards Correctingdata Preventingerror Monitoringdefects Referencedata Internaldata
  • 18. Boolean Logic & Dates DD/MM/YY v MM/DD/YY •10/10/09 = 10/10/09 •99/99/99 was accepted as a valid date structure yet it‟s clearly wrong Is it European format DD/MM/YYYY or US format MM/DD/YYYY? Precision •DD/MM/YY or DD/MM/YYYY OK to Mail = Y Not OK to Mail = Y OK to Mail = N Not OK to Mail = N
  • 19. Numbers in Text and Shared Numbers Systems Contain: • 0‟s and/or O‟s • 1‟s and/or I‟s • Tel numbers with 9 x 000 000 000 Same product – different numbers in 2 systems • Same Part number 99 000 1111 • 99 000 1111 = 1 days cold ration pack • 99 000 1111 = Radio valves • Leasing Agreement numbers • ID Counters shared across systems • SKU‟s • Tank & Aircraft Parts
  • 20. Misinterpretation & Standards M = Male in one system and Married in another S = Single in one system and Separated in another Gender •9 variants in the gender field of a hotel project Padhraic, Pádraig or Páraic Lane, LN, Ln, Road, Rd, Rd. etc. MI or Michigan US or USA or United States GB or UK or United Kingdom Mr. or Mister Hants or Hampshire
  • 21. Dislocation, misfielding Address A Address B 123 Arcasia Avenue 123 Arcasia Ave Fareham Hampshire Fareham PO16 8XT Hants PO16 8XT Person A Person B Martin P Martin P Doyle Doyle 02392 988303 +1 312-253-7873 +1 312-253-7873 02392 988303
  • 22. Anomalies & Congruence eMail does not tally with name parts Currency does not tally with location Goods shipped before order Values not in application pick lists (metadata) Default values used Notes (memo) fields used without validation rules
  • 23. DQ Studio – identifying and fixing • Product demonstration by: • Martin Kerr • How to connect, identify and correct defects…
  • 24. DQ Studio Classify •Is the data in your database what you think it is? Compare •How similar is value A to value B in % similarity Format •Email •I.P. •Postcode •Telephone •URL Generate: •phonetic tokens •pattern tokens Transform data •13 Categories •5 Spoken Languages Validate •Email •I.P. Address •Postal code •Telephone •URL
  • 25. DQ Studio Derive: • Job Title • Role • Level • Gender • Male, female, unknown • Telephone • Country • Location • Number Type Parse: • Email • I.P. Address • Telephone Verify • Locations (240 Countries) • Phones • Businesses • Contacts
  • 27. Matching – What is it? • Identification and management of records which: • Are the same • Might be the same • Are not the same •PAF Batch •PAF Lookup •No Way •Gone Away •Passed Away •Append •Table v Table•Table v Itself Dedupe X- Match X-Ref API X-Ref Data
  • 28. How is it done? Black White Manually •Internally •External Bureau service Automatically •Software Using black and white magic... •Black = Matches •White = Non Matches •Grey = Ambiguous Carefully to avoid: •Too many matches •Too few matches •Errors in matches
  • 29. The grey areas - When is a match a match? Bob = Bobby = Rob= Robert = Robby= Roberto? Thomson = Thompson = Tomson = Thomson? Xerox = Zerocks? PO16 8XT = P0I6 8XT? +44 (0) 2392 988303 = O2392 9883O3? 10TH Feb 2009 = 10/02/09 = 02/10/2009? Hants = Hampshire = Hamps? martin.doyle@dqglobal.com = doyle.martin@dqglobal.com
  • 30. Grey to Black or Grey to White • Transformations (Synonyms) • Phonetics • String comparisons • Intelligence • Rules • Spelling • Typo‟s • Logic • Experience • Lookups
  • 31. Mastering Perfection & merging? Problems: • Which data survives? • Which data gets re-assigned? • Which data gets stored? • Which data gets thrown away Solutions: • Define the record master • Define the field merge rules • Use technology to automate processes • Humanise exceptions
  • 32. Perfect & Merge for Identify Perfect Merge
  • 33. Process flow CRM Database PrimaryID SecondaryID Score {D1C12E3A-B7F2-E211-95FC-0015F298503A} {EFF76F28-E8EE-E211-9968-0015F298503A} 100 {D1C12E3A-B7F2-E211-95FC-0015F298503A} {EE1F80ED-53F0-E211-BBCE-0015F298503A} 86 {E9C12E3A-B7F2-E211-95FC-0015F298503A} {07F86F28-E8EE-E211-9968-0015F298503A} 100 {E9C12E3A-B7F2-E211-95FC-0015F298503A} {062080ED-53F0-E211-BBCE-0015F298503A} 94 {FFC12E3A-B7F2-E211-95FC-0015F298503A} {1DF86F28-E8EE-E211-9968-0015F298503A} 100 {FFC12E3A-B7F2-E211-95FC-0015F298503A} {81F86F28-E8EE-E211-9968-0015F298503A} 92 {FFC12E3A-B7F2-E211-95FC-0015F298503A} {1C2080ED-53F0-E211-BBCE-0015F298503A} 99 {FFC12E3A-B7F2-E211-95FC-0015F298503A} {802080ED-53F0-E211-BBCE-0015F298503A} 100 {CDC12E3A-B7F2-E211-95FC-0015F298503A} {EBF76F28-E8EE-E211-9968-0015F298503A} 100 {CDC12E3A-B7F2-E211-95FC-0015F298503A} {4FF86F28-E8EE-E211-9968-0015F298503A} 82 {CDC12E3A-B7F2-E211-95FC-0015F298503A} {EA1F80ED-53F0-E211-BBCE-0015F298503A} 100 {CDC12E3A-B7F2-E211-95FC-0015F298503A} {4E2080ED-53F0-E211-BBCE-0015F298503A} 82 {71F86F28-E8EE-E211-9968-0015F298503A} {702080ED-53F0-E211-BBCE-0015F298503A} 100 {6BF86F28-E8EE-E211-9968-0015F298503A} {6A2080ED-53F0-E211-BBCE-0015F298503A} 100 {01C22E3A-B7F2-E211-95FC-0015F298503A} {1FF86F28-E8EE-E211-9968-0015F298503A} 100 {01C22E3A-B7F2-E211-95FC-0015F298503A} {83F86F28-E8EE-E211-9968-0015F298503A} 100 {01C22E3A-B7F2-E211-95FC-0015F298503A} {1E2080ED-53F0-E211-BBCE-0015F298503A} 100 {01C22E3A-B7F2-E211-95FC-0015F298503A} {822080ED-53F0-E211-BBCE-0015F298503A} 100
  • 34. Match demonstration Connecting Defining Identifying Reviewing Processing
  • 35. Cleaning up your business systems: Back-up your data Define pick lists Ensure legacy data conforms to picklists Delete any temporary fields set-up for test and still in the production system Delete or archive old data Identify contacts with no email and/or no telephone # Identify and correct contacts with bogus phone numbers Identify records whose email bounces Identify businesses without contacts Archive linked documents which are „n‟ years old, however, take care with legal including: invoices and contracts User admin – delete any users who no longer access systems Review any prospects, suspects or opportunities not properly closed i.e. > „n‟ weeks from opening
  • 36. Actions to consider… Change attitudes to “ABC” thinking Think prevention not cure Apply DQ processes Verify, Format & Validate Suppress records Merge duplicates Append missing data for segmentation Govern and Comply Measure & Manage Get a CXO sponsor Prune & Consolidate & Remove competition Common dictionary of terms Define customer value, and lifetime?
  • 37. In conclusion… Identify •recognise there is a problem? Qualify •gather evidence, what, when, where and how large is the problem? Quantify •what‟s specifically doing the damage? Accept •acknowledge the scale of the task? Define •the goals and what will be measured? Perform •carry out the tasks agreed in the order or significance
  • 38. Questions… • Build a better business based on trusted data… • Contact DQ Global • www.DQGlobal.com • Talk to a consultant • sales@DQGlobal.com • +44 2392 988303 (Europe) • +1 314-253-7873(North America)

Editor's Notes

  1. Its inherently true that at some level everyone understand the importance of data qualityGenerally, everyone agrees data quality is importantNot true that everyone cares about data qualityCertainly not true that everyone knows what actions to take to improve data quality
  2. Idea is to maximise the Open Area so that we all know as much as possible...This is why data profiling is critical to success in DQ Projects, if you dont know where you are you can’t plot a journey to where you’re going.
  3. Without some means of measurement - how does anyone know?Without governance how does anyone know who’s responsible?Without Measurement and Governance and an understanding of the downstream impacts of data quality, how does any business know how much low quality data is actually costing?
  4. MDIt doesn’t matter what the room temperature is, its always room temperature – Stephen Wright.In our scenario, if this scale related to body temperature, then too cold hyperthermia could be an issue, too hot and feverish then all sorts of complications are possible.
  5. Answers from a game show... Called Family Fortunes... Where the hard of thinking gave these answers which I thought were applicable to our context.A Boy&apos;s name beginning with the letter J: &quot;Gerald..&quot;A word beginning with Z: &quot;Xylophone..&quot;A part of the body beginning with N: &quot;Knee..“And now you know why SoundEX does not work well as a matching algorithm...A mode of transport that you can walk in: &quot;Your shoes..“ - That’s what happens with free text fields in databases – no validation!