SlideShare a Scribd company logo
Data Quality:
Principles, Approaches, and Best Practices
Carl Anderson
carl.anderson@weightwatchers.com
WW – the new Weight Watchers
1/3 business leaders frequently make
decisions with data they don’t trust
Bad data costs the economy $100s BN / year
[IBM]
[TDWI]
Data Science
Business
Intelligence
Engineering
Data Strategy
About Me
Big data:
● Food
● Activity
● Exercises
● Challenges
● Social network
● Workshops
● Personal Coaches
● CRM
● Fulfillment
● Meal kits
● Supermarket foods
● E-commerce
● Cruises
...for 56 years
2017: fill lake with data; provide analysts access
2019: upstream control and governance
Data Entry Transformation 1 Transformation 2
Inaccurate
(GIGO)
Missing
Defaults
Dropped
records
Truncation
Encoding
changes
Data type
change
Stale
3rd party
Disagree
In General, What Can Go Wrong?
Shape
change
Dupes
Dupes
Accurate
Coherent
Complete
Consistent
Defined
Timely
Missing data, duplicates
Referential integrity, connect the dots
Data entry issues, stale data, default dates...
Data dictionaries, business glossary, provenance, schema
Latency
Same values across systems, e.g. same address
Facets of Data Quality
Trust Analysts willing to use data. NPS
*
*
*
Accurate
% records quarantined
% records in range
% records matching
Coherent
% records missing entity ID
% records missing foreign key
Complete
% records dupes
% records missing
% records complete
% fields complete
Consistent % records consistent
Defined
% tables defined
% fields defined
% dimensions defined
% measures defined
Timely
Mean time to arrival
95th percentile time to arrival
Volume Number of Records
Trust NPS
“If you can't measure it, you
can't improve it”
- Peter Drucker
Data Quality
Scorecard
Facet: Accuracy
Publish Schema Publish Schema
Adhere to Schema
Field Ranges
Source teams then: Source teams now (WIP):
Data team superpowers:
1. Auto consumption
2. Auto checks
3. Quarantine
4. Reporting
Data did not always match schema
Hard to trust
Hard to automate
No accountability
Accurate
% records quarantined
% records in range
% records matching
Facet: Accuracy
Publish Schema Publish Schema
Adhere to Schema
Field Ranges
Source teams then: Source teams now (WIP):
Data team superpowers:
1. Auto consumption
2. Auto checks
3. Quarantine
4. Reporting
Data did not always match schema
Hard to trust
Hard to automate
No accountability
Facet: Defined
Table-level data dictionaries
Business-level data dictionary
(Business Glossary)
https://medium.com/@leapingllamas
Facet: Defined. Flow from master
Data catalog is
master for table-level
definitions and
business glossary
Mapping table from
master to BI tool: here,
Looker dimensions and
measures
Tool compares
master to BI tool and
updates/injects and
creates pull request
Manually
reviewed and
merged
Master definitions
appear to users
Facet: Defined. Flow from master
Data catalog is
master for table-level
definitions and
business glossary
Mapping table from
master to BI tool: here,
Looker dimensions and
measures
Tool compares
master to BI tool and
updates/injects and
creates pull request
Manually
reviewed and
merged
Master definitions
appear to users
Open sourcing: https://github.com/ww-tech/lookml-tools
Facet: Defined. Style Guide
Open sourcing: https://github.com/ww-tech/lookml-tools
LookML
linter
Defined
% tables defined
% fields defined
Facet: Defined
+
LookML
updater
LookML
linter
Defined
% dimensions defined
% measures defined
Easy to lose trust. Hard to regain!
We asked:
● NPS data: would you recommend our data to a friend?
● NPS infrastructure: would you recommend our infrastructure (Looker, BigQuery etc) to a friend?
● NPS support: would you recommend CIE’s support to a friend?
We will resurvey at end of 2019
In April, 2019, we surveyed data-related NPS with analysts, data scientists, and
some decisions makers and execs
Trust NPS
Facet: Trust
1 Accurate
% records quarantined
% records in range
% records matching
2 Coherent
% records missing entity ID
% records missing foreign key
3 Complete
% records dupes
% records missing
% records complete
% fields complete
4 Consistent % records consistent
5 Defined
% tables defined
% fields defined
% dimensions defined
% measures defined
6 Timely
Mean time to arrival
95th percentile time to arrival
7 Volume Number of Records
8 Trust NPS
“If you can't measure it, you
can't improve it”
- Peter Drucker
Data Quality
Scorecard
Reference Data
Server logs
Metadata
Schema
Data catalog +
lookml-tools
Survey
Integrate into normal workflows
Our engineers work in Slack, so let them do data quality work there too
Integrate into team culture
Agile BI engineering team
● BI engineering teams set aside 10% of time for explicit data quality work
● Expect DQ dashboards for all new sources
● Weekly data quality meetings
● Now proactive, rather than reactive or retrospective
Data Quality is a Shared Responsibility
Adhere to
Schema
Automated
consumption
DQ Dashboards
Subscribe /
Report
Value Ranges Automated checks
Data
dictionaries
Investigate Investigate
Data dictionaries
+ glossary
Investigate
Single Source of Truth
Investigate
Data Catalog
Data
dictionaries
docsschemaMonitor/
investigate
What Questions Do You Have For Me?
Carl Anderson
carl.anderson@weighwatchers.com
@leapingllamas
https://medium.com/ww-tech-blog
We are hiring:
BI engineers, engineers, and data scientists for our Toronto office (a few blocks away).
Find our booth in recruiting hall.

More Related Content

What's hot

DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
Tuba Yaman Him
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
DATAVERSITY
 
Approaching Data Quality
Approaching Data QualityApproaching Data Quality
Approaching Data Quality
DATAVERSITY
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and Governance
DATAVERSITY
 
Strategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management SystemsStrategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management Systems
Boris Otto
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)
DATAVERSITY
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
Precisely
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
DATAVERSITY
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
DATAVERSITY
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
DATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
Why data governance is the new buzz?
Why data governance is the new buzz?Why data governance is the new buzz?
Why data governance is the new buzz?
Aachen Data & AI Meetup
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
DATAVERSITY
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
Data Governance Program Powerpoint Presentation Slides
Data Governance Program Powerpoint Presentation SlidesData Governance Program Powerpoint Presentation Slides
Data Governance Program Powerpoint Presentation Slides
SlideTeam
 
Data Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherData Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working Together
DATAVERSITY
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
Gartner
 

What's hot (20)

DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Approaching Data Quality
Approaching Data QualityApproaching Data Quality
Approaching Data Quality
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and Governance
 
Strategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management SystemsStrategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management Systems
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Why data governance is the new buzz?
Why data governance is the new buzz?Why data governance is the new buzz?
Why data governance is the new buzz?
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Data Governance Program Powerpoint Presentation Slides
Data Governance Program Powerpoint Presentation SlidesData Governance Program Powerpoint Presentation Slides
Data Governance Program Powerpoint Presentation Slides
 
Data Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherData Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working Together
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
 

Similar to Data Quality: principles, approaches, and best practices

Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
Precisely
 
Data quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data qualityData quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data quality
JaveriaGauhar
 
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Health Catalyst
 
Analytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko DimeskiAnalytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko Dimeski
Deko Dimeski
 
Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation Slides
SlideTeam
 
Data Quality
Data QualityData Quality
Data Quality
Vijaya K
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bijeffd00
 
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
DATAVERSITY
 
Master Your Data. Master Your Business
Master Your Data. Master Your BusinessMaster Your Data. Master Your Business
Master Your Data. Master Your Business
DLT Solutions
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overview
jkvr
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Precisely
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Precisely
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
Mahmoud Alfarra
 
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...
Knoldus Inc.
 
From Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernanceFrom Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data Governance
Precisely
 
Building a Data Quality Program from Scratch
Building a Data Quality Program from ScratchBuilding a Data Quality Program from Scratch
Building a Data Quality Program from Scratchdmurph4
 
March 2016 PHXTUG Meeting
March 2016 PHXTUG MeetingMarch 2016 PHXTUG Meeting
March 2016 PHXTUG Meeting
Michael Perillo
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
Beth Fitzpatrick
 
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
Stephen Childs
 

Similar to Data Quality: principles, approaches, and best practices (20)

Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Data quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data qualityData quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data quality
 
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
 
Analytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko DimeskiAnalytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko Dimeski
 
Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation Slides
 
Data Quality
Data QualityData Quality
Data Quality
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
 
Master Your Data. Master Your Business
Master Your Data. Master Your BusinessMaster Your Data. Master Your Business
Master Your Data. Master Your Business
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overview
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...
 
From Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernanceFrom Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data Governance
 
Building a Data Quality Program from Scratch
Building a Data Quality Program from ScratchBuilding a Data Quality Program from Scratch
Building a Data Quality Program from Scratch
 
Tom Kunz
Tom KunzTom Kunz
Tom Kunz
 
March 2016 PHXTUG Meeting
March 2016 PHXTUG MeetingMarch 2016 PHXTUG Meeting
March 2016 PHXTUG Meeting
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
 
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
 

More from Carl Anderson

Inspiring healthy habits: data science at WW
Inspiring healthy habits: data science at WWInspiring healthy habits: data science at WW
Inspiring healthy habits: data science at WW
Carl Anderson
 
Leveraging an in-house modeling framework for fun and profit
Leveraging an in-house modeling framework for fun and profitLeveraging an in-house modeling framework for fun and profit
Leveraging an in-house modeling framework for fun and profit
Carl Anderson
 
Setting up Data Science for Success: The Data Layer
Setting up Data Science for Success: The Data LayerSetting up Data Science for Success: The Data Layer
Setting up Data Science for Success: The Data Layer
Carl Anderson
 
Geo@Work, keynote from Carto Spatial Data Science conference
Geo@Work, keynote from Carto Spatial Data Science conferenceGeo@Work, keynote from Carto Spatial Data Science conference
Geo@Work, keynote from Carto Spatial Data Science conference
Carl Anderson
 
Creating a Data-Driven Organization -- thisismetis meetup
Creating a Data-Driven Organization -- thisismetis meetupCreating a Data-Driven Organization -- thisismetis meetup
Creating a Data-Driven Organization -- thisismetis meetup
Carl Anderson
 
Creating a Data-Driven Organization, Data Day Texas, January 2016
Creating a Data-Driven Organization, Data Day Texas, January 2016Creating a Data-Driven Organization, Data Day Texas, January 2016
Creating a Data-Driven Organization, Data Day Texas, January 2016
Carl Anderson
 
Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015
Carl Anderson
 
Creating a Data-Driven Organization (Data Day Seattle 2015)
Creating a Data-Driven Organization (Data Day Seattle 2015)Creating a Data-Driven Organization (Data Day Seattle 2015)
Creating a Data-Driven Organization (Data Day Seattle 2015)
Carl Anderson
 
Creating a Data-Driven Organization: an executive summary
Creating a Data-Driven Organization: an executive summaryCreating a Data-Driven Organization: an executive summary
Creating a Data-Driven Organization: an executive summary
Carl Anderson
 

More from Carl Anderson (9)

Inspiring healthy habits: data science at WW
Inspiring healthy habits: data science at WWInspiring healthy habits: data science at WW
Inspiring healthy habits: data science at WW
 
Leveraging an in-house modeling framework for fun and profit
Leveraging an in-house modeling framework for fun and profitLeveraging an in-house modeling framework for fun and profit
Leveraging an in-house modeling framework for fun and profit
 
Setting up Data Science for Success: The Data Layer
Setting up Data Science for Success: The Data LayerSetting up Data Science for Success: The Data Layer
Setting up Data Science for Success: The Data Layer
 
Geo@Work, keynote from Carto Spatial Data Science conference
Geo@Work, keynote from Carto Spatial Data Science conferenceGeo@Work, keynote from Carto Spatial Data Science conference
Geo@Work, keynote from Carto Spatial Data Science conference
 
Creating a Data-Driven Organization -- thisismetis meetup
Creating a Data-Driven Organization -- thisismetis meetupCreating a Data-Driven Organization -- thisismetis meetup
Creating a Data-Driven Organization -- thisismetis meetup
 
Creating a Data-Driven Organization, Data Day Texas, January 2016
Creating a Data-Driven Organization, Data Day Texas, January 2016Creating a Data-Driven Organization, Data Day Texas, January 2016
Creating a Data-Driven Organization, Data Day Texas, January 2016
 
Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015
 
Creating a Data-Driven Organization (Data Day Seattle 2015)
Creating a Data-Driven Organization (Data Day Seattle 2015)Creating a Data-Driven Organization (Data Day Seattle 2015)
Creating a Data-Driven Organization (Data Day Seattle 2015)
 
Creating a Data-Driven Organization: an executive summary
Creating a Data-Driven Organization: an executive summaryCreating a Data-Driven Organization: an executive summary
Creating a Data-Driven Organization: an executive summary
 

Recently uploaded

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Data Quality: principles, approaches, and best practices

  • 1. Data Quality: Principles, Approaches, and Best Practices Carl Anderson carl.anderson@weightwatchers.com WW – the new Weight Watchers
  • 2. 1/3 business leaders frequently make decisions with data they don’t trust Bad data costs the economy $100s BN / year [IBM] [TDWI]
  • 4.
  • 5. Big data: ● Food ● Activity ● Exercises ● Challenges ● Social network ● Workshops ● Personal Coaches ● CRM ● Fulfillment ● Meal kits ● Supermarket foods ● E-commerce ● Cruises ...for 56 years
  • 6. 2017: fill lake with data; provide analysts access 2019: upstream control and governance
  • 7. Data Entry Transformation 1 Transformation 2 Inaccurate (GIGO) Missing Defaults Dropped records Truncation Encoding changes Data type change Stale 3rd party Disagree In General, What Can Go Wrong? Shape change Dupes Dupes
  • 8. Accurate Coherent Complete Consistent Defined Timely Missing data, duplicates Referential integrity, connect the dots Data entry issues, stale data, default dates... Data dictionaries, business glossary, provenance, schema Latency Same values across systems, e.g. same address Facets of Data Quality Trust Analysts willing to use data. NPS * * *
  • 9. Accurate % records quarantined % records in range % records matching Coherent % records missing entity ID % records missing foreign key Complete % records dupes % records missing % records complete % fields complete Consistent % records consistent Defined % tables defined % fields defined % dimensions defined % measures defined Timely Mean time to arrival 95th percentile time to arrival Volume Number of Records Trust NPS “If you can't measure it, you can't improve it” - Peter Drucker Data Quality Scorecard
  • 10. Facet: Accuracy Publish Schema Publish Schema Adhere to Schema Field Ranges Source teams then: Source teams now (WIP): Data team superpowers: 1. Auto consumption 2. Auto checks 3. Quarantine 4. Reporting Data did not always match schema Hard to trust Hard to automate No accountability
  • 11. Accurate % records quarantined % records in range % records matching Facet: Accuracy Publish Schema Publish Schema Adhere to Schema Field Ranges Source teams then: Source teams now (WIP): Data team superpowers: 1. Auto consumption 2. Auto checks 3. Quarantine 4. Reporting Data did not always match schema Hard to trust Hard to automate No accountability
  • 12. Facet: Defined Table-level data dictionaries Business-level data dictionary (Business Glossary) https://medium.com/@leapingllamas
  • 13. Facet: Defined. Flow from master Data catalog is master for table-level definitions and business glossary Mapping table from master to BI tool: here, Looker dimensions and measures Tool compares master to BI tool and updates/injects and creates pull request Manually reviewed and merged Master definitions appear to users
  • 14. Facet: Defined. Flow from master Data catalog is master for table-level definitions and business glossary Mapping table from master to BI tool: here, Looker dimensions and measures Tool compares master to BI tool and updates/injects and creates pull request Manually reviewed and merged Master definitions appear to users Open sourcing: https://github.com/ww-tech/lookml-tools
  • 15. Facet: Defined. Style Guide Open sourcing: https://github.com/ww-tech/lookml-tools LookML linter
  • 16. Defined % tables defined % fields defined Facet: Defined + LookML updater LookML linter Defined % dimensions defined % measures defined
  • 17. Easy to lose trust. Hard to regain! We asked: ● NPS data: would you recommend our data to a friend? ● NPS infrastructure: would you recommend our infrastructure (Looker, BigQuery etc) to a friend? ● NPS support: would you recommend CIE’s support to a friend? We will resurvey at end of 2019 In April, 2019, we surveyed data-related NPS with analysts, data scientists, and some decisions makers and execs Trust NPS Facet: Trust
  • 18. 1 Accurate % records quarantined % records in range % records matching 2 Coherent % records missing entity ID % records missing foreign key 3 Complete % records dupes % records missing % records complete % fields complete 4 Consistent % records consistent 5 Defined % tables defined % fields defined % dimensions defined % measures defined 6 Timely Mean time to arrival 95th percentile time to arrival 7 Volume Number of Records 8 Trust NPS “If you can't measure it, you can't improve it” - Peter Drucker Data Quality Scorecard Reference Data Server logs Metadata Schema Data catalog + lookml-tools Survey
  • 19. Integrate into normal workflows Our engineers work in Slack, so let them do data quality work there too
  • 20. Integrate into team culture Agile BI engineering team ● BI engineering teams set aside 10% of time for explicit data quality work ● Expect DQ dashboards for all new sources ● Weekly data quality meetings ● Now proactive, rather than reactive or retrospective
  • 21. Data Quality is a Shared Responsibility Adhere to Schema Automated consumption DQ Dashboards Subscribe / Report Value Ranges Automated checks Data dictionaries Investigate Investigate Data dictionaries + glossary Investigate Single Source of Truth Investigate Data Catalog Data dictionaries docsschemaMonitor/ investigate
  • 22. What Questions Do You Have For Me? Carl Anderson carl.anderson@weighwatchers.com @leapingllamas https://medium.com/ww-tech-blog We are hiring: BI engineers, engineers, and data scientists for our Toronto office (a few blocks away). Find our booth in recruiting hall.