SlideShare a Scribd company logo
1 of 18
DATA QUALITY
THE HOLY GRAIL OF A DATA FLUENT ORGANIZATION
Balvinder Khurana
2
Balvinder has 15 years of experience in building large-scale custom software and
big data platform solutions for complicated client problems. She has extensive
experience in Analysis, Design, Architecture,
and Development of Web based Enterprise systems and Analytical systems using
Agile practices like Scrum and XP.
Balvinder currently works as a Data Architect and Global Data Community Lead for
Thoughtworks
Data Architect
Balvinder Khurana
A little bit
about me..
We often hear organisations complaining about…
… the same things
“We are not able to do the RCA of failures with the available data”
“We do not know if we can monetize our data”
“Our assortment team doesn’t trust the data our platform is providing and they are still using their old
Excel-based mechanism to do assortment planning”
“Often our POS systems go down and we lose an entire chunk of data”
“We can’t use the data we have to build a credit scoring algorithm since our existing data has many
income groups missing”
Garbage in,
Garbage Out!
Your analysis is as good
as the underlying data.
Systemic Data
Quality Issues
Addressing data quality issues late in the process
Quite a lot of time gets spent in addressing data quality issues in downstream systems or
data platforms as opposed to in source systems.
Missing context
As we move downstream, context gets lost and addressing some data quality issues leads to
further more data quality issues.
Non-uniform definitions
The redressal for data quality issues isn’t often agreed upon with various teams and across
organisation which leads to trust issues in the underlying data.
Point solutions
Data quality gets looked at from the lens of the viewer, thereby causing myopic solutions
that are tactical in nature but don’t address the root cause.
Lack of strategy
Data quality is addressed tactically, and not as an integrated process or framework in the
entire ecosystem of products and platform of an organisation.
Under-estimating the impact
Data quality issues not only affect the downstream systems such as BI/Predictive dashboard,
but are a big reason for teams losing trust on the data platform and hence, become an
impediment to change management.
DEFINING DATA QUALITY
Data quality refers to the ability of a given set of
data to fulfill an intended purpose.
It is the degree to which a set of inherent characteristics fulfill the
requirements of a system, determine the fitness for use of the data and
ensure its conformance to the requirements.
7
DATA
QUALITY
Uniqueness
Integrity
Consistency
Trustworthiness Standardisation
Usability
Availability
Reliability
Relevance
Class Balance
Multidimensionality in Data Quality
● Accuracy
● Integrity
● Consistency
● Completeness
● Auditability
8
Uniqueness
Reliability
Consistency
Standardisation
Trustworthiness
Usability
Availability
Integrity
Relevance
Class Balance
DATA
QUALITY
● Accessibility
● Timeliness
● Authorization
● Fitness
● Value
● Freshness
● Documentation
● Credibility
● Metadata
● Statistical Bias
● Readability
● Definability
● Referenceability
● Reproducibility
● Interoperable
Multidimensionality and its tenets in Data Quality
Availability Usability Reliability Relevance Standardisation
Accessibility Documentation Accuracy Fitness Definability
Timeliness Credibility Integrity Value Referenceability
Authorization Metadata Consistency Freshness Reproducibility
Statistical Bias Completeness Interoperable
Readability Auditability
Tenets of Data Quality
Big Data ecosystems bring in additional complexities
Volume Variety Velocity Veracity
How do we have a
comprehensive data quality
control for PBs of data
How do we cater to multiple
types of data - structured,
semi-structured and
unstructured
How do we have a data
quality measure in time to
cater for high velocity
How to handle inherent
impreciseness and
uncertainty
Modern Data Platforms - A Conceptual view
How do we validate the success of our solution?
How do we validate and measure the correctness of the prices you recommend?
How do we validate our analytics accuracy?
How do we provide more transparency into data quality at every transformation stage in the data
pipeline for the development teams?
How do we establish trust with data and insights that I am provisioning to my business teams?
How do we enable teams to discover and use the data that is being collected in various systems?
How do I ensure legal and regulatory compliance?
Who is responsible for ensuring data quality within various systems?
12
Example - Pricing for a Retailer
Baseline data quality / sensible defaults
KPIs and
dashboards
Rules execution
engine
Rules authoring
Fit for purpose
data quality
Reports/
alerts
Fit for purpose
data quality
ML
algorithms
Fit for purpose
data quality
Ad-hoc
analysis
Fit for purpose
data quality
Preventive
and
corrective
action
Fit for purpose
data quality
Downstream
systems
Intermediate data quality
Metrics
definition
Metrics
definition
Critical
path
Critical
path
Interface to enable quick
discovery and navigation of right
dataset
Data Discovery
Metadata Ingestion/updation by
APIs such as Business Glossary,
Technical Metadata, Lineage etc
Metadata Service
Metadata Repositories e.g.
schemas/relations/lineage/
indexing services
Repository & Indexing
Service
Owners and SLA/SLO/SLI to
ensure Data Quality, for each
layer, including the business
process
Ownership of DQ
Data Quality Framework
Domain Data Products and Data Quality
Article
Fixed values of
Category
Article Price
Article Price can not be
negative
Unique Price point per
article per channel
Sales
Total amount can be
negative(Returns)
Competitor
Prices
Multiple Price points
per article per channel
Legacy Data
warehouse
Modern
Pricing System
POS / Online
Sales
Surveys/ Web
Crawlers
Dynamic
Pricing
algorithm
Article
Price &
Sales
There should be
no outliers in price
Reports
Discoverable
Addressable
Self-describing
Trustworthy
Interoperable
Secure
PRE-ETL VALIDATIONS
Format
Consistency
Completeness
Domain
Timeliness
POST-ETL & PRE-
SIMULATION VALIDATIONS
Meta data
Data Transformation
Data Completeness
Business specific
Scope
Joins
Data copy
SIMULATION
VALIDATIONS
Model Validation
Implementation
Computation
AGGREGATION
VALIDATION
Hierarchy
Data Scope
Summarized values
UI VALIDATIONS
Representation
Format
Intuitive
Data Quality across the pipeline
Goals
of data
collecting
Determining
quality
dimensions
Determining
indicators/
KPIs
Formulating
evaluation
baseline
Data
analysis
and data
mining
Data
cleaning
Output
results
Output
data
Data quality
assessment
Generating
data quality
report
New goals
Quick pilot*
Satisfy
goals?
Data
collection
Yes
*Improve
data quality
17
Mitigate
Prioritize
Quantify
Identify
Operationalising
Data Quality
Thank You!
Reach out to us:
@Balvinder

More Related Content

Similar to Data Quality_ the holy grail for a Data Fluent Organization.pptx

Data Integrity: From speed dating to lifelong partnership
Data Integrity: From speed dating to lifelong partnershipData Integrity: From speed dating to lifelong partnership
Data Integrity: From speed dating to lifelong partnershipPrecisely
 
Intro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent SoftwareIntro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent Softwarerafeq
 
Data Quality Assessment Manager (DQAM)
Data Quality Assessment Manager (DQAM)Data Quality Assessment Manager (DQAM)
Data Quality Assessment Manager (DQAM)AnalytiX DS
 
AI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfAI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfarifulislam946965
 
Intro of Key Features of SoftCAAT BI SQL Software
Intro of Key Features of SoftCAAT BI SQL SoftwareIntro of Key Features of SoftCAAT BI SQL Software
Intro of Key Features of SoftCAAT BI SQL Softwarerafeq
 
Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?Edgewater
 
Intro of Key Features of SoftCAAT Ent SQL Software
Intro of Key Features of SoftCAAT Ent SQL SoftwareIntro of Key Features of SoftCAAT Ent SQL Software
Intro of Key Features of SoftCAAT Ent SQL Softwarerafeq
 
Intro of Key Features of S-CAAT
Intro of Key Features of S-CAATIntro of Key Features of S-CAAT
Intro of Key Features of S-CAATrafeq
 
Data Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better ReportingData Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better Reportingaccenture
 
Guided Analytics vs. Self-Service BI: Choose Your Path to Data-driven Success!
Guided Analytics vs. Self-Service BI: Choose Your Path to Data-driven Success!Guided Analytics vs. Self-Service BI: Choose Your Path to Data-driven Success!
Guided Analytics vs. Self-Service BI: Choose Your Path to Data-driven Success!Polestar Solutions
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 
The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance Precisely
 
Px Solutions Business Intelligence Overview
Px Solutions Business Intelligence OverviewPx Solutions Business Intelligence Overview
Px Solutions Business Intelligence OverviewPX Solutions LLC
 
Kickstart a Data Quality Strategy to Build Trust in Your Data
Kickstart a Data Quality Strategy to Build Trust in Your DataKickstart a Data Quality Strategy to Build Trust in Your Data
Kickstart a Data Quality Strategy to Build Trust in Your DataPrecisely
 
Analytics in manufacturing
Analytics in manufacturingAnalytics in manufacturing
Analytics in manufacturingSaurav Kumar
 
Data Governance a Business Value Driven Approach
Data Governance a Business Value Driven ApproachData Governance a Business Value Driven Approach
Data Governance a Business Value Driven ApproachTridant
 
Driving Business Performance with effective Enterprise Information Management
Driving Business Performance with effective Enterprise Information ManagementDriving Business Performance with effective Enterprise Information Management
Driving Business Performance with effective Enterprise Information ManagementRay Bachert
 
Data Quality: The Cornerstone Of High-Yield Technology Investments
Data Quality: The Cornerstone Of High-Yield Technology InvestmentsData Quality: The Cornerstone Of High-Yield Technology Investments
Data Quality: The Cornerstone Of High-Yield Technology InvestmentsshaileshShetty34
 
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...Knoldus Inc.
 

Similar to Data Quality_ the holy grail for a Data Fluent Organization.pptx (20)

Data Integrity: From speed dating to lifelong partnership
Data Integrity: From speed dating to lifelong partnershipData Integrity: From speed dating to lifelong partnership
Data Integrity: From speed dating to lifelong partnership
 
Strategy For Data Quality
Strategy For Data QualityStrategy For Data Quality
Strategy For Data Quality
 
Intro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent SoftwareIntro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent Software
 
Data Quality Assessment Manager (DQAM)
Data Quality Assessment Manager (DQAM)Data Quality Assessment Manager (DQAM)
Data Quality Assessment Manager (DQAM)
 
AI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfAI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdf
 
Intro of Key Features of SoftCAAT BI SQL Software
Intro of Key Features of SoftCAAT BI SQL SoftwareIntro of Key Features of SoftCAAT BI SQL Software
Intro of Key Features of SoftCAAT BI SQL Software
 
Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?
 
Intro of Key Features of SoftCAAT Ent SQL Software
Intro of Key Features of SoftCAAT Ent SQL SoftwareIntro of Key Features of SoftCAAT Ent SQL Software
Intro of Key Features of SoftCAAT Ent SQL Software
 
Intro of Key Features of S-CAAT
Intro of Key Features of S-CAATIntro of Key Features of S-CAAT
Intro of Key Features of S-CAAT
 
Data Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better ReportingData Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better Reporting
 
Guided Analytics vs. Self-Service BI: Choose Your Path to Data-driven Success!
Guided Analytics vs. Self-Service BI: Choose Your Path to Data-driven Success!Guided Analytics vs. Self-Service BI: Choose Your Path to Data-driven Success!
Guided Analytics vs. Self-Service BI: Choose Your Path to Data-driven Success!
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance
 
Px Solutions Business Intelligence Overview
Px Solutions Business Intelligence OverviewPx Solutions Business Intelligence Overview
Px Solutions Business Intelligence Overview
 
Kickstart a Data Quality Strategy to Build Trust in Your Data
Kickstart a Data Quality Strategy to Build Trust in Your DataKickstart a Data Quality Strategy to Build Trust in Your Data
Kickstart a Data Quality Strategy to Build Trust in Your Data
 
Analytics in manufacturing
Analytics in manufacturingAnalytics in manufacturing
Analytics in manufacturing
 
Data Governance a Business Value Driven Approach
Data Governance a Business Value Driven ApproachData Governance a Business Value Driven Approach
Data Governance a Business Value Driven Approach
 
Driving Business Performance with effective Enterprise Information Management
Driving Business Performance with effective Enterprise Information ManagementDriving Business Performance with effective Enterprise Information Management
Driving Business Performance with effective Enterprise Information Management
 
Data Quality: The Cornerstone Of High-Yield Technology Investments
Data Quality: The Cornerstone Of High-Yield Technology InvestmentsData Quality: The Cornerstone Of High-Yield Technology Investments
Data Quality: The Cornerstone Of High-Yield Technology Investments
 
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...
Ensuring Data Quality in Databricks Unleashing the Power of Great Expectation...
 

More from Balvinder Hira

Real time insights for better products, customer experience and resilient pla...
Real time insights for better products, customer experience and resilient pla...Real time insights for better products, customer experience and resilient pla...
Real time insights for better products, customer experience and resilient pla...Balvinder Hira
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scaleBalvinder Hira
 
Time series analysis 101
Time series analysis 101Time series analysis 101
Time series analysis 101Balvinder Hira
 
Agile, qa and data projects geek night 2020
Agile, qa and data projects   geek night 2020Agile, qa and data projects   geek night 2020
Agile, qa and data projects geek night 2020Balvinder Hira
 
Pricing Deep learning model
Pricing Deep learning modelPricing Deep learning model
Pricing Deep learning modelBalvinder Hira
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scaleBalvinder Hira
 

More from Balvinder Hira (7)

Real time insights for better products, customer experience and resilient pla...
Real time insights for better products, customer experience and resilient pla...Real time insights for better products, customer experience and resilient pla...
Real time insights for better products, customer experience and resilient pla...
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
 
Time series analysis 101
Time series analysis 101Time series analysis 101
Time series analysis 101
 
Agile, qa and data projects geek night 2020
Agile, qa and data projects   geek night 2020Agile, qa and data projects   geek night 2020
Agile, qa and data projects geek night 2020
 
Pricing Deep learning model
Pricing Deep learning modelPricing Deep learning model
Pricing Deep learning model
 
Google Cloud Platform
Google Cloud PlatformGoogle Cloud Platform
Google Cloud Platform
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
 

Recently uploaded

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Data Quality_ the holy grail for a Data Fluent Organization.pptx

  • 1. DATA QUALITY THE HOLY GRAIL OF A DATA FLUENT ORGANIZATION Balvinder Khurana
  • 2. 2 Balvinder has 15 years of experience in building large-scale custom software and big data platform solutions for complicated client problems. She has extensive experience in Analysis, Design, Architecture, and Development of Web based Enterprise systems and Analytical systems using Agile practices like Scrum and XP. Balvinder currently works as a Data Architect and Global Data Community Lead for Thoughtworks Data Architect Balvinder Khurana A little bit about me..
  • 3. We often hear organisations complaining about… … the same things “We are not able to do the RCA of failures with the available data” “We do not know if we can monetize our data” “Our assortment team doesn’t trust the data our platform is providing and they are still using their old Excel-based mechanism to do assortment planning” “Often our POS systems go down and we lose an entire chunk of data” “We can’t use the data we have to build a credit scoring algorithm since our existing data has many income groups missing”
  • 4. Garbage in, Garbage Out! Your analysis is as good as the underlying data.
  • 5. Systemic Data Quality Issues Addressing data quality issues late in the process Quite a lot of time gets spent in addressing data quality issues in downstream systems or data platforms as opposed to in source systems. Missing context As we move downstream, context gets lost and addressing some data quality issues leads to further more data quality issues. Non-uniform definitions The redressal for data quality issues isn’t often agreed upon with various teams and across organisation which leads to trust issues in the underlying data. Point solutions Data quality gets looked at from the lens of the viewer, thereby causing myopic solutions that are tactical in nature but don’t address the root cause. Lack of strategy Data quality is addressed tactically, and not as an integrated process or framework in the entire ecosystem of products and platform of an organisation. Under-estimating the impact Data quality issues not only affect the downstream systems such as BI/Predictive dashboard, but are a big reason for teams losing trust on the data platform and hence, become an impediment to change management.
  • 6. DEFINING DATA QUALITY Data quality refers to the ability of a given set of data to fulfill an intended purpose. It is the degree to which a set of inherent characteristics fulfill the requirements of a system, determine the fitness for use of the data and ensure its conformance to the requirements.
  • 8. ● Accuracy ● Integrity ● Consistency ● Completeness ● Auditability 8 Uniqueness Reliability Consistency Standardisation Trustworthiness Usability Availability Integrity Relevance Class Balance DATA QUALITY ● Accessibility ● Timeliness ● Authorization ● Fitness ● Value ● Freshness ● Documentation ● Credibility ● Metadata ● Statistical Bias ● Readability ● Definability ● Referenceability ● Reproducibility ● Interoperable Multidimensionality and its tenets in Data Quality
  • 9. Availability Usability Reliability Relevance Standardisation Accessibility Documentation Accuracy Fitness Definability Timeliness Credibility Integrity Value Referenceability Authorization Metadata Consistency Freshness Reproducibility Statistical Bias Completeness Interoperable Readability Auditability Tenets of Data Quality
  • 10. Big Data ecosystems bring in additional complexities Volume Variety Velocity Veracity How do we have a comprehensive data quality control for PBs of data How do we cater to multiple types of data - structured, semi-structured and unstructured How do we have a data quality measure in time to cater for high velocity How to handle inherent impreciseness and uncertainty
  • 11. Modern Data Platforms - A Conceptual view
  • 12. How do we validate the success of our solution? How do we validate and measure the correctness of the prices you recommend? How do we validate our analytics accuracy? How do we provide more transparency into data quality at every transformation stage in the data pipeline for the development teams? How do we establish trust with data and insights that I am provisioning to my business teams? How do we enable teams to discover and use the data that is being collected in various systems? How do I ensure legal and regulatory compliance? Who is responsible for ensuring data quality within various systems? 12 Example - Pricing for a Retailer
  • 13. Baseline data quality / sensible defaults KPIs and dashboards Rules execution engine Rules authoring Fit for purpose data quality Reports/ alerts Fit for purpose data quality ML algorithms Fit for purpose data quality Ad-hoc analysis Fit for purpose data quality Preventive and corrective action Fit for purpose data quality Downstream systems Intermediate data quality Metrics definition Metrics definition Critical path Critical path Interface to enable quick discovery and navigation of right dataset Data Discovery Metadata Ingestion/updation by APIs such as Business Glossary, Technical Metadata, Lineage etc Metadata Service Metadata Repositories e.g. schemas/relations/lineage/ indexing services Repository & Indexing Service Owners and SLA/SLO/SLI to ensure Data Quality, for each layer, including the business process Ownership of DQ Data Quality Framework
  • 14. Domain Data Products and Data Quality Article Fixed values of Category Article Price Article Price can not be negative Unique Price point per article per channel Sales Total amount can be negative(Returns) Competitor Prices Multiple Price points per article per channel Legacy Data warehouse Modern Pricing System POS / Online Sales Surveys/ Web Crawlers Dynamic Pricing algorithm Article Price & Sales There should be no outliers in price Reports Discoverable Addressable Self-describing Trustworthy Interoperable Secure
  • 15. PRE-ETL VALIDATIONS Format Consistency Completeness Domain Timeliness POST-ETL & PRE- SIMULATION VALIDATIONS Meta data Data Transformation Data Completeness Business specific Scope Joins Data copy SIMULATION VALIDATIONS Model Validation Implementation Computation AGGREGATION VALIDATION Hierarchy Data Scope Summarized values UI VALIDATIONS Representation Format Intuitive Data Quality across the pipeline
  • 16. Goals of data collecting Determining quality dimensions Determining indicators/ KPIs Formulating evaluation baseline Data analysis and data mining Data cleaning Output results Output data Data quality assessment Generating data quality report New goals Quick pilot* Satisfy goals? Data collection Yes *Improve data quality
  • 18. Thank You! Reach out to us: @Balvinder

Editor's Notes

  1. 1 min
  2. 1 min
  3. Volume:comprehensive data quality assessment is not possible.The data quality measures are approximate define in terms of probability and confidence intervals Have a clear metric and metric definition for data quality Variety: Data is also being collected from external sources 1) data sets from the internet and mobile internet 2) data from the Internet of Things; 3) data collected by various industries; 4) scientific experimental and observational data Velocity: Need to have data quality measures which are relevant as well as feasible Sampling, data quality on fly, structural validations instead of semantic Veracity: How to make sure the trustworthiness of source of data, else, such data might skew your data quality report
  4. The client is a huge retailer and has reached out to you to help them price their entire assortment of articles based on number of data points that they collect, what is the demand of any product, what is the competitor price for same product, does the product have any seasonal value….. SLO/SLA/Governance teams Business are losing trust in data How to I ascertain my Data Quality How much to invest on data quality assurance Untrustworthy results or inaccurate insights from analytics were due to a lack of quality in the data fed into systems such as AI and machine learning
  5. Data Quality framework hierarchical data quality framework from the perspective of data users. This framework consists of big data quality dimensions, quality characteristics, and quality indexes ROI of data quality Define, Measure, Analyze, Design/Improve, and Verify/Control
  6. Data users and data providers are often different organizations with very different goals and operational procedures. Thus, it is no surprise that their notions of data quality are very different. In many cases, the data providers have no clue about the business use cases of data users (data providers might not even care about it, unless they are getting paid for the data). This disconnect between data source and data use is one of the prime reasons behind the data quality issues.
  7. Plan: Planning (or designing) phase consists of defining scope & business need, identifying stakeholders, clarifying business rules for data, and identifying business processes. The outcome of the planning phase should clearly communicate to relevant senior management as well as other stakeholders the objectives of the DQ work. Assess: This phase measures the existing data with respect to business policies, data standards, and business practices. Profiling is a key component of this phase and of course a lot has been written about profiling & assessment. Analyze: Typically, we use both quantitative and qualitative analytical techniques to do gap analysis of where the data quality should be based on what’s defined in planning phase and where the data quality actually is. Pilot: There may be variations in how different organizations deal with Pilot and Deploy phases but we recommend a Piloting phase to focus on specific actions needed to improve the data quality. Piloting phase might also identify any business processes that need to be adjusted to improve data quality on a sustaining basis. Deploy: Based on the outcomes of pilot phase, Deploy phase should focus on both business and technical solutions to improve data quality. The tendency of many organizations is to focus on technical solutions only and ignore business solutions but in our opinion, it is a major mistake. Maintain: It is very important to make sure that processes and control mechanisms should be put in place to maintain the data quality efforts on an ongoing basis. Data Governance will play an important role in making sure that data quality is maintained for a sustaining program.