Keeping the Pulse of Your Data: Why You Need Data Observability
The document discusses the importance of data observability in ensuring data integrity and quality in organizations. It highlights the increasing reliance on data and the necessity for proactive monitoring to identify anomalies that can lead to significant issues. Key benefits of data observability include the ability to detect, analyze, and remediate data discrepancies before they adversely affect business outcomes.
47%
of newly created
datarecords have at
least one critical error
68%
of organizations say
disparate data negatively
impacts their organization
84%
of CEOs say that they are concerned
about the integrity of the data they
are making decisions on
Data integrity is a business imperative
5.
Building at Scale
•Semiconductor companies manufacture a microchip
with over 2 trillion transistors on less than 2 inches, and
double the capacity every 2 years?
• Auto companies build a car on a production line with
over 30,000 parts spanning different raw materials and
manufacturing processes?
• Software and Data Engineers develop, merge and
deploy millions of lines of code in near real time
continuous delivery pipelines?
5
6.
• “W. EdwardsDeming The Father of Quality Management” started the observability concept 100 years ago
• Observability is a key foundational concept of SPC, Lean, Six Sigma and any process dependent on building quality into
repetitive tasks
• Using statistical methods to control complex processes to ensure quality data products over time
1. Continually improves by tightening your limits and flagging data issues.
2. Identify special (infrequent) and common (bad data) root causes
3. Provides context into data with lineage, sourcing and parentage
4. Automatic action(s) such data quality remediation, model retraining, issue escalation and data pipeline activities
How? Observability
6
7.
Why Now?
7
• Businessesare more data-driven
than ever
• Problematic events are infrequent
but can be catastrophic
• User’s data expertise has evolved
along with expectations to do more
with it
• Data proliferation and technology
diversification
• AI has evolved to support the
complexity of the problem
Cloud, on-premises,
hybrid cloud
Snowflake, Delta Lake,
Oracle, MS SQL Server,
Big Query, Redshift
Streaming data,
databases and files
SAP, Salesforce, and
ERP & CRM systems
Examples
8.
QA is doneat the
time of development
Random issues are
surfaced
Users find and
report defects
8
8
Typical Data Products and Pipelines
Traditionally, the quality of a data product or pipeline is ensured during the
development process and not throughout the operational lifecycle.
Data Product(s)
X
Data Source #1
?
Data Source #2
?
Data Source #3
?
Data Source #4
?
Create and/or
Source The Data
Transform
Data
Enrich / Blend /
Merge Data
Publish an
Expose Data
P
r
o
c
e
s
s
9.
9
9
Data Pipelines withObservability
Data Observability tools the performance of data products and processes in order to
detect significant variations before they result in the creation of erroneous work product in reports,
analytics, insights and outcomes.
Data Source #1 Data Source #2 Data Source #3
!
Data Source #4
Create and/or
Source The Data
Transform
Data
Enrich / Blend /
Merge Data
Publish an
Expose Data
P
r
o
c
e
s
s
Issues identified and resolved prior to final product
O
b
s
e
r
v
e
Data Product(s)
of your datawith continuous measuring and monitoring
into your data landscape and dependencies with intuitive
self-discovery capabilities
when outliers and anomalies are identified using artificial intelligence
when identified by intelligent analysis
1
2
3
4
when issues occur by understanding the cause of
the issue
5
Data Observability benefits
12
Data Observability andQuality
14
Rules
Metadata
• Alerts and dashboards for overall data health
trending and threshold analysis
• Anomaly detection based on volume, freshness,
distribution and schema metadata
• Predictive analysis simulating human intelligence
to identify potential adverse data integrity events
“Observability is the missing piece today to give our data stewards access
to data discovery insights without having to go to IT for queries or reports”
- Jean Paul Otte, CDO, Degroof Petercam
19
Data
Observability
Impact of
Unexpected
Values
An incorrectcurrency type in the order created an
inflated revenue amount which would have resulted in
the incorrect total revenue amount.
The error was caused because the currency conversion
table was not updated.
Use Case Recap
22
1.Data anomaly impacted
downstream processes
2. Impact of Unexpected Values
caused by an invalid currency type
3. Unexpected data values caused
by lack of communication internally
4. Data exploration to uncover data
inconsistencies
23.
The modular, interoperablePrecisely Data
Integrity Suite contains everything you need
to deliver accurate, consistent, contextual
data to your business - wherever and
whenever it’s needed.
23