SlideShare a Scribd company logo
With numerous data products relying on hundreds and thousands of external
and internal data sources, modern organizations now have a more significant
number of data use cases. To meet their growing data needs, they have
adopted advanced technologies and big data infrastructures.
The increasing complexity of the data stack, the sheer volume, variety, speed,
and quantity of data generated and collected, opens the door to more complex
issues like schema changes, random drifts or poor data quality, downtimes,
duplicate data, and other complex issues. The complexity of data
management is also exacerbated by the many data storage options, data
pipelines and an array of enterprise applications.
Data engineers and business executives responsible for maintaining and
building data infrastructures and systems are often overwhelmed. They do
their best to keep data systems functional and operational as much as
possible. There are no perfect systems, and data volumes can be
unpredictable. No matter how much money data teams have invested in the
cloud, how sophisticated an analytics dashboard is or how well-designed it is,
everything fails--if unreliable data is ingested, transformed, and pushed
downstream.
Modern data pipelines are interconnected and not intuitive. Because of this,
data from both internal and external sources can become inconsistent,
inaccurate, missing, or change suddenly, which could eventually impact the
correctness and accuracy of dependent data assets. Data and analytics teams
must be able to dig deep to find the root cause of any data issues and then
resolve them.
It isn't easy to achieve this without a comprehensive and complete view of the
entire data stack and its lifecycle. Data observability is valuable for data teams
and organizations to ensure data quality and a reliable data flow throughout
their day-to-day business operations.
Data observability is essential, organizations and teams should pay attention
to it in order to achieve their data-driven visions.
What is Data Observability?
While observability is most commonly used in engineering and software
systems, it is also essential in the data niche. Software engineers can monitor
the health and performance of their applications using tools like DataDog,
AppDynamics and NewRelic -- data teams must also do the same.
Data observability is the ability of an organization to keep a constant pulse on
their data systems through tracking, monitoring and troubleshooting issues to
reduce downtime, improve data quality, and eventually prevent issues from
happening.
It is also a collection of technologies and activities that allow data and
analytics teams to track data-related failures and walk upstream to determine
what is wrong at each level (quality, infrastructure, and computation). This
helps data teams to measure the operative and effective use of data and
understand what’s happening across every stage of the enterprise data life-
cycle.
Similar to the three pillars of observability, data observability has 5 pillars.
Each pillar answers a series of questions that allow data teams to get a
holistic view of data health and pipelines when they are combined and
continuously monitored. Let’s have a look at these questions:
 Freshness: Was all data received and is it current? What
upstream data was omitted/included? When was the last time
data was extracted/generated? Was the data received on time?
 Volume: Has all the data been received? Are all the data tables
complete?
 Distribution: To whom was the data sent? How useful and
complete is the data? Is the data reliable? What was the process
of transforming the data? Are the data values within an
acceptable range of value?
 Lineage: Who are the downstream ingesters of a data asset?
Who generates the data? Who will use the data to make business
decisions? What are the stages at which downstream ingesters
will use the data?
 Schema: Does the data format conform to the schema? What has
changed in the data schema? Who made the changes?
What Is the Importance of Data Observability?
Data observability goes beyond monitoring and alerting. It allows
organizations to understand their data systems fully and allows them to fix or
even prevent data problems in increasingly complex data situations.
1) Data observability increases trust in data so that businesses can
make data-driven business decisions confidently.
While data insights and machine-learning algorithms can be invaluable,
inaccurate or mismanaged data can have devastating consequences.
Public Health England (PHE), which tracks daily Covid-19 infection rates,
found an error in their data collection. This error caused 15,841 cases
between September 25 and October 2 to be overlooked. According to the
PHE, the Excel spreadsheet used to collect data exceeded its data limit. The
result was that the daily number of new cases was much higher than initially
reported. Tens of thousands of people who had tested positive for Covid-19
did not receive contact from the government's "test & trace" program. Data
observability allows organizations to track and monitor situations efficiently
and quickly. This allows them to make more informed decisions.
2) Data observability allows for the timely delivery of high-quality data to
support business workloads.
Every organization must ensure that data is easily accessible and in the
correct format. Almost every department in an organization relies on high-
quality data for business operations. Data scientists, data engineers, and data
analysts depend on the data to provide insights and analytics. A lack of quality
data can lead to costly business process breakdowns.
For example, your company has an ecommerce site with multiple data
sources (stock quantities, sales transactions, user analytics), which
consolidate into a data warehouse. To generate annual reports, the sales
department requires sales transaction data, the marketing department relies
on user analytics data to run effective marketing campaigns and data
scientists rely on data to build and deploy machine learning models that will
help them recommend products. It could cause harm to the various aspects of
the business if one of the data sources is incorrect or out of sync.
Data observability is a way to ensure the quality, reliability, and consistency of
data within the data pipeline. It gives organizations a 360-degree overview of
their data ecosystem. This allows them to drill down and fix any issues that
could disrupt their data pipeline.
3) Data observability allows you to identify and fix data issues before
they affect your business.
Pure monitoring systems have a significant flaw that they can only detect
unusual conditions or situations you know about or anticipate. But what about
those cases that you can't see coming?
A mistake caused by Amsterdam's City Council in 2014 led to the loss of
EUR188 million. Inadvertently, the error occurred because the software used
by the council to distribute housing benefits to low-income families was
programmed in cents rather than euros. Families received significantly more
than they anticipated due to the software error. People who were expected to
receive EUR155 received EUR15,500. Even more alarming is that
administrators were not notified of this error by the software.
Data observability can detect situations you don't know about or wouldn't
consider looking for. It can also prevent problems from becoming severe
business issues. Data observability allows you to track the relationships
between specific issues and provides context and pertinent information for
root cause analysis.
Top Data Observability Platforms for Monitoring Data
Quality at Scale
We understand how difficult it can be to find the right observability tool for your
company. Here is a list of the top platforms for data observability in 2022.
1) Monte Carlo
Monte Carlo's observability service offers a complete solution to prevent a
damaged data pipeline. This tool is an excellent choice for data engineers as it
allows them to check dependability and avoid expensive data downtime.
Monte Carlo has unique features, including data catalogs, alerts, and out-of-
the-box observability on multiple criteria.
2) Databand
Databand's goal is to make data engineering more efficient in a complex
infrastructure. Databand's AI-powered platform provides data engineers with
tools to optimize their operations and get a single view of all their data flows.
Its goal is to identify the core elements of data pipelines and where they have
failed before insufficient data can get through. The contemporary data stack
also includes cloud-native technologies like Apache Airflow or Snowflake.
3) Honeycomb
Honeycomb provides developers with the visibility needed to identify and fix
problems in distributed systems. The firm claims that Honeycomb helps
developers understand and fix complex interactions in dispersed services. Its
full-stack cloud observability technology provides logs, traces, events and
automated instrumented codes using Honeycomb beelines as its agent.
Honeycomb supports OpenTelemetry for the generation of instrumentation
information.
4) Acceldata
Acceldata is a data observability platform that provides data monitoring, data
dependability, and data observability solutions. These tools were created to
assist data engineers in gaining cross-sectional and extensive views of
complex data pipelines. Acceldata's products combine signals from many
layers and workloads into one pane of glass, allowing multiple teams to
collaborate on data problems.
Acceldata Pulse also provides performance monitoring and observability,
which helps to ensure data reliability at scale. This tool is designed for the
financial and payment industries.
5) Datafold
Datafold is a data observability tool that helps data teams assess data quality
and implement anomaly detection and profiling. Datafold's capabilities allow
teams to perform data quality assurance using data profiling. Users can also
compare tables within a database or multiple databases and generate smart
warnings with just one click. Data teams can also track ETL code changes
during data transfers and connect them to their CI/CD to quickly examine the
code.
6) SigNoz
SigNoz, an open-source full-stack APM/observability system that tracks
metrics and traces, is available as an open-source project. Open-source
means that users can host the program on their infrastructure without sharing
their data with third parties. Full-stack technologies include telemetry, backend
storage, and a visualization layer that allows consumption and actions. SigNoz
uses OpenTelemetry(a vendor-agnostic instrumentation library) to create
telemetry data.
7) DataDog
DataDog's observability software includes infrastructure, log management,
and application performance monitoring. DataDog gives you a complete view
of distributed applications by tracing requests from end-to-end distributed
systems. It also displays latency percentiles and open-source instrument
libraries. This is the "necessary monitoring and security platform for cloud
applications," according to its creators.
8) Dynatrace
Dynatrace is a SaaS application for enterprises that targets large companies
and addresses many monitoring needs. Their AI engine, Davis, can automate
root cause investigation and anomaly detection. The company's technology
may also be a unique solution to infrastructure monitoring, application
security, and cloud automation.
9) Grafana Laboratories
Grafana's open-source analytics and interactive visualization web layers are
well-known for accommodating multiple storage backends for time-series
data. Grafana supports connections to Graphite, ElasticSearch, InfluxDB and
Prometheus. It also supports traces from Jaeger, X-Ray, Tempo, and Zipkin. It
also offers plugins, dashboards, alarms, and other user-level access for
governance. Grafana Cloud offers solutions like Grafana Cloud Logs, Grafana
Cloud Traces and Grafana Cloud Metrics.
10) Soda
Soda's AI-powered platform for data observability is an environment that
allows data owners, engineers, and data analysts to work together to solve
problems. Soda.ai describes the technology as "a platform that enables teams
to define what good data looks like and handle errors quickly before they have
a downstream impact." This tool allow users to examine their data and create
rules to validate it quickly.
Implementation of a Data Observability Framework
Data observability is an "outcome" of the DataOps movement. Even though
you can have the most advanced automation and algorithms to monitor your
metadata, it will only benefit with organizational adoption. However, anyone
can adopt DataOps as an organization, but it will be a well-documented
philosophy that doesn't impact output without the technology to support it.
So, how do you implement a data observability framework that improves your
data quality at all levels? What metrics should be tracked at each stage of the
data observability framework?
These are the key ingredients for a highly-functional data observability
framework:
i) DataOps Culture
ii) Standardized Data platform
iii) Unified Data Observability Platform
Before you can even consider producing high-value data products, you must
have widespread adoption of the DataOps Culture. This requires everyone to
be involved, especially leadership. They will be the ones who create the
systems and processes that support development, maintenance, feedback,
and other activities. A bottom-up movement is powerful, but you still need
budget approvals to make the necessary technological changes to support
DataOps.
Leadership can help the organization move towards a standardized data
platform if everyone buys into the idea. What does this mean? To ensure that
all teams have end-to-end accountability and ownership, infrastructure must
be in place to allow them to communicate openly and speak the same
language. Standard libraries are needed for API and data management (i.e.,
querying the data warehouse, reading/writing to the data lake, pulling
information from APIs, etc.) A standardized library is also required to ensure
data quality along with source code tracking, data versioning, and CI/CD
processes. With all this in place, your infrastructure is ready for success.
You now need an open, unified platform for monitoring your system's health
that allows your entire organization to access it. The observability platform will
act as a central metadata repository. It would include all of the features
mentioned earlier (like monitoring and alerting, tracking, comparison and
analysis), so data teams could view how other platform sections affect them.
To effectively monitor the functioning of the Data Observability Framework,
you should monitor the following metrics:
1) Operational Health:
 Execution Metadata
 Pipeline State
 Delays
2) Dataset Monitoring:
 Availability
 Freshness
 Volume
 Schema Change
3) Column-level Profiling:
 Summary statistics
 Anomaly detection
4) Row-level Validation:
 Business rule enforcement
 Stop "bad data"
To ensure operational health, it's best to collect execution metadata. This
metadata includes information about pipeline states, length, delays, retries,
and the times between runs. You should monitor the completeness and
availability of your data along with the volume and changes to the schema.
You should collect summary statistics for columns and use anomaly detection
to alert you of any changes. The column trends would include the Mean, Max,
and Min. Row-level validation would require you to ensure that previous
checks were valid and adhered to your business rules. This is very contextual,
so you will need to exercise your discretion.
Conclusion
Data observability is essential for any data team to be agile and iterate quickly
on their products. Without data observability it's difficult for teams to rely on
their infrastructure or tools because errors can't be tracked quickly. This
results in less flexibility in developing new features or improvements for
customers. You're effectively wasting money if you are not investing in this
critical piece of the DataOps framework in 2022.

More Related Content

Similar to Data Observability- The Next Frontier of Data Engineering Pdf.pdf

NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docxNFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docx
curwenmichaela
 
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth Enhancement
IRJET Journal
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business IntelligenceSukirti Garg
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
Shahbaz Anjam
 
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Stuart Blair
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
data collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptxdata collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptx
Sourabhkumar729579
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of data
Harsha MV
 
intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefLindy-Anne Botha
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
Vipul Kalamkar
 
new.pptx
new.pptxnew.pptx
Go from data to decision in one unified platform.pdf
Go from data to decision in one unified platform.pdfGo from data to decision in one unified platform.pdf
Go from data to decision in one unified platform.pdf
webmaster553228
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonIBM Danmark
 
Modern trends in information systems
Modern trends in information systemsModern trends in information systems
Modern trends in information systems
Preeti Sontakke
 
Sgcp14dunlea
Sgcp14dunleaSgcp14dunlea
Sgcp14dunlea
Justin Hayward
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Srikanth Sharma Boddupalli
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
PrabhaJoshi4
 
The Architecture for Rapid Decisions
The Architecture for Rapid DecisionsThe Architecture for Rapid Decisions
The Architecture for Rapid Decisions
Kishore Jethanandani, MBA, MA, MPhil,
 
DATA VIRTUALIZATION FOR DECISION MAKING IN BIG DATA
DATA VIRTUALIZATION FOR DECISION MAKING IN BIG DATADATA VIRTUALIZATION FOR DECISION MAKING IN BIG DATA
DATA VIRTUALIZATION FOR DECISION MAKING IN BIG DATA
ijseajournal
 

Similar to Data Observability- The Next Frontier of Data Engineering Pdf.pdf (20)

NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docxNFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docx
 
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth Enhancement
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
data collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptxdata collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptx
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of data
 
intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-brief
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
new.pptx
new.pptxnew.pptx
new.pptx
 
Taming the data beast
Taming the data beastTaming the data beast
Taming the data beast
 
Go from data to decision in one unified platform.pdf
Go from data to decision in one unified platform.pdfGo from data to decision in one unified platform.pdf
Go from data to decision in one unified platform.pdf
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter Jönsson
 
Modern trends in information systems
Modern trends in information systemsModern trends in information systems
Modern trends in information systems
 
Sgcp14dunlea
Sgcp14dunleaSgcp14dunlea
Sgcp14dunlea
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
The Architecture for Rapid Decisions
The Architecture for Rapid DecisionsThe Architecture for Rapid Decisions
The Architecture for Rapid Decisions
 
DATA VIRTUALIZATION FOR DECISION MAKING IN BIG DATA
DATA VIRTUALIZATION FOR DECISION MAKING IN BIG DATADATA VIRTUALIZATION FOR DECISION MAKING IN BIG DATA
DATA VIRTUALIZATION FOR DECISION MAKING IN BIG DATA
 

More from Data Science Council of America

The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
Data Science Council of America
 
Why Data Scientists Should Learn Machine Learning.pdf
Why Data Scientists Should Learn Machine Learning.pdfWhy Data Scientists Should Learn Machine Learning.pdf
Why Data Scientists Should Learn Machine Learning.pdf
Data Science Council of America
 
The Value of Data Visualization for Data Science Professionals.pdf
The Value of Data Visualization for Data Science Professionals.pdfThe Value of Data Visualization for Data Science Professionals.pdf
The Value of Data Visualization for Data Science Professionals.pdf
Data Science Council of America
 
Top 3 Interesting Careers in Big Data.pdf
Top 3 Interesting Careers in Big Data.pdfTop 3 Interesting Careers in Big Data.pdf
Top 3 Interesting Careers in Big Data.pdf
Data Science Council of America
 
Achieving Business Success with Data.pdf
Achieving Business Success with Data.pdfAchieving Business Success with Data.pdf
Achieving Business Success with Data.pdf
Data Science Council of America
 
Data Science - The New Skill for Today’s Entrepreneurs.pdf
Data Science - The New Skill for Today’s Entrepreneurs.pdfData Science - The New Skill for Today’s Entrepreneurs.pdf
Data Science - The New Skill for Today’s Entrepreneurs.pdf
Data Science Council of America
 
Know How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfKnow How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdf
Data Science Council of America
 
Pandas vs. SQL – Tools that Data Scientists use most often.pdf
Pandas vs. SQL – Tools that Data Scientists use most often.pdfPandas vs. SQL – Tools that Data Scientists use most often.pdf
Pandas vs. SQL – Tools that Data Scientists use most often.pdf
Data Science Council of America
 
Augmented Analytics The Future Of Data & Analytics.pdf
Augmented Analytics The Future Of Data & Analytics.pdfAugmented Analytics The Future Of Data & Analytics.pdf
Augmented Analytics The Future Of Data & Analytics.pdf
Data Science Council of America
 
Is Data Visualization Literacy Part of Your Company Culture.pdf
Is Data Visualization Literacy Part of Your Company Culture.pdfIs Data Visualization Literacy Part of Your Company Culture.pdf
Is Data Visualization Literacy Part of Your Company Culture.pdf
Data Science Council of America
 
Maximize Your D&A Strategy The Role Of A Citizen Data Scientist.pdf
Maximize Your D&A Strategy The Role Of A Citizen Data Scientist.pdfMaximize Your D&A Strategy The Role Of A Citizen Data Scientist.pdf
Maximize Your D&A Strategy The Role Of A Citizen Data Scientist.pdf
Data Science Council of America
 
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
Data Science Council of America
 
Importance of Data-Driven Storytelling Data Analysis &amp Visual Narratives.pdf
Importance of Data-Driven Storytelling Data Analysis &amp Visual Narratives.pdfImportance of Data-Driven Storytelling Data Analysis &amp Visual Narratives.pdf
Importance of Data-Driven Storytelling Data Analysis &amp Visual Narratives.pdf
Data Science Council of America
 
Top Trends & Predictions That Will Drive Data Science in 2022.pdf
Top Trends & Predictions That Will Drive Data Science in 2022.pdfTop Trends & Predictions That Will Drive Data Science in 2022.pdf
Top Trends & Predictions That Will Drive Data Science in 2022.pdf
Data Science Council of America
 
Essential capabilities of data scientist to have in 2022
Essential capabilities of data scientist to have in 2022Essential capabilities of data scientist to have in 2022
Essential capabilities of data scientist to have in 2022
Data Science Council of America
 
Senior Data Scientist
Senior Data ScientistSenior Data Scientist
Senior Data Scientist
Data Science Council of America
 
Senior Big Data Analyst
Senior Big Data AnalystSenior Big Data Analyst
Senior Big Data Analyst
Data Science Council of America
 
Associate Big Data Analyst | ABDA
Associate Big Data Analyst | ABDAAssociate Big Data Analyst | ABDA
Associate Big Data Analyst | ABDA
Data Science Council of America
 
Senior Big Data Engineer Certification
Senior Big Data Engineer CertificationSenior Big Data Engineer Certification
Senior Big Data Engineer Certification
Data Science Council of America
 

More from Data Science Council of America (19)

The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Why Data Scientists Should Learn Machine Learning.pdf
Why Data Scientists Should Learn Machine Learning.pdfWhy Data Scientists Should Learn Machine Learning.pdf
Why Data Scientists Should Learn Machine Learning.pdf
 
The Value of Data Visualization for Data Science Professionals.pdf
The Value of Data Visualization for Data Science Professionals.pdfThe Value of Data Visualization for Data Science Professionals.pdf
The Value of Data Visualization for Data Science Professionals.pdf
 
Top 3 Interesting Careers in Big Data.pdf
Top 3 Interesting Careers in Big Data.pdfTop 3 Interesting Careers in Big Data.pdf
Top 3 Interesting Careers in Big Data.pdf
 
Achieving Business Success with Data.pdf
Achieving Business Success with Data.pdfAchieving Business Success with Data.pdf
Achieving Business Success with Data.pdf
 
Data Science - The New Skill for Today’s Entrepreneurs.pdf
Data Science - The New Skill for Today’s Entrepreneurs.pdfData Science - The New Skill for Today’s Entrepreneurs.pdf
Data Science - The New Skill for Today’s Entrepreneurs.pdf
 
Know How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfKnow How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdf
 
Pandas vs. SQL – Tools that Data Scientists use most often.pdf
Pandas vs. SQL – Tools that Data Scientists use most often.pdfPandas vs. SQL – Tools that Data Scientists use most often.pdf
Pandas vs. SQL – Tools that Data Scientists use most often.pdf
 
Augmented Analytics The Future Of Data & Analytics.pdf
Augmented Analytics The Future Of Data & Analytics.pdfAugmented Analytics The Future Of Data & Analytics.pdf
Augmented Analytics The Future Of Data & Analytics.pdf
 
Is Data Visualization Literacy Part of Your Company Culture.pdf
Is Data Visualization Literacy Part of Your Company Culture.pdfIs Data Visualization Literacy Part of Your Company Culture.pdf
Is Data Visualization Literacy Part of Your Company Culture.pdf
 
Maximize Your D&A Strategy The Role Of A Citizen Data Scientist.pdf
Maximize Your D&A Strategy The Role Of A Citizen Data Scientist.pdfMaximize Your D&A Strategy The Role Of A Citizen Data Scientist.pdf
Maximize Your D&A Strategy The Role Of A Citizen Data Scientist.pdf
 
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
 
Importance of Data-Driven Storytelling Data Analysis &amp Visual Narratives.pdf
Importance of Data-Driven Storytelling Data Analysis &amp Visual Narratives.pdfImportance of Data-Driven Storytelling Data Analysis &amp Visual Narratives.pdf
Importance of Data-Driven Storytelling Data Analysis &amp Visual Narratives.pdf
 
Top Trends & Predictions That Will Drive Data Science in 2022.pdf
Top Trends & Predictions That Will Drive Data Science in 2022.pdfTop Trends & Predictions That Will Drive Data Science in 2022.pdf
Top Trends & Predictions That Will Drive Data Science in 2022.pdf
 
Essential capabilities of data scientist to have in 2022
Essential capabilities of data scientist to have in 2022Essential capabilities of data scientist to have in 2022
Essential capabilities of data scientist to have in 2022
 
Senior Data Scientist
Senior Data ScientistSenior Data Scientist
Senior Data Scientist
 
Senior Big Data Analyst
Senior Big Data AnalystSenior Big Data Analyst
Senior Big Data Analyst
 
Associate Big Data Analyst | ABDA
Associate Big Data Analyst | ABDAAssociate Big Data Analyst | ABDA
Associate Big Data Analyst | ABDA
 
Senior Big Data Engineer Certification
Senior Big Data Engineer CertificationSenior Big Data Engineer Certification
Senior Big Data Engineer Certification
 

Recently uploaded

Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 

Recently uploaded (20)

Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 

Data Observability- The Next Frontier of Data Engineering Pdf.pdf

  • 1. With numerous data products relying on hundreds and thousands of external and internal data sources, modern organizations now have a more significant number of data use cases. To meet their growing data needs, they have adopted advanced technologies and big data infrastructures. The increasing complexity of the data stack, the sheer volume, variety, speed, and quantity of data generated and collected, opens the door to more complex issues like schema changes, random drifts or poor data quality, downtimes, duplicate data, and other complex issues. The complexity of data management is also exacerbated by the many data storage options, data pipelines and an array of enterprise applications. Data engineers and business executives responsible for maintaining and building data infrastructures and systems are often overwhelmed. They do their best to keep data systems functional and operational as much as possible. There are no perfect systems, and data volumes can be unpredictable. No matter how much money data teams have invested in the cloud, how sophisticated an analytics dashboard is or how well-designed it is, everything fails--if unreliable data is ingested, transformed, and pushed downstream. Modern data pipelines are interconnected and not intuitive. Because of this, data from both internal and external sources can become inconsistent, inaccurate, missing, or change suddenly, which could eventually impact the correctness and accuracy of dependent data assets. Data and analytics teams must be able to dig deep to find the root cause of any data issues and then resolve them. It isn't easy to achieve this without a comprehensive and complete view of the entire data stack and its lifecycle. Data observability is valuable for data teams and organizations to ensure data quality and a reliable data flow throughout their day-to-day business operations. Data observability is essential, organizations and teams should pay attention to it in order to achieve their data-driven visions. What is Data Observability? While observability is most commonly used in engineering and software systems, it is also essential in the data niche. Software engineers can monitor the health and performance of their applications using tools like DataDog, AppDynamics and NewRelic -- data teams must also do the same.
  • 2. Data observability is the ability of an organization to keep a constant pulse on their data systems through tracking, monitoring and troubleshooting issues to reduce downtime, improve data quality, and eventually prevent issues from happening. It is also a collection of technologies and activities that allow data and analytics teams to track data-related failures and walk upstream to determine what is wrong at each level (quality, infrastructure, and computation). This helps data teams to measure the operative and effective use of data and understand what’s happening across every stage of the enterprise data life- cycle. Similar to the three pillars of observability, data observability has 5 pillars. Each pillar answers a series of questions that allow data teams to get a holistic view of data health and pipelines when they are combined and continuously monitored. Let’s have a look at these questions:  Freshness: Was all data received and is it current? What upstream data was omitted/included? When was the last time data was extracted/generated? Was the data received on time?  Volume: Has all the data been received? Are all the data tables complete?  Distribution: To whom was the data sent? How useful and complete is the data? Is the data reliable? What was the process of transforming the data? Are the data values within an acceptable range of value?  Lineage: Who are the downstream ingesters of a data asset? Who generates the data? Who will use the data to make business decisions? What are the stages at which downstream ingesters will use the data?  Schema: Does the data format conform to the schema? What has changed in the data schema? Who made the changes? What Is the Importance of Data Observability? Data observability goes beyond monitoring and alerting. It allows organizations to understand their data systems fully and allows them to fix or even prevent data problems in increasingly complex data situations.
  • 3. 1) Data observability increases trust in data so that businesses can make data-driven business decisions confidently. While data insights and machine-learning algorithms can be invaluable, inaccurate or mismanaged data can have devastating consequences. Public Health England (PHE), which tracks daily Covid-19 infection rates, found an error in their data collection. This error caused 15,841 cases between September 25 and October 2 to be overlooked. According to the PHE, the Excel spreadsheet used to collect data exceeded its data limit. The result was that the daily number of new cases was much higher than initially reported. Tens of thousands of people who had tested positive for Covid-19 did not receive contact from the government's "test & trace" program. Data observability allows organizations to track and monitor situations efficiently and quickly. This allows them to make more informed decisions. 2) Data observability allows for the timely delivery of high-quality data to support business workloads. Every organization must ensure that data is easily accessible and in the correct format. Almost every department in an organization relies on high- quality data for business operations. Data scientists, data engineers, and data analysts depend on the data to provide insights and analytics. A lack of quality data can lead to costly business process breakdowns. For example, your company has an ecommerce site with multiple data sources (stock quantities, sales transactions, user analytics), which consolidate into a data warehouse. To generate annual reports, the sales department requires sales transaction data, the marketing department relies on user analytics data to run effective marketing campaigns and data scientists rely on data to build and deploy machine learning models that will help them recommend products. It could cause harm to the various aspects of the business if one of the data sources is incorrect or out of sync. Data observability is a way to ensure the quality, reliability, and consistency of data within the data pipeline. It gives organizations a 360-degree overview of their data ecosystem. This allows them to drill down and fix any issues that could disrupt their data pipeline. 3) Data observability allows you to identify and fix data issues before they affect your business.
  • 4. Pure monitoring systems have a significant flaw that they can only detect unusual conditions or situations you know about or anticipate. But what about those cases that you can't see coming? A mistake caused by Amsterdam's City Council in 2014 led to the loss of EUR188 million. Inadvertently, the error occurred because the software used by the council to distribute housing benefits to low-income families was programmed in cents rather than euros. Families received significantly more than they anticipated due to the software error. People who were expected to receive EUR155 received EUR15,500. Even more alarming is that administrators were not notified of this error by the software. Data observability can detect situations you don't know about or wouldn't consider looking for. It can also prevent problems from becoming severe business issues. Data observability allows you to track the relationships between specific issues and provides context and pertinent information for root cause analysis. Top Data Observability Platforms for Monitoring Data Quality at Scale We understand how difficult it can be to find the right observability tool for your company. Here is a list of the top platforms for data observability in 2022. 1) Monte Carlo Monte Carlo's observability service offers a complete solution to prevent a damaged data pipeline. This tool is an excellent choice for data engineers as it allows them to check dependability and avoid expensive data downtime. Monte Carlo has unique features, including data catalogs, alerts, and out-of- the-box observability on multiple criteria. 2) Databand Databand's goal is to make data engineering more efficient in a complex infrastructure. Databand's AI-powered platform provides data engineers with tools to optimize their operations and get a single view of all their data flows. Its goal is to identify the core elements of data pipelines and where they have failed before insufficient data can get through. The contemporary data stack also includes cloud-native technologies like Apache Airflow or Snowflake. 3) Honeycomb Honeycomb provides developers with the visibility needed to identify and fix problems in distributed systems. The firm claims that Honeycomb helps
  • 5. developers understand and fix complex interactions in dispersed services. Its full-stack cloud observability technology provides logs, traces, events and automated instrumented codes using Honeycomb beelines as its agent. Honeycomb supports OpenTelemetry for the generation of instrumentation information. 4) Acceldata Acceldata is a data observability platform that provides data monitoring, data dependability, and data observability solutions. These tools were created to assist data engineers in gaining cross-sectional and extensive views of complex data pipelines. Acceldata's products combine signals from many layers and workloads into one pane of glass, allowing multiple teams to collaborate on data problems. Acceldata Pulse also provides performance monitoring and observability, which helps to ensure data reliability at scale. This tool is designed for the financial and payment industries. 5) Datafold Datafold is a data observability tool that helps data teams assess data quality and implement anomaly detection and profiling. Datafold's capabilities allow teams to perform data quality assurance using data profiling. Users can also compare tables within a database or multiple databases and generate smart warnings with just one click. Data teams can also track ETL code changes during data transfers and connect them to their CI/CD to quickly examine the code. 6) SigNoz SigNoz, an open-source full-stack APM/observability system that tracks metrics and traces, is available as an open-source project. Open-source means that users can host the program on their infrastructure without sharing their data with third parties. Full-stack technologies include telemetry, backend storage, and a visualization layer that allows consumption and actions. SigNoz uses OpenTelemetry(a vendor-agnostic instrumentation library) to create telemetry data. 7) DataDog DataDog's observability software includes infrastructure, log management, and application performance monitoring. DataDog gives you a complete view of distributed applications by tracing requests from end-to-end distributed
  • 6. systems. It also displays latency percentiles and open-source instrument libraries. This is the "necessary monitoring and security platform for cloud applications," according to its creators. 8) Dynatrace Dynatrace is a SaaS application for enterprises that targets large companies and addresses many monitoring needs. Their AI engine, Davis, can automate root cause investigation and anomaly detection. The company's technology may also be a unique solution to infrastructure monitoring, application security, and cloud automation. 9) Grafana Laboratories Grafana's open-source analytics and interactive visualization web layers are well-known for accommodating multiple storage backends for time-series data. Grafana supports connections to Graphite, ElasticSearch, InfluxDB and Prometheus. It also supports traces from Jaeger, X-Ray, Tempo, and Zipkin. It also offers plugins, dashboards, alarms, and other user-level access for governance. Grafana Cloud offers solutions like Grafana Cloud Logs, Grafana Cloud Traces and Grafana Cloud Metrics. 10) Soda Soda's AI-powered platform for data observability is an environment that allows data owners, engineers, and data analysts to work together to solve problems. Soda.ai describes the technology as "a platform that enables teams to define what good data looks like and handle errors quickly before they have a downstream impact." This tool allow users to examine their data and create rules to validate it quickly. Implementation of a Data Observability Framework Data observability is an "outcome" of the DataOps movement. Even though you can have the most advanced automation and algorithms to monitor your metadata, it will only benefit with organizational adoption. However, anyone can adopt DataOps as an organization, but it will be a well-documented philosophy that doesn't impact output without the technology to support it. So, how do you implement a data observability framework that improves your data quality at all levels? What metrics should be tracked at each stage of the data observability framework? These are the key ingredients for a highly-functional data observability framework:
  • 7. i) DataOps Culture ii) Standardized Data platform iii) Unified Data Observability Platform Before you can even consider producing high-value data products, you must have widespread adoption of the DataOps Culture. This requires everyone to be involved, especially leadership. They will be the ones who create the systems and processes that support development, maintenance, feedback, and other activities. A bottom-up movement is powerful, but you still need budget approvals to make the necessary technological changes to support DataOps. Leadership can help the organization move towards a standardized data platform if everyone buys into the idea. What does this mean? To ensure that all teams have end-to-end accountability and ownership, infrastructure must be in place to allow them to communicate openly and speak the same language. Standard libraries are needed for API and data management (i.e., querying the data warehouse, reading/writing to the data lake, pulling information from APIs, etc.) A standardized library is also required to ensure data quality along with source code tracking, data versioning, and CI/CD processes. With all this in place, your infrastructure is ready for success. You now need an open, unified platform for monitoring your system's health that allows your entire organization to access it. The observability platform will act as a central metadata repository. It would include all of the features mentioned earlier (like monitoring and alerting, tracking, comparison and analysis), so data teams could view how other platform sections affect them. To effectively monitor the functioning of the Data Observability Framework, you should monitor the following metrics: 1) Operational Health:  Execution Metadata  Pipeline State  Delays 2) Dataset Monitoring:  Availability  Freshness
  • 8.  Volume  Schema Change 3) Column-level Profiling:  Summary statistics  Anomaly detection 4) Row-level Validation:  Business rule enforcement  Stop "bad data" To ensure operational health, it's best to collect execution metadata. This metadata includes information about pipeline states, length, delays, retries, and the times between runs. You should monitor the completeness and availability of your data along with the volume and changes to the schema. You should collect summary statistics for columns and use anomaly detection to alert you of any changes. The column trends would include the Mean, Max, and Min. Row-level validation would require you to ensure that previous checks were valid and adhered to your business rules. This is very contextual, so you will need to exercise your discretion. Conclusion Data observability is essential for any data team to be agile and iterate quickly on their products. Without data observability it's difficult for teams to rely on their infrastructure or tools because errors can't be tracked quickly. This results in less flexibility in developing new features or improvements for customers. You're effectively wasting money if you are not investing in this critical piece of the DataOps framework in 2022.