SlideShare a Scribd company logo
Automating Data Reconciliation, Data Observability, and Data Quality Check After
Each Data Load
Over the last several years with the rise of cloud data warehouses and lakes such as
Snowflake, Redshift, and Databricks, data load processes have become increasingly
distributed and complex. Organizations are investing more capital in ingesting data from
multiple internal and external data sources. As companies’ dependency on data increases,
every day and business users use the data for critical business decisions, ensuring high data
quality is a top requirement in any data analytics platform.
As data gets processed every day through various pipelines, data can break for hundreds of
reasons, from code changes to business process changes. With a limited team size and
multiple competing priorities, data engineers are often not able to reconcile all data(or any
data) every day. As a result, many times business users find out about the data issues
before the data engineering team knows about them. But at that point, it is too late for them
to build the trust back.
How can we pro-actively learn about data issues before users tell us? What if we
automatically reconcile data after each load every day andalert data engineers when there is
a data issue? Is there any architecture or solution that can help us?
Yes, let’s review a solution called 4DAlert that automates data reconciliation, data quality,
and data observability in detail and see how it could help identify the issues automatically
before bad data reaches downstream reports and dashboards used by multiple users.
Scenario 1 — Reconcile data between source and target
Almost all data platforms load data from multiple source systems. Due to one or other
reasons data between source and target doesn’t match. Data teams spend manual effort
every day to reconcile numerous data sources.
4DAlert solution connects to diverse data sources and automatically reconciles data
between source and target. The solution leverages its own AI engine to determine
the reconciliation issues and alerts appropriate stakeholders through multiple channels
which include email, texts, and Slack channels.
Scenario 2-Data reconciliation within the analytics platform
Sometimes connecting to source systems is not possible due to several reasons such as
source systems are owned by different groups and they don’t allow or source systems are
too rigid for any external connection. In that scenario, 4DAlert’s AI engine reconciles
incoming new data with historical trends to determine data
anomalies and reconciliation issues.
Scenario 3 - Data Compare across the systems
In most organizations, there are multiple systems that consume the same data. Therefore, it
is a continuous challenge to keep data in-sync across systems. 4DAlert’s flexible
architecture allows it to connect diverse source systems and check key data points across
the systems.
Scenario 4- Checking numbers across layers in an analytics platform
Many times, the same data is stored in different layers and different objects. As multiple
pipelines and loads run on a daily basis, it becomes difficult to check if the numbers are the
same across the systems. the 4DAlert solution checks the numbers across layers and alerts
when data doesn’t match.
The solution that connects to diverse data sources.
4DAlert is a WEB API based AI solution that connects to most databases such as
Snowflake, Redshift, Synapse, HANA, SQL server Oracle Postgres and many more) and
reconciles data between source and target at a periodic schedule.
The solution is designed to connect source and target databases even though both source
and target databases are built on different database technology. For example, say source
could be SAP HANA system and target could be Snowflake or Redshift system or source
could be data lake in Azure or AWS S3 and target could be Snowflake or Redshift database,
4DAlert would be able to reconcile data without any issue.
Write your own SQL to detect the anomaly and check data quality
Users can write their custom SQL queries to pinpoint any particular anomalies and overwrite
their tolerance limit. For example, Sales varying by 10% is acceptable but varying by 60% is
not acceptable. When users don’t define their tolerance, 4DAlert uses statistical variances
and anomaly detection methods to detect outliers and alert as appropriate.
Data Observability
In a data platform, there could be hundreds or thousands of tables. Every day multiple
pipelines run and load objects. Few of the objects are loaded daily(sometimes multiple times
a day) and weekly, monthly, or yearly, and others are loaded on-demand on an ad-hoc
basis. It is very hard to keep track of how fresh the data is. Many times users continuously
ask about the last load date.
4DAlert checks vital statistics of each object on a regular basis and labels each object on its
freshness. This information could be broadcasted to users so that users are aware of the
freshness of each dataset.
Auto Quality Score
In an analytics platform, objects need to be loaded on a regular basis (sometimes with
predefined SLA). Anytime data is loaded users expect the data to be loaded without any
quality issue or load issue. However, many times there are objects that have frequent issues
in load timing or data quality. A data observability platform such as 4DAlert tracks the failure
points and provides a detailed performance scorecard for each object. Scores for each
object are published as a dashboard to data engineers, enterprise data team and data
scientists, and sometimes end-users for greater transparency.
Multiple keys and multiple metrics for any data set
Many times, a dataset contains more than one key metric. For example; Dataset could have
revenue and sold qt, discount, cost of goods sold and any of these metrics could go wrong.
So a solution should be able to scan more than one metric simultaneously to look for
abnormalities.
Key quality metrics(Ex Row count, Null count, distinct count, max value, Min value )
4DAlert comes with many predefined metrics that are applied automatically to detect
anomalies in the data. For example, the Material Number in inventory data should not be a
null or distinct list of countries in the data set, it can’t be millions or the maximum amount of
PO should not be more than 10,000. These rules are predefined and come out of the box
and data sets are checked for these rules.
Enumerated value check
Many times the data team wants to restrict certain field values to predefined value sets.
Example currencies should be a value from a predefined currency list. Same for plants,
country, region, etc… 4DAlert could check
Seasonality, Month-end/Quarter end or year-end spike
Many times, data spikes at month-end or quarter-end or year-end or at any particular period
of the year. An AI-enabled solution such as 4DAlert takes into account the seasonality in the
data as it tries to identify the anomalies.
Custom metrics
If predefined metrics or custom metrics are not all you need then you should be able to add
your own metrics. 4DAlert allows you to write your SQL query, check the values and detect
anomalies.
This post was written by Nihar Rout, Managing Partner, and Lead Architect@ 4DAlert.
Want to try schema compare features that will help you continuously deploy changes with
Zero error? Request a demo with one of our experts at https://4dalert.com/
Resource: https://medium.com/@nihar.rout_analytics/automatic-data-
reconciliation-data-quality-and-data-observability-3eeca4650cd

More Related Content

Similar to Automatic Data Reconciliation, Data Quality, and Data Observability.pdf

Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
FredReynolds2
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
AkhilSinghal21
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
 
Optimising Data Lakes for Financial Services
Optimising Data Lakes for Financial ServicesOptimising Data Lakes for Financial Services
Optimising Data Lakes for Financial Services
Andrew Carr
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7
Rohit Mittal
 
Become Data Driven With Hadoop as-a-Service
Become Data Driven With Hadoop as-a-ServiceBecome Data Driven With Hadoop as-a-Service
Become Data Driven With Hadoop as-a-Service
Mammoth Data
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales Goals
IrshadKhan682442
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales Goals
WilliamJohnson288536
 
Using Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales GoalsUsing Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales Goals
KevinJohnson667312
 
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEM
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEMWHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEM
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEM
Rajaraj64
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
ssuseracaaae2
 
A guide to preparing your data for tableau
A guide to preparing your data for tableauA guide to preparing your data for tableau
A guide to preparing your data for tableau
Phillip Reinhart
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
work
 
Case studies for Application of Acceldata - TrueDigital and PhonePe.docx
Case studies for Application of Acceldata - TrueDigital and PhonePe.docxCase studies for Application of Acceldata - TrueDigital and PhonePe.docx
Case studies for Application of Acceldata - TrueDigital and PhonePe.docx
AfzalAkthar2
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
Sheetal Pratik
 
Data warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spendData warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spend
noviari sugianto
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Precisely
 
Stream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White PaperStream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White Paper
Impetus Technologies
 
End User Informatics
End User InformaticsEnd User Informatics
End User Informatics
Ambareesh Kulkarni
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
MetroStar
 

Similar to Automatic Data Reconciliation, Data Quality, and Data Observability.pdf (20)

Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Optimising Data Lakes for Financial Services
Optimising Data Lakes for Financial ServicesOptimising Data Lakes for Financial Services
Optimising Data Lakes for Financial Services
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7
 
Become Data Driven With Hadoop as-a-Service
Become Data Driven With Hadoop as-a-ServiceBecome Data Driven With Hadoop as-a-Service
Become Data Driven With Hadoop as-a-Service
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales Goals
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales Goals
 
Using Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales GoalsUsing Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales Goals
 
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEM
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEMWHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEM
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEM
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
A guide to preparing your data for tableau
A guide to preparing your data for tableauA guide to preparing your data for tableau
A guide to preparing your data for tableau
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Case studies for Application of Acceldata - TrueDigital and PhonePe.docx
Case studies for Application of Acceldata - TrueDigital and PhonePe.docxCase studies for Application of Acceldata - TrueDigital and PhonePe.docx
Case studies for Application of Acceldata - TrueDigital and PhonePe.docx
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Data warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spendData warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spend
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
 
Stream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White PaperStream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White Paper
 
End User Informatics
End User InformaticsEnd User Informatics
End User Informatics
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 

Recently uploaded

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
inaya7568
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 

Recently uploaded (20)

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 

Automatic Data Reconciliation, Data Quality, and Data Observability.pdf

  • 1. Automating Data Reconciliation, Data Observability, and Data Quality Check After Each Data Load Over the last several years with the rise of cloud data warehouses and lakes such as Snowflake, Redshift, and Databricks, data load processes have become increasingly distributed and complex. Organizations are investing more capital in ingesting data from multiple internal and external data sources. As companies’ dependency on data increases, every day and business users use the data for critical business decisions, ensuring high data quality is a top requirement in any data analytics platform. As data gets processed every day through various pipelines, data can break for hundreds of reasons, from code changes to business process changes. With a limited team size and multiple competing priorities, data engineers are often not able to reconcile all data(or any
  • 2. data) every day. As a result, many times business users find out about the data issues before the data engineering team knows about them. But at that point, it is too late for them to build the trust back. How can we pro-actively learn about data issues before users tell us? What if we automatically reconcile data after each load every day andalert data engineers when there is a data issue? Is there any architecture or solution that can help us? Yes, let’s review a solution called 4DAlert that automates data reconciliation, data quality, and data observability in detail and see how it could help identify the issues automatically before bad data reaches downstream reports and dashboards used by multiple users.
  • 3. Scenario 1 — Reconcile data between source and target Almost all data platforms load data from multiple source systems. Due to one or other reasons data between source and target doesn’t match. Data teams spend manual effort every day to reconcile numerous data sources. 4DAlert solution connects to diverse data sources and automatically reconciles data between source and target. The solution leverages its own AI engine to determine the reconciliation issues and alerts appropriate stakeholders through multiple channels which include email, texts, and Slack channels.
  • 4. Scenario 2-Data reconciliation within the analytics platform Sometimes connecting to source systems is not possible due to several reasons such as source systems are owned by different groups and they don’t allow or source systems are too rigid for any external connection. In that scenario, 4DAlert’s AI engine reconciles incoming new data with historical trends to determine data anomalies and reconciliation issues. Scenario 3 - Data Compare across the systems
  • 5. In most organizations, there are multiple systems that consume the same data. Therefore, it is a continuous challenge to keep data in-sync across systems. 4DAlert’s flexible architecture allows it to connect diverse source systems and check key data points across the systems. Scenario 4- Checking numbers across layers in an analytics platform Many times, the same data is stored in different layers and different objects. As multiple pipelines and loads run on a daily basis, it becomes difficult to check if the numbers are the same across the systems. the 4DAlert solution checks the numbers across layers and alerts when data doesn’t match.
  • 6. The solution that connects to diverse data sources. 4DAlert is a WEB API based AI solution that connects to most databases such as Snowflake, Redshift, Synapse, HANA, SQL server Oracle Postgres and many more) and reconciles data between source and target at a periodic schedule. The solution is designed to connect source and target databases even though both source and target databases are built on different database technology. For example, say source could be SAP HANA system and target could be Snowflake or Redshift system or source could be data lake in Azure or AWS S3 and target could be Snowflake or Redshift database, 4DAlert would be able to reconcile data without any issue.
  • 7. Write your own SQL to detect the anomaly and check data quality Users can write their custom SQL queries to pinpoint any particular anomalies and overwrite their tolerance limit. For example, Sales varying by 10% is acceptable but varying by 60% is not acceptable. When users don’t define their tolerance, 4DAlert uses statistical variances and anomaly detection methods to detect outliers and alert as appropriate.
  • 8. Data Observability In a data platform, there could be hundreds or thousands of tables. Every day multiple pipelines run and load objects. Few of the objects are loaded daily(sometimes multiple times a day) and weekly, monthly, or yearly, and others are loaded on-demand on an ad-hoc basis. It is very hard to keep track of how fresh the data is. Many times users continuously ask about the last load date. 4DAlert checks vital statistics of each object on a regular basis and labels each object on its freshness. This information could be broadcasted to users so that users are aware of the freshness of each dataset.
  • 9. Auto Quality Score In an analytics platform, objects need to be loaded on a regular basis (sometimes with predefined SLA). Anytime data is loaded users expect the data to be loaded without any quality issue or load issue. However, many times there are objects that have frequent issues in load timing or data quality. A data observability platform such as 4DAlert tracks the failure points and provides a detailed performance scorecard for each object. Scores for each object are published as a dashboard to data engineers, enterprise data team and data scientists, and sometimes end-users for greater transparency. Multiple keys and multiple metrics for any data set
  • 10. Many times, a dataset contains more than one key metric. For example; Dataset could have revenue and sold qt, discount, cost of goods sold and any of these metrics could go wrong. So a solution should be able to scan more than one metric simultaneously to look for abnormalities. Key quality metrics(Ex Row count, Null count, distinct count, max value, Min value )
  • 11. 4DAlert comes with many predefined metrics that are applied automatically to detect anomalies in the data. For example, the Material Number in inventory data should not be a null or distinct list of countries in the data set, it can’t be millions or the maximum amount of PO should not be more than 10,000. These rules are predefined and come out of the box and data sets are checked for these rules. Enumerated value check Many times the data team wants to restrict certain field values to predefined value sets. Example currencies should be a value from a predefined currency list. Same for plants, country, region, etc… 4DAlert could check Seasonality, Month-end/Quarter end or year-end spike Many times, data spikes at month-end or quarter-end or year-end or at any particular period of the year. An AI-enabled solution such as 4DAlert takes into account the seasonality in the data as it tries to identify the anomalies. Custom metrics
  • 12. If predefined metrics or custom metrics are not all you need then you should be able to add your own metrics. 4DAlert allows you to write your SQL query, check the values and detect anomalies. This post was written by Nihar Rout, Managing Partner, and Lead Architect@ 4DAlert. Want to try schema compare features that will help you continuously deploy changes with Zero error? Request a demo with one of our experts at https://4dalert.com/ Resource: https://medium.com/@nihar.rout_analytics/automatic-data- reconciliation-data-quality-and-data-observability-3eeca4650cd