SlideShare a Scribd company logo
1 of 7
Download to read offline
Whitepaper
Authors : Seth Rao, Angsuman Dutta, Himansu Sekhar Tripathy, Deep Sharma
AI-Led Cognitive Data Quality
3
3,4
4
4,5
5
Contents
5
6,7
1. Background
2. Quality Assurances – Validity & Reasonableness
02 / 09
3. Traditional Approach Can be Expensive and Error Prone
5. Who Should Go for Alternative Approach
6. Conclusion
7. About the Author
4. Alternative Approach based on AI/ML
AI-Led Cognitive Data Quality
03 / 09
1. Background
2. Quality Assurances – Validity & Reasonableness
Data Quality Management (DQM) impacts a number of key business drivers, ranging from regulatory
compliances, to customer satisfaction, to building new business models. Quality is one of the key functions
under Data Governance, as unverified/unqualified data has little value to the organization. One of the leading
global research and advisory firm estimates that an average Fortune 500 enterprise loses about $9.7mn
annually over data quality issues. Although the true intangible cost of poor data is much higher, the sad truth
is that data quality has not been paid the attention it deserves.
One of the reasons for this discrepancy is the way data quality issues are identified in the current systems and
tools. A techno-functional team reviews data assets of an organization, and writes a set of rules to identify
anomalies that are flagged for the review of data stewards. As these rules are static in nature, they become
obsolete in 12- 24 months and a new assessment is required. Another significant reason is that many of the
issues are contextual and are not easily codified. Consider the example of a bank that approved a corporate
loan for a frequent client of theirs, at terms the client had never borrowed before, and a product that client
had historically shunned. That loan should not have been approved without verifying the client’s intent. The
loan data file had data quality errors, the duration of the loan was captured as 3 months and not 3 years.
These subtle contextual errors cannot be caught with the traditional validation checks, like checking for
completeness, uniqueness, consistency, accuracy, etc. All the checks presently done are independent of
historical business context.
In such a dynamic business environment, the need is to augment the modernization of data management
with AI-based data quality, thus achieving data semantics for delivering trusted business-critical data at
organizations’ fingertips.
At its simplest, data quality can be broken into two
categories: completeness and accuracy.
Completeness refers to ensuring that all expected
data is received. Accuracy evaluates the validity of
the data. Completeness and accuracy can be
subjective, and should be guided by the line of
business and the type of business. For example,
car insurance premium increase of greater than
25% in one cycle may not be accurate.
Quality assurance edits used to check for both
completeness and accuracy, can be broken into
two broad categories: validity and reasonableness.
Validity edits identify definite errors, and often
result in submissions being rejected. They are
frequently used to validate formats, ensure
completeness, and highlight obvious errors.
Format edits are used to reject data, which does
not conform to the specified format, such as text in
a date or numerical field or an email without an
“@” symbol.
04 / 09
3. Traditional Approach Can
be Expensive and Error Prone
4. Alternative Approach
based on AI/ML
Reasonableness edits look for information that is
highly unlikely or is an extreme outlier, but these
are extremely complex to do correctly. Reason-
ableness edits don’t generally cause a data
submission to be rejected, but may require an
explanation. Reasonableness can be based on the
statistical probability of the value, business rules, or
acceptable tolerances. These edits can be lenient
or strict, depending on purpose. Stricter edits gen-
erally result in more edit failures, which typically
lead to higher operational costs.
Curating quality data requires time and money, for
both, setup and operations. The time spent devel-
oping clear guidance and edit checks can save
time and money by avoiding excessive data
clean-up, or in a worst case scenario, unusable
data. Data governance, data standards, and quality
assurance edits all help minimize the data quality
problems. Using industry-defined terms and
formats can reduce errors because they minimize
the need to define and transform data.
Cost of Incorrect Data
Gartner reports that 40% of data initiatives fail due
to poor quality of data and affects overall labour
productivity by ~20%*. That is a huge loss on which
it’s hard to even put a cost figure on. Forbes and
PwC have reported that poor DQ was a critical
factor that led to regulatory non-compliance. Poor
quality of Big Data is costing companies not only in
fines, manual rework to fix errors, inaccurate data
for insights, failed initiatives and longer turnaround
times, but also in lost opportunity. Operationally
most organizations fail to unlock the value of their
marketing campaigns due to Data Quality issues.
Our research estimates that an average of 25-30%
of time in any big-data project is spent on identify-
ing and fixing data quality issues. In extreme
scenarios where data quality issues are significant,
projects get abandoned. That is very expensive
loss of capability!
Manually setting rules for 100’s of Tables with
1000’s of columns is unrealistic. We’ve frequently
seen companies with 10’s of thousands of tables
and 100’s of columns in each. There are no SME’s
who know every column of every table to be able
to capture every rule needed to validate a data set.
The data is just too vast and diverse. The gamut of
data quality rules specific to the dataset must be
autonomously learned using cognitive algorithms.
These rules will be dynamic and evolve as the data
evolves, to reflect the new reality. The AI/ML pow-
ered data quality system will behave like an
individual who is not constrained by the initial set
of rules they have learnt, but continue to learn and
evolve as their surroundings change.
Interacting with our customers we saw that look-
ing for errors in vast amounts of data was like look-
ing for a needle in a haystack. It’s a very complex
problem for large data sets, flowing at high
speeds, from many different sources, via many
different platforms. It’s a nightmare for the SME’s,
coders and for the people who want to make deci-
sions based on that data. Consider the example of
a bank which was onboarding 400 new applica-
tions in one year to their new IT platform. With an
average of four data sources per app, and a mere
100 checks per source, their team was tasked with
creating 160,000 checks.
AI-Led Cognitive Data Quality
05 / 09
6. Conclusion
5. Who Should Go for
Alternative Approach
Data Quality issues are hidden in all organizations, yet prevalent. Although a plethora of Data Quality tools is
available, the Data Quality identification process in many enterprises is generally static, obsolete,
time-consuming, and low on controls. Most of the processes have a lot of manual & static touchpoints, are
low on auditability, and are time-consuming. Robust Data quality processes have to identify newer errors
even before they occur. Using cognitive algorithms in identification of poor data will reduce effort & cost, and
will improve quality scores dramatically. Even after engaging many programmers to solve the data quality
problems, they never seem to go away. The only scalable path to good, reliable data is to leverage the power
of AI to validate data autonomously.
AI-Led Cognitive Data Quality
Any rule-based system implementation will not
scale in the new reality of big and/or complex data.
Only machine learning systems can scale to the
levels required by complex and/or large enterpris-
es.
Verticals: Organizations where data is used to
make critical decisions, will need to have a high
degree of certainty on the trustworthiness of their
data. Every organization in every vertical we’ve
worked with, has significant portions of poor qual-
ity data. The only difference we’ve seen is in the
organizational maturity to realize how vulnerable
they truly are. Those organizations who’ve realized
they are vulnerable are highly regulated industries
like Banking, Financial Services, Healthcare, and
others. And most other industries are slower to fix
their poor quality data situation.
Data Characteristics: When organizations deal
with data that have any of these characteristics,
they are highly likely to have more data errors:
- Big data
- Complex, inter-connected data
- Data aggregated from many sources or many IT
systems/platforms (HDFS, Cloud, RDBMS, noSQL,
mainframe, etc.)
- Constantly evolving data
- Non-monolithic, heterogeneous data, where
rules have to be created for small micro-segments
of data to validate their trustworthiness
- Operational and transactional data of reasonable
volume
Unless your business is extremely simple, every
organization will earn a few check marks in the
above list.
06 / 09
AI-Led Cognitive Data Quality
About the Authors
Seth Rao
Ph.D., is the CEO of FirstEigen
Seth Rao, Ph.D., is the CEO of FirstEigen, a Greater Chicago-based Cognitive Data
Validation company. Their flagship product, DataBuck, is recognized by Gartner and
IDC as the most innovative data validation software. By leveraging AI/ML, it is >10x
effective in catching unexpected data errors. It increases the reliability of data by
self-discovering 1,000s of data quality relationships and patterns autonomously,
updates the rules as the data evolves, and monitors the new data continuously.
(http://www.firsteigen.com/databuck/).
Seth holds a Ph.D. in Engineering from Illinois Institute of Technology (IIT), Chicago, and
has an MBA from Northwestern University’s Kellogg School of Management, USA.
Angsuman Dutta
Entrepreneur, Investor and Corporate strategist
A. Dutta is an entrepreneur, investor and corporate strategist, with experience in
building software businesses that scale and drive value. In his past roles, he has
provided information governance and data quality advisory services to several Fortune
500 companies. He is a recognized thought leader, and has published numerous
articles on information governance.
He earned a Bachelor of Technology degree in engineering from the Indian Institute of
Technology, Kharagpur, an MS in Computer Science from the Illinois Institute of
Technology, and an MBA in Analytical Finance and Strategy from the University of
Chicago, USA.
AI-Led Cognitive Data Quality
Himansu Sekhar Tripathy
Data Management consultant
Himansu Sekhar Tripathy is a Data Management consultant, with over 18 years of
experience in consulting and delivery of data solutions. His interest areas include
enterprise data strategy, cloud data engineering, big data engineering, data
integration, quality, metadata management, MDM, and data governance. As a
technology evangelist, he believes in leveraging emerging technologies in pushing
the boundaries on real-time next-gen analytics. Himanshu has a Master’s Degree in
Business Administration and a Bachelor’s Degree in Computer Science Engineering.
Deep Sharma
Associate Consultant
Deep Sharma is an Associate Consultant in Cognitive & Analytics Practice unit at LTI,
with around three years of experience in technology consulting, analytics market
research and offerings creation on emerging hybrid technology trends across the Data
& Analytics technology stack. He has a keen interest in various building blocks of Data
& Analytics like Data Integration, Data Quality, Data Governance and Data Visualization.
Deep has a Master’s Degree in Business Analytics.
info@Lntinfotech.com
LTI (NSE: LTI, BSE: 540005) is a global technology consulting and digital solutions Company helping more than 300
clients succeed in a converging world. With operations in 30 countries, we go the extra mile for our clients and
accelerate their digital transformation with LTI’s Mosaic platform enabling their mobile, social, analytics, IoT and cloud
journeys. Founded in 1997 as a subsidiary of Larsen & Toubro Limited, our unique heritage gives us unrivaled real-world
expertise to solve the most complex challenges of enterprises across all industries. Each day, our team of more than
27,000 LTItes enable our clients to improve the effectiveness of their business and technology operations, and deliver
value to their customers, employees and shareholders. Find more at www.Lntinfotech.com or follow us at
@LTI_Global

More Related Content

Similar to AI-Led-Cognitive-Data-Quality.pdf

Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineSrikanth Sharma Boddupalli
 
D&B Whitepaper The Big Payback On Data Quality
D&B Whitepaper The Big Payback On Data QualityD&B Whitepaper The Big Payback On Data Quality
D&B Whitepaper The Big Payback On Data QualityRebecca Croucher
 
Developing A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product DataDeveloping A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product DataFindWhitePapers
 
What Is Data Quality.pdf
What Is Data Quality.pdfWhat Is Data Quality.pdf
What Is Data Quality.pdfscottsamith
 
Justifying Your Data Quality Projects
Justifying Your Data Quality ProjectsJustifying Your Data Quality Projects
Justifying Your Data Quality ProjectsInnovative_Systems
 
Infographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience AnalyticsInfographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience AnalyticsSapience Analytics
 
Beyond Firefighting: A Leaders Guide to Proactive Data Quality Management
Beyond Firefighting: A Leaders Guide to Proactive Data Quality ManagementBeyond Firefighting: A Leaders Guide to Proactive Data Quality Management
Beyond Firefighting: A Leaders Guide to Proactive Data Quality ManagementHarley Capewell
 
SDM Presentation V1.0
SDM Presentation V1.0SDM Presentation V1.0
SDM Presentation V1.0KirSinc
 
Big Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperExperian
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackPrecisely
 
Information Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer SatisfactionInformation Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer SatisfactionCapgemini
 
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernancePedro Martins
 
What is Data Observability.pdf
What is Data Observability.pdfWhat is Data Observability.pdf
What is Data Observability.pdf4dalert
 
5 Pillars Of Effective Data Management In Modern Data Systems.pdf
5 Pillars Of Effective Data Management In Modern Data Systems.pdf5 Pillars Of Effective Data Management In Modern Data Systems.pdf
5 Pillars Of Effective Data Management In Modern Data Systems.pdfaNumak & Company
 
Whitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in EnterpriseWhitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in EnterpriseBRIDGEi2i Analytics Solutions
 
oracle-data-governance-wp.pdf
oracle-data-governance-wp.pdforacle-data-governance-wp.pdf
oracle-data-governance-wp.pdfaliramezani30
 
Data Governance a Business Value Driven Approach
Data Governance a Business Value Driven ApproachData Governance a Business Value Driven Approach
Data Governance a Business Value Driven ApproachTridant
 

Similar to AI-Led-Cognitive-Data-Quality.pdf (20)

Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
 
D&B Whitepaper The Big Payback On Data Quality
D&B Whitepaper The Big Payback On Data QualityD&B Whitepaper The Big Payback On Data Quality
D&B Whitepaper The Big Payback On Data Quality
 
Developing A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product DataDeveloping A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product Data
 
What Is Data Quality.pdf
What Is Data Quality.pdfWhat Is Data Quality.pdf
What Is Data Quality.pdf
 
Justifying Your Data Quality Projects
Justifying Your Data Quality ProjectsJustifying Your Data Quality Projects
Justifying Your Data Quality Projects
 
Infographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience AnalyticsInfographic | Quality of Data & Cost of Bad Data | Sapience Analytics
Infographic | Quality of Data & Cost of Bad Data | Sapience Analytics
 
Beyond Firefighting: A Leaders Guide to Proactive Data Quality Management
Beyond Firefighting: A Leaders Guide to Proactive Data Quality ManagementBeyond Firefighting: A Leaders Guide to Proactive Data Quality Management
Beyond Firefighting: A Leaders Guide to Proactive Data Quality Management
 
SDM Presentation V1.0
SDM Presentation V1.0SDM Presentation V1.0
SDM Presentation V1.0
 
Big Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White Paper
 
Tom Kunz
Tom KunzTom Kunz
Tom Kunz
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
 
Information Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer SatisfactionInformation Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer Satisfaction
 
Data Management
Data ManagementData Management
Data Management
 
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data Governance
 
What is Data Observability.pdf
What is Data Observability.pdfWhat is Data Observability.pdf
What is Data Observability.pdf
 
5 Pillars Of Effective Data Management In Modern Data Systems.pdf
5 Pillars Of Effective Data Management In Modern Data Systems.pdf5 Pillars Of Effective Data Management In Modern Data Systems.pdf
5 Pillars Of Effective Data Management In Modern Data Systems.pdf
 
Whitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in EnterpriseWhitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in Enterprise
 
Data Quality
Data QualityData Quality
Data Quality
 
oracle-data-governance-wp.pdf
oracle-data-governance-wp.pdforacle-data-governance-wp.pdf
oracle-data-governance-wp.pdf
 
Data Governance a Business Value Driven Approach
Data Governance a Business Value Driven ApproachData Governance a Business Value Driven Approach
Data Governance a Business Value Driven Approach
 

More from arifulislam946965

abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...
abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...
abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...arifulislam946965
 
ERC-RP-Weekly-Slides-September-2022-Linked.pdf
ERC-RP-Weekly-Slides-September-2022-Linked.pdfERC-RP-Weekly-Slides-September-2022-Linked.pdf
ERC-RP-Weekly-Slides-September-2022-Linked.pdfarifulislam946965
 
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdfBusiness Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdfarifulislam946965
 
FirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdfFirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdfarifulislam946965
 
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdf
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdfSnowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdf
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdfarifulislam946965
 
13-Essential-Data-Validation-Checks.pdf
13-Essential-Data-Validation-Checks.pdf13-Essential-Data-Validation-Checks.pdf
13-Essential-Data-Validation-Checks.pdfarifulislam946965
 
What are the signs of pica eating disorder
What are the signs of pica eating disorderWhat are the signs of pica eating disorder
What are the signs of pica eating disorderarifulislam946965
 

More from arifulislam946965 (16)

abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...
abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...
abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...
 
Flowers to the world.pdf
Flowers to the world.pdfFlowers to the world.pdf
Flowers to the world.pdf
 
ERC-RP-Weekly-Slides-September-2022-Linked.pdf
ERC-RP-Weekly-Slides-September-2022-Linked.pdfERC-RP-Weekly-Slides-September-2022-Linked.pdf
ERC-RP-Weekly-Slides-September-2022-Linked.pdf
 
do_ingestion.pdf
do_ingestion.pdfdo_ingestion.pdf
do_ingestion.pdf
 
do_pipelines.pdf
do_pipelines.pdfdo_pipelines.pdf
do_pipelines.pdf
 
do_dq.pdf
do_dq.pdfdo_dq.pdf
do_dq.pdf
 
dq_fail.pdf
dq_fail.pdfdq_fail.pdf
dq_fail.pdf
 
strategies.pdf
strategies.pdfstrategies.pdf
strategies.pdf
 
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdfBusiness Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
 
FirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdfFirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdf
 
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdf
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdfSnowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdf
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdf
 
13-Essential-Data-Validation-Checks.pdf
13-Essential-Data-Validation-Checks.pdf13-Essential-Data-Validation-Checks.pdf
13-Essential-Data-Validation-Checks.pdf
 
What are the signs of pica eating disorder
What are the signs of pica eating disorderWhat are the signs of pica eating disorder
What are the signs of pica eating disorder
 
바카라사이트
바카라사이트바카라사이트
바카라사이트
 
카지노사이트
카지노사이트카지노사이트
카지노사이트
 
우리카지노
우리카지노우리카지노
우리카지노
 

Recently uploaded

7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
rishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfrishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfmuskan1121w
 
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckPitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckHajeJanKamps
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024christinemoorman
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdfOrient Homes
 
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc.../:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...lizamodels9
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedKaiNexus
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...lizamodels9
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni
 
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptx
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptxBanana Powder Manufacturing Plant Project Report 2024 Edition.pptx
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptxgeorgebrinton95
 
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneVIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile
 

Recently uploaded (20)

7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
rishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdfrishikeshgirls.in- Rishikesh call girl.pdf
rishikeshgirls.in- Rishikesh call girl.pdf
 
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckPitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdf
 
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc.../:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
/:Call Girls In Jaypee Siddharth - 5 Star Hotel New Delhi ➥9990211544 Top Esc...
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.
 
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptx
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptxBanana Powder Manufacturing Plant Project Report 2024 Edition.pptx
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptx
 
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneVIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
 

AI-Led-Cognitive-Data-Quality.pdf

  • 1. Whitepaper Authors : Seth Rao, Angsuman Dutta, Himansu Sekhar Tripathy, Deep Sharma AI-Led Cognitive Data Quality
  • 2. 3 3,4 4 4,5 5 Contents 5 6,7 1. Background 2. Quality Assurances – Validity & Reasonableness 02 / 09 3. Traditional Approach Can be Expensive and Error Prone 5. Who Should Go for Alternative Approach 6. Conclusion 7. About the Author 4. Alternative Approach based on AI/ML
  • 3. AI-Led Cognitive Data Quality 03 / 09 1. Background 2. Quality Assurances – Validity & Reasonableness Data Quality Management (DQM) impacts a number of key business drivers, ranging from regulatory compliances, to customer satisfaction, to building new business models. Quality is one of the key functions under Data Governance, as unverified/unqualified data has little value to the organization. One of the leading global research and advisory firm estimates that an average Fortune 500 enterprise loses about $9.7mn annually over data quality issues. Although the true intangible cost of poor data is much higher, the sad truth is that data quality has not been paid the attention it deserves. One of the reasons for this discrepancy is the way data quality issues are identified in the current systems and tools. A techno-functional team reviews data assets of an organization, and writes a set of rules to identify anomalies that are flagged for the review of data stewards. As these rules are static in nature, they become obsolete in 12- 24 months and a new assessment is required. Another significant reason is that many of the issues are contextual and are not easily codified. Consider the example of a bank that approved a corporate loan for a frequent client of theirs, at terms the client had never borrowed before, and a product that client had historically shunned. That loan should not have been approved without verifying the client’s intent. The loan data file had data quality errors, the duration of the loan was captured as 3 months and not 3 years. These subtle contextual errors cannot be caught with the traditional validation checks, like checking for completeness, uniqueness, consistency, accuracy, etc. All the checks presently done are independent of historical business context. In such a dynamic business environment, the need is to augment the modernization of data management with AI-based data quality, thus achieving data semantics for delivering trusted business-critical data at organizations’ fingertips. At its simplest, data quality can be broken into two categories: completeness and accuracy. Completeness refers to ensuring that all expected data is received. Accuracy evaluates the validity of the data. Completeness and accuracy can be subjective, and should be guided by the line of business and the type of business. For example, car insurance premium increase of greater than 25% in one cycle may not be accurate. Quality assurance edits used to check for both completeness and accuracy, can be broken into two broad categories: validity and reasonableness. Validity edits identify definite errors, and often result in submissions being rejected. They are frequently used to validate formats, ensure completeness, and highlight obvious errors. Format edits are used to reject data, which does not conform to the specified format, such as text in a date or numerical field or an email without an “@” symbol.
  • 4. 04 / 09 3. Traditional Approach Can be Expensive and Error Prone 4. Alternative Approach based on AI/ML Reasonableness edits look for information that is highly unlikely or is an extreme outlier, but these are extremely complex to do correctly. Reason- ableness edits don’t generally cause a data submission to be rejected, but may require an explanation. Reasonableness can be based on the statistical probability of the value, business rules, or acceptable tolerances. These edits can be lenient or strict, depending on purpose. Stricter edits gen- erally result in more edit failures, which typically lead to higher operational costs. Curating quality data requires time and money, for both, setup and operations. The time spent devel- oping clear guidance and edit checks can save time and money by avoiding excessive data clean-up, or in a worst case scenario, unusable data. Data governance, data standards, and quality assurance edits all help minimize the data quality problems. Using industry-defined terms and formats can reduce errors because they minimize the need to define and transform data. Cost of Incorrect Data Gartner reports that 40% of data initiatives fail due to poor quality of data and affects overall labour productivity by ~20%*. That is a huge loss on which it’s hard to even put a cost figure on. Forbes and PwC have reported that poor DQ was a critical factor that led to regulatory non-compliance. Poor quality of Big Data is costing companies not only in fines, manual rework to fix errors, inaccurate data for insights, failed initiatives and longer turnaround times, but also in lost opportunity. Operationally most organizations fail to unlock the value of their marketing campaigns due to Data Quality issues. Our research estimates that an average of 25-30% of time in any big-data project is spent on identify- ing and fixing data quality issues. In extreme scenarios where data quality issues are significant, projects get abandoned. That is very expensive loss of capability! Manually setting rules for 100’s of Tables with 1000’s of columns is unrealistic. We’ve frequently seen companies with 10’s of thousands of tables and 100’s of columns in each. There are no SME’s who know every column of every table to be able to capture every rule needed to validate a data set. The data is just too vast and diverse. The gamut of data quality rules specific to the dataset must be autonomously learned using cognitive algorithms. These rules will be dynamic and evolve as the data evolves, to reflect the new reality. The AI/ML pow- ered data quality system will behave like an individual who is not constrained by the initial set of rules they have learnt, but continue to learn and evolve as their surroundings change. Interacting with our customers we saw that look- ing for errors in vast amounts of data was like look- ing for a needle in a haystack. It’s a very complex problem for large data sets, flowing at high speeds, from many different sources, via many different platforms. It’s a nightmare for the SME’s, coders and for the people who want to make deci- sions based on that data. Consider the example of a bank which was onboarding 400 new applica- tions in one year to their new IT platform. With an average of four data sources per app, and a mere 100 checks per source, their team was tasked with creating 160,000 checks. AI-Led Cognitive Data Quality
  • 5. 05 / 09 6. Conclusion 5. Who Should Go for Alternative Approach Data Quality issues are hidden in all organizations, yet prevalent. Although a plethora of Data Quality tools is available, the Data Quality identification process in many enterprises is generally static, obsolete, time-consuming, and low on controls. Most of the processes have a lot of manual & static touchpoints, are low on auditability, and are time-consuming. Robust Data quality processes have to identify newer errors even before they occur. Using cognitive algorithms in identification of poor data will reduce effort & cost, and will improve quality scores dramatically. Even after engaging many programmers to solve the data quality problems, they never seem to go away. The only scalable path to good, reliable data is to leverage the power of AI to validate data autonomously. AI-Led Cognitive Data Quality Any rule-based system implementation will not scale in the new reality of big and/or complex data. Only machine learning systems can scale to the levels required by complex and/or large enterpris- es. Verticals: Organizations where data is used to make critical decisions, will need to have a high degree of certainty on the trustworthiness of their data. Every organization in every vertical we’ve worked with, has significant portions of poor qual- ity data. The only difference we’ve seen is in the organizational maturity to realize how vulnerable they truly are. Those organizations who’ve realized they are vulnerable are highly regulated industries like Banking, Financial Services, Healthcare, and others. And most other industries are slower to fix their poor quality data situation. Data Characteristics: When organizations deal with data that have any of these characteristics, they are highly likely to have more data errors: - Big data - Complex, inter-connected data - Data aggregated from many sources or many IT systems/platforms (HDFS, Cloud, RDBMS, noSQL, mainframe, etc.) - Constantly evolving data - Non-monolithic, heterogeneous data, where rules have to be created for small micro-segments of data to validate their trustworthiness - Operational and transactional data of reasonable volume Unless your business is extremely simple, every organization will earn a few check marks in the above list.
  • 6. 06 / 09 AI-Led Cognitive Data Quality About the Authors Seth Rao Ph.D., is the CEO of FirstEigen Seth Rao, Ph.D., is the CEO of FirstEigen, a Greater Chicago-based Cognitive Data Validation company. Their flagship product, DataBuck, is recognized by Gartner and IDC as the most innovative data validation software. By leveraging AI/ML, it is >10x effective in catching unexpected data errors. It increases the reliability of data by self-discovering 1,000s of data quality relationships and patterns autonomously, updates the rules as the data evolves, and monitors the new data continuously. (http://www.firsteigen.com/databuck/). Seth holds a Ph.D. in Engineering from Illinois Institute of Technology (IIT), Chicago, and has an MBA from Northwestern University’s Kellogg School of Management, USA. Angsuman Dutta Entrepreneur, Investor and Corporate strategist A. Dutta is an entrepreneur, investor and corporate strategist, with experience in building software businesses that scale and drive value. In his past roles, he has provided information governance and data quality advisory services to several Fortune 500 companies. He is a recognized thought leader, and has published numerous articles on information governance. He earned a Bachelor of Technology degree in engineering from the Indian Institute of Technology, Kharagpur, an MS in Computer Science from the Illinois Institute of Technology, and an MBA in Analytical Finance and Strategy from the University of Chicago, USA.
  • 7. AI-Led Cognitive Data Quality Himansu Sekhar Tripathy Data Management consultant Himansu Sekhar Tripathy is a Data Management consultant, with over 18 years of experience in consulting and delivery of data solutions. His interest areas include enterprise data strategy, cloud data engineering, big data engineering, data integration, quality, metadata management, MDM, and data governance. As a technology evangelist, he believes in leveraging emerging technologies in pushing the boundaries on real-time next-gen analytics. Himanshu has a Master’s Degree in Business Administration and a Bachelor’s Degree in Computer Science Engineering. Deep Sharma Associate Consultant Deep Sharma is an Associate Consultant in Cognitive & Analytics Practice unit at LTI, with around three years of experience in technology consulting, analytics market research and offerings creation on emerging hybrid technology trends across the Data & Analytics technology stack. He has a keen interest in various building blocks of Data & Analytics like Data Integration, Data Quality, Data Governance and Data Visualization. Deep has a Master’s Degree in Business Analytics. info@Lntinfotech.com LTI (NSE: LTI, BSE: 540005) is a global technology consulting and digital solutions Company helping more than 300 clients succeed in a converging world. With operations in 30 countries, we go the extra mile for our clients and accelerate their digital transformation with LTI’s Mosaic platform enabling their mobile, social, analytics, IoT and cloud journeys. Founded in 1997 as a subsidiary of Larsen & Toubro Limited, our unique heritage gives us unrivaled real-world expertise to solve the most complex challenges of enterprises across all industries. Each day, our team of more than 27,000 LTItes enable our clients to improve the effectiveness of their business and technology operations, and deliver value to their customers, employees and shareholders. Find more at www.Lntinfotech.com or follow us at @LTI_Global