SlideShare a Scribd company logo
1 of 8
Download to read offline
Understanding IDP: Data
Validation and Feedback
Loop
According to Gartner, "The market for document capture,
extraction, and processing is highly fragmented. Data and analytics
leaders should use this research to understand the process flow
and differentiated capabilities offered by intelligent document
processing solutions". Gartner's recently released "Infographic:
Understand Intelligent Document Processing" covers these 6 critical
flows in IDP.
1. Capture or Ingestion
2. Document Pre-processing
3. Document Classification
4. Data Extraction
5. Validation and Feedback Loop
6. Integration
This is the fourth post in the series exploring Data Validation and
Feedback Loop.
When it comes to IDP systems, one of the key evaluation
parameters is the accuracy it offers. Besides depending on just the
quality of the extraction process, there are external signals that IDP
systems tap into to improve accuracy. Data validation against an
external source is one of many such signals.
When you think of these signals, try to draw a parallel to how
modern-day GPS location systems work. You may know that GPS
systems measure the distance of the subject from three or more
satellites and apply a technique called triangulation to detect an
intersection point. It is impossible to accurately pinpoint the location
of the subject with a signal from just one satellite.
To relate to this problem, stick out your arm, raise a finger and
close one eye. You will notice that with one closed eye, you lose the
sense of distance. You cannot really tell how far your finger is.
Getting visual signals from both eyes helps you get a true reading
of your depth of field. Similarly, GPS systems use three different
signals to accurately place the subject's location. Opening an IDP
conversation with satellites is quite a stretch but the point to note
here is that more signals lead to higher accuracy. Similarly, data
validation and feedback loops are techniques used by modern IDP
systems to improve accuracy and thereby mature faster
exponentially. An efficient data validation system can lift your IDP
accuracy by 15 to 20%. Let's see how.
Data Validation
If IDP is the best option to automate data processing, what does
data validation add to it? Data validation, as the name suggests, is
the process of validating the extracted data for multiple points of
accuracy, such as is the right data being extracted and if the
extracted data itself is accurate. A typical use case for data
validation is exception handling, such as weeding out documents
that are out of scope. For example, you have a list of vendors
where only documents from these vendors should be extracted, or
a receipt is mixed among the invoices you are processing and
needs to be disregarded. If you experience these or similar cases,
then you need data validation.
Let us look at a scenario for data validation. Imagine you are
extracting information from a loan document. Borrowers have
availed loans from different banks, but you want to validate the list
of approved lenders or banks in your system and differentiate
between the approved and unapproved lenders. In this case, you
implement data validation techniques where an IDP system usually
connects with the third-party database through APIs or to a set of
data in the IDP vendor's cloud system synced daily or periodically
from the third-party database. Let me simplify this. You are
extracting a loan document where the borrower has availed a loan
from Bank of America, and Bank of America is your approved
lender. Then, with data validation, you can have an identifier for it,
maybe list the lender as a lien-holder in the extraction results.
Data validation is one of the key factors that brings in an
exponential increase in the extraction accuracies, which means
your IDP models mature in no time. Let me give you a ballpark
figure. After analyzing the extraction results of our customers for the
past few months, we have observed that Infrrd's data validation
algorithms immediately spike the accuracy levels around 10%. It
means if the IDP system was providing 80% accuracy without data
validation, it may give 90% accuracy or more with data validation.
There are different types of validation. The most common ones are:
Pattern-based validation: Here, the data is validated based on
patterns. For example, the vehicle identification number (VIN),
which is a unique identifier for a car, is a combination of digits and
capital letters and usually constitutes 17 characters. This number
has a pattern, such as the first 3 digits representing the
manufacturer, digits 4 to 8 may be alphanumeric and represent the
vehicle descriptions, and so on. In this case, pattern-based data
validation detects and corrects the extraction errors in the VIN
number, including tricky ones, such as the number 1 and the capital
letter I getting interchanged.
Dictionary-based validation: This is done against a set of data in
the system. For example, you can verify the extracted invoice
approver name matches the name of the approver in the IDP
system. In this case, the dictionary-based validation detects and
corrects the currency code.
Context-based validation: This is done where the same value is
relevant in two contexts. For example, you are extracting an
insurance document that has the same value in two contexts, say
collision deductible and comprehensive deductible always have the
value 500. In such cases, the ML models may misinterpret the
context as the values are the same and may learn incorrectly, which
eventually may have a dip in the accuracy. So, to detect these kinds
of different contexts with similar or the same value, context-based
validation is the way forward.
So, how do you implement data validation in IDP solutions? One of
the key strategies is configuring business rules.
Business Rules
Modern IDP solutions mostly validate extracted data using business
rules. Let us say you have an expense management system to
process invoices. You are extracting relevant information from
these invoices using an OCR system. In the initial stages, the
extraction accuracy is not expected to be high. However, you have
an agreement with your IDP vendor that an expected level of
accuracy can be achieved in a specific timeframe. Now, how do you
frequently measure the improvements in accuracy? You can do this
by configuring business rules.
Business rules can be configured in an IDP solution in two ways,
either through customization from the backend or through the user
interface. In modern IDP solutions, business rules are a high-value
offering in the user interface, where you can configure them based
on your requirements.
Automated Accuracy Improvement
Any corrections performed by your data entry or correction user
acts as an input to the system so that the accuracy is improved in
future extractions. Modern ML-based IDP systems automatically
learn from corrections so that the accuracy of future extractions is
improved. The feedback loop brings the best results when
corrections are integrated with extraction.
When you extract data, human-in-the-loop (HITL) plays the role of
correcting the data that are extracted with low confidence. IDP
solutions assign a confidence score while extracting data at a
granular level, usually at the field level. So, each field that is
automatically extracted has a confidence score assigned to it. You
can decide the fields that need correction based on the confidence
score.
Let us take an example. You are extracting the invoice number,
merchant name, merchant address, and total amount from an
invoice. In this case, you set a high confidence score for critical
fields, such as the invoice number. If the invoice number is not
extracted with high confidence, it will be served to a human to
correct it.
Some companies outsource corrections to manage costs. However,
the chances are that they incur higher costs in the long run. Let us
say you have an OCR system to extract data but corrections are
outsourced to a BPO team because it is cheaper or more
convenient than employing data entry or correction users. However,
what you miss here is a long-term matured IDP system that can
drastically reduce the corrections efforts for the future.
Infrrd's IDP solution has an integrated dashboard to perform
corrections where the feedback loop is automated. There are
patent-pending capabilities Infrrd offers to ensure efficient and
intelligent analysis of data before triggering a feedback loop.
After Infrrd's IDP automatically extracts the data, two things can
happen based on the maturity of the models: either a document
goes through Straight Through Processing, or it is served for
correction. If some fields are extracted with low confidence, the
corresponding documents are sent to queues for correction by a
data entry user.
The queues are configured based on the confidence score
assigned by the system during extraction.
The corrections performed by the data entry user act as feedback
for the system to learn, and this ensures improved accuracy in
future extractions.
There you go. Ensure that you choose a futuristic IDP solution to
stay competitive. It means choosing an IDP solution that offers
excellent extraction and classification features and has excellent
data validation and feedback loop capabilities to manage variations
and inaccuracies efficiently.
Here is a table that depicts the industry-relevant data validation and
feedback loop features and Infrrd's capabilities:
Feature Infrrd's IDP
Pattern-based validation
✔
Dictionary-based validation
✔
Context-based validation
✔
Business Rules Through Configuration
✔
Self Service Business Rules
On The Roadmap
Automated Accuracy Improvements
✔
In our next post, we explore Gartner's description of Integration and
how Infrrd stacks up.

More Related Content

Similar to Understanding IDP: Data Validation and Feedback Loop

Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...Decision CAMP
 
Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...Certus Solutions
 
Week11 Determine Technical Requirements
Week11 Determine Technical RequirementsWeek11 Determine Technical Requirements
Week11 Determine Technical Requirementshapy
 
Finger Gesture Based Rating System
Finger Gesture Based Rating SystemFinger Gesture Based Rating System
Finger Gesture Based Rating SystemIRJET Journal
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationKate Subramanian
 
3+ Keys to Proactive Underwriting (1).pdf
3+ Keys to Proactive Underwriting (1).pdf3+ Keys to Proactive Underwriting (1).pdf
3+ Keys to Proactive Underwriting (1).pdfCogitate.us
 
Unified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge GraphUnified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge GraphVaticle
 
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsAnametrix
 
Accenture Insurance Data Capture
Accenture Insurance Data Capture Accenture Insurance Data Capture
Accenture Insurance Data Capture Accenture Insurance
 
SMS_White Paper_ClearView Assessment-PUB-v01r00
SMS_White Paper_ClearView Assessment-PUB-v01r00SMS_White Paper_ClearView Assessment-PUB-v01r00
SMS_White Paper_ClearView Assessment-PUB-v01r00Brent Anderson
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Science Council of America
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersMohitMhapuskar
 
Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data FeedEfficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data FeedIRJET Journal
 

Similar to Understanding IDP: Data Validation and Feedback Loop (20)

Data Mining Lec1.pptx
Data Mining Lec1.pptxData Mining Lec1.pptx
Data Mining Lec1.pptx
 
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
 
Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...
 
Week11 Determine Technical Requirements
Week11 Determine Technical RequirementsWeek11 Determine Technical Requirements
Week11 Determine Technical Requirements
 
5-Unit (CAB).pdf
5-Unit (CAB).pdf5-Unit (CAB).pdf
5-Unit (CAB).pdf
 
Finger Gesture Based Rating System
Finger Gesture Based Rating SystemFinger Gesture Based Rating System
Finger Gesture Based Rating System
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
 
Gc3310851089
Gc3310851089Gc3310851089
Gc3310851089
 
Gc3310851089
Gc3310851089Gc3310851089
Gc3310851089
 
3+ Keys to Proactive Underwriting (1).pdf
3+ Keys to Proactive Underwriting (1).pdf3+ Keys to Proactive Underwriting (1).pdf
3+ Keys to Proactive Underwriting (1).pdf
 
IOT & Procuement
IOT & ProcuementIOT & Procuement
IOT & Procuement
 
Unified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge GraphUnified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge Graph
 
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital Analytics
 
Accenture Insurance Data Capture
Accenture Insurance Data Capture Accenture Insurance Data Capture
Accenture Insurance Data Capture
 
SMS_White Paper_ClearView Assessment-PUB-v01r00
SMS_White Paper_ClearView Assessment-PUB-v01r00SMS_White Paper_ClearView Assessment-PUB-v01r00
SMS_White Paper_ClearView Assessment-PUB-v01r00
 
Improving Data Extraction Performance
Improving Data Extraction PerformanceImproving Data Extraction Performance
Improving Data Extraction Performance
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
 
Achieving Business Success with Data.pdf
Achieving Business Success with Data.pdfAchieving Business Success with Data.pdf
Achieving Business Success with Data.pdf
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
 
Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data FeedEfficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
 

More from Infrrd

Intelligent Document Processing
Intelligent Document ProcessingIntelligent Document Processing
Intelligent Document ProcessingInfrrd
 
IDP: A Booster Shot for your RPA, Chatbot and Low Code Implementations
IDP: A Booster Shot for your RPA, Chatbot and Low Code ImplementationsIDP: A Booster Shot for your RPA, Chatbot and Low Code Implementations
IDP: A Booster Shot for your RPA, Chatbot and Low Code ImplementationsInfrrd
 
Using Alerts To Gain Efficiency For Document Processing.pdf
Using Alerts To Gain Efficiency For Document Processing.pdfUsing Alerts To Gain Efficiency For Document Processing.pdf
Using Alerts To Gain Efficiency For Document Processing.pdfInfrrd
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Infrrd
 
Launching Infrrd IDP's Latest Features
Launching Infrrd IDP's Latest FeaturesLaunching Infrrd IDP's Latest Features
Launching Infrrd IDP's Latest FeaturesInfrrd
 
Transformer-Based OCR.pdf
Transformer-Based OCR.pdfTransformer-Based OCR.pdf
Transformer-Based OCR.pdfInfrrd
 
Invoice processing
Invoice processingInvoice processing
Invoice processingInfrrd
 
Where have all the data entry candidates gone?
Where have all the data entry candidates gone?Where have all the data entry candidates gone?
Where have all the data entry candidates gone?Infrrd
 
IDP with Intelligent Table Extraction
IDP with Intelligent Table ExtractionIDP with Intelligent Table Extraction
IDP with Intelligent Table ExtractionInfrrd
 
Document Types Explained: Structured, Semi-Structured and Unstructured
Document Types Explained: Structured, Semi-Structured and UnstructuredDocument Types Explained: Structured, Semi-Structured and Unstructured
Document Types Explained: Structured, Semi-Structured and UnstructuredInfrrd
 
Understanding IDP: Document Classification
Understanding IDP: Document ClassificationUnderstanding IDP: Document Classification
Understanding IDP: Document ClassificationInfrrd
 
Who are the top intelligent document processing (idp) vendors
Who are the top intelligent document processing (idp) vendors Who are the top intelligent document processing (idp) vendors
Who are the top intelligent document processing (idp) vendors Infrrd
 
Infrrd's AI-enabled Audit Automation
Infrrd's AI-enabled Audit AutomationInfrrd's AI-enabled Audit Automation
Infrrd's AI-enabled Audit AutomationInfrrd
 
How To Start Your Journey To Become An AI Enabled Enterprise?
How To Start Your Journey To Become An AI Enabled Enterprise?How To Start Your Journey To Become An AI Enabled Enterprise?
How To Start Your Journey To Become An AI Enabled Enterprise?Infrrd
 
Intelligent Data Capture Process
Intelligent Data Capture Process Intelligent Data Capture Process
Intelligent Data Capture Process Infrrd
 

More from Infrrd (15)

Intelligent Document Processing
Intelligent Document ProcessingIntelligent Document Processing
Intelligent Document Processing
 
IDP: A Booster Shot for your RPA, Chatbot and Low Code Implementations
IDP: A Booster Shot for your RPA, Chatbot and Low Code ImplementationsIDP: A Booster Shot for your RPA, Chatbot and Low Code Implementations
IDP: A Booster Shot for your RPA, Chatbot and Low Code Implementations
 
Using Alerts To Gain Efficiency For Document Processing.pdf
Using Alerts To Gain Efficiency For Document Processing.pdfUsing Alerts To Gain Efficiency For Document Processing.pdf
Using Alerts To Gain Efficiency For Document Processing.pdf
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 
Launching Infrrd IDP's Latest Features
Launching Infrrd IDP's Latest FeaturesLaunching Infrrd IDP's Latest Features
Launching Infrrd IDP's Latest Features
 
Transformer-Based OCR.pdf
Transformer-Based OCR.pdfTransformer-Based OCR.pdf
Transformer-Based OCR.pdf
 
Invoice processing
Invoice processingInvoice processing
Invoice processing
 
Where have all the data entry candidates gone?
Where have all the data entry candidates gone?Where have all the data entry candidates gone?
Where have all the data entry candidates gone?
 
IDP with Intelligent Table Extraction
IDP with Intelligent Table ExtractionIDP with Intelligent Table Extraction
IDP with Intelligent Table Extraction
 
Document Types Explained: Structured, Semi-Structured and Unstructured
Document Types Explained: Structured, Semi-Structured and UnstructuredDocument Types Explained: Structured, Semi-Structured and Unstructured
Document Types Explained: Structured, Semi-Structured and Unstructured
 
Understanding IDP: Document Classification
Understanding IDP: Document ClassificationUnderstanding IDP: Document Classification
Understanding IDP: Document Classification
 
Who are the top intelligent document processing (idp) vendors
Who are the top intelligent document processing (idp) vendors Who are the top intelligent document processing (idp) vendors
Who are the top intelligent document processing (idp) vendors
 
Infrrd's AI-enabled Audit Automation
Infrrd's AI-enabled Audit AutomationInfrrd's AI-enabled Audit Automation
Infrrd's AI-enabled Audit Automation
 
How To Start Your Journey To Become An AI Enabled Enterprise?
How To Start Your Journey To Become An AI Enabled Enterprise?How To Start Your Journey To Become An AI Enabled Enterprise?
How To Start Your Journey To Become An AI Enabled Enterprise?
 
Intelligent Data Capture Process
Intelligent Data Capture Process Intelligent Data Capture Process
Intelligent Data Capture Process
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Understanding IDP: Data Validation and Feedback Loop

  • 1. Understanding IDP: Data Validation and Feedback Loop According to Gartner, "The market for document capture, extraction, and processing is highly fragmented. Data and analytics leaders should use this research to understand the process flow and differentiated capabilities offered by intelligent document processing solutions". Gartner's recently released "Infographic: Understand Intelligent Document Processing" covers these 6 critical flows in IDP. 1. Capture or Ingestion 2. Document Pre-processing 3. Document Classification 4. Data Extraction 5. Validation and Feedback Loop 6. Integration
  • 2. This is the fourth post in the series exploring Data Validation and Feedback Loop. When it comes to IDP systems, one of the key evaluation parameters is the accuracy it offers. Besides depending on just the quality of the extraction process, there are external signals that IDP systems tap into to improve accuracy. Data validation against an external source is one of many such signals. When you think of these signals, try to draw a parallel to how modern-day GPS location systems work. You may know that GPS systems measure the distance of the subject from three or more satellites and apply a technique called triangulation to detect an intersection point. It is impossible to accurately pinpoint the location of the subject with a signal from just one satellite. To relate to this problem, stick out your arm, raise a finger and close one eye. You will notice that with one closed eye, you lose the sense of distance. You cannot really tell how far your finger is. Getting visual signals from both eyes helps you get a true reading of your depth of field. Similarly, GPS systems use three different signals to accurately place the subject's location. Opening an IDP conversation with satellites is quite a stretch but the point to note here is that more signals lead to higher accuracy. Similarly, data validation and feedback loops are techniques used by modern IDP systems to improve accuracy and thereby mature faster exponentially. An efficient data validation system can lift your IDP accuracy by 15 to 20%. Let's see how. Data Validation If IDP is the best option to automate data processing, what does data validation add to it? Data validation, as the name suggests, is the process of validating the extracted data for multiple points of accuracy, such as is the right data being extracted and if the
  • 3. extracted data itself is accurate. A typical use case for data validation is exception handling, such as weeding out documents that are out of scope. For example, you have a list of vendors where only documents from these vendors should be extracted, or a receipt is mixed among the invoices you are processing and needs to be disregarded. If you experience these or similar cases, then you need data validation. Let us look at a scenario for data validation. Imagine you are extracting information from a loan document. Borrowers have availed loans from different banks, but you want to validate the list of approved lenders or banks in your system and differentiate between the approved and unapproved lenders. In this case, you implement data validation techniques where an IDP system usually connects with the third-party database through APIs or to a set of data in the IDP vendor's cloud system synced daily or periodically from the third-party database. Let me simplify this. You are extracting a loan document where the borrower has availed a loan from Bank of America, and Bank of America is your approved lender. Then, with data validation, you can have an identifier for it, maybe list the lender as a lien-holder in the extraction results. Data validation is one of the key factors that brings in an exponential increase in the extraction accuracies, which means your IDP models mature in no time. Let me give you a ballpark figure. After analyzing the extraction results of our customers for the past few months, we have observed that Infrrd's data validation algorithms immediately spike the accuracy levels around 10%. It means if the IDP system was providing 80% accuracy without data validation, it may give 90% accuracy or more with data validation. There are different types of validation. The most common ones are: Pattern-based validation: Here, the data is validated based on patterns. For example, the vehicle identification number (VIN), which is a unique identifier for a car, is a combination of digits and
  • 4. capital letters and usually constitutes 17 characters. This number has a pattern, such as the first 3 digits representing the manufacturer, digits 4 to 8 may be alphanumeric and represent the vehicle descriptions, and so on. In this case, pattern-based data validation detects and corrects the extraction errors in the VIN number, including tricky ones, such as the number 1 and the capital letter I getting interchanged. Dictionary-based validation: This is done against a set of data in the system. For example, you can verify the extracted invoice approver name matches the name of the approver in the IDP system. In this case, the dictionary-based validation detects and corrects the currency code. Context-based validation: This is done where the same value is relevant in two contexts. For example, you are extracting an insurance document that has the same value in two contexts, say collision deductible and comprehensive deductible always have the value 500. In such cases, the ML models may misinterpret the context as the values are the same and may learn incorrectly, which eventually may have a dip in the accuracy. So, to detect these kinds of different contexts with similar or the same value, context-based validation is the way forward. So, how do you implement data validation in IDP solutions? One of the key strategies is configuring business rules. Business Rules Modern IDP solutions mostly validate extracted data using business rules. Let us say you have an expense management system to process invoices. You are extracting relevant information from these invoices using an OCR system. In the initial stages, the extraction accuracy is not expected to be high. However, you have an agreement with your IDP vendor that an expected level of
  • 5. accuracy can be achieved in a specific timeframe. Now, how do you frequently measure the improvements in accuracy? You can do this by configuring business rules. Business rules can be configured in an IDP solution in two ways, either through customization from the backend or through the user interface. In modern IDP solutions, business rules are a high-value offering in the user interface, where you can configure them based on your requirements. Automated Accuracy Improvement Any corrections performed by your data entry or correction user acts as an input to the system so that the accuracy is improved in future extractions. Modern ML-based IDP systems automatically learn from corrections so that the accuracy of future extractions is improved. The feedback loop brings the best results when corrections are integrated with extraction. When you extract data, human-in-the-loop (HITL) plays the role of correcting the data that are extracted with low confidence. IDP solutions assign a confidence score while extracting data at a granular level, usually at the field level. So, each field that is automatically extracted has a confidence score assigned to it. You can decide the fields that need correction based on the confidence score. Let us take an example. You are extracting the invoice number, merchant name, merchant address, and total amount from an invoice. In this case, you set a high confidence score for critical fields, such as the invoice number. If the invoice number is not extracted with high confidence, it will be served to a human to correct it. Some companies outsource corrections to manage costs. However,
  • 6. the chances are that they incur higher costs in the long run. Let us say you have an OCR system to extract data but corrections are outsourced to a BPO team because it is cheaper or more convenient than employing data entry or correction users. However, what you miss here is a long-term matured IDP system that can drastically reduce the corrections efforts for the future. Infrrd's IDP solution has an integrated dashboard to perform corrections where the feedback loop is automated. There are patent-pending capabilities Infrrd offers to ensure efficient and intelligent analysis of data before triggering a feedback loop. After Infrrd's IDP automatically extracts the data, two things can happen based on the maturity of the models: either a document goes through Straight Through Processing, or it is served for correction. If some fields are extracted with low confidence, the corresponding documents are sent to queues for correction by a data entry user.
  • 7. The queues are configured based on the confidence score assigned by the system during extraction. The corrections performed by the data entry user act as feedback for the system to learn, and this ensures improved accuracy in future extractions. There you go. Ensure that you choose a futuristic IDP solution to stay competitive. It means choosing an IDP solution that offers excellent extraction and classification features and has excellent data validation and feedback loop capabilities to manage variations and inaccuracies efficiently.
  • 8. Here is a table that depicts the industry-relevant data validation and feedback loop features and Infrrd's capabilities: Feature Infrrd's IDP Pattern-based validation ✔ Dictionary-based validation ✔ Context-based validation ✔ Business Rules Through Configuration ✔ Self Service Business Rules On The Roadmap Automated Accuracy Improvements ✔ In our next post, we explore Gartner's description of Integration and how Infrrd stacks up.