SlideShare a Scribd company logo
Understanding IDP: Data
Validation and Feedback
Loop
According to Gartner, "The market for document capture,
extraction, and processing is highly fragmented. Data and analytics
leaders should use this research to understand the process flow
and differentiated capabilities offered by intelligent document
processing solutions". Gartner's recently released "Infographic:
Understand Intelligent Document Processing" covers these 6 critical
flows in IDP.
1. Capture or Ingestion
2. Document Pre-processing
3. Document Classification
4. Data Extraction
5. Validation and Feedback Loop
6. Integration
This is the fourth post in the series exploring Data Validation and
Feedback Loop.
When it comes to IDP systems, one of the key evaluation
parameters is the accuracy it offers. Besides depending on just the
quality of the extraction process, there are external signals that IDP
systems tap into to improve accuracy. Data validation against an
external source is one of many such signals.
When you think of these signals, try to draw a parallel to how
modern-day GPS location systems work. You may know that GPS
systems measure the distance of the subject from three or more
satellites and apply a technique called triangulation to detect an
intersection point. It is impossible to accurately pinpoint the location
of the subject with a signal from just one satellite.
To relate to this problem, stick out your arm, raise a finger and
close one eye. You will notice that with one closed eye, you lose the
sense of distance. You cannot really tell how far your finger is.
Getting visual signals from both eyes helps you get a true reading
of your depth of field. Similarly, GPS systems use three different
signals to accurately place the subject's location. Opening an IDP
conversation with satellites is quite a stretch but the point to note
here is that more signals lead to higher accuracy. Similarly, data
validation and feedback loops are techniques used by modern IDP
systems to improve accuracy and thereby mature faster
exponentially. An efficient data validation system can lift your IDP
accuracy by 15 to 20%. Let's see how.
Data Validation
If IDP is the best option to automate data processing, what does
data validation add to it? Data validation, as the name suggests, is
the process of validating the extracted data for multiple points of
accuracy, such as is the right data being extracted and if the
extracted data itself is accurate. A typical use case for data
validation is exception handling, such as weeding out documents
that are out of scope. For example, you have a list of vendors
where only documents from these vendors should be extracted, or
a receipt is mixed among the invoices you are processing and
needs to be disregarded. If you experience these or similar cases,
then you need data validation.
Let us look at a scenario for data validation. Imagine you are
extracting information from a loan document. Borrowers have
availed loans from different banks, but you want to validate the list
of approved lenders or banks in your system and differentiate
between the approved and unapproved lenders. In this case, you
implement data validation techniques where an IDP system usually
connects with the third-party database through APIs or to a set of
data in the IDP vendor's cloud system synced daily or periodically
from the third-party database. Let me simplify this. You are
extracting a loan document where the borrower has availed a loan
from Bank of America, and Bank of America is your approved
lender. Then, with data validation, you can have an identifier for it,
maybe list the lender as a lien-holder in the extraction results.
Data validation is one of the key factors that brings in an
exponential increase in the extraction accuracies, which means
your IDP models mature in no time. Let me give you a ballpark
figure. After analyzing the extraction results of our customers for the
past few months, we have observed that Infrrd's data validation
algorithms immediately spike the accuracy levels around 10%. It
means if the IDP system was providing 80% accuracy without data
validation, it may give 90% accuracy or more with data validation.
There are different types of validation. The most common ones are:
Pattern-based validation: Here, the data is validated based on
patterns. For example, the vehicle identification number (VIN),
which is a unique identifier for a car, is a combination of digits and
capital letters and usually constitutes 17 characters. This number
has a pattern, such as the first 3 digits representing the
manufacturer, digits 4 to 8 may be alphanumeric and represent the
vehicle descriptions, and so on. In this case, pattern-based data
validation detects and corrects the extraction errors in the VIN
number, including tricky ones, such as the number 1 and the capital
letter I getting interchanged.
Dictionary-based validation: This is done against a set of data in
the system. For example, you can verify the extracted invoice
approver name matches the name of the approver in the IDP
system. In this case, the dictionary-based validation detects and
corrects the currency code.
Context-based validation: This is done where the same value is
relevant in two contexts. For example, you are extracting an
insurance document that has the same value in two contexts, say
collision deductible and comprehensive deductible always have the
value 500. In such cases, the ML models may misinterpret the
context as the values are the same and may learn incorrectly, which
eventually may have a dip in the accuracy. So, to detect these kinds
of different contexts with similar or the same value, context-based
validation is the way forward.
So, how do you implement data validation in IDP solutions? One of
the key strategies is configuring business rules.
Business Rules
Modern IDP solutions mostly validate extracted data using business
rules. Let us say you have an expense management system to
process invoices. You are extracting relevant information from
these invoices using an OCR system. In the initial stages, the
extraction accuracy is not expected to be high. However, you have
an agreement with your IDP vendor that an expected level of
accuracy can be achieved in a specific timeframe. Now, how do you
frequently measure the improvements in accuracy? You can do this
by configuring business rules.
Business rules can be configured in an IDP solution in two ways,
either through customization from the backend or through the user
interface. In modern IDP solutions, business rules are a high-value
offering in the user interface, where you can configure them based
on your requirements.
Automated Accuracy Improvement
Any corrections performed by your data entry or correction user
acts as an input to the system so that the accuracy is improved in
future extractions. Modern ML-based IDP systems automatically
learn from corrections so that the accuracy of future extractions is
improved. The feedback loop brings the best results when
corrections are integrated with extraction.
When you extract data, human-in-the-loop (HITL) plays the role of
correcting the data that are extracted with low confidence. IDP
solutions assign a confidence score while extracting data at a
granular level, usually at the field level. So, each field that is
automatically extracted has a confidence score assigned to it. You
can decide the fields that need correction based on the confidence
score.
Let us take an example. You are extracting the invoice number,
merchant name, merchant address, and total amount from an
invoice. In this case, you set a high confidence score for critical
fields, such as the invoice number. If the invoice number is not
extracted with high confidence, it will be served to a human to
correct it.
Some companies outsource corrections to manage costs. However,
the chances are that they incur higher costs in the long run. Let us
say you have an OCR system to extract data but corrections are
outsourced to a BPO team because it is cheaper or more
convenient than employing data entry or correction users. However,
what you miss here is a long-term matured IDP system that can
drastically reduce the corrections efforts for the future.
Infrrd's IDP solution has an integrated dashboard to perform
corrections where the feedback loop is automated. There are
patent-pending capabilities Infrrd offers to ensure efficient and
intelligent analysis of data before triggering a feedback loop.
After Infrrd's IDP automatically extracts the data, two things can
happen based on the maturity of the models: either a document
goes through Straight Through Processing, or it is served for
correction. If some fields are extracted with low confidence, the
corresponding documents are sent to queues for correction by a
data entry user.
The queues are configured based on the confidence score
assigned by the system during extraction.
The corrections performed by the data entry user act as feedback
for the system to learn, and this ensures improved accuracy in
future extractions.
There you go. Ensure that you choose a futuristic IDP solution to
stay competitive. It means choosing an IDP solution that offers
excellent extraction and classification features and has excellent
data validation and feedback loop capabilities to manage variations
and inaccuracies efficiently.
Here is a table that depicts the industry-relevant data validation and
feedback loop features and Infrrd's capabilities:
Feature Infrrd's IDP
Pattern-based validation
✔
Dictionary-based validation
✔
Context-based validation
✔
Business Rules Through Configuration
✔
Self Service Business Rules
On The Roadmap
Automated Accuracy Improvements
✔
In our next post, we explore Gartner's description of Integration and
how Infrrd stacks up.

More Related Content

Similar to Understanding IDP: Data Validation and Feedback Loop

Data Mining Lec1.pptx
Data Mining Lec1.pptxData Mining Lec1.pptx
Data Mining Lec1.pptx
NimishaKapoor9
 
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...Decision CAMP
 
Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...
Certus Solutions
 
Week11 Determine Technical Requirements
Week11 Determine Technical RequirementsWeek11 Determine Technical Requirements
Week11 Determine Technical Requirementshapy
 
5-Unit (CAB).pdf
5-Unit (CAB).pdf5-Unit (CAB).pdf
5-Unit (CAB).pdf
Chandrapriya Rediex
 
Finger Gesture Based Rating System
Finger Gesture Based Rating SystemFinger Gesture Based Rating System
Finger Gesture Based Rating System
IRJET Journal
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
Kate Subramanian
 
Gc3310851089
Gc3310851089Gc3310851089
Gc3310851089
IJERA Editor
 
Gc3310851089
Gc3310851089Gc3310851089
Gc3310851089
IJERA Editor
 
3+ Keys to Proactive Underwriting (1).pdf
3+ Keys to Proactive Underwriting (1).pdf3+ Keys to Proactive Underwriting (1).pdf
3+ Keys to Proactive Underwriting (1).pdf
Cogitate.us
 
Unified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge GraphUnified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge Graph
Vaticle
 
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital Analytics
Anametrix
 
Accenture Insurance Data Capture
Accenture Insurance Data Capture Accenture Insurance Data Capture
Accenture Insurance Data Capture
Accenture Insurance
 
SMS_White Paper_ClearView Assessment-PUB-v01r00
SMS_White Paper_ClearView Assessment-PUB-v01r00SMS_White Paper_ClearView Assessment-PUB-v01r00
SMS_White Paper_ClearView Assessment-PUB-v01r00Brent Anderson
 
Improving Data Extraction Performance
Improving Data Extraction PerformanceImproving Data Extraction Performance
Improving Data Extraction Performance
Data Scraping and Data Extraction
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Science Council of America
 
Achieving Business Success with Data.pdf
Achieving Business Success with Data.pdfAchieving Business Success with Data.pdf
Achieving Business Success with Data.pdf
Data Science Council of America
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
MohitMhapuskar
 
Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data FeedEfficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
IRJET Journal
 

Similar to Understanding IDP: Data Validation and Feedback Loop (20)

Data Mining Lec1.pptx
Data Mining Lec1.pptxData Mining Lec1.pptx
Data Mining Lec1.pptx
 
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
Decision CAMP 2013 - sako hidetoshi - blaze consulting japan - Using Business...
 
Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...Certus Accelerate - Building the business case for why you need to invest in ...
Certus Accelerate - Building the business case for why you need to invest in ...
 
Week11 Determine Technical Requirements
Week11 Determine Technical RequirementsWeek11 Determine Technical Requirements
Week11 Determine Technical Requirements
 
5-Unit (CAB).pdf
5-Unit (CAB).pdf5-Unit (CAB).pdf
5-Unit (CAB).pdf
 
Finger Gesture Based Rating System
Finger Gesture Based Rating SystemFinger Gesture Based Rating System
Finger Gesture Based Rating System
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
 
Gc3310851089
Gc3310851089Gc3310851089
Gc3310851089
 
Gc3310851089
Gc3310851089Gc3310851089
Gc3310851089
 
3+ Keys to Proactive Underwriting (1).pdf
3+ Keys to Proactive Underwriting (1).pdf3+ Keys to Proactive Underwriting (1).pdf
3+ Keys to Proactive Underwriting (1).pdf
 
IOT & Procuement
IOT & ProcuementIOT & Procuement
IOT & Procuement
 
Unified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge GraphUnified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge Graph
 
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital Analytics
 
Accenture Insurance Data Capture
Accenture Insurance Data Capture Accenture Insurance Data Capture
Accenture Insurance Data Capture
 
SMS_White Paper_ClearView Assessment-PUB-v01r00
SMS_White Paper_ClearView Assessment-PUB-v01r00SMS_White Paper_ClearView Assessment-PUB-v01r00
SMS_White Paper_ClearView Assessment-PUB-v01r00
 
Improving Data Extraction Performance
Improving Data Extraction PerformanceImproving Data Extraction Performance
Improving Data Extraction Performance
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
 
Achieving Business Success with Data.pdf
Achieving Business Success with Data.pdfAchieving Business Success with Data.pdf
Achieving Business Success with Data.pdf
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
 
Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data FeedEfficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
 

More from Infrrd

Intelligent Document Processing
Intelligent Document ProcessingIntelligent Document Processing
Intelligent Document Processing
Infrrd
 
IDP: A Booster Shot for your RPA, Chatbot and Low Code Implementations
IDP: A Booster Shot for your RPA, Chatbot and Low Code ImplementationsIDP: A Booster Shot for your RPA, Chatbot and Low Code Implementations
IDP: A Booster Shot for your RPA, Chatbot and Low Code Implementations
Infrrd
 
Using Alerts To Gain Efficiency For Document Processing.pdf
Using Alerts To Gain Efficiency For Document Processing.pdfUsing Alerts To Gain Efficiency For Document Processing.pdf
Using Alerts To Gain Efficiency For Document Processing.pdf
Infrrd
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
Infrrd
 
Launching Infrrd IDP's Latest Features
Launching Infrrd IDP's Latest FeaturesLaunching Infrrd IDP's Latest Features
Launching Infrrd IDP's Latest Features
Infrrd
 
Transformer-Based OCR.pdf
Transformer-Based OCR.pdfTransformer-Based OCR.pdf
Transformer-Based OCR.pdf
Infrrd
 
Invoice processing
Invoice processingInvoice processing
Invoice processing
Infrrd
 
Where have all the data entry candidates gone?
Where have all the data entry candidates gone?Where have all the data entry candidates gone?
Where have all the data entry candidates gone?
Infrrd
 
IDP with Intelligent Table Extraction
IDP with Intelligent Table ExtractionIDP with Intelligent Table Extraction
IDP with Intelligent Table Extraction
Infrrd
 
Document Types Explained: Structured, Semi-Structured and Unstructured
Document Types Explained: Structured, Semi-Structured and UnstructuredDocument Types Explained: Structured, Semi-Structured and Unstructured
Document Types Explained: Structured, Semi-Structured and Unstructured
Infrrd
 
Understanding IDP: Document Classification
Understanding IDP: Document ClassificationUnderstanding IDP: Document Classification
Understanding IDP: Document Classification
Infrrd
 
Who are the top intelligent document processing (idp) vendors
Who are the top intelligent document processing (idp) vendors Who are the top intelligent document processing (idp) vendors
Who are the top intelligent document processing (idp) vendors
Infrrd
 
Infrrd's AI-enabled Audit Automation
Infrrd's AI-enabled Audit AutomationInfrrd's AI-enabled Audit Automation
Infrrd's AI-enabled Audit Automation
Infrrd
 
How To Start Your Journey To Become An AI Enabled Enterprise?
How To Start Your Journey To Become An AI Enabled Enterprise?How To Start Your Journey To Become An AI Enabled Enterprise?
How To Start Your Journey To Become An AI Enabled Enterprise?
Infrrd
 
Intelligent Data Capture Process
Intelligent Data Capture Process Intelligent Data Capture Process
Intelligent Data Capture Process
Infrrd
 

More from Infrrd (15)

Intelligent Document Processing
Intelligent Document ProcessingIntelligent Document Processing
Intelligent Document Processing
 
IDP: A Booster Shot for your RPA, Chatbot and Low Code Implementations
IDP: A Booster Shot for your RPA, Chatbot and Low Code ImplementationsIDP: A Booster Shot for your RPA, Chatbot and Low Code Implementations
IDP: A Booster Shot for your RPA, Chatbot and Low Code Implementations
 
Using Alerts To Gain Efficiency For Document Processing.pdf
Using Alerts To Gain Efficiency For Document Processing.pdfUsing Alerts To Gain Efficiency For Document Processing.pdf
Using Alerts To Gain Efficiency For Document Processing.pdf
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 
Launching Infrrd IDP's Latest Features
Launching Infrrd IDP's Latest FeaturesLaunching Infrrd IDP's Latest Features
Launching Infrrd IDP's Latest Features
 
Transformer-Based OCR.pdf
Transformer-Based OCR.pdfTransformer-Based OCR.pdf
Transformer-Based OCR.pdf
 
Invoice processing
Invoice processingInvoice processing
Invoice processing
 
Where have all the data entry candidates gone?
Where have all the data entry candidates gone?Where have all the data entry candidates gone?
Where have all the data entry candidates gone?
 
IDP with Intelligent Table Extraction
IDP with Intelligent Table ExtractionIDP with Intelligent Table Extraction
IDP with Intelligent Table Extraction
 
Document Types Explained: Structured, Semi-Structured and Unstructured
Document Types Explained: Structured, Semi-Structured and UnstructuredDocument Types Explained: Structured, Semi-Structured and Unstructured
Document Types Explained: Structured, Semi-Structured and Unstructured
 
Understanding IDP: Document Classification
Understanding IDP: Document ClassificationUnderstanding IDP: Document Classification
Understanding IDP: Document Classification
 
Who are the top intelligent document processing (idp) vendors
Who are the top intelligent document processing (idp) vendors Who are the top intelligent document processing (idp) vendors
Who are the top intelligent document processing (idp) vendors
 
Infrrd's AI-enabled Audit Automation
Infrrd's AI-enabled Audit AutomationInfrrd's AI-enabled Audit Automation
Infrrd's AI-enabled Audit Automation
 
How To Start Your Journey To Become An AI Enabled Enterprise?
How To Start Your Journey To Become An AI Enabled Enterprise?How To Start Your Journey To Become An AI Enabled Enterprise?
How To Start Your Journey To Become An AI Enabled Enterprise?
 
Intelligent Data Capture Process
Intelligent Data Capture Process Intelligent Data Capture Process
Intelligent Data Capture Process
 

Recently uploaded

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 

Recently uploaded (20)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 

Understanding IDP: Data Validation and Feedback Loop

  • 1. Understanding IDP: Data Validation and Feedback Loop According to Gartner, "The market for document capture, extraction, and processing is highly fragmented. Data and analytics leaders should use this research to understand the process flow and differentiated capabilities offered by intelligent document processing solutions". Gartner's recently released "Infographic: Understand Intelligent Document Processing" covers these 6 critical flows in IDP. 1. Capture or Ingestion 2. Document Pre-processing 3. Document Classification 4. Data Extraction 5. Validation and Feedback Loop 6. Integration
  • 2. This is the fourth post in the series exploring Data Validation and Feedback Loop. When it comes to IDP systems, one of the key evaluation parameters is the accuracy it offers. Besides depending on just the quality of the extraction process, there are external signals that IDP systems tap into to improve accuracy. Data validation against an external source is one of many such signals. When you think of these signals, try to draw a parallel to how modern-day GPS location systems work. You may know that GPS systems measure the distance of the subject from three or more satellites and apply a technique called triangulation to detect an intersection point. It is impossible to accurately pinpoint the location of the subject with a signal from just one satellite. To relate to this problem, stick out your arm, raise a finger and close one eye. You will notice that with one closed eye, you lose the sense of distance. You cannot really tell how far your finger is. Getting visual signals from both eyes helps you get a true reading of your depth of field. Similarly, GPS systems use three different signals to accurately place the subject's location. Opening an IDP conversation with satellites is quite a stretch but the point to note here is that more signals lead to higher accuracy. Similarly, data validation and feedback loops are techniques used by modern IDP systems to improve accuracy and thereby mature faster exponentially. An efficient data validation system can lift your IDP accuracy by 15 to 20%. Let's see how. Data Validation If IDP is the best option to automate data processing, what does data validation add to it? Data validation, as the name suggests, is the process of validating the extracted data for multiple points of accuracy, such as is the right data being extracted and if the
  • 3. extracted data itself is accurate. A typical use case for data validation is exception handling, such as weeding out documents that are out of scope. For example, you have a list of vendors where only documents from these vendors should be extracted, or a receipt is mixed among the invoices you are processing and needs to be disregarded. If you experience these or similar cases, then you need data validation. Let us look at a scenario for data validation. Imagine you are extracting information from a loan document. Borrowers have availed loans from different banks, but you want to validate the list of approved lenders or banks in your system and differentiate between the approved and unapproved lenders. In this case, you implement data validation techniques where an IDP system usually connects with the third-party database through APIs or to a set of data in the IDP vendor's cloud system synced daily or periodically from the third-party database. Let me simplify this. You are extracting a loan document where the borrower has availed a loan from Bank of America, and Bank of America is your approved lender. Then, with data validation, you can have an identifier for it, maybe list the lender as a lien-holder in the extraction results. Data validation is one of the key factors that brings in an exponential increase in the extraction accuracies, which means your IDP models mature in no time. Let me give you a ballpark figure. After analyzing the extraction results of our customers for the past few months, we have observed that Infrrd's data validation algorithms immediately spike the accuracy levels around 10%. It means if the IDP system was providing 80% accuracy without data validation, it may give 90% accuracy or more with data validation. There are different types of validation. The most common ones are: Pattern-based validation: Here, the data is validated based on patterns. For example, the vehicle identification number (VIN), which is a unique identifier for a car, is a combination of digits and
  • 4. capital letters and usually constitutes 17 characters. This number has a pattern, such as the first 3 digits representing the manufacturer, digits 4 to 8 may be alphanumeric and represent the vehicle descriptions, and so on. In this case, pattern-based data validation detects and corrects the extraction errors in the VIN number, including tricky ones, such as the number 1 and the capital letter I getting interchanged. Dictionary-based validation: This is done against a set of data in the system. For example, you can verify the extracted invoice approver name matches the name of the approver in the IDP system. In this case, the dictionary-based validation detects and corrects the currency code. Context-based validation: This is done where the same value is relevant in two contexts. For example, you are extracting an insurance document that has the same value in two contexts, say collision deductible and comprehensive deductible always have the value 500. In such cases, the ML models may misinterpret the context as the values are the same and may learn incorrectly, which eventually may have a dip in the accuracy. So, to detect these kinds of different contexts with similar or the same value, context-based validation is the way forward. So, how do you implement data validation in IDP solutions? One of the key strategies is configuring business rules. Business Rules Modern IDP solutions mostly validate extracted data using business rules. Let us say you have an expense management system to process invoices. You are extracting relevant information from these invoices using an OCR system. In the initial stages, the extraction accuracy is not expected to be high. However, you have an agreement with your IDP vendor that an expected level of
  • 5. accuracy can be achieved in a specific timeframe. Now, how do you frequently measure the improvements in accuracy? You can do this by configuring business rules. Business rules can be configured in an IDP solution in two ways, either through customization from the backend or through the user interface. In modern IDP solutions, business rules are a high-value offering in the user interface, where you can configure them based on your requirements. Automated Accuracy Improvement Any corrections performed by your data entry or correction user acts as an input to the system so that the accuracy is improved in future extractions. Modern ML-based IDP systems automatically learn from corrections so that the accuracy of future extractions is improved. The feedback loop brings the best results when corrections are integrated with extraction. When you extract data, human-in-the-loop (HITL) plays the role of correcting the data that are extracted with low confidence. IDP solutions assign a confidence score while extracting data at a granular level, usually at the field level. So, each field that is automatically extracted has a confidence score assigned to it. You can decide the fields that need correction based on the confidence score. Let us take an example. You are extracting the invoice number, merchant name, merchant address, and total amount from an invoice. In this case, you set a high confidence score for critical fields, such as the invoice number. If the invoice number is not extracted with high confidence, it will be served to a human to correct it. Some companies outsource corrections to manage costs. However,
  • 6. the chances are that they incur higher costs in the long run. Let us say you have an OCR system to extract data but corrections are outsourced to a BPO team because it is cheaper or more convenient than employing data entry or correction users. However, what you miss here is a long-term matured IDP system that can drastically reduce the corrections efforts for the future. Infrrd's IDP solution has an integrated dashboard to perform corrections where the feedback loop is automated. There are patent-pending capabilities Infrrd offers to ensure efficient and intelligent analysis of data before triggering a feedback loop. After Infrrd's IDP automatically extracts the data, two things can happen based on the maturity of the models: either a document goes through Straight Through Processing, or it is served for correction. If some fields are extracted with low confidence, the corresponding documents are sent to queues for correction by a data entry user.
  • 7. The queues are configured based on the confidence score assigned by the system during extraction. The corrections performed by the data entry user act as feedback for the system to learn, and this ensures improved accuracy in future extractions. There you go. Ensure that you choose a futuristic IDP solution to stay competitive. It means choosing an IDP solution that offers excellent extraction and classification features and has excellent data validation and feedback loop capabilities to manage variations and inaccuracies efficiently.
  • 8. Here is a table that depicts the industry-relevant data validation and feedback loop features and Infrrd's capabilities: Feature Infrrd's IDP Pattern-based validation ✔ Dictionary-based validation ✔ Context-based validation ✔ Business Rules Through Configuration ✔ Self Service Business Rules On The Roadmap Automated Accuracy Improvements ✔ In our next post, we explore Gartner's description of Integration and how Infrrd stacks up.