Understanding IDP: Document Classification

Understanding IDP: Document
Classification
According to Gartner, "The market for document capture, extraction
and processing is highly fragmented. Data and analytics leaders
should use this research to understand the process flow and
differentiated capabilities offered by intelligent document processing
solutions." Gartner's recently released “Infographic: Understand
Intelligent Document Processing" covers these 6 critical flows in
IDP.
1. Capture or Ingestion
2. Document Preprocessing
3. Document Classification
4. Data Extraction
5. Validation and Feedback Loop
6. Integration
In this second post in our "Understanding IDP" series we
explore Document Classification. (Check out our first
post featuring Capture or Ingestion and Document Preprocessing.)

IDP is inevitably becoming essential for businesses to automate
and scale exponentially and competitively. The key to IDP is how
efficiently and accurately your legacy, semi-structured,
unstructured, or multi-variation documents are extracted. Before
extracting the data, a key but complex activity is document
classification, which means indexing, detecting, and classifying
different document types.
Why classify?
In today’s digital world, businesses are transforming rapidly with
technology to stay competitive. This means that a large volume of
data and documents are processed and classified, with
unstructured document data amplifying the challenge.
Before touching upon Infrrd’s deep learning-based and industry-
leading classification features, let us look at the business use cases
or challenges in document classification and how an IDP solution
can be a game-changer in this space.
Let’s consider a prospective customer applying for a loan with your
mortgage company. Here, a lot of information is exchanged
between the borrower and company, such as W-2 forms, bank
statements, and ID cards. There would be several ways to collect
this information—the borrower may be required to send an email or
to upload these documents to a Web portal. Now, as your mortgage
company receives different types of documents—most not in a
routed or defined way—the first step is to interpret the different
types of documents received. If you are in the mortgage industry,
you understand well the complexity for a loan officer to accurately
and efficiently organize these documents, notwithstanding the
possible inaccuracies or errors in this process. This is where an IDP
solution offers excellent ROI with intelligent automation to
automatically classify document types with an exponential increase
in time and accuracy.

Another challenge for the mortgage industry is the loan closing
package. When the loan is approved, the company sends a loan
closing package, a set of documents, such as the completed loan
application, home title, and other mortgage documents that
borrowers sign to finalize the loan processing. The volume of
documents to process in loan closing packages can run up to
hundreds and even thousands of pages. So, you can imagine the
complexities and time spent by loan servicers involved in this
process.
Similar to the mortgage industry, any sector where a large volume
of documents is processed is a perfect domain for IDP solutions.
Intelligent Classification
As challenges are complex, let us see what Infrrd’s IDP systems
offer. Infrrd’s classification features are based on a combination of
AI technologies, such as deep learning and NLP, and proprietary
machine learning algorithms. We call it Intelligent Classification.
Using Infrrd’s IDP, you can create your own classification models
and map each document type to specific extraction models.
In today’s IDP space, classification does not just detect or identify
the type of content in a document and categorize it but does more
to achieve intelligent classification. What does that mean? Let’s say
a borrower who applied for a loan submits W-2 forms for the
previous two years. What you need is the W-2 forms not for any two
years but for the immediately preceding years. This is where
Intelligent classification plays a major role. It goes deeper and
enables you to classify the documents based on the dates, or any
other data, in the document.

Classification types
Our classification models support multi-language processing and
address diverse business scenarios, including document
classification and page classification.
1. Document Classification
Infrrd has a built-in, out-of-the-box, computer vision-based
Document Classification model to classify various types of
documents. Consider that you have 100 documents, 60 of which
are invoices and 40 are receipts. All you have to do is zip those
documents and upload them to our Document Classification model.
The Infrrd system will recognize the various document types and
categorize them for you sooner than you think.

2. Page Classification
Page Classification is an Infrrd proposition to address a unique
challenge for a large number of businesses. In reality, there are
several instances where different documents are in a single file. In
these cases, each page may have to be split based on the
document type. This challenge requires a paradigm shift in
classifying the document types. For example, you have a 100-page
unstructured document, where legacy invoices and receipts are
scattered throughout making it a daunting task to make sense of it.
However, you just have to upload the document to our Intelligent
Page Classification model, and the rest is taken care of for you.

Infrrd’s Patent-Pending Page Continuation
Before we conclude, let me touch upon the Page Continuation
feature that should bring a paradigm shift in document
classification. Page Continuation, a patent-pending Infrrd feature, is
a unique capability of the Page Classification model where Infrrd’s
proprietary machine learning algorithms distinguish similar data
stacked together. For example, in your 100-page document, pages
12 to 15 are 3 monthly bank statements of a specific bank - say
Bank of America. However, you may need to verify whether the
bank statements are recent or you may want to distinguish them
based on other parameters. Our Page Continuation feature has a
proprietary logic that distinguishes bank statements for each month
even though the document type is the same.
The Page Continuation feature can eliminate manual efforts
drastically, reducing the hundreds and thousands of hours that you
may have had to invest for detailed analyses of classified
documents - making this IDP feature a high-value proposition for
your business.
Now, let’s take a look at a common pitfall while choosing an IDP
solution. We have heard from our customers that they initially
choose vendors that provide 50% to 60% classification accuracy
because it brings some level of automation. However, they quickly
realize this partial solution limits their productivity. It always makes
sense to choose an IDP solution that provides Intelligent
Classification with an accuracy of 90% or more to remain
competitive.
Use case
It is a reality that your business may have to constantly evolve to
stay competitive which means frequent changes to your document
processing workflow. Infrrd’s classification approach is beneficial

because our classification models recognize and easily integrate
with trained extraction models, i.e. trained document types. You
need to train or supervise only the new data sets. Let’s say you
want to classify two documents - invoices and loan documents. If
you have already trained an extraction model for invoices,
additional training or supervision may not be required during
classification; you just need to train the data set for only the new
document type, the loan document.
Moreover, Infrrd’s ML-first, API-driven IDP solution enables you to
group multiple classification and extraction models to create a new
model. In a nutshell, Infrrd’s classification models are tightly
integrated with existing extraction models to offer you flexibility,
accuracy, and versatility in managing rapid, constant redirections or
transitions in your business, or document-processing, workflows.
Choosing the right IDP partner keeps you competitive and
eliminates a myriad of pitfalls. During your IDP selection process,
we recommend you add Intelligent Classification to your evaluation
checkpoints.
Be sure to check out our next post, where we explore Gartner’s
description of the fourth critical flow, Data Extraction, and see how
Infrrd stacks up.

Understanding IDP: Document Classification

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Understanding IDP: Document Classification

Similar to Understanding IDP: Document Classification (20)

More from Infrrd

More from Infrrd (14)

Recently uploaded

Recently uploaded (20)

Understanding IDP: Document Classification