Inawisdom IDP

© 2023, Inawisdom Ltd.
INAWISDOM INTRODUCTION
INTELLIGENT DOCUMENT
PROCESSING
FEBRUARY 2023
Intelligent Document
Processing
© 2023 Cognizant | Private
November 2023

2
Cognizant’s UK&I specialist AWS Data & AI Team
INAWISDOM
Found in 2016, an AWS Partner since 2017 and Premier
Partner since 2019. Inawisdom was acquired by Cognizant in
2020 and is part of Cognizant’s UK&I Consulting
Inawisdom lives and breath AWS including holding over 180
AWS certifications and accreditations. Inawisdom maintain a
close relationship with the AWS team, supporting and staying
up-to-date with all the latest developments.
Inawisdom has been awarded in the following
areas:
► ML Partner of the Year 2020
► Differentiation Partner of the Year 2019
► Global Launch Partner – CCI
► Launch Partner – AWS UAE Region
Inawisdom holds 9 competencies and service designations, reflecting business-
wide expertise in key areas:
Our Qualifications
All of our consultants hold at least 1
AWS certification. Including some
consultants with all certifications
Our CTO has been ranked #1 AWS
Ambassador in EMEA in 2021 and
2022

Your Data, AI and Machine Learning partner
WHY COGNIZANT & INAWISDOM
We offer a rapid,
proven path to
Machine Learning
excellence
We help customers in
a broad range of
industries achieve
their ML goals
We are recognised,
all-in AWS experts,
focused on customer
success
We offer full-stack
services, including AI /
ML, Data & Analytics,
BI and MLOps

Inawisdom’s full-stack capability
OUR SERVICES
Business
Differentiation / Value
Data Driven
Business Decisions
Cloud Transformation
Adoption and Scale
Digital
Enablement
AI and Machine Learning
Data & Analytics
Data Foundations
Cloud Infrastructure
Landing Zone, Control Tower, migration

Discover. Deliver. Productionise. Scale.
ACCELERATE AI/ML ADOPTION ON AWS
D
e
p
l
o
y
Scaled AI Productionise
Operationalise
Discovery
D
i
f
f
e
r
e
n
t
i
a
t
e
I
n
n
o
v
a
t
e
P
r
o
v
e
V
a
l
u
e
B
u
s
i
n
e
s
s
C
a
s
e
D
e
p
l
o
y
E
m
b
e
d
A
u
t
o
m
a
t
e
E
n
a
b
l
e
Data Science
AI/ML
Data
Engineering
& Platform
DevOps &
MLOps
Cloud/Data
Architecture

Think big, start small, prove value
THE INAWISDOM APPROACH

Proprietary platform that enables secure and rapid data capture and analysis
RAMP – INAWISDOM’S CUTTING-EDGE TECHNOLOGY
Data Ingestion
Service Layer
Anomaly
Detection
Entity
Extraction
Natural
Language
Processing
Name
matching
Exploitation Layer
Prediction Classification
Enterprise
Security
Streaming Batch
Monitoring
High
Availability
Automated
Deployment
Operational
System
Management
Dashboard
Standard BI
Discovery
Dashboard
Human
Structured
e.g. SQL
Unstructured
e.g. documents
Devices
e.g. IoT
Public open
datasets
Public APIs
e.g. Twitter

IMPROVING DATA EXTRACTION WITH
INTELLIGENT DOCUMENT PROCESSING

“Intelligent Document Processing provides
decision support tools that make data extraction
faster, less error-prone and less subjective, by
reducing the amount of human intervention
required.
With the ability to automatically test results, the
entire end-to-end process can be scaled up
faster and in a more cost-effective way.”
WHY SHOULD MY BUSINESS USE IDP?

IDP – INAWISDOM APPROACH
The Human in the Loop Human reviewers may sometimes want to override the extracted
/ interpreted data for perfectly valid business reasons — for
example, the source document might be incorrect or out-of-date.
Or they may simply need to correct an extraction error.
Our process makes it easy for humans to review the extracted
data as needed.
This also helps improve the accuracy of the models over time —
a key part of MLOps.
Once the accuracy is sufficiently high, the review process will
evolve to the point of “sampling”, where only a subset of the
extracted data is checked to maintain ongoing business
confidence.
Having a robust MLOps pipeline in place allows this process to
be highly automated.

Inawisdom approach to IDP requires a range of capabilities
IDP PLATFORM

KEY BENEFITS
Reviewing multiple lengthy documents, extracting relevant data and storing it in the right systems
is a time-consuming process and prone to mistakes. With Intelligent Document Processing, this
can be done
much more efficiently.
Not only does this result in cost-savings, it also speeds up the entire value chain — making data
available for further processing or business analytics much earlier on.
Drive Efficiency & Access Insights Faster
Increase Accuracy & Improve Processes
A second pair of eyes should ideally review the human decisions made when encoding the data,
as mistakes can be expensive. IDP provides the second pair of eyes but flips the process around
so that the encoder becomes the reviewer.
Over time, the results of the IDP can be used to further increase efficiency. For example,
standardising input across documents will increase the model accuracy, reducing human effort.
The IDP process shines a light on data variability across sources and highlights outliers —insights
that can improve the data collection process.

INAWISDOM EXPERIENCE IN BANKING,
FINANCIAL SERVICES & INSURANCE
IDP Client Case Focus

Acceleration in AI/ML adoption in the Insurance and Financial Services sector
AI/ML IN INSURANCE AND FINANCIAL SERVICES
Document
Processing
Personalised
User Experience
Chat Bots &
Assistants
Next Best
Action
Call Centre
Optimisation
Fraud
Detection
Risk
Management
Credit
Scoring
Customer
Lifetime Value
Smart Claim
Management
Churn
Prediction
Price
Optimisation
Regulation &
Compliance
Sentiment
Analysis
Debt Management
& Prevention

15
CASE STUDY 1
The Customer:
IDP - From Document-led to a Data-driven Market-place
The Sector: Financial Services
The Solution:
The Result:
The Requirement:
Ø Established an automated, scalable underwriting process to improve underwriters’ day to
day operations and drive business growth
Ø Implemented E2E automated workflow solution that accepts documents via email or
uploads, embedding a series of AI and gen AI ML models to extract key data points
(pricing/policies) from broker documents held in multiple types (pdf, email, xls).
Ø Trained and Deployed a series of fine-tuned ML models including Gen AI LLMs
targeted at domain specific documents to improve extraction accuracy.
Ø Use Case feasibility proven in 4 weeks (PoV), Production grade solution in 5 months
Ø Average time to process c. 3 minutes, 540 times faster than previous manual process
Ø Scale: 200,000+ documents processed per year
Ø Improved quality for risk writing (90%+ extraction accuracy on key data points)
Ø Productivity gains leading to multi-million operational cost reduction on yearly basis
International insurance
and reinsurance group
Revolutionise the approach for underwriting risk in specialty
insurance – leveraging AI & Gen AI
The process from submission to quotation was largely manual –complexity of domain
specific nomenclature and a huge volume of unstructured data was making automated data
extraction difficult & slow

16
CASE STUDY 2
The Customer:
Gen AI to support the Intake Process and CX
The Result:
The Solution:
The Requirement:
Ø Performed EDA on data received (25 000 job requests and set ot PPT documents)
Ø Applied Generative AI models with request text and synthetic attachments, in
particular to extract accurately key instructions embedded in the PPT and
summarizing list of instructions. .
Ø Trained a ML regression model to predict effort estimate for the job request
Ø Discovery work completed - Data Readiness completed – presented to CEO.
Ø Target State solution & path to Production defined
Ø Next steps: Path to Production for this Use Case and define Gen AI roadmap
for 2024-2025 (beyond this Use Case)
The Sector:
UC1: Client receives c. 20,000 job requests per year, mostly
related to updating PPT documents. The Intake process (from receiving the request
and assessing the complexity/effort of the task and allocating to the right resource) is
manual.
Can Generative AI be leveraged to automate the InTake process and eventually
improve the Submission experience
Business
Services
BPO Service provider to
Investment Banking and Legal

17
CASE STUDY 3
The Customer:
Audio Transcription and text summarization powered by AI/ML
The Result:
The Solution:
The Requirement:
Ø Implement and deploy a secure E2E cloud-based AWS production grade solution to
support this Transcription Use Case
Ø Solution include Transcription, Diarization and Dictation.
Ø Bespoke (fine-tuned) ML models to Transcribe audio files including speaker detection
Ø Gen AI model to deliver Text Summarization
Ø 92.5 % accuracy (Word Error Rate, Diarization Error Rate)
Ø 6 mins average Processing time (audio recording from 30 min up to 4 hrs)
Ø $50 to $200 cost savings per hour of audio processed
Ø Customer Experience improved with improved SLA. Revenue growth and margin uplift
due to productivity gains
The Sector:
Automate the process to transcribe audio files at scale and
with high accuracy to reduce process time and improve SLA.
Include Text summarization features as new servie to end customers
Business
Services

18
CASE STUDY 4
The Customer:
Emails triage – Accelerating lead processing in FSI
The Result:
The Solution:
The Requirement:
Ø Created a scalable document processing solution to extract key data from
emails sent by brokers
Ø Fine-tuned Large Language Models (LLMs / Gen AI) with Transfer
Learning on AWS to extract and interpret industry-specific terminology
Ø Developed a user interface to allow the underwriting team to review and
correct the extracted data points as needed
Ø Accuracy rates of 80-90%, Average processing time is 17 seconds
Ø Auditable history of submissions including content, edits, agents, contract values,
priorities and processing times
Ø Provide full visibility to management on leads coming in helping better adjusting
workforce allocation (Supply vs Demand) and target missed opportunities
The Sector:
Can IDP / AI-ML be used to extract email content from
leads sent by broker and drive automation in the triage process
Currently process is heavily manual and does not allow management to get
comprehensive view on leads coming in and any missed opportunities.
Financial Services
Specialty insurer underwriting
personal & commercial risk

19
CASE STUDY 5
The Customer:
Policies admin automation powered by AI/ML in FSI
The Result:
The Solution:
The Requirement:
Ø Conduct remediation activities to improve existing IDP solution,
implementing best practices for monitoring, scalability and integration
Ø Develop new classification and data extraction models to handle a
variety of structured and unstructured Retail Annuities documents,
including free-form customer letters and application forms
Ø Produce synthetic data using Generative AI to support training and
testing of models, in place of sensitive customer data
Ø Provide ongoing support and management of the solution
Ø Solution scalability and reliability improved, Faster data extraction and
improved accuracy, leading to a reduction in processing costs
Ø Several new use cases being explored to expand current solution
The Sector:
Improve and expand the existing IDP (Intelligent
Document Processing) solution, to enable key use cases including
accelerated processing of insurance documents
Financial
Services
Leading provider of asset
management & life insurance

20
CASE STUDY 6
The Customer:
Automating and improving invoicing process with AI/ML
The Result:
The Solution:
The Requirement:
Ø ML solution to (1) analyse documents, automate the extraction of business rules from
OCG documents and (2) trained a classification model to detect potential errors in
line items invoices and categorize them based on the primary reason for rejection.
Ø Leveraged Generative AI (GPT-3) to generate synthetic data for improved training
and testing
Ø Built a robust QA process and audit trail to ensure consistency and transparency
Ø Document extraction Accuracy rates of 75-97% across both use cases
Ø 20% reduction in processing times
Ø Yearly labour cost-savings of approximately $1.5m. Significant Revenue growth expected
The Sector:
The client needed a way to quickly analyse customer documents –
namely Outside Counsel Guidelines and invoices – to extract key information, highlight
line items for review and reduce the likelihood that invoice submissions were rejected
(each invoice embedding thousands of line items, each of them to quality check.
Business
Services

21
CASE STUDY 7
The Customer:
Audio processing - Intelligent Customer Call Analysis
The Sector: Utilities
The Result:
Ø Improved Accuracy – better ability to identify specific issues and their root
causes through full capture of customer call
Ø Enhanced Customer Service and Improved Operations - enabled more
accurate scheduling of on-site engineers to solve problems first time
Ø Innovative Call Analysis - further business insight opportunities identified
through intelligent ML modelling
The Solution:
The Requirement: Intelligent Customer Call Analysis to get more from the
call recording information, helping to diagnose their customers' requirements
more efficiently
Ø Used the Discovery-as-a-Service method to rapidly progress from problem
to solution
Ø Deployed AWS's Contact Centre Innovation –creating a post-call analytics
solution using Amazon Transcribe to detail thousands of customers call
recordings from voice to text
Ø Applied advanced ML models including deep learning/LLM models to
identify root causes of caller enquiries
A leading Utilities water
company

IDP combines the capabilities provided by AWS Services into a business outcome
IDP SOLUTION ARCHITECTURE

INGESTION AND
PREPROCESSING
AWS Lambda
AWS Step Functions
Amazon S3
Amazon EventBridge

Capturing Meta Information and performing file conversion
INGESTION
AWS Lambda
PACK FILENAME NAME VALUE
1 Slip 1 S3 Location S3://XXXXX
1 Slip 1 Date Time Captured XXXX/XX/XX XX:XX
1 Slip 1 Business Unit XXXXXX
1 Slip 1 Status PENDING
1 Slip 1 Email Address XXX@xx.com
1 Slip 1 Email Received XXXX/XX/XX XX:XX
1 Slip 1 Type Of Doc RENEWAL_SLIP
Amazon DynamoDB
Amazon S3
AWS Step Functions
Amazon
EventBridge

CLASSIFICATION
& EXTRACTION
Amazon
Textract
Amazon
Comprehend
(Classification)
Hugging Face
Amazon
SageMaker

Using custom classification to determine how to process a
pack and the documents within it
EXTRACTION AND CLASSIFICATION
AWS Step
Functions
Amazon S3
Amazon
Textract
Amazon S3
Amazon
Comprehend
(Classification – Multi Label)
Amazon S3 Amazon S3
rawText.txt
Original PDF
AWS Lambda AWS Lambda
combinedText.txt
Amazon DynamoDB
output.tar.gz containing
output.json
Line of business
+ doc type I.e Slip
Line per doc format
allows for whole pack
is processed in one
comprehend job
10 examples needed
per label/class
Attention to metrics is
required
1 2 3 4

Advanced OCR capabilities
AMAZON TEXTRACT
Analyze Document:
• Pretrained on millions on
documents
Analyze Expense:
• Focused on invoices and
receipts.
Analyze ID:
• Focused on identity documents
such as U.S. passports and
driver’s licenses
Analyze Lending:
• Focused on mortgages

Multiple Extraction Types
AMAZON TEXTRACT
Queries
Forms
Table

Analysis Document
AMAZON TEXTRACT
Output:
• Tables as CSV
• Raw as text
• Key Values in Tables
• JSON Output that is very verbose
JSON Output needs post processing:
• Document is split per page even
when table spans more than 1
• Headers and Footer get in the way
when tables are split over multiple
pages
• Output contains Bounding Boxes +
X/Y locations

Problem 1: Formatting of Data Points
AMAZON TEXTRACT
Query
Form
Queries
Forms

Solution 1: Formatting of Data Points
AMAZON TEXTRACT
PACK FILENAME NAME VALUE
1 Invoice 1 S3 Location S3://XXXXX
1 Invoice 1 Date Time Captured XXXX/XX/XX XX:XX
1 Invoice 1 Business Unit XXXXXX
1 Invoice 1 Status PENDING
1 Invoice 1 Email Address billing@amazon.com
1 Invoice 1 Email Received XXXX/XX/XXX XX:XX
1 Invoice 1 Type of Doc AWS INVOICE
The classification is used to know what queries to ask and how understand the results
i.e., we could infer that Amazon use MM-DD-YYYY for dates

Problem 2: Context loss in Forms
AMAZON TEXTRACT
• Hugging Face and Amazon are
partners
• Multiple Amazon SageMaker
GroundTruth jobs used for
labelling and a Lambda used
collated/merge the manifests
• Model trained on labelled data is
done in Amazon SageMaker and
hosted via an endpoint
• Hugging Face have approved
Docker Images that do all the
boilerplate work for you
Textract Form
Solution 2: LayoutLM via Hugging Face

UNDERSTANDING
& ENRICHMENT
Amazon
SageMaker
Ground Truth
Amazon
Comprehend
(NER)
Amazon
Comprehend
(Custom NER)

Textract Raw
Name Entity Recognition
AMAZON COMPREHEND
Seattle WA 98109-5210
US
Billing Period: Aug 1 . Aug 31, 2010
Amazon Simple Notification Service
$ 0.00
Amazon Elastic Compute Cloud
$ 85.13
Amazon Simple Queue Service

Problem 3: Too generic!
AMAZON COMPREHEND

Solution 3: Customize Pre-Trained Large Language Models with Transfer Learning
CUSTOM MACHINE LEARNING MODELS
Wikipedia
Books
News
Other Sources
…
…
…
Source Docs
Named
Entities
AWS Comprehend
DATE
PERSON
ORGANIZATION
OTHER
…

Wikipedia
Books
News
Other Sources
…
…
…
Source Docs
Named
Entities
AWS Comprehend
…
…
Customized AWS Comprehend
DATE
PERSON
ORGANIZATION
OTHER
…
Wikipedia
Books
News
Other Sources
…
Source Docs
…
…
Wikipedia
Books
News
Other Sources
…
Source Docs

…
…
Wikipedia
Books
News
Other Sources
…
Source Docs
Named
Entities
AWS Comprehend
…
…
Customized AWS Comprehend
New Connections
are learned
DATE
PERSON
ORGANIZATION
LAYER
OTHER
…
Wikipedia
Books
News
Other Sources
…
Source Docs

…
…
Layer Customized AWS Comprehend
LAYER
Wikipedia
Books
News
Other Sources
…
Source Docs

POSTPROCESSING
AWS Lambda
Amazon
DynamoDB
Amazon RDS

Problem 4:
Data Points that are the same thing but called different things
POST PROCESSING

Problem 5:
More than one Data Point to consider
POST PROCESSING

Problem 6:
Missing Data Points to consider
POST PROCESSING

Solution 4 + 5 + 6:
Precedence stored in DynamoDB and processed in Lambda
POST PROCESSING
AWS Lambda
Amazon DynamoDB
Rule Set Rule
Slip Renewal Date {}
Slip Insured Name {}
Invoice Invoice Id {}
Invoice Total Amount {}
Invoice Supplier {}
Amazon DynamoDB

Problem 7: Different values in different documents
for the same thing
ENRICHMENT
Mrs Elizabeth Bloggs
Ms E Bloggs
Lizz Bloggs
Mr & Mrs J Bloggs
Joe Bloggs
Joseph Bloggs
Mr J Bloggs
Policy Number :
3355443333-wdd-7
Miss E Jones
Solution 7: Using Existing Data
Amazon RDS

Columba House,
Adastral Park, Martlesham Heath
Ipswich, Suffolk, IP5 3RE
www.inawisdom.com
@inawisdom Inawisdom
info@Inawisdom.com
+44 20 8133 8349
Thank you
© 2023 Cognizant | Private
Phil Basford
Senior Director – Consulting / Inawisdom CTO
Philip.basford@cognizant.com
23

Inawisdom IDP

Recommended

Recommended

More Related Content

Similar to Inawisdom IDP

Similar to Inawisdom IDP (20)

More from PhilipBasford

More from PhilipBasford (15)

Recently uploaded

Recently uploaded (20)

Inawisdom IDP