AWS - Cloud Native
MuleSoft Meetup
11-May-2024
Unveiling the Heart of
MuleSoft Intelligent
Document Processing -
AWS Textract
Organizers
Shubham Chaurasia
Billennium India
Integration Developer
/in/shubhamchaurasia1
Robin Sinha
Specialist II
/in/robin-sinha
Speakers
• Sr. Integration Specialist
• Working in LTIMindTree
• 5+ Years of Experience
• MuleSoft Certified
• IBM Certified
PRIYA SHAW
Speakers
• One of the Youngest MuleSoft Ambassdor
• Integration Developer in Billennium India
• 4+ years of Experience in Integrations
• MCIA, MCD L1, MCPA, MCIA, MCHS Certified,
• Salesforce, AWS, GCP, Azure, Workato Certified
• Mulesoft Meetup Leader | Mentor | Blogger | Speaker
• AWS Community Builder
• https://www.linkedin.com/in/shubhamchaurasia1/
• https://medium.com/@myid535
Shubham Chaurasia
AGENDA
● Introduction to AWS
● AWS Textract
● Document Processing using AWS Textract
● AWS Textract UseCases
● Demo - Textract Integration with MuleSoft
● QnA
● Networking time
AWS Top Product Categories
Need for processing documents
Search &
Discovery
Compliance
and control
Business
process automation
How documents are processed today
Manual
Processing
Optical character recognition
(OCR)
Rules and template-
based extraction
Conventional Technologies Fall Short
Structured Data
Dependencies
Flexibility Constraints
Lack of Data Validation
Integration with
Downstream Systems
AI Innovation Challenge
Accuracy
Concerns
Optical Character Recognition
(OCR)
Custom Trained AI Models
Challenges for processing documents
Expensive Error-prone Time-consuming
Processing a substantial
volume and variety of
documents demands
significant human hours
Why You Need A Scalable Document
Processing Solution
Current technology struggles
to work with evolving tech
stacks and AI solutions,
hindering the delivery of a
complete end-to-end
document automation
The makeshift, point-to-point
solution results in an
inefficient and non-
scalable ROI
Extract and structure data faster in any
format with AI-powered accuracy
Intelligent Document Processing
Maximize efficiency, minimize costs
Automate the entire document lifecycle and
incorporate manual review when necessary
Automate complete workflows beyond
documents
Seamlessly integrate extracted data with downstream
systems for a fully streamlined workflow
Automate complete workflows
beyond documents
Integrate extracted data with downstream
systems for a fully streamlined workflow
Accelerate time to market
Jumpstart with pre-trained models designed
for common document types
$2,413.13
69 Bank Street
Apr 2024 GA
AWS Textract
AWS Textract
• Detect text in typed and handwritten forms across various documents.
• Extract structured data like text, forms, and tables using Amazon Textract Document Analysis API.
• Use Queries to specify and extract information from documents with Amazon Textract Analyze
Document API.
• Analyze expenses from invoices and receipts with AnalyzeExpense API.
• Process U.S. government-issued ID documents like driver's licenses and passports using AnalyzeID
API.
• Automate routing and analysis of mortgage loan packages with Analyze Lending workflow.
• Customize Queries with your data for tailored processing needs.
Amazon Textract features
Text extraction Table extraction Form extraction
Features
Choose one of the following features based on your use case.
• DetectDocumentText - OCR
Extracts raw text.
• AnalyzeDocument - Tables
Extracts all tables and table cells in the document.
• Analyze Document - Queries
Extracts document data based on custom queries.
• AnalyzeDocument - Forms
Extracts all key-value pairs in the document.
• Analyze Document - Signatures
Extracts signatures from documents.
• AnalyzeDocument - Layout
Extracts titles, paragraphs, headers, section headers, lists, page numbers, footers, table areas, key-value areas and figure areas.
• Analyze ID
Extracts information from ID.
AWS Textract UseCases
Intelligent search indexes
Detect text in image and
PDF files
Natural Language Processing
controlling text grouping and
extraction, including word, line,
and table cell extraction
Accelerate data capture
and normalization
diverse sources such as financial
documents and medical notes
Automate data extraction
Integrate APIs into
existing workflows for
structured data extraction
AWS Textract Customers
Black Knight drives
efficiency and
delivers cost savings
Elevance Health automated
classification of attachments
for claims by 90%
Paytm achieved cost
savings of up to 75%
with Amazon Textract
Amazon Textract: Sync and async
Synchronous
Asynchronous
Supports single-page
documents such
as images (e.g., mobile
capture)
For multi-page documents,
up to 3,000 pages
Document
Document
Amazon Textract
Amazon Textract
Get results
Notification
Get results
With synchronous processing, Amazon Textract can analyze single-page documents for applications where latency is critical. Amazon
Textract also provides asynchronous operations to extend support to multipage documents.
Architecture - Form capture
Input
A customer uses a
mobile app to capture
a photo of an
employment
application form.
Customer Application
Customers experience real-
time capture of their
information by taking a
photo instead of manual
data entry.
Amazon Textract
The API is integrated
into the end-user
application to
automatically extract
text from the form
and auto-populate
the form fields.
Database
User submitted
data is loaded into
a database.
Amazon Textract: Text extraction
Blocks: PAGE, LINE, WORD
Document
Word
Output
Line1
Text extraction API: DetectDocumentText
Name Description
Blocks List of blocks identified
from the document
ID Unique ID of the unit
Relationships CHILD
Block type PAGE, LINE, WORD
Pages Contains number of
pages in the document
Name Description
Document Blob or Amazon S3
object
Response
Request
Demo Textract API
Networking time
Thank You

MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Cloud Native Meetup #4

  • 1.
    AWS - CloudNative MuleSoft Meetup 11-May-2024 Unveiling the Heart of MuleSoft Intelligent Document Processing - AWS Textract
  • 2.
    Organizers Shubham Chaurasia Billennium India IntegrationDeveloper /in/shubhamchaurasia1 Robin Sinha Specialist II /in/robin-sinha
  • 3.
    Speakers • Sr. IntegrationSpecialist • Working in LTIMindTree • 5+ Years of Experience • MuleSoft Certified • IBM Certified PRIYA SHAW
  • 4.
    Speakers • One ofthe Youngest MuleSoft Ambassdor • Integration Developer in Billennium India • 4+ years of Experience in Integrations • MCIA, MCD L1, MCPA, MCIA, MCHS Certified, • Salesforce, AWS, GCP, Azure, Workato Certified • Mulesoft Meetup Leader | Mentor | Blogger | Speaker • AWS Community Builder • https://www.linkedin.com/in/shubhamchaurasia1/ • https://medium.com/@myid535 Shubham Chaurasia
  • 5.
    AGENDA ● Introduction toAWS ● AWS Textract ● Document Processing using AWS Textract ● AWS Textract UseCases ● Demo - Textract Integration with MuleSoft ● QnA ● Networking time
  • 6.
    AWS Top ProductCategories
  • 7.
    Need for processingdocuments Search & Discovery Compliance and control Business process automation
  • 8.
    How documents areprocessed today Manual Processing Optical character recognition (OCR) Rules and template- based extraction
  • 9.
    Conventional Technologies FallShort Structured Data Dependencies Flexibility Constraints Lack of Data Validation Integration with Downstream Systems AI Innovation Challenge Accuracy Concerns Optical Character Recognition (OCR) Custom Trained AI Models
  • 10.
    Challenges for processingdocuments Expensive Error-prone Time-consuming
  • 11.
    Processing a substantial volumeand variety of documents demands significant human hours Why You Need A Scalable Document Processing Solution Current technology struggles to work with evolving tech stacks and AI solutions, hindering the delivery of a complete end-to-end document automation The makeshift, point-to-point solution results in an inefficient and non- scalable ROI
  • 12.
    Extract and structuredata faster in any format with AI-powered accuracy Intelligent Document Processing Maximize efficiency, minimize costs Automate the entire document lifecycle and incorporate manual review when necessary Automate complete workflows beyond documents Seamlessly integrate extracted data with downstream systems for a fully streamlined workflow Automate complete workflows beyond documents Integrate extracted data with downstream systems for a fully streamlined workflow Accelerate time to market Jumpstart with pre-trained models designed for common document types $2,413.13 69 Bank Street Apr 2024 GA
  • 13.
  • 14.
    AWS Textract • Detecttext in typed and handwritten forms across various documents. • Extract structured data like text, forms, and tables using Amazon Textract Document Analysis API. • Use Queries to specify and extract information from documents with Amazon Textract Analyze Document API. • Analyze expenses from invoices and receipts with AnalyzeExpense API. • Process U.S. government-issued ID documents like driver's licenses and passports using AnalyzeID API. • Automate routing and analysis of mortgage loan packages with Analyze Lending workflow. • Customize Queries with your data for tailored processing needs.
  • 15.
    Amazon Textract features Textextraction Table extraction Form extraction
  • 16.
    Features Choose one ofthe following features based on your use case. • DetectDocumentText - OCR Extracts raw text. • AnalyzeDocument - Tables Extracts all tables and table cells in the document. • Analyze Document - Queries Extracts document data based on custom queries. • AnalyzeDocument - Forms Extracts all key-value pairs in the document. • Analyze Document - Signatures Extracts signatures from documents. • AnalyzeDocument - Layout Extracts titles, paragraphs, headers, section headers, lists, page numbers, footers, table areas, key-value areas and figure areas. • Analyze ID Extracts information from ID.
  • 17.
    AWS Textract UseCases Intelligentsearch indexes Detect text in image and PDF files Natural Language Processing controlling text grouping and extraction, including word, line, and table cell extraction Accelerate data capture and normalization diverse sources such as financial documents and medical notes Automate data extraction Integrate APIs into existing workflows for structured data extraction
  • 18.
    AWS Textract Customers BlackKnight drives efficiency and delivers cost savings Elevance Health automated classification of attachments for claims by 90% Paytm achieved cost savings of up to 75% with Amazon Textract
  • 19.
    Amazon Textract: Syncand async Synchronous Asynchronous Supports single-page documents such as images (e.g., mobile capture) For multi-page documents, up to 3,000 pages Document Document Amazon Textract Amazon Textract Get results Notification Get results With synchronous processing, Amazon Textract can analyze single-page documents for applications where latency is critical. Amazon Textract also provides asynchronous operations to extend support to multipage documents.
  • 20.
    Architecture - Formcapture Input A customer uses a mobile app to capture a photo of an employment application form. Customer Application Customers experience real- time capture of their information by taking a photo instead of manual data entry. Amazon Textract The API is integrated into the end-user application to automatically extract text from the form and auto-populate the form fields. Database User submitted data is loaded into a database.
  • 21.
    Amazon Textract: Textextraction Blocks: PAGE, LINE, WORD Document Word Output Line1
  • 22.
    Text extraction API:DetectDocumentText Name Description Blocks List of blocks identified from the document ID Unique ID of the unit Relationships CHILD Block type PAGE, LINE, WORD Pages Contains number of pages in the document Name Description Document Blob or Amazon S3 object Response Request
  • 23.
  • 24.
  • 25.