7. Need for processing documents
Search &
Discovery
Compliance
and control
Business
process automation
8. How documents are processed today
Manual
Processing
Optical character recognition
(OCR)
Rules and template-
based extraction
9. Conventional Technologies Fall Short
Structured Data
Dependencies
Flexibility Constraints
Lack of Data Validation
Integration with
Downstream Systems
AI Innovation Challenge
Accuracy
Concerns
Optical Character Recognition
(OCR)
Custom Trained AI Models
11. Processing a substantial
volume and variety of
documents demands
significant human hours
Why You Need A Scalable Document
Processing Solution
Current technology struggles
to work with evolving tech
stacks and AI solutions,
hindering the delivery of a
complete end-to-end
document automation
The makeshift, point-to-point
solution results in an
inefficient and non-
scalable ROI
12. Extract and structure data faster in any
format with AI-powered accuracy
Intelligent Document Processing
Maximize efficiency, minimize costs
Automate the entire document lifecycle and
incorporate manual review when necessary
Automate complete workflows beyond
documents
Seamlessly integrate extracted data with downstream
systems for a fully streamlined workflow
Automate complete workflows
beyond documents
Integrate extracted data with downstream
systems for a fully streamlined workflow
Accelerate time to market
Jumpstart with pre-trained models designed
for common document types
$2,413.13
69 Bank Street
Apr 2024 GA
14. AWS Textract
• Detect text in typed and handwritten forms across various documents.
• Extract structured data like text, forms, and tables using Amazon Textract Document Analysis API.
• Use Queries to specify and extract information from documents with Amazon Textract Analyze
Document API.
• Analyze expenses from invoices and receipts with AnalyzeExpense API.
• Process U.S. government-issued ID documents like driver's licenses and passports using AnalyzeID
API.
• Automate routing and analysis of mortgage loan packages with Analyze Lending workflow.
• Customize Queries with your data for tailored processing needs.
16. Features
Choose one of the following features based on your use case.
• DetectDocumentText - OCR
Extracts raw text.
• AnalyzeDocument - Tables
Extracts all tables and table cells in the document.
• Analyze Document - Queries
Extracts document data based on custom queries.
• AnalyzeDocument - Forms
Extracts all key-value pairs in the document.
• Analyze Document - Signatures
Extracts signatures from documents.
• AnalyzeDocument - Layout
Extracts titles, paragraphs, headers, section headers, lists, page numbers, footers, table areas, key-value areas and figure areas.
• Analyze ID
Extracts information from ID.
17. AWS Textract UseCases
Intelligent search indexes
Detect text in image and
PDF files
Natural Language Processing
controlling text grouping and
extraction, including word, line,
and table cell extraction
Accelerate data capture
and normalization
diverse sources such as financial
documents and medical notes
Automate data extraction
Integrate APIs into
existing workflows for
structured data extraction
18. AWS Textract Customers
Black Knight drives
efficiency and
delivers cost savings
Elevance Health automated
classification of attachments
for claims by 90%
Paytm achieved cost
savings of up to 75%
with Amazon Textract
19. Amazon Textract: Sync and async
Synchronous
Asynchronous
Supports single-page
documents such
as images (e.g., mobile
capture)
For multi-page documents,
up to 3,000 pages
Document
Document
Amazon Textract
Amazon Textract
Get results
Notification
Get results
With synchronous processing, Amazon Textract can analyze single-page documents for applications where latency is critical. Amazon
Textract also provides asynchronous operations to extend support to multipage documents.
20. Architecture - Form capture
Input
A customer uses a
mobile app to capture
a photo of an
employment
application form.
Customer Application
Customers experience real-
time capture of their
information by taking a
photo instead of manual
data entry.
Amazon Textract
The API is integrated
into the end-user
application to
automatically extract
text from the form
and auto-populate
the form fields.
Database
User submitted
data is loaded into
a database.
21. Amazon Textract: Text extraction
Blocks: PAGE, LINE, WORD
Document
Word
Output
Line1
22. Text extraction API: DetectDocumentText
Name Description
Blocks List of blocks identified
from the document
ID Unique ID of the unit
Relationships CHILD
Block type PAGE, LINE, WORD
Pages Contains number of
pages in the document
Name Description
Document Blob or Amazon S3
object
Response
Request