Learn what document indexing is and how index data can be captured with barcode recognition, OCR and more for unattended or automated indexing. Learn about full-text and metadata indexing and capture from scanned documents, print streams or existing files. This is a tutorial to define document indexing and discuss the technologies and methods used to identify and capture the data.
5. The index information is
stored or integrated into
a database or
document/records
management system
which provides a
framework for users to
locate the documents.
My Database
10. “Documents are the currency of business.
They are at the heart of critical workflows
and drive just about every area of
business.” -- IDC, “The Role of Documents: How They Drive
Business, Today and Tomorrow”, January 2013
12. If the process is not
designed correctly
at the outset, trying
to rectify it later
can be both
difficult and costly.
And in some
environments such
as legal, the cost of
not locating a key
document can be
monumental.
Avoid Disaster
13. So how can indexing information
be extracted with little to no
user intervention?
14. So how can indexing information
be extracted with little to no
user intervention?
• Barcodes
• Content Data Mining
• Optical Character Recognition (OCR)
• Zonal OCR
• Drag and Drop OCR
16. Intelligent data capture software can extract barcode
data for indexing.
Barcodes can also be used for many other
purposes such as file naming, splitting,
bookmarking and routing.
17. Files that contain text can be mined using various data
mining techniques.
18. OCR tools and technology such as
Regular Expressions aid in text mining.
19. Regular expression (regex) scripts are
powerful tools to help identify keywords or
actual strings of text for indexing from
many source types.
OCR tools and technology such as
Regular Expressions aid in text mining.
20. Regular expression (regex) scripts are
powerful tools to help identify keywords or
actual strings of text for indexing from
many source types.
The scripting process can look for words
with specific characters, lengths,
character types, or preceding keywords.
OCR tools and technology such as
Regular Expressions aid in text mining.
21. If an inventory item should contain three alpha characters
followed by five numbers, advanced indexing solutions can
use regex to recognized this pattern and reject all
documents with items not meeting this rule.
The document can be tagged for manual inspection before
further processing is done.
Advanced indexing solutions offer Field Validation
based on Regular Expressions.
PEN21096
CAP36581
INV98453
PA568793
22. Used to process EOB's or other records where the same
document needs to be in multiple patient records or places.
Advanced data capture solutions such as ImageRamp allow
the operator to easily scan the EOB once, index the different
patients' information via an onscreen keyboard, drag-and-
drop OCR, or barcode reading methods, and route to the
appropriate patients' records with little to no intervention.
Advanced indexing solutions can accommodate
special needs such as Scan Once, Index Many
ImageRamp:
Multiple Indexing,
Naming and
Routing of the
Same Document
Patient A
Patient B
Patient C
Policy
EOB
23. Index Sources can be:
• Print streams
• Scanned documents
• Existing files such as word processing
and spreadsheets
24. PDF print streams can be used to
produce the source data for invoice
runs or other AP/AR functions that
can then be mined for index data
and document splits.
25. With OCR technology, make your scanned or
image-based file fully text-searchable or extract
data from a zone for indexing.
26. With most data
capture solutions, users
often select the output
file format as a
“searchable PDF” to
make a full-text index.
This uses OCR
technology to create a
PDF file with two layers,
an image layer and a
text layer that can be
used for full-text
searching.
27. With zonal OCR, document areas are identified for
OCR capture. Drag-and-drop OCR lets an operator
highlight document text which is automatically OCR'd
and dropped into index fields.
28. Now that I’ve captured my index data, what can
I do?
29. Now that I’ve captured my index data, what can
I do?
1. Use a simple search and retrieval system
30. Now that I’ve captured my index data, what can
I do?
1. Use a simple search and retrieval system
• Let’s you search on the index fields or
free form search on full-text, searchable
PDF files.
• Can be a stepping stone to a full-
fledged document management
system later without loss of investment.
31. Now that I’ve captured my index data, what can
I do?
2. Send it to an existing document
management or EMR/EHR system.
32. Now that I’ve captured my index data, what can
I do?
2. Send it to an existing document
management or EMR/EHR system.
Henry Schein, Dentrix,
Dentrix Enterprise
Dentrix Ascend, Easy
Dental
Viive, DentalVision, axiUm
Filenet
ANYONE via CSV, XML
Laserfich
e
Documentum
MyMedicalRecords
Eaglesoft
Allscripts
Epic
Dentrix
Sharepoint
CSV, XML
standard
formats
33. Learn more about ImageRamp,
intelligent data capture software and…
34. Click for information on:
• Understanding your scanning requirements
• Using Regular Expressions for Automated Data Capture and Indexing
• Make your Paperless Dreams Come True, using Fujitsu ScanSnap
scanners for document capture
• What can barcodes do for me? (in document Management/EMR Data
capture)
• 8 Must Haves for any Document Capture System
• What is document Indexing
document capture and processing:
36. Image Credits
• Dave Gray dgray_xplane, http://bit.ly/17xKYXp
• Marcin Wichary, Alphabetical, http://bit.ly/1aILOku
• Jim Morgan, database http://bit.ly/1ai0Nm3
• Liza liza31337, Book crease, http://bit.ly/1lWj8tL
• UCL Faculty of Mathematical and Physical Sciences, Index,
http://bit.ly/19q6GiI
• Stuart Caie kyz, Indexed, http://bit.ly/Kfwbau
• Spiffie, “Fujitsu ScanSnap S300M” http://bit.ly/1ksdhhv
• Doctorwonder, “Stack O'Money!” http://bit.ly/1fgxpko
• Boston Public Library, The card index department,
http://bit.ly/1kygZq2
• Robyn Jay, robynejay Train wreck at Montparnasse 1895,
http://bit.ly/19q8CYq
• Theilr, spray, http://bit.ly/1hjGKp3
• Phil Whitehouse,Phillie Casablanca, Blue Zone, http://bit.ly/1hjGVAT
• Seiichi Kusunoki Visual Maintenance, Bunch of Papers,
http://bit.ly/1eJ8EZu
• Patrick Hoesly, “Thank you” http://bit.ly/17xKErE
All images are owned or licensed by DocuFi with acknowledgement given to: