What is Document
Indexing?
1
in·dex /in-deks/ n.
plural in·dex·es, in·di·ces /in-duh-seez/
a list (as of bibliographical i...
the process of tagging or associating
information with a file so it can be used for
search and retrieval purposes later
In...
Indexing creates the “searchable” information
that users will later use to find documents.
Invoice Number?
Customer/Employee Number?
Customer/Employee Name?
Date?
Site ID?
Patient Name?Doctor?
Work Order Number?
W...
The index information is
stored or integrated into
a database or
document/records
management system
which provides a
frame...
There are two types of Indexes.
Full-text indexing is
just what the name
implies; all the text
of the document is
indexed.
When specific words
or descriptions are
indexed to create
the searchable
index fields, the
information is
referred to as
“...
So Why is
Indexing
Important?
“Documents are the currency of business.
They are at the heart of critical workflows
and drive just about every area of
bu...
Great care should be taken to
design an efficient indexing scheme.
If the process is not
designed correctly
at the outset, trying
to rectify it later
can be both
difficult and costly.
And i...
So how can indexing information
be extracted with little to no
user intervention?
So how can indexing information
be extracted with little to no
user intervention?
• Barcodes
• Content Data Mining
• Optic...
Intelligent data capture software can extract barcode
data for indexing.
Intelligent data capture software can extract barcode
data for indexing.
Barcodes can also be used for many other
purposes...
Files that contain text can be mined using various data
mining techniques.
OCR tools and technology such as
Regular Expressions aid in text mining.
Regular expression (regex) scripts are
powerful tools to help identify keywords or
actual strings of text for indexing fro...
Regular expression (regex) scripts are
powerful tools to help identify keywords or
actual strings of text for indexing fro...
If an inventory item should contain three alpha characters
followed by five numbers, advanced indexing solutions can
use r...
Used to process EOB's or other records where the same
document needs to be in multiple patient records or places.
Advanced...
Index Sources can be:
• Print streams
• Scanned documents
• Existing files such as word processing
and spreadsheets
PDF print streams can be used to
produce the source data for invoice
runs or other AP/AR functions that
can then be mined ...
With OCR technology, make your scanned or
image-based file fully text-searchable or extract
data from a zone for indexing.
With most data
capture solutions, users
often select the output
file format as a
“searchable PDF” to
make a full-text inde...
With zonal OCR, document areas are identified for
OCR capture. Drag-and-drop OCR lets an operator
highlight document text ...
Now that I’ve captured my index data, what can
I do?
Now that I’ve captured my index data, what can
I do?
1. Use a simple search and retrieval system
Now that I’ve captured my index data, what can
I do?
1. Use a simple search and retrieval system
• Let’s you search on the...
Now that I’ve captured my index data, what can
I do?
2. Send it to an existing document
management or EMR/EHR system.
Now that I’ve captured my index data, what can
I do?
2. Send it to an existing document
management or EMR/EHR system.
Henr...
Learn more about ImageRamp,
intelligent data capture software and…
Click for information on:
• Understanding your scanning requirements
• Using Regular Expressions for Automated Data Captur...
Contact us for more information on:
• How to capture index data from print streams
• Using Regex to capture index informat...
Image Credits
• Dave Gray dgray_xplane, http://bit.ly/17xKYXp
• Marcin Wichary, Alphabetical, http://bit.ly/1aILOku
• Jim ...
Upcoming SlideShare
Loading in …5
×

What is Document Indexing? A tutorial for intelligent data capture.

11,107 views

Published on

Learn what document indexing is and how index data can be captured with barcode recognition, OCR and more for unattended or automated indexing. Learn about full-text and metadata indexing and capture from scanned documents, print streams or existing files. This is a tutorial to define document indexing and discuss the technologies and methods used to identify and capture the data.

Published in: Software, Business

What is Document Indexing? A tutorial for intelligent data capture.

  1. 1. What is Document Indexing? 1 in·dex /in-deks/ n. plural in·dex·es, in·di·ces /in-duh-seez/ a list (as of bibliographical information or citations to a body of literature) arranged usually in alphabetical order of some specified datum (as author, subject, or keyword): as a : a list of items (as topics or names) 2 in·dex /in-deks/ v. to provide an index for (something, such as a book) Copyright ©2014
  2. 2. the process of tagging or associating information with a file so it can be used for search and retrieval purposes later Indexing:
  3. 3. Indexing creates the “searchable” information that users will later use to find documents.
  4. 4. Invoice Number? Customer/Employee Number? Customer/Employee Name? Date? Site ID? Patient Name?Doctor? Work Order Number? Waybill Number? Prescription Number?
  5. 5. The index information is stored or integrated into a database or document/records management system which provides a framework for users to locate the documents. My Database
  6. 6. There are two types of Indexes.
  7. 7. Full-text indexing is just what the name implies; all the text of the document is indexed.
  8. 8. When specific words or descriptions are indexed to create the searchable index fields, the information is referred to as “metadata.”
  9. 9. So Why is Indexing Important?
  10. 10. “Documents are the currency of business. They are at the heart of critical workflows and drive just about every area of business.” -- IDC, “The Role of Documents: How They Drive Business, Today and Tomorrow”, January 2013
  11. 11. Great care should be taken to design an efficient indexing scheme.
  12. 12. If the process is not designed correctly at the outset, trying to rectify it later can be both difficult and costly. And in some environments such as legal, the cost of not locating a key document can be monumental. Avoid Disaster
  13. 13. So how can indexing information be extracted with little to no user intervention?
  14. 14. So how can indexing information be extracted with little to no user intervention? • Barcodes • Content Data Mining • Optical Character Recognition (OCR) • Zonal OCR • Drag and Drop OCR
  15. 15. Intelligent data capture software can extract barcode data for indexing.
  16. 16. Intelligent data capture software can extract barcode data for indexing. Barcodes can also be used for many other purposes such as file naming, splitting, bookmarking and routing.
  17. 17. Files that contain text can be mined using various data mining techniques.
  18. 18. OCR tools and technology such as Regular Expressions aid in text mining.
  19. 19. Regular expression (regex) scripts are powerful tools to help identify keywords or actual strings of text for indexing from many source types. OCR tools and technology such as Regular Expressions aid in text mining.
  20. 20. Regular expression (regex) scripts are powerful tools to help identify keywords or actual strings of text for indexing from many source types. The scripting process can look for words with specific characters, lengths, character types, or preceding keywords. OCR tools and technology such as Regular Expressions aid in text mining.
  21. 21. If an inventory item should contain three alpha characters followed by five numbers, advanced indexing solutions can use regex to recognized this pattern and reject all documents with items not meeting this rule. The document can be tagged for manual inspection before further processing is done. Advanced indexing solutions offer Field Validation based on Regular Expressions. PEN21096 CAP36581 INV98453 PA568793
  22. 22. Used to process EOB's or other records where the same document needs to be in multiple patient records or places. Advanced data capture solutions such as ImageRamp allow the operator to easily scan the EOB once, index the different patients' information via an onscreen keyboard, drag-and- drop OCR, or barcode reading methods, and route to the appropriate patients' records with little to no intervention. Advanced indexing solutions can accommodate special needs such as Scan Once, Index Many ImageRamp: Multiple Indexing, Naming and Routing of the Same Document Patient A Patient B Patient C Policy EOB
  23. 23. Index Sources can be: • Print streams • Scanned documents • Existing files such as word processing and spreadsheets
  24. 24. PDF print streams can be used to produce the source data for invoice runs or other AP/AR functions that can then be mined for index data and document splits.
  25. 25. With OCR technology, make your scanned or image-based file fully text-searchable or extract data from a zone for indexing.
  26. 26. With most data capture solutions, users often select the output file format as a “searchable PDF” to make a full-text index. This uses OCR technology to create a PDF file with two layers, an image layer and a text layer that can be used for full-text searching.
  27. 27. With zonal OCR, document areas are identified for OCR capture. Drag-and-drop OCR lets an operator highlight document text which is automatically OCR'd and dropped into index fields.
  28. 28. Now that I’ve captured my index data, what can I do?
  29. 29. Now that I’ve captured my index data, what can I do? 1. Use a simple search and retrieval system
  30. 30. Now that I’ve captured my index data, what can I do? 1. Use a simple search and retrieval system • Let’s you search on the index fields or free form search on full-text, searchable PDF files. • Can be a stepping stone to a full- fledged document management system later without loss of investment.
  31. 31. Now that I’ve captured my index data, what can I do? 2. Send it to an existing document management or EMR/EHR system.
  32. 32. Now that I’ve captured my index data, what can I do? 2. Send it to an existing document management or EMR/EHR system. Henry Schein, Dentrix, Dentrix Enterprise Dentrix Ascend, Easy Dental Viive, DentalVision, axiUm Filenet ANYONE via CSV, XML Laserfich e Documentum MyMedicalRecords Eaglesoft Allscripts Epic Dentrix Sharepoint CSV, XML standard formats
  33. 33. Learn more about ImageRamp, intelligent data capture software and…
  34. 34. Click for information on: • Understanding your scanning requirements • Using Regular Expressions for Automated Data Capture and Indexing • Make your Paperless Dreams Come True, using Fujitsu ScanSnap scanners for document capture • What can barcodes do for me? (in document Management/EMR Data capture) • 8 Must Haves for any Document Capture System • What is document Indexing document capture and processing:
  35. 35. Contact us for more information on: • How to capture index data from print streams • Using Regex to capture index information, • More tutorial information on document management • Scanning documents for document management, • How to intelligently capture index data from your scans • Requirements for document management scanning • How to select a document capture or document scanning solution • Using touchscreen scanners such as the Fujitsu ScanSnap as an intelligent capture solution • Batch document scanning solutions • Document Management cost savings • EMR data capture • Batch Indexing solutions • Batch document indexing • Index documents • Create a document index • Document management index • Index from print stream • ECM index • Index ECM By DocuFi, makers of ImageRamp, Document Management Capture Solution 30 years’ experience in the Document Imaging market. Find out more at ImageRamp and www.docufi.com Copyright ©2014
  36. 36. Image Credits • Dave Gray dgray_xplane, http://bit.ly/17xKYXp • Marcin Wichary, Alphabetical, http://bit.ly/1aILOku • Jim Morgan, database http://bit.ly/1ai0Nm3 • Liza liza31337, Book crease, http://bit.ly/1lWj8tL • UCL Faculty of Mathematical and Physical Sciences, Index, http://bit.ly/19q6GiI • Stuart Caie kyz, Indexed, http://bit.ly/Kfwbau • Spiffie, “Fujitsu ScanSnap S300M” http://bit.ly/1ksdhhv • Doctorwonder, “Stack O'Money!” http://bit.ly/1fgxpko • Boston Public Library, The card index department, http://bit.ly/1kygZq2 • Robyn Jay, robynejay Train wreck at Montparnasse 1895, http://bit.ly/19q8CYq • Theilr, spray, http://bit.ly/1hjGKp3 • Phil Whitehouse,Phillie Casablanca, Blue Zone, http://bit.ly/1hjGVAT • Seiichi Kusunoki Visual Maintenance, Bunch of Papers, http://bit.ly/1eJ8EZu • Patrick Hoesly, “Thank you” http://bit.ly/17xKErE All images are owned or licensed by DocuFi with acknowledgement given to:

×