SlideShare a Scribd company logo
1 of 3
Download to read offline
Understanding the
Difference Between
Structured and
Unstructured Documents
By: Randy Van Ittersum and Erin Spalding CDIA+
www.disusa.com
Copyright 2005 www.disusa.com Page 1
White Paper
Understanding the Differences Between Structured and Unstructured Documents
Differences Between the Two Document Types
What is the difference between structured and unstructured documents? With a structured
document, certain information always appears in the same location on the page. For example, in
an employment application the applicant’s name always appear in the same box in the same place
on the document. In contrast, an unstructured document has the opposite characteristics –
information can appear in unexpected places on the document. An example would be in a hand
written note or a whitepaper.
Some documents share the characteristics of both types of documents, such as invoices. For
example, suppliers’ invoices feel like a structured document because they have a consistent
appearance from one billing period to the next. However, when viewed in aggregate by an
accounts payable department that receives thousands of invoices daily in a myriad of different
formats; they seem more like structured documents.
What About Template-Based OCR Systems
Some document imaging systems advocate template-based OCR (optical character
recognition) to capture the information needed to identify the document for later retrieval.
They call this pixy dust, where you don’t need to do anything with the documents other
than to load the automatic document feeder. Unfortunately this solution only works well
with structured documents, and it is not 100% accurate even under the best conditions.
(For more information on the accuracy of OCR, read our whitepaper on that subject).
Copyright 2005 www.disusa.com Page 2
Needless to say, you will need to have a different method to capture the key information
needed to retrieve documents that are unstructured. In many organizations unstructured
documents represent the majority of the documents that will be imaged with a document
imaging system.
Characteristics of Structured and Unstructured
Documents
Type of Document Structured Unstructured
Characteristics: • Familiar data appears in
the same place every
time.
• Data appears in
unexpected places in the
document.
Examples: • Insurance claim form
• Employment application
• A letter
• A hand-written note
Used by Organizations: • Low volume operations
• Internally created
invoices
• High volume operations
• Invoices received from
outside the organization
Conclusion
Every organization will have both structured and unstructured document with which to
contend. It is generally a good idea to purchase a document imaging system that offers
the maximum capabilities to deal with both types of documents, rather than purchasing a
system that caters only to a single document type.
Copyright 2005 www.disusa.com Page 3

More Related Content

Similar to Understanding Differences Between Structured Unstructured Docs

En ebook-digital-signature-for-the-remote-workplace
En ebook-digital-signature-for-the-remote-workplaceEn ebook-digital-signature-for-the-remote-workplace
En ebook-digital-signature-for-the-remote-workplaceNiranjanaDhumal
 
Modern Document Processing | Nanonets Blog.pdf
Modern Document Processing | Nanonets Blog.pdfModern Document Processing | Nanonets Blog.pdf
Modern Document Processing | Nanonets Blog.pdfDhanashreeBadhe
 
Reducing paper in todays digital world
Reducing paper in todays digital worldReducing paper in todays digital world
Reducing paper in todays digital worldICM Document Solutions
 
Reducing paper in todays digital world
Reducing paper in todays digital worldReducing paper in todays digital world
Reducing paper in todays digital worldICM Document Solutions
 
ITGulfCoast: Technology Trends In The Legal Industry by Garrett LaBorde
ITGulfCoast: Technology Trends In The Legal Industry by Garrett LaBordeITGulfCoast: Technology Trends In The Legal Industry by Garrett LaBorde
ITGulfCoast: Technology Trends In The Legal Industry by Garrett LaBordeGarrett P. Laborde
 
Smarter Document Capture
Smarter Document CaptureSmarter Document Capture
Smarter Document CaptureDesktop Imaging
 
Digital signatures whitepaper_thinkdox
Digital signatures whitepaper_thinkdoxDigital signatures whitepaper_thinkdox
Digital signatures whitepaper_thinkdoxChristopher Wynder
 
Bridging The Gap, Eliminating paper in the Enterprise
Bridging The Gap, Eliminating paper in the EnterpriseBridging The Gap, Eliminating paper in the Enterprise
Bridging The Gap, Eliminating paper in the Enterpriseschenery
 
Miwp magage share-docs-2012
Miwp magage share-docs-2012Miwp magage share-docs-2012
Miwp magage share-docs-2012Liberteks
 
Evisort New Document Analyzer Offers Out-of-the-Box AI to Mine All A Company’...
Evisort New Document Analyzer Offers Out-of-the-Box AI to Mine All A Company’...Evisort New Document Analyzer Offers Out-of-the-Box AI to Mine All A Company’...
Evisort New Document Analyzer Offers Out-of-the-Box AI to Mine All A Company’...Evisort
 
ShadowCounsel LLC - Services and Pricing
ShadowCounsel LLC - Services and PricingShadowCounsel LLC - Services and Pricing
ShadowCounsel LLC - Services and PricingDavid Black
 
How AI is changing legal due diligence
How AI is changing legal due diligenceHow AI is changing legal due diligence
How AI is changing legal due diligenceImprima
 
Imprima | How AI is Changing Legal Due Diligence
Imprima | How AI is Changing Legal Due DiligenceImprima | How AI is Changing Legal Due Diligence
Imprima | How AI is Changing Legal Due DiligenceImprima
 
Small Law Office Management for the Legal Professional
Small Law Office Management for the Legal ProfessionalSmall Law Office Management for the Legal Professional
Small Law Office Management for the Legal ProfessionalShawn J. Roberts
 

Similar to Understanding Differences Between Structured Unstructured Docs (20)

En ebook-digital-signature-for-the-remote-workplace
En ebook-digital-signature-for-the-remote-workplaceEn ebook-digital-signature-for-the-remote-workplace
En ebook-digital-signature-for-the-remote-workplace
 
Modern Document Processing | Nanonets Blog.pdf
Modern Document Processing | Nanonets Blog.pdfModern Document Processing | Nanonets Blog.pdf
Modern Document Processing | Nanonets Blog.pdf
 
Reducing paper in todays digital world
Reducing paper in todays digital worldReducing paper in todays digital world
Reducing paper in todays digital world
 
Reducing paper in todays digital world
Reducing paper in todays digital worldReducing paper in todays digital world
Reducing paper in todays digital world
 
Digitization
DigitizationDigitization
Digitization
 
ITGulfCoast: Technology Trends In The Legal Industry by Garrett LaBorde
ITGulfCoast: Technology Trends In The Legal Industry by Garrett LaBordeITGulfCoast: Technology Trends In The Legal Industry by Garrett LaBorde
ITGulfCoast: Technology Trends In The Legal Industry by Garrett LaBorde
 
Smarter Document Capture
Smarter Document CaptureSmarter Document Capture
Smarter Document Capture
 
DU_SERIES_Session1.pdf
DU_SERIES_Session1.pdfDU_SERIES_Session1.pdf
DU_SERIES_Session1.pdf
 
Digital signatures whitepaper_thinkdox
Digital signatures whitepaper_thinkdoxDigital signatures whitepaper_thinkdox
Digital signatures whitepaper_thinkdox
 
Bridging The Gap, Eliminating paper in the Enterprise
Bridging The Gap, Eliminating paper in the EnterpriseBridging The Gap, Eliminating paper in the Enterprise
Bridging The Gap, Eliminating paper in the Enterprise
 
Records Management Services
Records Management ServicesRecords Management Services
Records Management Services
 
Miwp magage share-docs-2012
Miwp magage share-docs-2012Miwp magage share-docs-2012
Miwp magage share-docs-2012
 
What is Intelligent Document and Data Capture? A look at the technologies to ...
What is Intelligent Document and Data Capture? A look at the technologies to ...What is Intelligent Document and Data Capture? A look at the technologies to ...
What is Intelligent Document and Data Capture? A look at the technologies to ...
 
What is Batch Document Processing? A tutorial for document capture.
What is Batch Document Processing?  A tutorial for document capture.What is Batch Document Processing?  A tutorial for document capture.
What is Batch Document Processing? A tutorial for document capture.
 
Evisort New Document Analyzer Offers Out-of-the-Box AI to Mine All A Company’...
Evisort New Document Analyzer Offers Out-of-the-Box AI to Mine All A Company’...Evisort New Document Analyzer Offers Out-of-the-Box AI to Mine All A Company’...
Evisort New Document Analyzer Offers Out-of-the-Box AI to Mine All A Company’...
 
ShadowCounsel LLC - Services and Pricing
ShadowCounsel LLC - Services and PricingShadowCounsel LLC - Services and Pricing
ShadowCounsel LLC - Services and Pricing
 
How AI is changing legal due diligence
How AI is changing legal due diligenceHow AI is changing legal due diligence
How AI is changing legal due diligence
 
Imprima | How AI is Changing Legal Due Diligence
Imprima | How AI is Changing Legal Due DiligenceImprima | How AI is Changing Legal Due Diligence
Imprima | How AI is Changing Legal Due Diligence
 
Small Law Office Management for the Legal Professional
Small Law Office Management for the Legal ProfessionalSmall Law Office Management for the Legal Professional
Small Law Office Management for the Legal Professional
 
A few steps to create a paperless office
A few steps to create a paperless officeA few steps to create a paperless office
A few steps to create a paperless office
 

Recently uploaded

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Recently uploaded (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Understanding Differences Between Structured Unstructured Docs

  • 1. Understanding the Difference Between Structured and Unstructured Documents By: Randy Van Ittersum and Erin Spalding CDIA+ www.disusa.com Copyright 2005 www.disusa.com Page 1 White Paper
  • 2. Understanding the Differences Between Structured and Unstructured Documents Differences Between the Two Document Types What is the difference between structured and unstructured documents? With a structured document, certain information always appears in the same location on the page. For example, in an employment application the applicant’s name always appear in the same box in the same place on the document. In contrast, an unstructured document has the opposite characteristics – information can appear in unexpected places on the document. An example would be in a hand written note or a whitepaper. Some documents share the characteristics of both types of documents, such as invoices. For example, suppliers’ invoices feel like a structured document because they have a consistent appearance from one billing period to the next. However, when viewed in aggregate by an accounts payable department that receives thousands of invoices daily in a myriad of different formats; they seem more like structured documents. What About Template-Based OCR Systems Some document imaging systems advocate template-based OCR (optical character recognition) to capture the information needed to identify the document for later retrieval. They call this pixy dust, where you don’t need to do anything with the documents other than to load the automatic document feeder. Unfortunately this solution only works well with structured documents, and it is not 100% accurate even under the best conditions. (For more information on the accuracy of OCR, read our whitepaper on that subject). Copyright 2005 www.disusa.com Page 2
  • 3. Needless to say, you will need to have a different method to capture the key information needed to retrieve documents that are unstructured. In many organizations unstructured documents represent the majority of the documents that will be imaged with a document imaging system. Characteristics of Structured and Unstructured Documents Type of Document Structured Unstructured Characteristics: • Familiar data appears in the same place every time. • Data appears in unexpected places in the document. Examples: • Insurance claim form • Employment application • A letter • A hand-written note Used by Organizations: • Low volume operations • Internally created invoices • High volume operations • Invoices received from outside the organization Conclusion Every organization will have both structured and unstructured document with which to contend. It is generally a good idea to purchase a document imaging system that offers the maximum capabilities to deal with both types of documents, rather than purchasing a system that caters only to a single document type. Copyright 2005 www.disusa.com Page 3