SlideShare a Scribd company logo
1 of 8
Pondicherry University
Dhatchayani M
Department: LIS
Course: MLIS, 2ND Year
Automatic indexing is indexing made by algorithmic procedures. The
algorithm works on a database containing document representations (which
may be full text representations or bibliographical records or partial text
representations and in principle also value added databases). Automatic
indexing may also be performed on non-text databases,
e.g. images or music.
This statistical technique: Involves
(1) the determination of certain probability relationships between individual
content-bearing words and subject categories, and
(2) the use of these relationships to predict the category to which a
document containing the words belongs.
The basic and simplest concept of automatic indexing developed in
the 1950s was the KWIC or Keyword in Context index based on
permutations of significant words in titles, abstracts or full text --
manipulated by machine. The first major report on the application of this
indexing concept occurred at the International Conference on Scientific
Information (ICSI) held in Washington, D. C. in November of 1958. The
paper was not the sensational product; the actual demonstration of the
method was the sensation of the conference.
 At the risk of getting ahead of ourselves and in view of the obvious
information explosion that our scientific and intelligence communities surely
face, let us point out what successful automatic indexing could mean.
 First, we seem to be rapidly approaching the time when along with the
printed page there will be an associated tape of corresponding information
ready for direct input to a computing machine.
 This means that as each organization receives its daily incoming documents
a machine could read them and route them directly to the proper users. The
users could describe their
 Information needs in terms of "standing" requests and on the basis of these
a machine could determine how the incoming "take" should be
disseminated. Since automatic dissemination is only a special aspect of a
mechanized library
 System, it follows that automatic indexing also would allow incoming
documents to be indexed and thus identified for subsequent retrieval.
 Basic Notions: This approach to the problem of automatic indexing is a
statistical one. It is based on the rather straightforward notion that the
individual words in a document function. The fundamental thesis says, in
effect, that statistics on kind, frequency, location, order, etc.,
 Words and Predictions: Concerning the selection of clue words, how
shall we decide which words convey the most information, how many
different words should be used, etc.? Clearly, certain content-bearing words
such as "electron" and "transistor" are better clues than logical type words
such as "if", and "then", etc.
 The Empirical Test: First a corpus of documents was selected and
indexed using a set of subject categories created for the purposes of the
experiment. The design, execution, results and evaluation of this test are
examined in the following sections.
Automatic indexing is the process of analyzing an item to extract the
Information to be permanently kept in an index. This text categorizes the
indexing techniques into statistical, natural language, concept, and hypertext
linkages.
 Statistical strategies: Statistical strategies cover the broadest range of
indexing techniques and are the most prevalent in commercial systems. The
words/phrases are the domain of searchable values.
 Natural Language: Natural Language approaches perform the similar
processing token identification as in statistical techniques, but then
additionally perform varying levels of natural language parsing of the item
(e.g., present, past, future actions).
 Concept index: Concept indexing uses the words within an item to
correlate to concepts discussed in the item. This is a generalization of the
specific words to values used to index the item.
 Hypertext linkages: Finally, a special class of indexing can be defined
by creation of hypertext linkages. These linkages provide virtual threads of
concepts between items versus directly defining the concept within an item.
Conclusion:
 Automatic indexing is the preprocessing stage allowing search of items
in an Information Retrieval System. Its role is critical to the success of
searches in finding relevant items. If the concepts within an item are not
located and represented in the index during this stage, the item is not
found during search. Some techniques allow for the combinations of
data at search time to equate to particular concepts (i.e.post co-
ordination).
Thank you

More Related Content

What's hot

Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
baradhimarch81
 
INFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.LINFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.L
anujessy
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
silambu111
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
Selman Bozkır
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
silambu111
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 

What's hot (20)

Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
INFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.LINFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.L
 
IRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptxIRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptx
 
Digital library
Digital libraryDigital library
Digital library
 
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Web search vs ir
Web search vs irWeb search vs ir
Web search vs ir
 
Term weighting
Term weightingTerm weighting
Term weighting
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
basis of infromation retrival part 1 retrival tools
basis of infromation retrival part 1 retrival toolsbasis of infromation retrival part 1 retrival tools
basis of infromation retrival part 1 retrival tools
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
 
Signature files
Signature filesSignature files
Signature files
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDFCS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
 

Similar to Automatic indexing

Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
stilliegeorgiana
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
anhcrowley
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
unyil96
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
ijcsity
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
Waqas Tariq
 
Hendrik flash talk metadata creation 2010 05-19
Hendrik flash talk metadata creation 2010 05-19Hendrik flash talk metadata creation 2010 05-19
Hendrik flash talk metadata creation 2010 05-19
Trinity College Dublin
 

Similar to Automatic indexing (20)

Hci
HciHci
Hci
 
Content analysis
Content analysisContent analysis
Content analysis
 
Content analysis
Content analysisContent analysis
Content analysis
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And Applications
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
 
Text mining
Text miningText mining
Text mining
 
G04124041046
G04124041046G04124041046
G04124041046
 
Empowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic EnrichmentEmpowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic Enrichment
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
 
Hypertext
HypertextHypertext
Hypertext
 
Social Media and Text Analytics
Social Media and Text AnalyticsSocial Media and Text Analytics
Social Media and Text Analytics
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
 
Hendrik flash talk metadata creation 2010 05-19
Hendrik flash talk metadata creation 2010 05-19Hendrik flash talk metadata creation 2010 05-19
Hendrik flash talk metadata creation 2010 05-19
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Automatic indexing

  • 2. Automatic indexing is indexing made by algorithmic procedures. The algorithm works on a database containing document representations (which may be full text representations or bibliographical records or partial text representations and in principle also value added databases). Automatic indexing may also be performed on non-text databases, e.g. images or music. This statistical technique: Involves (1) the determination of certain probability relationships between individual content-bearing words and subject categories, and (2) the use of these relationships to predict the category to which a document containing the words belongs.
  • 3. The basic and simplest concept of automatic indexing developed in the 1950s was the KWIC or Keyword in Context index based on permutations of significant words in titles, abstracts or full text -- manipulated by machine. The first major report on the application of this indexing concept occurred at the International Conference on Scientific Information (ICSI) held in Washington, D. C. in November of 1958. The paper was not the sensational product; the actual demonstration of the method was the sensation of the conference.
  • 4.  At the risk of getting ahead of ourselves and in view of the obvious information explosion that our scientific and intelligence communities surely face, let us point out what successful automatic indexing could mean.  First, we seem to be rapidly approaching the time when along with the printed page there will be an associated tape of corresponding information ready for direct input to a computing machine.  This means that as each organization receives its daily incoming documents a machine could read them and route them directly to the proper users. The users could describe their  Information needs in terms of "standing" requests and on the basis of these a machine could determine how the incoming "take" should be disseminated. Since automatic dissemination is only a special aspect of a mechanized library  System, it follows that automatic indexing also would allow incoming documents to be indexed and thus identified for subsequent retrieval.
  • 5.  Basic Notions: This approach to the problem of automatic indexing is a statistical one. It is based on the rather straightforward notion that the individual words in a document function. The fundamental thesis says, in effect, that statistics on kind, frequency, location, order, etc.,  Words and Predictions: Concerning the selection of clue words, how shall we decide which words convey the most information, how many different words should be used, etc.? Clearly, certain content-bearing words such as "electron" and "transistor" are better clues than logical type words such as "if", and "then", etc.  The Empirical Test: First a corpus of documents was selected and indexed using a set of subject categories created for the purposes of the experiment. The design, execution, results and evaluation of this test are examined in the following sections.
  • 6. Automatic indexing is the process of analyzing an item to extract the Information to be permanently kept in an index. This text categorizes the indexing techniques into statistical, natural language, concept, and hypertext linkages.  Statistical strategies: Statistical strategies cover the broadest range of indexing techniques and are the most prevalent in commercial systems. The words/phrases are the domain of searchable values.  Natural Language: Natural Language approaches perform the similar processing token identification as in statistical techniques, but then additionally perform varying levels of natural language parsing of the item (e.g., present, past, future actions).  Concept index: Concept indexing uses the words within an item to correlate to concepts discussed in the item. This is a generalization of the specific words to values used to index the item.
  • 7.  Hypertext linkages: Finally, a special class of indexing can be defined by creation of hypertext linkages. These linkages provide virtual threads of concepts between items versus directly defining the concept within an item. Conclusion:  Automatic indexing is the preprocessing stage allowing search of items in an Information Retrieval System. Its role is critical to the success of searches in finding relevant items. If the concepts within an item are not located and represented in the index during this stage, the item is not found during search. Some techniques allow for the combinations of data at search time to equate to particular concepts (i.e.post co- ordination).