Automatic indexing
Upcoming SlideShare
Loading in...5
×
 

Automatic indexing

on

  • 203 views

 

Statistics

Views

Total Views
203
Views on SlideShare
203
Embed Views
0

Actions

Likes
1
Downloads
3
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Shortly and very clearing in the automatic indexing
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Automatic indexing Automatic indexing Presentation Transcript

  • Pondicherry University Dhatchayani M Department: LIS Course: MLIS, 2ND Year
  • Automatic indexing is indexing made by algorithmic procedures. The algorithm works on a database containing document representations (which may be full text representations or bibliographical records or partial text representations and in principle also value added databases). Automatic indexing may also be performed on non-text databases, e.g. images or music. This statistical technique: Involves (1) the determination of certain probability relationships between individual content-bearing words and subject categories, and (2) the use of these relationships to predict the category to which a document containing the words belongs.
  • The basic and simplest concept of automatic indexing developed in the 1950s was the KWIC or Keyword in Context index based on permutations of significant words in titles, abstracts or full text -- manipulated by machine. The first major report on the application of this indexing concept occurred at the International Conference on Scientific Information (ICSI) held in Washington, D. C. in November of 1958. The paper was not the sensational product; the actual demonstration of the method was the sensation of the conference.
  •  At the risk of getting ahead of ourselves and in view of the obvious information explosion that our scientific and intelligence communities surely face, let us point out what successful automatic indexing could mean.  First, we seem to be rapidly approaching the time when along with the printed page there will be an associated tape of corresponding information ready for direct input to a computing machine.  This means that as each organization receives its daily incoming documents a machine could read them and route them directly to the proper users. The users could describe their  Information needs in terms of "standing" requests and on the basis of these a machine could determine how the incoming "take" should be disseminated. Since automatic dissemination is only a special aspect of a mechanized library  System, it follows that automatic indexing also would allow incoming documents to be indexed and thus identified for subsequent retrieval.
  •  Basic Notions: This approach to the problem of automatic indexing is a statistical one. It is based on the rather straightforward notion that the individual words in a document function. The fundamental thesis says, in effect, that statistics on kind, frequency, location, order, etc.,  Words and Predictions: Concerning the selection of clue words, how shall we decide which words convey the most information, how many different words should be used, etc.? Clearly, certain content-bearing words such as "electron" and "transistor" are better clues than logical type words such as "if", and "then", etc.  The Empirical Test: First a corpus of documents was selected and indexed using a set of subject categories created for the purposes of the experiment. The design, execution, results and evaluation of this test are examined in the following sections.
  • Automatic indexing is the process of analyzing an item to extract the Information to be permanently kept in an index. This text categorizes the indexing techniques into statistical, natural language, concept, and hypertext linkages.  Statistical strategies: Statistical strategies cover the broadest range of indexing techniques and are the most prevalent in commercial systems. The words/phrases are the domain of searchable values.  Natural Language: Natural Language approaches perform the similar processing token identification as in statistical techniques, but then additionally perform varying levels of natural language parsing of the item (e.g., present, past, future actions).  Concept index: Concept indexing uses the words within an item to correlate to concepts discussed in the item. This is a generalization of the specific words to values used to index the item.
  •  Hypertext linkages: Finally, a special class of indexing can be defined by creation of hypertext linkages. These linkages provide virtual threads of concepts between items versus directly defining the concept within an item. Conclusion:  Automatic indexing is the preprocessing stage allowing search of items in an Information Retrieval System. Its role is critical to the success of searches in finding relevant items. If the concepts within an item are not located and represented in the index during this stage, the item is not found during search. Some techniques allow for the combinations of data at search time to equate to particular concepts (i.e.post co- ordination).
  • Thank you