The document provides guidelines for indexes and related information retrieval devices. It begins with an introduction and pre-test to assess the participants' familiarity with indexes and information retrieval techniques. It then defines what an index is and discusses the purposes and uses of indexes. The document outlines different types of indexes based on criteria such as the type of terms or objects referred to, arrangement of entries, analysis method, and more. It also discusses choosing headings, automatic indexing, and basic information retrieval techniques.
Guidelines for indexes and related information retrieval devices
1. Guidelines for indexes and
related information retrieval
devices.
Presented by:
Ms.Liziel T. Isidoro
and
Mr.Rahadel DEstresa
MLIS
2. PRE-TEST:
Do you know what an index is?
Are you familiar with the basic rules in indexing?
Have you experienced to index an article?
Do you know some information retrieval techniques?
Are you familiar with some softwares’ that can be used in indexing?
3. OBJECTIVE:
The participants should be able to know what an index, including its purpose
and how to use it. Moreover, they must learn how to do the information retrieval
through manual and systematic retrieval.
4. DEFINITION
• Index- a guide to the contents of a document or collection of documents with the same
format arranged in a searchable order such as alphabetical, classified, chronological or
numerical.
Indexing- The process of analysing the contents of a document and assigning index terms to
represent the names of persons, places, titles and subject matter of documents and for these
serve as access points in locating and retrieving information from the document; it is a vital
process for the storage, searching and retrieval of information.
Index entry – single record in an index that may consist of four parts: main heading,
subheading, locator and/or cross reference/s.
Descriptor – a term designed for use by the thesaurus to represent the aboutness of a topic in a
document.
Document – any item that contains information, either in print or non-print format, including
digital forms.
Identifier - proper name of person, object, institution/organization, process, etc.
5. • indexing language - any vocabulary, controlled or uncontrolled, used for
indexing along with the rules of usage.
• indexing system – a set of prescribed procedures (manual or machine-
operated) intended for organizing the contents of a document or knowledge
records for purposes of retrieval and dissemination.
• keyword - raw word coming from the documents that are regarded as
indexable term.
• translation – the process of converting concepts derived from the document
into a particular set of index terms usually derived from a controlled
vocabulary.
• vocabulary control - the process of organizing a list of terms for use in
indexing, along with the rules of usage.
DEFINITION
6. Purposes and Uses of Indexes
• Saves time and effort in finding information.
• Identify potentially relevant information in the
document or collection being indexed.
• Analyze concepts treated in a document to
produce appropriate index headings based on
the indexing language assigned.
• Indicate relationships among terms.
7. • Group together related topics.
• Direct the users seeking information under terms
not chosen as index headings to headings that
have been chosen.
• Suggest related topics .
• Tool for current awareness services.
Purposes and Uses of Indexes
8. Indexes by type of object referred to
a. authors: all types of document creators such as writers, composers,
illustrators, translators, editors, choreographers, artists, sculptors,
inventors
b. subjects (topics or features): topics treated in documents and/ or features
of documentary units (for example, genre, format, methodological
approach). Separate indexes are often devoted to special types of topics
such as persons, places, or corporate bodies; features, such as genres
example, poetry, drama); or notations, such as International Standard
Numbers (ISBN).
Types of Indexes
(NISO-TR02-1997)
9. Indexes by type of term used for headings
a. names: proper nouns, such as names of persons, places, corporate bodies.
b. numbers or notations: numerical or coded designations, such as classification notation,
patent number, ISBN, date.
c. words and phrases: common words and phrases (as opposed to names or proper nouns).
(NISO-TR02-1997)
10. Indexes by type or extent of indexable matter on which an index is based
a. full text of document
b. abstracts
c. titles only
d. first lines only (for example, first lines of poems)
e. citations(reference citations to other documents
(NISO-TR02-1997)
11. Indexes by arrangement of entries
a. alphabetical or alphanumeric
b. classified: headings arranged on the basis of relations among concepts represented
by headings, for example, hierarchy, inclusion, chronology, or other association.
Classified indexes are often based on existing classification schemes, such as the
Dewey Decimal Classification.
c. alphabetico-classed: broad headings arranged alphabetically. Narrower headings are
grouped under broad headings and arranged alphanumerically or relationally on the
basis of hierarchy, inclusion, chronology, or other association.
(NISO-TR02-1997)
12. Indexes by method of document analysis
a. human intellectual analysis and identification of topics and concepts expressed and/
or features manifested
b. computer algorithms designed to identify useful terms, phrases, or features
c. combination of computer-based and human analysis.
(NISO-TR02-1997)
13. Indexes by method of term selection
a. assignment of terms to represent topics and features (whether or
not the term is in the documentary unit being indexed)
b. extraction of terms from the documentary unit
c. a combination of assignment and extraction methods
(NISO-TR02-1997)
14. Indexes by method of term coordination
a. pre-coordinate combination: such as subject heading indexes, string
indexes, chain indexes, keyword indexes (including KWIC, KWOC,
KWAC indexes), rotated, and permuted indexes
b. post-coordinate combination: includes the use of Boolean
operators, proximity measures, and the combination of weighted
terms.
(NISO-TR02-1997)
15. Indexes by type, periodicity, format, genre, or medium of document(s) being
indexed
Examples are: books, monographs, periodicals, serials, poetry, fiction, short stories,
films, videos, illustrations, pictures, paintings, artifacts, software, computer readable
texts, maps, and sound recordings
Indexes by medium of index
a. printed or written
b. microform
c. electronic media, including online, CD-ROM
d. braille
(NISO-TR02-1997)
16. Indexes by periodicity of the index
a. one-time, closed-end indexes
b. continuing, open-end indexes
Indexes by authorship
a. authored: an authored index; a separately authored document distinct from the
document(s) that is (are) being indexed. It is created independently by one or more
persons through intellectual analysis of text, as distinguished from indexes that are
created solely through algorithmic analysis of text carried out electronically
b. automatically generated
(NISO-TR02-1997)
17. Choice and Forms of Headings (ISO 999)
1. Personal Names
• full form as possible
• should take the form used in the document, but if the text is not consistent, the
indexer should adopt one form
• choose the most recent, or the most commonly used form of personal name as the
heading and add “see” cross-references from other forms,
e.g. Clemens, Samuel Langhorne see Twain, Mark
• where surnames are in common used, the entry should be the surname followed
by any given name or initials
• Where surnames are not used, the name that customarily comes first should
properly be used as the entry word
e.g. Imran Khan
18. • Persons identified only by a given name or forename should be indexed under
that name, qualified if necessary, by a title of office or other distinguishing
epithet
e.g. Leonardo da Vinci
Boudicca, Queen of Iceni
• Persons normally identified by a title of honor or nobility should be indexed
under that title, expanded if necessary by their family name
e.g. Dalai Lama
First Duke of Marlborough, John Churchill
• Compound and multiple surnames, whether hyphenated or not, should be
indexed under the first part
e.g. Layzell Ward, Patricia
Perez de Cueller, Javier
Choice and Forms of Headings (ISO 999)
19. 2. Corporate Bodies
• Names of the corporate bodies should normally be indexed without transposition
e.g. British Museum
• Transposition may, however, be used if it is considered that this would help the users
of the index.
e.g. Department of Agriculture see Agriculture, Department of
J. Whitaker & Sons see Whitaker (J) & Sons
• Choose the most recent or the most commonly used form of corporate name as the
main heading and add “see” cross references from other forms
e.g. John Moores University see Liverpool John Moores University
Liverpool John Moores University
Choice and Forms of Headings (ISO 999)
20. 3. Geographic Names
• should be full as necessary for clarity, with additions to avoid confusion with the otherwise identical
names
e.g Alaminos (Laguna)
Alaminos (Pangasinan)
• An article or preposition should be retained in a geographic name of which it forms an integral part
e.g. La Paz
Las Vegas
• Where the article or preposition does not form an integral part of a name it should be omitted, e.g.
e.g New Forest rather than The New Forest
Rheinfall rather than Der Rheinfall
Choice and Forms of Headings (ISO 999)
21. 4. Titles of documents
• should normally be italicized, underlined or otherwise distinguished. If necessary for identification, names of creators,
places of publication dates or other qualifiers may be added within parenthesis.
e.g. Ave Maria (Gounod)
Ave Maria (Schubert)
Ave Maria (Verdi)
• In an English index, articles in titles are conventionally transposed to the end of the heading so that filing order is
explicit.
e.g. Hunting of the Snark, The
Kapital, Das
• A preposition at the beginning of the title should be retained
e.g. To the Lighthouse
Choice and Forms of Headings (ISO 999)
22. 5. First lines of poems
Conventionally in an index of first lines of poems, the
article is retained without transposition and is recognized
for purpose of alphabetical arrangement
e.g. A little thing in the snow
The modest Rose puts forth a thorn
Choice and Forms of Headings (ISO 999)
23. Automatic Indexing
Refers to indexing by machine, or the analysis of text by means of computer
algorithms.
Most automatic indexing system are not really “automatic” in the sense of
substituting computers for humans, but intended to assist the human indexer.
A better term for these system is “machine-aided”.
25. Information retrieval is the activity of obtaining information resources relevant to an
information need from a collection of information resources.
An information retrieval process begins when a user enters a query into the system.
Queries are formal statements of information needs.
User queries are matched against the database information. Depending on the
application the data objects may be, for example, text documents, images, audio,
mind maps or videos.
Introduction
26. Every online database, every search engine, everything that is searched
online is based in some way or another on principles developed in IR
IR is at the heart of searching used in systems such as DIALOG, LexisNexis & others
Understanding the basics of IR is a prerequisite for understanding how
searching of online systems works.
Why IR?
27. Information retrieval can be divided into several major constitutes which include:
1. Database
2. Search mechanism
3. Language
4. Interface
Major Components of IR
28. A system whose base, whose key concepts, is simply a particular way of
handling data & its objective is to record and maintain information.
Database
29. Information organized systematically that can be searched and retrieved when a
corresponding search mechanism is provided.
Search procedures can be categorized as basic or advance search procedure.
Capacity of search mechanism determines what retrieval techniques will be
available to users and how information stored in databases can be retrieved.
Search mechanism
30. Information relies on language when being processed, transferred or
communicated.
Language can be identified as natural language and controlled vocabulary.
Language
31. Interface regularly considered whether or not an
information retrieval system is user friendly.
Quality of interface checked by interaction
mode
Determines the ultimate success of a system
for information retrieval
Interface
33. Logical operations are also known as Boolean Logic. When Boolean logic is applied to
information retrieval, the three operators, called Boolean operators.
The AND operate for narrowing down a search
The OR operate for broadening a search
The NOT operator for excluding unwanted results
Boolean Searching
35. Text sometimes exhibits case sensitivity; that is, words can differ in meaning based on
differing use of uppercase and lowercase letters. Words with capital letters do not always
have the same meaning when written with lowercase letters.
For example, Bill is the first name of former U.S. president William Clinton, who could
sign a bill
The opposite term of "case-sensitive" is "case-insensitive“
For example, Google searches are generally case-insensitive and Gmail is case-
sensitive by default.
Case sensitivity searching
36. Truncation allows a search to be conducted for all the different forms of a word
having the same common roots
Used symbol (Question mark? , asterisk* and pound sign # ) for truncation
purpose.
A number of different options are available for truncation like Left truncation,
Right truncation and middle truncation.
Truncation
37. Left truncation retrievals all the words having the same characteristics at the
right hand part, for example, *hyl will retrieval words such as “methyl” and “ethyl”
Right truncation, for example the term of Network* as a query results in
retrieving documents on networks and networking.
Similarly middle truncation retrieval the words having the same characteristics
at the left hand and right hand part, for example, “Colo*r” will retrieval both the
term “colour” and “color”.
Cont.…
41. A proximity search allows you to specify how close two (or more) words must be to
each other in order to register a match.
There are three types of proximity searches:
Word proximity
Sentence proximity
Paragraph proximity
Proximity searching
42. It is most useful with numerical information. The following options are usually
available for range searching
greater than (>) less than (<)
equal to (=)
not equal to (/= or o)
greater than equal to (>=)
less than or equal to (<=)
Range searching
43. To search for documents or items that contain numbers within a range, type your search
term and the range of numbers separated by two periods (“..”). For example, to search for
pencils that costs between $1.50 and $2.50, type the following:
Example of Range Searching
45. Online information retrieval systems allow the user to search databases located remotely
with the help of the computer and telecommunication technology.
Basic searching techniques
Advanced retrieval techniques
Examples:
Library of Congress, University of Punjab Library
Online systems
46. CD-ROM systems are usually searched locally and it works if the systems are not
networked.
Basic retrieval techniques are supported in CD-ROM systems while advanced search
facilities are applied in limited scope.
The data which is stored on compact disc (CD) can to read by any computer operating
systems and any CD-ROM drive.
Example:
LISA
CD-ROM systems
47. Online public access catalogs (OPACs) are traditional catalogs executed in a different medium.
Different features of OPACs are
First, OPACs contains bibliographic information about library resources.
Second, OPACs can be considered as an extension of MARC records.
Third, OPACs support at least field searching, keyword searching and Boolean searching.
Examples
Library of congress catalogue
University of Punjab online catalogue
OPAC
48.
49. It deals with text as well as multimedia information resources that are
linked with other documents and there is no target user’s community
as such.
Basically web is a platform where anyone from anywhere can
publish virtually any information, in any language or in any format.
Examples,
Google, Alta Vista
Web information Retrieval Systems
50. MAJOR STANDALONE INDEXING
SOFTWARE:
MACREX- by Macrex Indexing Services
(www.macrex.com).
Developed by Hilary and Drusilla Calvert in
the United Kingdom.
It is a tool similar to a word-processor for
professional indexers, who create the
entries themselves.
MACREX produces consistency and helps
the indexer to save time.
MACREX is not an a machine indexing
program, and will NOT create an index
automatically from a given text.
51. CINDEX
produced by Indexing Research
(www.indexres.com) and founded by Frances
Lennie in 1985.
Cindex is a uniquely capable program for
preparing indexes to books, newspapers and
other periodical publications.
Cindex is available for Windows and for Mac.