Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Technology, Business
  • John Buffi is a retired police offer who lost his home to Superstorm Sandy. He now uses the "Demolisher" system to help take care of his 91-year-old father and children. John says: "My only statement is "WOW"...I thought your other systems were special but this is going to turn out to be the " Holy Grail" of all MLB systems, no doubt! 
    Are you sure you want to  Yes  No
    Your message goes here
  • Want to preview some of our plans? You can get 50 Woodworking Plans and a 440-Page "The Art of Woodworking" Book... Absolutely FREE 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ♥♥♥ ♥♥♥
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❶❶❶ ❶❶❶
    Are you sure you want to  Yes  No
    Your message goes here
  • Odd Morning Hack Helps Mom of 3 Lose 62lbs (See Before/After pics) ◆◆◆
    Are you sure you want to  Yes  No
    Your message goes here


  1. 1. INDEXING
  2. 2. <ul><li>Definition of Terms </li></ul><ul><li>indexing - the process of providing in-depth access to information contained within a document or knowledge record. </li></ul><ul><li>index - a guide to the contents of a document or collection of documents with the same format arranged in a searchable order such as alphabetical, classified, chronological or numerical. </li></ul><ul><li>index entry – single record in an index that may consist of four parts: main heading, subheading, locator and/or cross reference/s. </li></ul><ul><li>descriptor – a term designed for use by the thesaurus to represent the aboutness of a topic in a document. </li></ul><ul><li>document – any item that contains information, either in print or non-print format, including digital forms. </li></ul><ul><li>identifier - proper name of person, object, institution/organization, process, etc. </li></ul>
  3. 3. <ul><li>indexing language - any vocabulary, controlled or uncontrolled, used for indexing along with the rules of usage. </li></ul><ul><li>indexing system – a set of prescribed procedures (manual or machine-operated) intended for organizing the contents of a document or knowledge records for purposes of retrieval and dissemination. </li></ul><ul><li>keyword - raw word coming from the documents that are regarded as indexable term. </li></ul><ul><li>qualifier - a term or phrase added to a heading to distinguish among homographs or clarify meaning. </li></ul><ul><li>translation – the process of converting concepts derived from the document into a particular set of index terms usually derived from a controlled vocabulary. </li></ul><ul><li>vocabulary control - the process of organizing a list of terms for use in indexing, along with the rules of usage. </li></ul>
  4. 4. Development of Indexes and Indexing <ul><li>First systematic organization of written records occurred in Sumer around 3, 000 B.C. </li></ul><ul><li>Around 2, 000 B.C. in China and India, record keeping became part of the society. </li></ul><ul><li>Early civilizations proposed schemes of knowledge classification and document arrangement (e.g. Greeks used some sort of alphabetic order). </li></ul><ul><li>In 900 A.D., an encyclopedia was arranged in alphabetical order. </li></ul><ul><li>During the 15th century, books were published with blank pages and quite wide margins. </li></ul>
  5. 5. <ul><li>The 17th century brought a new type of information tool, the periodical. </li></ul><ul><li>During the 19th century also, Paul Otlet and Henry La Fontaine founded the International Institute of Bibliography to improve indexing approaches to scholarly literature. This led to modern keyword and free-text indexing. </li></ul><ul><li>In 1900, H.W. Wilson first published Reader’s Guide to Periodical Literature. </li></ul><ul><li>In the 1950s, W.F. Poole published an index that covered numerous issues of many periodicals. </li></ul><ul><li>By the 1950s, computers penetrated the indexing arena and efforts to evaluate indexing begun. </li></ul>
  6. 6. Role of Indexing in Information Retrieval <ul><li>Relationship of Indexing, Abstracting and Searching </li></ul><ul><li>(Cleveland and Cleveland, 2001, p. 31)‏ </li></ul>DOCUMENT INDEX ABSTRACT PATRON INDEXING TOOL
  7. 7. Information Retrieval System <ul><li>Information retrieval system is a mechanism for carrying out the functions of information retrieval process. </li></ul><ul><li>Organization of information may take in different forms (manual, by the use of computer or a combination of both). </li></ul><ul><li>Most challenging problem: providing for the nearest possible response or coincidence </li></ul><ul><li>Modern information retrieval systems: data retrieval, reference retrieval and text retrieval. </li></ul>
  8. 8. <ul><li>Functions involved: </li></ul><ul><li>1. The information is created and acquired for the system. </li></ul><ul><li>2. Knowledge records are analyzed and tagged by set of index terms. </li></ul><ul><li>3. The knowledge records are stored physically and index terms are stored into a structured file. </li></ul><ul><li>4. The user’s query is tagged with sets of index terms and then is matched against tagged records. </li></ul><ul><li>5. Matched documents are retrieved for review. </li></ul><ul><li>6. Feedback may lead to several reiterations of the search. </li></ul>Information Retrieval System
  9. 9. Feedback may lead to several reiterations of the search... Request is conceptually analyzed Request is translated into system's index language A searching strategy is composed Search is carried out Search is completed Is user satisfied? Stop Reformation of the request Are all searching options depleted? User espresses an inforamtion need
  10. 10. Purposes and Uses of Indexes <ul><li>Saves time and effort in finding information. </li></ul><ul><li>Identify potentially relevant information in the document or collection being indexed. </li></ul><ul><li>Analyze concepts treated in a document to produce appropriate index headings based on the indexing language assigned. </li></ul><ul><li>Indicate relationships among terms. </li></ul>
  11. 11. <ul><li>Group together related topics. </li></ul><ul><li>Direct the users seeking information under terms not chosen as index headings to headings that have been chosen. </li></ul><ul><li>Suggest related topics . </li></ul><ul><li>Tool for current awareness services. </li></ul>Purposes and Uses of Indexes
  12. 12. Types of Indexes <ul><li>by arrangement </li></ul><ul><li>a. Alphabetical index </li></ul><ul><li>Advantage: </li></ul><ul><li>More convenient to use and follows an order that is familiar to users. </li></ul><ul><li>Drawbacks: </li></ul><ul><li>synonymy </li></ul><ul><li>scattering of entries </li></ul>
  13. 13. <ul><li>b. Classified index </li></ul><ul><li>Advantages: </li></ul><ul><li>useful for generic searches. </li></ul><ul><li>Brings similar things together. </li></ul><ul><li>Drawbacks: </li></ul><ul><li>Most users find them difficult to use. </li></ul><ul><li>Needs a secondary file. </li></ul><ul><li>One cannot enter it directly as one can with alphabetical sequences of names. </li></ul>
  14. 14. <ul><li>c. Concordance </li></ul><ul><li>Uses: </li></ul><ul><li>Locate a partly or completely remembered passage </li></ul><ul><li>Drawback: </li></ul><ul><li>Searching is difficult since this type of index spreads similar entries over many synonymous terms, ignores misspellings, and confuses any general-specific term relationships. </li></ul><ul><li>d. Numerical or serial order* </li></ul><ul><li>e.g. Numerical Patent Index of Chemical Abstracts; American Statistics Office </li></ul>
  15. 15. Nelson’s Complete Concordance of the Revised Standard Version Bible <ul><li>AARON </li></ul><ul><li>“ Is there not A., your brother, the Ex. 4.14 </li></ul><ul><li>The Lord said to A., “Go into 4.27 </li></ul><ul><li>And Moses told A. all the words 4.28 </li></ul><ul><li>And A. spoke all the words which 4.30 </li></ul><ul><li>Afterward Moses and A. went to 5.30 </li></ul>
  16. 16. <ul><li>by type or form of material indexed </li></ul><ul><li>1. Book index * </li></ul><ul><li>Reasons for Preparing a Book Index </li></ul><ul><li>collects the different ways of wording the same concept. </li></ul><ul><li>filters information for the reader. </li></ul><ul><li>pinpoints information </li></ul>
  17. 17. <ul><li>Components of a book index entry: </li></ul><ul><ul><li>main heading </li></ul></ul><ul><ul><li>subheading </li></ul></ul><ul><ul><li>locator </li></ul></ul><ul><ul><li>cross references </li></ul></ul><ul><li>World Wide Web (WWW)‏ </li></ul><ul><li>browsers, 78 </li></ul><ul><li>components, 89 </li></ul><ul><li>development, 100-156 </li></ul><ul><li>see also Internet </li></ul>
  18. 18. <ul><li>2. Periodical index * </li></ul><ul><li>consistency becomes the most challenging part </li></ul><ul><li>open-ended projects </li></ul><ul><li>scope is broader </li></ul><ul><li>3. Newspaper index </li></ul><ul><li>vocabulary control becomes a paramount challenge </li></ul><ul><li>4. Audiovisual materials index </li></ul><ul><li>textual labeling is needed along with image matching </li></ul>
  19. 19. Difference between Book & Periodical Indexes <ul><li>Compiled only once and within a relatively short time and usually performed by a single person. </li></ul><ul><li>Deals with a more or less well-defined central topic. </li></ul><ul><li>A continuous process and more often performed by a team of indexers and lasting for an extended period. </li></ul><ul><li>Deals with a great variety of topics. </li></ul>
  20. 20. <ul><li>Indexing terms are almost always derived from the text. </li></ul><ul><li>Specificity is largely governed by the text itself. </li></ul><ul><li>Terminology must be consistent and derived from a controlled vocabulary. </li></ul><ul><li>Terms are prescribed by a controlled vocabulary and their level of specificity may be lower than the book index. </li></ul>Difference between Book & Periodical Indexes
  21. 21. <ul><li>Every single page of a book must be read. </li></ul><ul><li>Entire text is virtually subject to indexing. </li></ul><ul><li>Always bound with the indexed text. </li></ul><ul><li>Articles are scanned for indexable items and may rely on an abstract or summary compiled. </li></ul><ul><li>A periodical index will depend on a number of policy decisions. </li></ul><ul><li>Compiled separately. </li></ul>Difference between Book & Periodical Indexes
  22. 22. <ul><li>by physical form </li></ul><ul><li>card index </li></ul><ul><li>printed index </li></ul><ul><li>microform index </li></ul><ul><li>computerized index </li></ul><ul><ul><li>automatic indexing </li></ul></ul><ul><ul><li>computer-assisted indexing </li></ul></ul>
  23. 23. Principles and Concepts of Indexing <ul><li>Exhaustivity – refers to the extent to which concepts are made retrievable by means of index terms. </li></ul><ul><li>1.1 Summarization </li></ul><ul><li>1.2 Depth indexing </li></ul><ul><li>2. Specificity – refers to the extent to which a concept or topic in a document is identified by a percise term in the hierarchy of its genus-species relationship. </li></ul><ul><li>Example: </li></ul><ul><li>An information resource about musicians should be entered under ‘ Musicians ’ and not under ‘Performing Artists’ . </li></ul><ul><li>3. Consistency – refers to the extent of the agreement exists on the terms to be used to index some documents. </li></ul><ul><li>Types of consistency level: </li></ul><ul><li>inter-indexer consistency </li></ul><ul><li>intra-indexer consistency </li></ul>
  24. 24. Indexing Languages <ul><li>Purposes and Uses </li></ul><ul><li>a system for naming or identifying subjects contained in a document. </li></ul><ul><li>as a tool for communication </li></ul><ul><li>Features/Characteristics </li></ul><ul><li>Vocabulary – refers to terms selected from the indexing of concepts. </li></ul><ul><li>Syntactics – refers to the combination and modification of terms to form headings and multilevel headings or to form search statements. </li></ul><ul><li>Example: Employees, Training of; Training of employees </li></ul><ul><li>Semantics – the study of meaning as expressed in communication such as words. </li></ul>
  25. 25. <ul><li>Semantic relationships are categorized into: </li></ul><ul><li>Equivalence relationship – implies that there will be more than one term denoting the same concept. </li></ul><ul><ul><li>Synonyms </li></ul></ul><ul><ul><li>Quasi-synonyms </li></ul></ul><ul><ul><li>Preferred spelling </li></ul></ul><ul><ul><li>Acronyms and abbreviations </li></ul></ul><ul><ul><li>Current and established terms </li></ul></ul><ul><ul><li>Translation </li></ul></ul>
  26. 26. <ul><li>Hierarchical relationship </li></ul><ul><ul><li>Genus – species relationship (represents class inclusion </li></ul></ul><ul><li>Example: </li></ul><ul><li>Agro industry  Food Industry  Meat Industry </li></ul><ul><ul><li>Whole - part relationship </li></ul></ul><ul><li>Example: </li></ul><ul><li>Foot  Toes </li></ul><ul><li>Affinitive relationship – displayed with the use of related terms </li></ul><ul><li>Example: </li></ul><ul><li>Men – Women </li></ul><ul><li>Education – Teaching </li></ul>
  27. 27. <ul><li>1. Na tural language ( derived-term system )‏ </li></ul><ul><li>Characteristics are: </li></ul><ul><li>Improves recall because it provides more access points but reduces precision </li></ul><ul><li>Redundancy is greater </li></ul><ul><li>Uses more current terms </li></ul><ul><li>Tends to be favored by subject-specialists or the end-users </li></ul><ul><li>May also be called indexing by extraction (or extractive indexing method). </li></ul>Types of Indexing Languages
  28. 28. <ul><li>2. Controlled vocabulary ( assigned-term system )‏ </li></ul><ul><li>Functions: </li></ul><ul><li>To control synonyms by choosing one form as the standard term </li></ul><ul><li>To make distinctions among homographs </li></ul><ul><li>To bring or link together terms that are closely related </li></ul><ul><li>Establishes the size of scope of a term </li></ul><ul><li>Usually records hierarchical and affinitive/associative relations </li></ul><ul><li>Controls variant spellings </li></ul>
  29. 29. <ul><li>Syndetic devices used by a controlled vocabulary: </li></ul><ul><li>USE and UF (use for) for synonyms </li></ul><ul><li>BT (broader term), NT (narrower term) and RT (related term) for differing levels of specificity and certain near synonyms and antonyms </li></ul>
  30. 30. <ul><li>Advantages of Controlled Vocabulary Language </li></ul><ul><li>Increases the probability that both indexer and searcher will express a particular concept in the same way. </li></ul><ul><li>Increases the probability that the same term will be used by different indexers or by the same indexer at different times. </li></ul><ul><li>Helps searchers to focus their thoughts when they approach the information system without a full and precise realization of what information they need. </li></ul>
  31. 31. <ul><li>Disadvantages of Controlled Vocabulary Language : </li></ul><ul><li>Incompatibility of different indexing languages. </li></ul><ul><li>High input cost. </li></ul><ul><li>The possibility of inadequate vocabulary. </li></ul>
  32. 32. <ul><li>1. Authority List / Subject Authority List </li></ul><ul><li>Examples: </li></ul><ul><ul><ul><li>Library of Congress Subject Headings </li></ul></ul></ul><ul><ul><ul><li>Sears List of Subject Headings </li></ul></ul></ul><ul><ul><ul><li>Dewey Decimal Classification </li></ul></ul></ul><ul><li>2. Thesaurus </li></ul><ul><li>Latin word means ‘treasure’ </li></ul><ul><li>Poly-hierarchical </li></ul><ul><li>Examples: </li></ul><ul><ul><ul><li>The Art & Architecture Thesaurus* </li></ul></ul></ul><ul><ul><ul><li>ERIC (Education Resouces Information Center) Thesaurus* </li></ul></ul></ul>Types of Controlled Vocabulary
  33. 33. <ul><li>Similarities between Authority Lists and Thesauri </li></ul><ul><li>Both attempts to provide subject access to information resources by providing terminology that can be consistent rather than uncontrolled and unpredictable. </li></ul><ul><li>Both choose preferred terms and make references from non-used terms. </li></ul><ul><li>Both provide hierarchies so that terms are presented in relation to their broader, narrower, and related terms. </li></ul>
  34. 34. <ul><li>Difference between Authority Lists and Thesauri </li></ul><ul><li>Thesauri are made up of single terms and bound terms representing single concepts. Subject heading lists have phrases and other pre-coordinated terms in addition to single terms. </li></ul><ul><li>Thesauri are more strictly hierarchical. </li></ul><ul><li>Thesauri are narrow in scope. </li></ul><ul><li>Thesauri are more likely multilingual. </li></ul>
  35. 35. <ul><li>Relationships of Terms </li></ul><ul><li>INTELLIGENCE </li></ul><ul><li>BT: Ability </li></ul><ul><li>NT: Comprehension </li></ul><ul><li>RT: Talent </li></ul><ul><li> Aptitude </li></ul><ul><li>Broader term (BT) reference shows hierarchical relationship upward in the classification tree. </li></ul><ul><li>Narrower term (NT) reference is similar to the broader term reference, except it goes down in the classification tree. </li></ul><ul><li>Related term (RT) reference refers to a descriptor that can be used in addition to the basic term but is not in a hierarchical relationship. </li></ul><ul><li>Use reference refers to a preferred descriptor from a non-usable term. </li></ul><ul><li>Use for (UF) reference deals primarily with synonymous or variant forms of the preferred descriptor. It is also used to lead the indexer to more general terms. </li></ul><ul><li>Scope Note (SN) is used to give the users about the descriptor’s usage restrictions or to clarify ambiguity. </li></ul>
  36. 36. Construction of a Thesaurus <ul><ul><li>Identify the subject field. </li></ul></ul><ul><ul><li>Identify the nature of literature to be indexed. </li></ul></ul><ul><ul><li>Identify the users. </li></ul></ul><ul><ul><li>Identify the file structure. Will this be a pre-coordinate or post-coordinate system? </li></ul></ul><ul><ul><li>Consult published indexes, glossaries, dictionaries, and other tools in the subject areas for the raw vocabulary. </li></ul></ul><ul><ul><li>Cluster the terms. </li></ul></ul><ul><ul><li>Establish term relationships. </li></ul></ul>
  37. 37. Indexing Systems <ul><li>1. Coordinate indexes – an indexing scheme that combines single index terms to create composite subject concepts </li></ul><ul><li>Types: </li></ul><ul><li>post-coordinate indexing </li></ul><ul><li>pre-coordinate indexing </li></ul>
  38. 38. <ul><li>2. Classified indexes – contents are arranged systematically by classes or subject headings. </li></ul><ul><li>2.1 Enumerative indexes </li></ul><ul><ul><li>Both DDC, LCC, and UDC are examples of enumerative classifications. </li></ul></ul><ul><ul><li>Enumerative classifications are top-down methods of analysis. </li></ul></ul>
  39. 39. <ul><li>2.2 Faceted indexes </li></ul><ul><li>often called as analytico-synthetic system. A facet analysis is a tightly controlled process by which simple concepts are organized into carefully defined categories by connecting class numbers of the basic concepts. </li></ul><ul><li>Bottom-up systems. </li></ul><ul><li>Is pre-coordinated at the time of indexing and is arranged in classification order rather than a straight alphabetical order. </li></ul><ul><li>Shiyali Ramamrita Ranganathan in 1930s </li></ul><ul><li>Example: When indexing a cookbook, some important facets might be: </li></ul><ul><li>Holidays </li></ul><ul><li>Ingredients </li></ul><ul><li>Recipe Titles </li></ul><ul><li>Techniques </li></ul>
  40. 40. <ul><li>3. Chain indexes </li></ul><ul><li>Provide that every concept becomes linked, or chained. </li></ul><ul><li>Introduced by S.R. Ranganathan as part of his Colon Classification, the system uses “synthesis” or “number building” . The number that represents some complex subject is arrived at by joining the notational elements that represent more elemental subjects. </li></ul>
  41. 41. <ul><li>Example of a Chain Index </li></ul><ul><li>Topic: Victorian period English Poetry (821.8)‏ </li></ul><ul><li>Hierarchy: </li></ul><ul><li>8 Literature </li></ul><ul><li> 2 English </li></ul><ul><li>1 Poetry </li></ul><ul><li> .8 Victorian period </li></ul><ul><li>Chain index entries that will be generated are the following: </li></ul><ul><li>Victorian period: Poetry: English: Literature 821.8 </li></ul><ul><li>Poetry: English: Literature 821 </li></ul><ul><li>English: Literature 820 </li></ul><ul><li>Literature 800 </li></ul>
  42. 42. <ul><li>4. Permuted title indexes </li></ul><ul><li>Advantages: </li></ul><ul><li>minimum cost </li></ul><ul><li>does not need the expertise of a professional indexer because it is entirely done by a computer </li></ul><ul><li>Disadvantages: </li></ul><ul><li>titles may not accurately reflect the content of the item </li></ul><ul><li>limited number of terms restrict complete subject indication </li></ul><ul><li>most of the title indexes are unappealing to the eye </li></ul><ul><li>can increase the retrieval of irrelevant documents </li></ul><ul><ul><li>usually employ stop-lists </li></ul></ul><ul><li>Scattering of synonyms and generic terms usually cause user frustration and missed entries. </li></ul>
  43. 43. <ul><li>4.1 KWIC (keyword in context) – was introduced by Hans Peter Luhn in 1959. It is a rotated index most commonly derived from the titles of documents. Each keyword appearing in a title becomes an entry point and highlighted in some way by setting it off at the center of the page. </li></ul><ul><li>Principles of KWIC Indexing </li></ul><ul><li>Title are generally informative </li></ul><ul><li>Words extracted from the title can be used as an effective guide </li></ul><ul><li>Although the meaning of an individual word viewed in isolation may be ambiguous or too general, the context surrounding the word helps to define and explain meaning. </li></ul><ul><li>Example: </li></ul><ul><li>for Croatians.   Cataloging and classification </li></ul><ul><li>Cataloging and   classification for Croatians </li></ul><ul><li>for   Croatians . Cataloging and classification </li></ul>
  44. 44. <ul><li>4.2 KWOC (keyword out of context) - A variation on the Keyword in Context Index (KWIC), in which keywords, removed from the context of the titles that contain them, appear as headings in a separate line index flush with the left margin. </li></ul><ul><li>Example: </li></ul><ul><li>Cataloging Cataloging and classification for Croatians. classification Cataloging and classification for Croatians. Croatians . Cataloging and classification for Croatians </li></ul><ul><li>*A keyword used as an entry point in KWOC index is sometimes not repeated in the title but is replaced by an asterisk (*) or some symbols. </li></ul><ul><li> </li></ul><ul><li>Example: </li></ul><ul><li>Blue-eyed * Cats in Texas ……………………. 25 </li></ul><ul><li>Cat The * and the Economy ………….. 12 </li></ul><ul><li>Cats Blue-eyed * in Texas …………..…. 13 </li></ul><ul><li>Economy The Cat and the * ……………….… 56 </li></ul><ul><li>Texas Blue-eyed Cats in * ……………..… 76 </li></ul>
  45. 45. <ul><li>4.3 KWAC (keyword alongside context ) - also produced by computer algorithm, are designed to preserve work pairs and phrases in the alphabetical sequence of keywords while at the same time imitating the traditional format with the lead term on the left. </li></ul><ul><li>Example: </li></ul><ul><li>Cataloging and classification for Croatians. classification for Croatians. Cataloging and C roatians . Cataloging and classification for </li></ul>
  46. 46. <ul><li>5. Citation indexes – lead users to papers by citations, rather than by index terms. </li></ul><ul><li>6. String indexes – a word-based system in which the indexer analyzes the various aspects of the subject treated in a document and records the aspects as words, along with “role operators” . The computer program combines these words into string of terms that represents a brief summary of the document’s content. </li></ul>
  47. 47. <ul><li>6.1 PRECIS (Preserved Context Index System)‏ </li></ul><ul><li>developed by Derek Austin for the British National Bibliography (1971-1973) in order to produce printed alphabetical subject entries. </li></ul><ul><li>principle of “context-dependency”. </li></ul><ul><li>It involves: </li></ul><ul><ul><li>Determining the subject content of the document </li></ul></ul><ul><ul><li>Analyzing the subject statement to determine the role of each significant term (action term, location item, an agent or object of the action)‏ </li></ul></ul><ul><ul><li>Determine the relationship of a term to other terms in the database and how should all these terms be linked. </li></ul></ul>
  48. 48. <ul><li>Below is an illustration on how a string of terms are organized according to the principle of context-dependency. </li></ul><ul><li>Topic: “Selection of personnel in paper industries in the Philippines”, the input string is: </li></ul><ul><li>A > B > C > D </li></ul><ul><li>or </li></ul><ul><li>Philippines > Paper industries > Personnel > Selection </li></ul>
  49. 49. <ul><li>The input string is: </li></ul><ul><li>(0) Philippines </li></ul><ul><li>(1) paper industries </li></ul><ul><li>(P) personnel </li></ul><ul><li>(2) selection </li></ul>Where (2) represents the “transition action”, (P) “object of action”, (0) “location, and (1) “key system” (object of transitive action). These operators show the role that a term plays in relation to other terms and thus can be regarded as “role indicators” or “role operators”.
  50. 50. <ul><li>Entries provided are: </li></ul><ul><li>Philippines </li></ul><ul><li> Paper industries. Personnel. Selection. </li></ul><ul><li>Paper industries. Philippines. </li></ul><ul><li> Personnel. Selection. </li></ul><ul><li>Personnel. Paper industries. Philippines. </li></ul><ul><li>Selection. </li></ul><ul><li>Selection. Personnel. Paper industries. Philippines </li></ul>
  51. 51. <ul><li>6.2 POPSI (Postulate-based Permuted Subject Indexing) </li></ul><ul><li>developed at the Documentation Research and Training Center (India)‏ </li></ul><ul><li>classification ideas of S.R. Ranganathan </li></ul><ul><li>coding used for the index string generator is based on the indicator system of Colon Classification. A comma “,” precedes the “entity” segment; a semicolon “;” is a “property segment”; a colon “:”is a process segment; a hyphen “-“is a qualifying sub segment; and a greater than “>”is a narrower term. </li></ul>
  52. 52. <ul><li>Example: The topic “study, using rabbits, of heart stimulation by antibiotics” will be placed under the discipline of pharmacology and will generate the following input string: </li></ul><ul><li>PHARMACOLOGY, CHEMICAL>DRUG>ANTIBIOTICS; STIMULATION-CIRCULATORY SYSTEM>HEART: STUDY-ANIMAL>RABBIT </li></ul>
  53. 53. <ul><li>Index strings that may be generated from the index string cited above are: </li></ul><ul><li>ANIMAL,STUDY,STIMULATION </li></ul><ul><li>PHARMACOLOGY,ANTIBIOTICS;STIMULATION-HEART:STUDY-RABBIT </li></ul><ul><li>ANTIBIOTICS,PHARMACOLOGY </li></ul><ul><li>PHARMACOLOGY,ANTIBIOTICS;STIMULATION-HEART:STUDY-RABBIT </li></ul>
  54. 54. <ul><li>6.3 NEPHIS (Nested Phrase Indexing System) – developed by Timothy C. Craven. The input string was designed to be a phrase in ordinary language. </li></ul><ul><li>Four different coding symbols are used: </li></ul><ul><li>the left and the right angular brackets (“<” and “>”) - mark the beginning and end of a phrase embedded or nested within a larger phrase </li></ul><ul><li>the question mark “?” - indicates that what follows is a connective to be included only in those index strings in which the connective has something to connect </li></ul><ul><li>the at sign “@” - indicates that what follows is not an access term ; this coding symbol is used at the beginning of the input string or at the beginning of a nested phrase. </li></ul>
  55. 55. <ul><li>Example: </li></ul><ul><li>Topic is “measures from information theory of the information content of document surrogates” </li></ul><ul><li>@MEASURES? OF<INFORMATION CONTENT?OF <DOCUMENT SURROGATES>>?FROM<INFORMATION THEORY> </li></ul><ul><li>Sample index strings that may be generated from the above input string are: </li></ul><ul><li>DOCUMENT SURROGATES. INFORMATION CONTENT. MEASURES FROM INFORMATION THEORY </li></ul><ul><li>INFORMATION CONTENT OF DOCUMENT SURROGATES. MEASURES FROM INFORMATION THEORY </li></ul><ul><li>INFORMATION THEORY. MEASURES OF INFORMATION CONTENT OF DOCUMENT SURROGATES </li></ul>
  56. 56. <ul><li>6.4 CIFT (Contextual Indexing and Faceted Taxonomic Access System) </li></ul><ul><li>– developed for the Modern Language Association (MLA), alphabetical subject entries are created from strings provided by indexers who assign facets derived from literature, linguistics and folklore. </li></ul><ul><li>Example: </li></ul><ul><li>HENDIADYS </li></ul><ul><li>English literature. Tragedy. 1500-1599 </li></ul><ul><li> Shakespeare, William. Hamlet . Use of HENDIADYS. Sources in Vigil. Linguistic approach </li></ul><ul><li>LINGUISTIC APPROACH </li></ul><ul><li>English literature. Tragedy. 1500-1599 </li></ul><ul><li>Shakespeare, William. Hamlet . Use of Hendiadys. Sources in Vigil. </li></ul><ul><li>LINGUISTIC APPROACH </li></ul>
  57. 57. Measures of Effectiveness of the Indexing System <ul><li>1. Recall measure – is a simple quantitative ratio of relevant documents retrieved to the total number of relevant documents potentially available. Recall depends on the level of exhaustivity allowed by the indexing policy. </li></ul><ul><li>Example: </li></ul><ul><li>If there are 100 relevant documents in the library that are relevant to the user’s needs and the indexing system retrieves 75, then the recall ratio is 75 out of 100 (75/100). Recall for this search is 75 percent effective. </li></ul>
  58. 58. <ul><li>2. Precision measure – is the ratio of relevant documents retrieved to the total number of documents retrieved. Relevance or precision depends on the terminology of the text being indexed and the specificity of the indexing language used. </li></ul><ul><li>Example: </li></ul><ul><li>If 100 documents are retrieved and 50 of those items are relevant to the request, the precision ratio is 50 to 100 (50/100). Precision for this search is 50 percent effective. </li></ul>
  59. 59. Subject Indexing <ul><li>Steps in subject indexing: </li></ul><ul><li>1. Recording bibliographic data </li></ul><ul><li>2. Subject determination </li></ul><ul><li>3. Conceptual analysis </li></ul><ul><li>4. Translation into standard terms using controlled vocabulary </li></ul>
  60. 60. <ul><li>1. Recording bibliographic data (author, title, publication data, etc.)‏ </li></ul><ul><li>a. When indexing printed books, pamphlets, periodicals and other printed documents, use locators that refer to the page numbers, separating locators with a comma. Example: Livingstone, Ken 1/3, 1/97, 3/56 </li></ul><ul><li>b. When indexing several issues or volumes of one title of a periodical, the indexer should take the locators from the numbering of the issues at the time of publication. </li></ul><ul><li>Example: 54/3: 38 volume/part: page </li></ul><ul><li>53, April 1998: 38 volume, date: page </li></ul><ul><li>53: 38 volume: page </li></ul><ul><li>April 1998: 38 date: page </li></ul>
  61. 61. <ul><li>c. When indexing contents of a collection of documents, locators should give complete information about each document (title of the article, the author(s), the title of the periodical, volume number and date, and the inclusive pagination for the article). </li></ul><ul><li>Example: </li></ul><ul><li>Automated Teller Machines </li></ul><ul><li>Competition spurs development of innovative bank technologies. Bus Journ. 45, Jan-Mar 2004: 13. </li></ul><ul><li> The new networks. Info Tech. Apr 2005: 76-89. </li></ul><ul><li>d. If a document treats a subject continuously in a consecutively numbered sequence, reference should be made to the first and last numbered elements only. </li></ul><ul><li>e. Exceptionally, where space constraints apply or where the locators are extremely long, e.g. 10002-10012, numbers may be deleted so that the only changed digits of the second locator are given, e.g. 10002-12. </li></ul><ul><li>f. Conventionally, the digits 10-19 in each hundred are given in full, e.g. 412-18 </li></ul>
  62. 62. <ul><li>2. Subject determination </li></ul><ul><li>“ aboutness” of the material </li></ul><ul><li>formulation of a concept list </li></ul><ul><ul><li>most appropriate to the given community of users </li></ul></ul><ul><ul><li>If necessary, modify both indexing tools and procedures as a result of feedback from inquiries </li></ul></ul><ul><ul><li>no arbitrary limit should be set to the number of terms or descriptors </li></ul></ul><ul><ul><li>concepts should be identified as specifically as possible. </li></ul></ul>
  63. 63. <ul><li>3. Content analysis </li></ul><ul><li>a. Factors that may affect content analysis: </li></ul><ul><li>Environmental situation </li></ul><ul><li>Policy decisions </li></ul><ul><li>Decisions of the indexer </li></ul><ul><li>b . Parts of the documents that have to be analyzed </li></ul><ul><li>Title </li></ul><ul><li>Abstract </li></ul><ul><li>List of contents </li></ul><ul><li>Text itself </li></ul><ul><li>Illustrations, diagrams, tables and captions. </li></ul><ul><li>Reference section </li></ul>
  64. 64. <ul><li>4. Translation into standard terms using controlled vocabulary </li></ul><ul><li>The following practices must be observed in the translation process. </li></ul><ul><ul><li>Concepts which are already translated into indexing terms should be translated into their preferred terms. </li></ul></ul><ul><ul><li>Terms which represent new concepts should be checked for accuracy and acceptability in reference tools. </li></ul></ul><ul><ul><li>If the concepts are not present in an existing thesaurus or classification scheme, these may be </li></ul></ul><ul><ul><ul><ul><li>Expressed by terms or descriptors which are admitted into indexing language </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Represented temporarily by more general terms, the new concept being proposed as candidates for later addition </li></ul></ul></ul></ul>
  65. 65. Indexing Policies and Guidelines & Production of Indexes
  66. 66. <ul><li>Indexing Procedures for Books </li></ul><ul><li>1. Examine the text carefully. </li></ul><ul><li>2. Read the text several times, page by page, to be able to analyze the contents and determine the indexable topics. </li></ul><ul><li>3. Select the topics to be indexed taking into consideration their significance to the central theme of the book. </li></ul><ul><li>4. Name the topics that were chosen to be indexed and mark up page proofs. </li></ul>
  67. 67. <ul><li>5. Alphabetize the entries. </li></ul><ul><li>6. Edit the entries </li></ul><ul><li>Decide which entries should be the main headings and which should be the subheadings </li></ul><ul><li>Decide whether certain entries will be treated as main entries or subentries </li></ul><ul><li>Example: </li></ul><ul><li>handicrafts </li></ul><ul><li>pottery making pottery making </li></ul><ul><li>weaving or weaving </li></ul><ul><li>wood carving wood carving </li></ul>
  68. 68. <ul><li>Main entries unmodified by subentries should not be followed by long rows or page numbers . </li></ul><ul><li>Subentries must be concise and informative </li></ul><ul><li>Make a final choice among synonymous terms </li></ul><ul><li>Provide adequate but not excessive cross-referencing </li></ul><ul><li>Examples: </li></ul><ul><li>Cars Trucks </li></ul><ul><li>Chevrolet, 224 Dodge Ram, 219 </li></ul><ul><li>Mazda, 146 GMC (Jimmy), 143 </li></ul><ul><li>Volkswagen Mercedes-Benz, 144 </li></ul><ul><li>See also trucks See also cars </li></ul>
  69. 69. <ul><li>Punctuation </li></ul><ul><li>a. The inversion of a phrase used as the heading in a main entry is punctuated by a comma . </li></ul><ul><li>b. If the heading is followed immediately by page references , a comma is used between the heading and the first numeral and between subsequent numerals. </li></ul><ul><li>c. If the heading is followed immediately by run-in subentries , a colon precedes the first subheading. All subsequent subentries are preceded by semicolons. For example: </li></ul><ul><li>payments, balance of: definition of, 16; </li></ul><ul><li>importance of, 19 </li></ul>
  70. 70. <ul><li>7. Determine the design of the index after the compilation of the entries </li></ul><ul><li>Decide whether subentries will follow an indented or run-in style. </li></ul><ul><li>Typography should be used to differentiate between types of headings and to distinguish them from numerals indicating volumes, parts and pages. </li></ul><ul><li>8. Typing, proofreading, and the final review. </li></ul>
  71. 71. Indexing Techniques for Periodical Articles <ul><li>1. Always index names of persons honored by awards or prizes and those eulogized in obituaries. </li></ul><ul><li>2. Every article that have permanent value should be indexed under all topics and issues dealt with. </li></ul><ul><li>3. Editorials should be indexed under their topics as any other article but differentiated from the others by the addition of (Ed.) or (E) . The titles of editorials may be indexed under a collective heading “Editorials”. </li></ul>
  72. 72. <ul><li>4. Letters to the editor if considered indexable should be indexed by topic, not under a caption that may have been assigned by the editor. It is advisable to index at least the name of the person who criticized an article as well as the author’s response. For example: </li></ul><ul><li>Doe, John. “Effect of magnetic fields” 37-43 </li></ul><ul><li>Errors (H. Smith) 75; correction 185 </li></ul><ul><li>[author’s entry] </li></ul><ul><li>Smith, Henry. “Effect of magnetic fields” </li></ul><ul><li>(John Doe pp. 37-43): errors 75 </li></ul><ul><li>[letter writer’s index entry] </li></ul>
  73. 73. <ul><li>5. Book reviews are indexed by the title of the book, followed by the name of the author, the locator, and the designation (R) unless all book reviews are listed under the class heading “Book Reviews” or in a separate index, </li></ul><ul><li>e.g. </li></ul><ul><li>Guide to reference books, 10th ed. (Sheehy) 68 (R)‏ </li></ul><ul><li>*The name of the reviewer should be included in the author name index, </li></ul><ul><li>e.g. </li></ul><ul><li>Dixon, Geoffrey 68 (R), 92-96, 123 </li></ul>
  74. 74. Choice and Forms of Headings (ISO 999) <ul><li>1. Personal Names </li></ul><ul><li>full form as possible </li></ul><ul><li>should take the form used in the document , but if the text is not consistent, the indexer should adopt one form </li></ul><ul><li>choose the most recent, or the most commonly used form of personal name as the heading and add “see” cross-references from other forms, </li></ul><ul><li>e.g. Clemens, Samuel Langhorne see Twain, Mark </li></ul><ul><li>where surnames are in common used, the entry should be the surname followed by any given name or initials </li></ul><ul><li>Where surnames are not used, the name that customarily comes first should properly be used as the entry word </li></ul><ul><li>e.g. Imran Khan </li></ul>
  75. 75. <ul><li>Persons identified only by a given name or forename should be indexed under that name, qualified if necessary, by a title of office or other distinguishing epithet </li></ul><ul><li>e.g. Leonardo da Vinci </li></ul><ul><li>Boudicca, Queen of Iceni </li></ul><ul><li>Persons normally identified by a title of honor or nobility should be indexed under that title, expanded if necessary by their family name </li></ul><ul><li>e.g. Dalai Lama </li></ul><ul><li>First Duke of Marlborough, John Churchill </li></ul><ul><li>Compound and multiple surnames, whether hyphenated or not, should be indexed under the first part </li></ul><ul><li>e.g. Layzell Ward, Patricia </li></ul><ul><li>Perez de Cueller, Javier </li></ul>
  76. 76. <ul><li>2. Corporate Bodies </li></ul><ul><li>Names of the corporate bodies should normally be indexed without transposition </li></ul><ul><li>e.g. British Museum </li></ul><ul><li>Transposition may, however, be used if it is considered that this would help the users of the index. </li></ul><ul><li>e.g. Department of Agriculture see Agriculture, Department of </li></ul><ul><li>J. Whitaker & Sons see Whitaker (J) & Sons </li></ul><ul><li>Choose the most recent or the most commonly used form of corporate name as the main heading and add “see” cross references from other forms </li></ul><ul><li>e.g. John Moores University see Liverpool John Moores University </li></ul><ul><li>Liverpool John Moores University </li></ul>
  77. 77. <ul><li>3. Geographic Names </li></ul><ul><li>should be full as necessary for clarity, with additions to avoid confusion with the otherwise identical names </li></ul><ul><li>e.g Alaminos (Laguna)‏ </li></ul><ul><li>Alaminos (Pangasinan)‏ </li></ul><ul><li>An article or preposition should be retained in a geographic name of which it forms an integral part </li></ul><ul><li>e.g. La Paz </li></ul><ul><li>Las Vegas </li></ul><ul><li>Where the article or preposition does not form an integral part of a name it should be omitted, e.g. </li></ul><ul><li>e.g New Forest rather than The New Forest </li></ul><ul><li>Rheinfall rather than Der Rheinfall </li></ul>
  78. 78. <ul><li>4. Titles of documents </li></ul><ul><li>should normally be italicized, underlined or otherwise distinguished. If necessary for identification, names of creators, places of publication dates or other qualifiers may be added within parenthesis. </li></ul><ul><li>e.g. Ave Maria (Gounod)‏ </li></ul><ul><li>Ave Maria (Schubert)‏ </li></ul><ul><li>Ave Maria (Verdi)‏ </li></ul><ul><ul><li>In an English index, articles in titles are conventionally transposed to the end of the heading so that filing order is explicit. </li></ul></ul><ul><li>e.g. Hunting of the Snark, The </li></ul><ul><li>Kapital, Das </li></ul><ul><li>A preposition at the beginning of the title should be retained </li></ul><ul><li>e.g. To the Lighthouse </li></ul>
  79. 79. <ul><li>5. First lines of poems </li></ul><ul><ul><li>Conventionally in an index of first lines of poems, the article is retained without transposition and is recognized for purpose of alphabetical arrangement </li></ul></ul><ul><li>e.g. A little thing in the snow </li></ul><ul><li> The modest Rose puts forth a thorn </li></ul>
  80. 80. Evaluation of Indexes <ul><li>Guidelines/Criteria </li></ul><ul><li>1. Subject error </li></ul><ul><ul><ul><li>Errors in choosing subject descriptors </li></ul></ul></ul><ul><ul><ul><li>Omission errors </li></ul></ul></ul><ul><ul><ul><li>Use of a too broad or too narrow term </li></ul></ul></ul><ul><li>2. Generic searching – Alphabetical indexes have always presented difficulties in promoting generic searching. </li></ul>
  81. 81. <ul><li>3. Terminology </li></ul><ul><li>4. Internal guidance </li></ul><ul><ul><ul><li>Cross-references </li></ul></ul></ul><ul><ul><ul><li>Printed instruction on how to use the index </li></ul></ul></ul><ul><li>5. Accuracy in referring </li></ul><ul><ul><ul><li>Bibliographic citation </li></ul></ul></ul><ul><ul><ul><li>Cross-references </li></ul></ul></ul><ul><li>6. Entry scattering </li></ul><ul><li>Example: </li></ul><ul><li>College libraries School libraries </li></ul><ul><li>National libraries Special libraries </li></ul><ul><li>Public libraries </li></ul>
  82. 82. <ul><li>7. Entry differentiation </li></ul><ul><li>Example: </li></ul><ul><li>Libraries, 1-2, 28-31, 42, 53-60, 82, 109-11, 131-40, 310, 342-50 </li></ul><ul><li>8. Spelling and punctuation </li></ul><ul><li>9. Filing </li></ul><ul><ul><ul><li>Letter by letter (Air base, Airborne, Air brake)‏ </li></ul></ul></ul><ul><ul><ul><li>Word by word (Air base, Airborne, Air brake)‏ </li></ul></ul></ul><ul><li>10. Layout </li></ul><ul><ul><ul><li>Main heading are in heavy print </li></ul></ul></ul><ul><ul><ul><li>Subheadings are in lighter print and small letters and indented </li></ul></ul></ul><ul><ul><ul><li>See references are italicized </li></ul></ul></ul><ul><li>11. Length and type </li></ul><ul><ul><ul><li>Index length should be 3-5% of the pages of a typical nonfiction book, about 5-8% for a history or biography and about 15-20% for reference books </li></ul></ul></ul><ul><li>12. Cost </li></ul><ul><li>13. Standards </li></ul>
  83. 83. <ul><li>International Organization for Standardization </li></ul><ul><li>ISO 2788: 1986 – Documentation – Guidelines for the establishment and development of monolingual thesauri </li></ul><ul><li>ISO 5964: 1985 – Documentation - Guidelines for the establishment and development of multilingual thesauri </li></ul><ul><li>ISO 5963: 1985 – Documentation – Methods for examining documents, determining their subjects, and selecting indexing terms </li></ul>Indexing Standards
  84. 84. Indexing Standards <ul><li>International Organization for Standardization </li></ul><ul><li>ISO 999: 1996 – Information and documentation – Guidelines for the content, organization and presentation of indexes </li></ul><ul><li>ISO 4: 1997 - Information and documentation – Rules for the abbreviation of title words and titles of publications. It publishes a List of Serial Title Word Abbreviations which includes title word abbreviations in over 50 languages. </li></ul>
  85. 85. British Standards Institution (BSI) BS 1749 : 1985 - Recommendations for alphabetical arrangement and the filing order of numbers and symbols BS 6478: 1984 Guide to filing bibliographic information in libraries and documentation BS 6529: 1984 Recommendations for examining documents, determining their subjects and selecting indexing terms BS 6723: 1985 Guide to establishment and development of multilingual thesauri BS 5723: 1987 - Guide to establishment and development of monolingual thesauri BS ISO 999: 1996 Information and Documentation – Guidelines for the content, organization and presentation of indexes
  86. 86. Automatic Indexing <ul><li>refers to indexing by machine, or the analysis of text by means of computer algorithms. The focus is on automatic methods used behind the scenes with little or no input from individual searchers, with the exception of relevance feedback. </li></ul>
  87. 87. <ul><li>Four Types of Approaches </li></ul><ul><li>(Cleveland & Cleveland, 2001, p. 211)‏ </li></ul><ul><li>Statistical – based on counts of words, statistical associations, and collation techniques that assigns weighs, cluster similar words </li></ul><ul><li>Syntactical – stresses grammar and parts of speech, identifying concepts found in designated grammatical combinations, such as noun phrases. </li></ul>
  88. 88. <ul><li>Semantic systems – concerned with the context sensitivity of words in the text. What does cat mean in terms of its context? House cats? Heavy earthmoving equipment? </li></ul><ul><li>Knowledge-based – systems goes beyond thesaurus or equivalent relationships to knowing the relationship between words, e.g. ‘ tibia’ is part of a leg, thus the document is indexed under ‘ leg injuries’ . </li></ul>
  89. 89. Human /Manual Indexing vs. Automatic Indexing <ul><li>Needs more people </li></ul><ul><li>Costly </li></ul><ul><li>Human error </li></ul><ul><li>Low in production </li></ul><ul><li>Quality can range from excellent to appalling </li></ul><ul><li>Needs less human effort </li></ul><ul><li>Cheaper </li></ul><ul><li>Follows instruction automatically </li></ul><ul><li>Accurate </li></ul><ul><li>Fast in production </li></ul><ul><li>Promotes meticulous problem analysis </li></ul><ul><li>Dependent to human intelligence </li></ul><ul><li>Power lies on how the computer is programmed </li></ul>
  90. 90. Human /Manual Indexing vs. Automatic Indexing <ul><li>Automatic methods have trouble handling synonyms, homonyms, and semantic relations. Conceptualizing is very poor. </li></ul><ul><li>Human indexers go through cognitive processes that may be influenced by their background experience, education, training, intelligence, and common sense. </li></ul><ul><li>Computers can, and humans cannot, organize all words in a text and in a given database and make statistical operations on them </li></ul>
  91. 91. Indexing and the Internet <ul><li>Search Tools </li></ul><ul><li>Search engines - Engines are computer software that scan the Web and select pages to be indexed for the searching system. They are often referred to as Web indexes since they examine the content of the web pages. Examples: HotBot, InfoSeek, and Google. </li></ul><ul><li>Directory-based systems – usually indexed by human and thus tend to have a higher level of quality in the indexing . Indexing may be based on full text or on most frequently used words since the way the material is organized is a sense of browsing that is similar to traditional library browsing. Examples: Yahoo! Directory and Google Directory </li></ul><ul><li>Metasearchers - allow the user to search across multiple search tools at once. They take user’s query and submit it to a number of other search tools. Examples: Metacrawler and Surfmax </li></ul>
  92. 92. GOOD LUCK!