Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Like this? Share it with your network


The Significance of Vocabulary Michael Buckland

Uploaded on


  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. The Significance of Vocabulary Michael Buckland School of Information Management and Systems University of California, Berkeley
  • 2. The Significance of Vocabulary
    • An economic claim: Vocabulary problems reduce the benefits and return on investment in information services.
    • Vocabulary is used for indexicality, therefore issues of identity are central to LIS.
    • Vocabulary is central to digital libraries.
    • Vocabulary central to explaining the history of conceptions of LIS!
  • 3. A correctly formed Library of Congress Subject heading, but who would think of such search terms? God --- Knowableness --- History of doctrines --- Early church, ca. 30-600 --- Congresses.
  • 4. Economic Rationale:
    • Massive investment in repositories
    • Large investment in categorization schemes: classifications, thesauri, concept codes, headings, …
    • Categorization schemes usually specialized and stylized
    • Increasingly unfamiliar to searchers, hence ineffective, inefficient use
  • 5. Remedy Support for searching unfamiliar metadata vocabularies: Interface to translate searcher’s vocabulary into system’s vocabulary.
  • 6. Examples Automobile import, export data (Census Bureau) Automobiles? No data. Cars? “ Railway or tramway stock” (Passenger motor vehicles, spark ignition engine.)
  • 7. “ Automobiles”, also know as . . . TL 205 180/280 3711 in Library of Congress Classification in U.S. Patent Classification in Standard Industrial Classification
  • 8. Example: Coastal pollution F SU COASTAL POLLUTION 0 F TW COASTAL POLLUTION SUMMARIZE SUBJECTS LCSH Marine pollution Coastal zone management Water --- Pollution Petroleum industry and trade Beach erosion Coasts Barrier islands MeSH Seawater Water pollution Bacteria Water microbiology Air pollution Environmental monitoring Bathing beaches
  • 9. International Harmonized Commodity Classification System: “Computer”
    • HS 84 : “Nuclear reactors, boilers, machines and mechanical appliances”
    • HS 8471 : “Automatic data processing machines and units thereof, magnetic or optical readers, machines for transcribing data”
    • HS 847120 : “Digital auto data proc mach contng in the same housing a CPU and input & output device”
  • 10. INSPEC Thesaurus subdomain-based indexes:
    • “ Water” subdomain: Fission reactor safety; Fission reactor fuel; Polymers; Organic insulating materials; Water supply; Cable insulation; Insulation testing; and Insulating oils.
    • “ Biology” subdomain: Water; Biomechanics; Physiological models; Neurophysiology; Cellular effects of radiation.
    • “ Information Studies” subdomain: Agriculture; Natural resources; Forecasting theory; Operations research; Erosion.
  • 11. Example: Vietnam War. U.C. MELVYL Online Catalog FIND XSU VIETNAM WAR Search Results: 0 records FIND XSU VIETNAMESE CONFLICT Search Results: 4,190 records
  • 12. Dictionaries don’t always help Emanuel Goldberg: Aerial photography using a “ Drachen ” Actual meaning: Aerodynamic tethered balloon. Standard contemporary English was: Aerostat. German: Drachen (= Kite in dictionary)
  • 13. “ Entry vocabulary” search interfaces:
    • Software and algorithms map natural language vocabulary to specialized metadata terms.
    • Allows users to enter ordinary language queries while taking advantage of existing subject headings, categorization
    • Uses co-occurrence statistics to link users’ ordinary language terms to system vocabularies
    • Statistical association between lexical items in titles and abstracts and the system’s metadata vocabulary
    • Suggests most likely system vocabulary
  • 14. Thesaurus navigation
    • Facilitates browsing where structure is present: Broader, narrower, related terms
    • Guides searcher to other parts of the structure
    Retrieval set analysis
    • Navigation within micro-domain
  • 15. Web access: WWW forms-based application supported by Perl Supports searches on remote repositories Four subdomain dictionaries in three databases --- BIOSIS (Biological abstracts): subdomain “water” --- INSPEC: subdomains: “information science”, “water” --- U.S. Patent Office classification
  • 16. Statement of work:
    • Varied prototype Entry Vocabulary Modules.
    • Unintrusive development of EVMs by agents
    • Sensitivity to subdomains.
    • Natural language processing to augment statistical term frequency.
    • Recommendations for metadata “codebooks” for numeric databases.
    • www.sims.berkeley.edu/metadata/