The Significance of Vocabulary
Michael Buckland
School of Information Management and Systems
University of California, Ber...
The Significance of Vocabulary
• An economic claim: Vocabulary problems reduce
the benefits and return on investment in
in...
A correctly formed Library of Congress Subject
heading, but who would think of such search
terms?
God --- Knowableness ---...
Economic Rationale:
• Massive investment in repositories
• Large investment in categorization schemes:
classifications, th...
Remedy
Support for searching unfamiliar metadata
vocabularies: Interface to translate searcher’s
vocabulary into system’s ...
Examples
Automobile import, export data (Census Bureau)
Automobiles?
No data.
Cars?
“Railway or tramway stock”
(Passenger ...
“Automobiles”, also know as . . .
TL 205
180/280
3711
in Library of Congress Classification
in U.S. Patent Classification
...
Example: Coastal pollution
F SU COASTAL POLLUTION 0
F TW COASTAL POLLUTION
SUMMARIZE SUBJECTS
LCSH
Marine pollution
Coasta...
International Harmonized Commodity
Classification System: “Computer”
• HS 84: “Nuclear reactors, boilers, machines and
mec...
INSPEC Thesaurus subdomain-
based indexes:
• “Water” subdomain: Fission reactor safety;
Fission reactor fuel; Polymers; Or...
Example: Vietnam War.
U.C. MELVYL Online Catalog
FIND XSU VIETNAM WAR
Search Results: 0 records
FIND XSU VIETNAMESE CONFLI...
Dictionaries don’t always help
Emanuel Goldberg: Aerial photography using
a “Drachen”
Actual meaning: Aerodynamic tethered...
“Entry vocabulary” search interfaces:
• Software and algorithms map natural language
vocabulary to specialized metadata te...
Thesaurus navigation
• Facilitates browsing where structure is
present: Broader, narrower, related terms
• Guides searcher...
Web access: WWW forms-based application
supported by Perl
Supports searches on remote repositories
Four subdomain dictiona...
Statement of work:
• Varied prototype Entry Vocabulary Modules.
• Unintrusive development of EVMs by agents
• Sensitivity ...
Upcoming SlideShare
Loading in …5
×

The Significance of Vocabulary Michael Buckland

510 views
397 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
510
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The Significance of Vocabulary Michael Buckland

  1. 1. The Significance of Vocabulary Michael Buckland School of Information Management and Systems University of California, Berkeley
  2. 2. The Significance of Vocabulary • An economic claim: Vocabulary problems reduce the benefits and return on investment in information services. • Vocabulary is used for indexicality, therefore issues of identity are central to LIS. • Vocabulary is central to digital libraries. • Vocabulary central to explaining the history of conceptions of LIS!
  3. 3. A correctly formed Library of Congress Subject heading, but who would think of such search terms? God --- Knowableness --- History of doctrines --- Early church, ca. 30-600 --- Congresses.
  4. 4. Economic Rationale: • Massive investment in repositories • Large investment in categorization schemes: classifications, thesauri, concept codes, headings, … • Categorization schemes usually specialized and stylized • Increasingly unfamiliar to searchers, hence ineffective, inefficient use
  5. 5. Remedy Support for searching unfamiliar metadata vocabularies: Interface to translate searcher’s vocabulary into system’s vocabulary.
  6. 6. Examples Automobile import, export data (Census Bureau) Automobiles? No data. Cars? “Railway or tramway stock” (Passenger motor vehicles, spark ignition engine.)
  7. 7. “Automobiles”, also know as . . . TL 205 180/280 3711 in Library of Congress Classification in U.S. Patent Classification in Standard Industrial Classification
  8. 8. Example: Coastal pollution F SU COASTAL POLLUTION 0 F TW COASTAL POLLUTION SUMMARIZE SUBJECTS LCSH Marine pollution Coastal zone management Water --- Pollution Petroleum industry and trade Beach erosion Coasts Barrier islands MeSH Seawater Water pollution Bacteria Water microbiology Air pollution Environmental monitoring Bathing beaches
  9. 9. International Harmonized Commodity Classification System: “Computer” • HS 84: “Nuclear reactors, boilers, machines and mechanical appliances” • HS 8471: “Automatic data processing machines and units thereof, magnetic or optical readers, machines for transcribing data” • HS 847120: “Digital auto data proc mach contng in the same housing a CPU and input & output device”
  10. 10. INSPEC Thesaurus subdomain- based indexes: • “Water” subdomain: Fission reactor safety; Fission reactor fuel; Polymers; Organic insulating materials; Water supply; Cable insulation; Insulation testing; and Insulating oils. • “Biology” subdomain: Water; Biomechanics; Physiological models; Neurophysiology; Cellular effects of radiation. • “Information Studies” subdomain: Agriculture; Natural resources; Forecasting theory; Operations research; Erosion.
  11. 11. Example: Vietnam War. U.C. MELVYL Online Catalog FIND XSU VIETNAM WAR Search Results: 0 records FIND XSU VIETNAMESE CONFLICT Search Results: 4,190 records
  12. 12. Dictionaries don’t always help Emanuel Goldberg: Aerial photography using a “Drachen” Actual meaning: Aerodynamic tethered balloon. Standard contemporary English was: Aerostat. German: Drachen (= Kite in dictionary)
  13. 13. “Entry vocabulary” search interfaces: • Software and algorithms map natural language vocabulary to specialized metadata terms. • Allows users to enter ordinary language queries while taking advantage of existing subject headings, categorization • Uses co-occurrence statistics to link users’ ordinary language terms to system vocabularies • Statistical association between lexical items in titles and abstracts and the system’s metadata vocabulary • Suggests most likely system vocabulary
  14. 14. Thesaurus navigation • Facilitates browsing where structure is present: Broader, narrower, related terms • Guides searcher to other parts of the structure Retrieval set analysis • Navigation within micro-domain
  15. 15. Web access: WWW forms-based application supported by Perl Supports searches on remote repositories Four subdomain dictionaries in three databases --- BIOSIS (Biological abstracts): subdomain “water” --- INSPEC: subdomains: “information science”, “water” --- U.S. Patent Office classification
  16. 16. Statement of work: • Varied prototype Entry Vocabulary Modules. • Unintrusive development of EVMs by agents • Sensitivity to subdomains. • Natural language processing to augment statistical term frequency. • Recommendations for metadata “codebooks” for numeric databases. • www.sims.berkeley.edu/metadata/

×