Whither subject access?


Published on

Published in: Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Whither subject access?

  1. 1. Whither subject access? Karen Markey Professor, University of Michigan [email_address]
  2. 2. Outline <ul><li>6 reasons why subject access so difficult </li></ul><ul><li>4 end-user searcher types </li></ul><ul><ul><li>Helping the most predominant type (~ 80% of queries) to overcome difficulties </li></ul></ul><ul><li>4 system improvements </li></ul><ul><li>Our improvement approach: </li></ul><ul><ul><li>A web-based board game that teaches players how to build their knowledge about a topic </li></ul></ul>
  3. 3. Why do subject access? (The library context) <ul><li>I don’t know something, and I want to find out </li></ul>
  4. 4. Why is subject access so difficult?-1 <ul><li>If you don’t know something, how can you formulate a question, query, keywords, search statement, etc., to answer it? </li></ul>“ Precisely because of the inquirer's lack of knowledge about a problem area, it is impossible to specify what would resolve it.” – Belkin 1980, 137 –
  5. 5. Outcome of subject searches (The library context) <ul><li>Vetted scholarship </li></ul><ul><li>Read, analyze, and synthesize </li></ul><ul><li>Act: Satisfy the information need that set the subject-access episode into action </li></ul>
  6. 6. Why is subject access so difficult?-2 <ul><li>Where to find the answer? </li></ul><ul><ul><li>OPAC </li></ul></ul><ul><ul><li>Library-licensed databases (at U-M = 1,023) </li></ul></ul><ul><ul><li>The web: Google and other search engines </li></ul></ul><ul><ul><li>Institutional repositories </li></ul></ul><ul><ul><li>Subject archives (e.g., arXiv, Cogprint) </li></ul></ul><ul><ul><li>Invisible web </li></ul></ul>
  7. 7. What is subject access so difficult?-3 <ul><li>In the course of satisfying knowing, you encounter “doing” </li></ul><ul><ul><li>Buying and selling </li></ul></ul><ul><ul><li>Playing </li></ul></ul><ul><ul><li>Managing assets </li></ul></ul><ul><ul><li>Talking to other people </li></ul></ul><ul><ul><li>Computing </li></ul></ul><ul><ul><li>Developing … so that people can buy and sell, play, manage assets, talk, and compute </li></ul></ul>
  8. 8. Outcome of subject searches (The e-context) <ul><li>Vetted scholarship </li></ul><ul><li>And a whole lot more </li></ul>
  9. 9. New technologies for scientists & scholars OCLC Environment Scan: Pattern recognition: Executive Summary, p. 3. 2003.
  10. 10. Expanded role for librarians <ul><li>No longer just about the finished products of research </li></ul><ul><li>Selecting, organizing, preserving, etc., the products and by-products of “doing” science & scholarship </li></ul><ul><li>Orienting information seekers about the “doing” science & scholarship artifacts they encounter </li></ul><ul><li>Maybe using some of these same new technologies to facilitate what we do… </li></ul>
  11. 11. Outcome of subject searches (The e-library context) <ul><li>Vetted scholarship </li></ul><ul><li>And a whole lot more: </li></ul><ul><ul><li>Limiting this “whole lot more” to the products and by-product of the research enterprise … to the doing of science and scholarship </li></ul></ul>
  12. 12. Why is subject access so difficult?-4 <ul><li>The seeker’s present level of expertise vis-à-vis their retrievals </li></ul><ul><ul><li>Grade school </li></ul></ul><ul><ul><li>High school </li></ul></ul><ul><ul><li>College </li></ul></ul><ul><ul><li>Graduate school </li></ul></ul><ul><ul><li>Terminal degree, e.g., MD, JD, PhD, MFA, licenses, certifications, ordinations, initiations, etc. </li></ul></ul><ul><li>Topics: </li></ul><ul><ul><li>Kukulcan </li></ul></ul><ul><ul><li>Making aerogel affordable </li></ul></ul><ul><ul><li>How do birds migrate? </li></ul></ul><ul><ul><li>Tibetan Buddhism </li></ul></ul><ul><ul><li>Pop rocks </li></ul></ul><ul><ul><li>The Black Death </li></ul></ul><ul><ul><li>Using extremophiles to clean up radioactive wastes </li></ul></ul>Knowledge is like the dust. You can't see it building up because it builds up so slowly but after a while when you check, you can see it has built up quite a bit.
  13. 13. Why is subject access so difficult?-5 <ul><li>Different document representations </li></ul><ul><ul><li>Titles </li></ul></ul><ul><ul><li>Uncontrolled keywords </li></ul></ul><ul><ul><li>Controlled vocabularies </li></ul></ul><ul><ul><li>Abstracts </li></ul></ul><ul><ul><li>Web pages </li></ul></ul><ul><ul><li>E-journal articles </li></ul></ul><ul><ul><li>Citation data </li></ul></ul><ul><ul><li>E-reviews </li></ul></ul><ul><ul><li>E-encyclopedia articles </li></ul></ul><ul><ul><li>E-newspaper articles </li></ul></ul><ul><ul><li>E-books </li></ul></ul>
  14. 14. Why is subject access so difficult?-6 <ul><li>Different search engines and search functionality </li></ul><ul><ul><li>Boolean </li></ul></ul><ul><ul><li>Probabilistic </li></ul></ul><ul><ul><li>Manual or automatic truncation </li></ul></ul><ul><ul><li>Word proximity </li></ul></ul><ul><ul><li>Spelling correction </li></ul></ul><ul><ul><li>Phrase searching </li></ul></ul><ul><ul><li>Relevance ranking </li></ul></ul><ul><ul><li>Popularity ranking </li></ul></ul>
  15. 15. Plus… <ul><li>Our knowledge of people’s feelings during search exacerbates the problem </li></ul><ul><ul><li>(Kuhlthau’s ISP Model) </li></ul></ul>relief, satisfaction or disappointment 6. Closure so much work to do but confidence in ability… 5. Information collection optimism, confidence in ability to complete the task 4. Focus formulation confusion, doubt, threat, uncertainty 3. Prefocus exploration confusion, anxiety, brief elation after selection 2. Topic selection apprehension, uncertainty 1. Task initiation
  16. 16. Summing up: Subject access is difficult <ul><li>Knowing so little about what I want to know </li></ul><ul><li>Expressing my query in words </li></ul><ul><li>Formulating my query into a search statement that yields useful retrievals </li></ul><ul><li>Continuing the search beyond the web </li></ul><ul><li>Eliminating the noise </li></ul><ul><li>Retrieving something I can understand given my present knowledge of the subject </li></ul><ul><li>Roller coastering up and down emotionally </li></ul>
  17. 17. What really matters = system & domain knowledge <ul><ul><li>Most people are looking for information on topics they know nothing about </li></ul></ul><ul><ul><li>They have low system knowledge and low domain knowledge </li></ul></ul>less than 0.5% ~14% High domain expertise ~7% ~79% Low domain expertise High system knowledge Low system knowledge
  18. 18. When double novices search… <ul><li>Low domain knowledge </li></ul><ul><ul><li>Not knowing the right jargon, names of movers & shakers, an expert other than their instructor </li></ul></ul><ul><li>Low system knowledge </li></ul><ul><ul><li>Searches that are frenetic, aimless, random, meandering … </li></ul></ul><ul><li>Low procedural knowledge </li></ul><ul><ul><li>Not knowing what sources to search or the order of searching sources </li></ul></ul><ul><ul><li>Success starting with Google, but then what? </li></ul></ul><ul><li>Low metacognitive knowledge </li></ul><ul><ul><li>Not thinking about searching, search strategies, search tactics, making progress, knowing when to stop… </li></ul></ul><ul><li>(The vast majority of users and uses) </li></ul>
  19. 19. Low system knowledge & high domain knowledge-1 <ul><li>High procedural knowledge </li></ul><ul><ul><li>Familiar with in-domain sources </li></ul></ul><ul><ul><li>The order for searching these sources </li></ul></ul><ul><li>High domain knowledge—they know: </li></ul><ul><ul><li>Experts contributing to their field </li></ul></ul><ul><ul><li>Jargon and language of their field </li></ul></ul><ul><ul><li>Other domain experts for recommendations </li></ul></ul><ul><ul><li>Channel this knowledge into these successful search strategies: </li></ul></ul><ul><ul><ul><li>Author searching </li></ul></ul></ul><ul><ul><ul><li>Backward chaining </li></ul></ul></ul><ul><ul><ul><li>Forward chaining </li></ul></ul></ul><ul><ul><ul><li>Journal runs </li></ul></ul></ul>
  20. 20. Low system knowledge & high domain knowledge-2 <ul><li>Rely on their domain knowledge to quickly spot relevant retrievals </li></ul><ul><li>Don’t need Google for basic information in their domain </li></ul><ul><li>Not as frenetic … </li></ul><ul><li>Do they generalize the search strategies of their in-domain searches to their </li></ul>out-of-domain searches?
  21. 21. The rise of the professional-amateur class-1 <ul><li>Becoming a birdwatcher (1960s) </li></ul><ul><ul><li>Library: field guides, picture books, how-to books </li></ul></ul><ul><ul><li>Parents (?) </li></ul></ul><ul><ul><li>Scout leaders (?) </li></ul></ul><ul><li>Becoming a birdwatcher (today) </li></ul><ul><ul><li>All of the above + </li></ul></ul><ul><ul><li>e-birding: rare bird alerts; chat and experts on mailing lists; photo archive; hot spots: directions, lists, and maps; meeting and field trip notes; commercial tours; travel preparation </li></ul></ul><ul><ul><li>Doing: Go on field trips and benefit from volunteer expertise </li></ul></ul>
  22. 22. Professional-amateur class-2 <ul><li>Greying of America> Increase in professional-amateurs </li></ul><ul><ul><li>Boomers retire in good health with leisure time and money … </li></ul></ul><ul><li>Professional classes harness professional-amateur enthusiasm and expertise </li></ul><ul><ul><li>Cornell Laboratory of Ornithology’s Citizen Science (http://www.birds.cornell.edu/) </li></ul></ul><ul><ul><ul><li>Status and population trends ~ as simple as counting birds at your feeders </li></ul></ul></ul><ul><ul><ul><li>Threatened species: Tanagers, Cerulean warblers, Golden-winged warblers </li></ul></ul></ul><ul><ul><li>U.S. Forest Service </li></ul></ul><ul><ul><ul><li>Endangered Kirtland’s Warbler </li></ul></ul></ul><ul><li>Swelling the ranks of searchers with low system-high expertise knowledge </li></ul>
  23. 23. Kirtland’s Warbler (Ron Austing photography)
  24. 24. Double experts <ul><li>High domain knowledge </li></ul><ul><ul><li>Use in-domain search strategies </li></ul></ul><ul><ul><li>Know jargon, active researchers, other domain experts… </li></ul></ul><ul><li>High system knowledge </li></ul><ul><ul><li>Use the wide range of search-system functionality </li></ul></ul><ul><li>High procedural knowledge </li></ul><ul><ul><li>Know the relevant sources and their order </li></ul></ul><ul><li>High metacognitive knowledge </li></ul><ul><ul><li>Thinking about searching, search strategies, search tactics, accessing their progress, knowing when to stop… </li></ul></ul><ul><li>(Miniscule percentage of users and uses) </li></ul>
  25. 25. Low domain knowledge & high system knowledge <ul><li>High system knowledge </li></ul><ul><ul><li>Rarely frenetic … </li></ul></ul><ul><ul><li>Use the wide range of system search functionality </li></ul></ul><ul><ul><li>Use in-domain search strategies for out-of-domain searches </li></ul></ul><ul><ul><ul><li>Author searching </li></ul></ul></ul><ul><ul><ul><li>Backward and forward chaining </li></ul></ul></ul><ul><ul><ul><li>Journal runs </li></ul></ul></ul><ul><li>Cognizant of procedural knowledge </li></ul><ul><ul><li>What are the in-domain sources? </li></ul></ul><ul><ul><li>How should these sources be ordered? </li></ul></ul><ul><li>Cognizant of metacognitive knowledge </li></ul><ul><ul><li>Think about searching </li></ul></ul>
  26. 26. Improve searching for double novices <ul><li>Reduce the impact of the end user’s </li></ul><ul><ul><li>Low system knowledge </li></ul></ul><ul><ul><li>Low domain expertise </li></ul></ul><ul><ul><li>Low procedural knowledge </li></ul></ul><ul><li>Reduce these conditions> </li></ul><ul><ul><li>End users can focus on thinking about searching (metacognitive knowledge) </li></ul></ul>
  27. 27. Reduce the impact of low system knowledge: Post-Boolean-1 <ul><li>Build future systems with post-Boolean searching </li></ul><ul><ul><li>Quoting Susan Feldman : </li></ul></ul><ul><ul><ul><li>“ These systems are doing what expert searchers have learned to do yourselves. They look for terms that can distinguish one document from another, they ask for the terms to appear close together in the document, they stem words, they count words that appear in the title more heavily than those appearing in the rest of the text …” </li></ul></ul></ul>
  28. 28. Reduce the impact of low system knowledge: Post-Boolean-2 <ul><li>Post-Boolean systems don’t require people to: </li></ul><ul><ul><li>Understand Boolean retrieval </li></ul></ul><ul><ul><li>Enter complicated search syntax </li></ul></ul><ul><ul><li>Scan unranked retrievals </li></ul></ul><ul><li>Post-Boolean systems rank potentially relevant retrievals at the top </li></ul><ul><ul><li>Let people use their energy spotting of relevant retrievals </li></ul></ul><ul><ul><li>(That’s what people with high domain knowledge and low system knowledge are doing) </li></ul></ul>That’s what people with high domain knowledge and low system knowledge are doing
  29. 29. Reduce the impact of low domain expertise: Ranking retrievals <ul><li>Profile ranking algorithms and relevance feedback routines to: </li></ul><ul><ul><li>Give higher weights to titles, subject headings, and table of contents entries than to words buried deep in the text </li></ul></ul><ul><ul><li>Produce retrievals that give a comprehensive rather than a cursory treatment of the desired topic </li></ul></ul><ul><ul><li>Ensure relevant retrievals are ranked at the top </li></ul></ul>
  30. 30. Reduce the impact of low domain expertise: Feedback <ul><li>Enhance relevance feedback routines with the search strategies of domain experts </li></ul><ul><ul><li>Backward chaining </li></ul></ul><ul><ul><li>Forward chaining </li></ul></ul><ul><ul><li>Author searching </li></ul></ul><ul><ul><li>Journal runs </li></ul></ul><ul><li>These strategies require input that is straightforward and objective </li></ul><ul><ul><li>Author names </li></ul></ul><ul><ul><li>Citation data </li></ul></ul><ul><ul><li>Journal titles </li></ul></ul>
  31. 31. Reduce the impact of low procedural knowledge: Process models-1 <ul><li>(Some background first) </li></ul><ul><li>Google searching is easy </li></ul><ul><ul><li>Google searches “everything” in one fell swoop </li></ul></ul><ul><ul><li>No deliberating or second guessing about database selection </li></ul></ul><ul><ul><li>Google’s popularity ranking algorithm ranks the simple, low-granularity stuff at the top </li></ul></ul><ul><ul><li>Google is a great starting point, then what? </li></ul></ul>
  32. 32. Reduce the impact of low procedural knowledge: Process models-2 <ul><li>Library gateways feature metasearching to mirror Google searching </li></ul><ul><ul><li>Gateways categorize databases by discipline and let people search across these databases </li></ul></ul><ul><ul><li>Metasearching in gateways is not effective because it ignores procedural knowledge </li></ul></ul><ul><ul><ul><li>Given one’s knowledge about a topic, knowing what sources to search and in what order </li></ul></ul></ul>
  33. 33. Reduce the impact of low procedural knowledge: Process models-3 <ul><li>Building systems with procedural knowledge should be the next leap forward in online system design </li></ul><ul><ul><li>Process models to simulate the procedural knowledge of system experts selecting databases </li></ul></ul><ul><ul><ul><li>General-to-specific model (Tom Kirk) </li></ul></ul></ul><ul><ul><ul><li>Gateway at Ohio State (Virginia Tiefel) </li></ul></ul></ul><ul><ul><ul><li>Learning-the-library models (Beaubien, Hogan, & George) </li></ul></ul></ul>
  34. 34. Reduce the impact of low procedural knowledge: Needed metadata-1 <ul><li>Add more cataloging because data in existing bibliographic records is not able to approximate procedural knowledge: </li></ul><ul><ul><li>In a discipline: in biology, mathematics, physics … </li></ul></ul><ul><ul><li>With knowledge of this subject at a particular academic level: with an elementary education, with a high school education, with a college education … </li></ul></ul><ul><ul><li>To what extent the author is an authority on the topic at hand </li></ul></ul><ul><ul><li>For a particular class of people: for teens, for seniors, for shut-ins, etc. </li></ul></ul>
  35. 35. Reduce the impact of low procedural knowledge: Needed metadata-2 <ul><li>Add more cataloging (contd.) </li></ul><ul><ul><li>Is a particular genre or of a particular literary nature: encyclopedias, newspapers, poetry, history, bibliography, research, diary, statistics … </li></ul></ul><ul><ul><li>What can be done with the artifact: read, calculate, play, chat, sell, gamble… </li></ul></ul><ul><ul><li>How others benefited from using the artifact (reviews and ratings) </li></ul></ul><ul><li>Survey existing databases for controlled vocabularies for these elements </li></ul>
  36. 36. To-do list: <ul><li>1. Post-Boolean retrieval systems </li></ul><ul><ul><li>Profile ranking algorithms to weight titles, subject headings, and table of contents entries higher than words buried deep in the text </li></ul></ul><ul><ul><li>Produce retrievals that give a comprehensive rather than a cursory treatment of the desired topic </li></ul></ul><ul><li>2. Enhance relevance feedback routines with the search strategies of domain experts </li></ul><ul><ul><li>Author searches </li></ul></ul><ul><ul><li>Backward chaining </li></ul></ul><ul><ul><li>Forward chaining </li></ul></ul><ul><ul><li>Journal runs </li></ul></ul>
  37. 37. To-do list: <ul><li>3. Build systems with procedural knowledge for searching scholarly and scientific information </li></ul><ul><ul><li>The next major leap forward in online system design! </li></ul></ul><ul><li>4. Add more subject cataloging </li></ul><ul><ul><ul><li>In a discipline </li></ul></ul></ul><ul><ul><ul><li>For a particular class of people </li></ul></ul></ul><ul><ul><ul><li>Is a particular genre of literature… </li></ul></ul></ul><ul><ul><ul><ul><li>(Don’t build vocabularies from scratch—cull vocabularies from other databases) </li></ul></ul></ul></ul><ul><ul><li>Desired outcome = Relevant ranked retrievals that are in keeping with people’s knowledge of their topics </li></ul></ul>
  38. 38. To-do list: <ul><li>Then let users focus on putting the relevant information they find to work for them </li></ul><ul><ul><li>Making a decision </li></ul></ul><ul><ul><li>Taking an action </li></ul></ul><ul><ul><li>Adding to their knowledge base about a topic </li></ul></ul>Knowledge is like the dust. You can't see it building up because it builds up so slowly but after a while when you check, you can see it has built up quite a bit.
  39. 39. Storygame Project-1 <ul><li>A web-based board game: Gain knowledge and depth in a real research topic (“The Black Death”) </li></ul><ul><li>Navigate what is written about the Black Death in a systematic way and get practice, practice, practice </li></ul><ul><ul><li>Tom Kirk’s General-to-Specific Model for Library Research </li></ul></ul><ul><ul><ul><li>Start with the web </li></ul></ul></ul><ul><ul><ul><li>Consult encyclopedias </li></ul></ul></ul><ul><ul><ul><li>Read books </li></ul></ul></ul><ul><ul><ul><li>Locate edited works </li></ul></ul></ul><ul><ul><ul><li>Find journal articles </li></ul></ul></ul><ul><ul><ul><li>Use a favorite, relevant publication to find more via the Web of Science </li></ul></ul></ul>
  40. 40. Storygaming Project-2 <ul><li>Games: Popular pastime for college students </li></ul><ul><li>Games have good learning principles (Gee) </li></ul><ul><ul><li>Lower the consequence of failure </li></ul></ul><ul><ul><li>Repetition and practice </li></ul></ul><ul><ul><li>Reward </li></ul></ul><ul><ul><li>Becoming expert in a domain and being recognized for their expertise </li></ul></ul><ul><ul><li>Discovery … </li></ul></ul>
  41. 41. The Solution: Gaming with a Strong Storytelling Element-2 <ul><li>Our immediate mission: </li></ul><ul><ul><li>Build a game prototype in which players become certified library researchers </li></ul></ul><ul><ul><li>Host game play with incoming freshmen </li></ul></ul><ul><ul><li>Evaluate the prototype </li></ul></ul><ul><ul><li>Improve the prototype: game genre, functionality, interactivity, more instructions, etc. </li></ul></ul><ul><li>Our long-term mission: </li></ul><ul><ul><li>Give students games that they want to play </li></ul></ul><ul><ul><li>Learn, practice, and reinforce information-literacy skills </li></ul></ul><ul><ul><li>Accommodate large numbers of students </li></ul></ul><ul><ul><li>Export beyond U-M (Interested? Let’s get an IMLS grant to do this.) </li></ul></ul>
  42. 43. Game Play Basics <ul><li>Board game </li></ul><ul><ul><ul><li>(Like Risk, Monopoly, Clue, Chutes & Ladders…) </li></ul></ul></ul><ul><ul><li>Monopoly = the game of becoming a real-estate tycoon </li></ul></ul><ul><ul><li>Our game = the game of becoming a certified library researcher </li></ul></ul><ul><li>Players accumulate wealth, territory, and knowledge </li></ul><ul><ul><li>Wealth = gold </li></ul></ul><ul><ul><li>Territory = libraries that gamers acquire by proving their fitness as researchers </li></ul></ul><ul><ul><li>Knowledge = quota of correct answers to questions </li></ul></ul><ul><li>Game winner = Fastest and most accurate researcher </li></ul><ul><li>Modest prizes> Delmas Foundation grant </li></ul>
  43. 44. Game Demonstration <ul><li>Backstory </li></ul><ul><ul><li>The Black Death has reached the Duchy of Hidgeon </li></ul></ul><ul><ul><li>Duke of Hidgeon must develop a plan to handle the impending crisis </li></ul></ul><ul><ul><li>Duchy libraries are stocked with knowledge past, present, and future </li></ul></ul><ul><ul><li>The duke needs certified researchers to find answers </li></ul></ul><ul><ul><li>This is the game of research certification </li></ul></ul><ul><li>< Let’s play! > </li></ul>
  44. 45. Storygame web links <ul><li>The Storygame Project: http://www.si.umich.edu/~ylime/storygame.html </li></ul><ul><li>The game: http://ics.umflint.edu:3904/team/login Use &quot;demo&quot; for name and &quot;secret&quot; for password </li></ul><ul><li>The video: http://www.youtube.com/watch?v=u76tW-ne-yY </li></ul>
  45. 46. Summing Up <ul><li>6 subject-access difficulties + emotions </li></ul><ul><li>4 searcher types </li></ul><ul><ul><li>Help double novices! (Low domain knowledge and low system knowledge) </li></ul></ul><ul><li>4 improvements: </li></ul><ul><ul><li>Post-Boolean retrieval </li></ul></ul><ul><ul><li>Built-in strategies </li></ul></ul><ul><ul><li>Built-in process models </li></ul></ul><ul><ul><li>Needed metadata </li></ul></ul><ul><li>Our contribution: </li></ul><ul><ul><li>Storygaming to teach process models </li></ul></ul><ul><li>Fellow speakers’ contributions … </li></ul>Help double novices! (Low domain knowledge and low system knowledge)