Your SlideShare is downloading. ×
0
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Google Tech Talk: Reconsidering Relevance
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Google Tech Talk: Reconsidering Relevance

46,318

Published on

Reconsidering Relevance …

Reconsidering Relevance

We've become complacent about relevance. The overwhelming success of web search engines has lulled even information retrieval (IR) researchers to expect only incremental improvements in relevance in the near future. And beyond web search, there are still broad search problems where relevance still feels hopelessly like the pre-Google web.

But even some of the most basic IR questions about relevance are unresolved. We take for granted the very idea that a computer can determine which documents are relevant to a person's needs. And we still rely on two-word queries (on average) to communicate a user's information need. But this approach is a contrivance; in reality, we need to think of information-seeking as a problem of optimizing the communication between people and machines.

We can do better. In fact, there are a variety of ongoing efforts to do so, often under the banners of "interactive information retrieval", "exploratory search", and "human computer information retrieval". In this talk, I'll discuss these initiatives and how they are helping to move "relevance" beyond today's outdated assumptions.


About the Speaker

Daniel Tunkelang is co-founder and Chief Scientist at Endeca, a leading provider of enterprise information access solutions. He leads Endeca's efforts to develop features and capabilities that emphasize user interaction. Daniel has spearheaded the annual Workshops on Human Computer Information Retrieval (HCIR) and is organizing the Industry Track for SIGIR '09. Daniel also publishes The Noisy Channel, a widely read and cited blog that focuses on how people interact with information.

Daniel holds undergraduate degrees in mathematics and computer science from the Massachusetts Institute of Technology, with a minor in psychology. He completed a PhD at Carnegie Mellon University for his work on information visualization. His work previous to Endeca includes stints at the IBM T. J. Watson Research Center and AT&T Labs,

Published in: Technology, Business
3 Comments
33 Likes
Statistics
Notes
No Downloads
Views
Total Views
46,318
On Slideshare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
348
Comments
3
Likes
33
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Friends, Countrymen, Googlers, I come to bury relevance, not to praise it. Well, that’s overstating the case. But I am here today to challenge your approach to information access, and more importantly to tease out and question its philosophical underpinnings. I realize that I’m singling you out as Googlers for holding a belief that is far more widely held, but you are the standard bearers of relevance. And you invited me.  Notes: this presentation was delivered at the Google NYC office on 1/7/09. The title is an allusion to Tefko Saracevic’s article, “Relevance Reconsidered”. If you are interested in learning more about the history of relevance I highly recommend his 2007 Lazerow Memorial Lecture on “Relevance in information science” (http://mediabeast.ites.utk.edu/mediasite4/Viewer/?peid=fb8f84cb-9f82-499f-b12c-9a56ab5cf5ba).
  • Transcript

    • 1. Reconsidering Relevance Daniel Tunkelang Chief Scientist, Endeca
    • 2. howdy! <ul><li>1988 – 1992 </li></ul><ul><li>1993 – 1998 </li></ul><ul><li>1999 - </li></ul>
    • 3. overview <ul><li>what is relevance? </li></ul><ul><li>what’s wrong with relevance? </li></ul><ul><li>what are the alternatives? </li></ul>
    • 4. but first let’s set the stage
    • 5. iconic businesses of the 20 th and 21 st centuries I’m Feeling Lucky
    • 6. process and scale orchestration
    • 7. but there’s a dark side
    • 8. users are satisfied
    • 9. an interesting contrast <ul><li>“ Search on the internet is solved. I always find what I need. But why not in the enterprise? Seems like a solution waiting to happen.” </li></ul><ul><li>- a Fortune 500 CTO </li></ul>
    • 10. the real questions <ul><li>What is “search on the internet” and why is it perceived a solved problem? </li></ul><ul><li>What is “search in the enterprise” and why is it perceived as an unsolved problem? </li></ul><ul><li>And what does this have to do with relevance? </li></ul>
    • 11. easy vs. hard search problems <ul><li>easy where to buy Ender in Exile ? </li></ul><ul><li>hard good novel to read on the beach? </li></ul><ul><li>easy proof that sorting has n log n lower bound? </li></ul><ul><li>hard algorithm to sort partially ordered set, given a constant-time comparator? </li></ul>
    • 12. <ul><li>what is relevance? </li></ul><ul><li>what’s wrong with relevance? </li></ul><ul><li>what are the alternatives? </li></ul>
    • 13. defining relevance <ul><li>Relevance is defined as a measure of information conveyed by a document relative to a query. It is shown that the relationship between the document and the query, though necessary, is not sufficient to determine relevance. </li></ul><ul><li>William Goffman, On relevance as a measure, 1964. </li></ul>
    • 14. we need more definitions
    • 15. let’s work top-down <ul><li>information retrieval (IR) = study of retrieval of information (not data) from collection of written documents retrieved documents aim at satisfying user information need </li></ul>
    • 16. IR assumes information needs <ul><li>user information need = natural language declaration of informational need of user </li></ul><ul><li>query = expression of user information need in input language provided by information system </li></ul>
    • 17. relevance drives IR modeling <ul><li>modeling = studies algorithms used for ranking documents according to system assigned likelihood of relevance </li></ul><ul><li>model = a set of premises and an algorithm for ranking documents with regard to a user query </li></ul>
    • 18. a relevance-centric approach information Need query select from results rank using IR model USER: SYSTEM: tf-idf PageRank
    • 19. <ul><li>what is relevance? </li></ul><ul><li>what’s wrong with relevance? </li></ul><ul><li>what are the alternatives? </li></ul>
    • 20. our first communication problem information need query <ul><li>2 words? </li></ul><ul><li>natural language? </li></ul><ul><li>telepathy? </li></ul>
    • 21. and the game of telephone continues query rank using IR model <ul><li>cumulative error </li></ul><ul><li>relevance is subjective </li></ul><ul><li>what Goffman said </li></ul>
    • 22. and hopefully users feel lucky rank using IR model <ul><li>selection bias </li></ul><ul><li>inefficient channel </li></ul><ul><li>backup plan? </li></ul>select from results
    • 23. queries are misinterpreted Results 1-10 out of about 344,000,000 for ir
    • 24. ranked lists are inefficient
    • 25. assumptions of relevance-centric approach <ul><li>self-awareness </li></ul><ul><li>self-expression </li></ul><ul><li>model knows best </li></ul><ul><li>answer is a document </li></ul><ul><li>one-shot query </li></ul>
    • 26. can we do better?
    • 27. <ul><li>what is relevance? </li></ul><ul><li>what’s wrong with relevance? </li></ul><ul><li>what are the alternatives? </li></ul>
    • 28. human-computer information retrieval <ul><li>don’t just guess the user’s intent </li></ul><ul><ul><li>optimize communication </li></ul></ul><ul><li>increase user responsibility and control </li></ul><ul><ul><li>require and reward human intellectual effort </li></ul></ul>“ Toward Human-Computer Information Retrieval” Gary Marchionini
    • 29. human computer information retrieval
    • 30. a concrete use case <ul><li>Colleague: Hey Daniel! You should check out what this guy Steve Pollitt’s been researching. Sounds right up your alley. </li></ul><ul><li>Daniel: Sure thing, I’ll look into it. </li></ul>
    • 31. google him!
    • 32. google scholar him?
    • 33. rexa him?
    • 34. getting better
    • 35. hcir-inspired interface
    • 36. tags provide summarization and guidance
    • 37. my information need evolves as i learn
    • 38. hcir – implementing the vision
    • 39. scatter/gather: a search for “star”
    • 40. faceted search
    • 41. practical considerations <ul><li>which facets to show </li></ul><ul><li>which facet values to show </li></ul><ul><li>when to suggest faceted refinement </li></ul><ul><li>how to automate faceted classification </li></ul>
    • 42. showing the right facets: microwaves
    • 43. showing the right facets: ceiling fans
    • 44. query-driven clarification before refinement <ul><li>Matching Categories include: </li></ul><ul><ul><li>Appliances > Small Appliances > Irons & Steamers </li></ul></ul><ul><ul><li>Appliances > Small Appliances > Microwaves & Steamers </li></ul></ul><ul><ul><li>Bath > Sauna & Spas > Steamers </li></ul></ul><ul><ul><li>Kitchen > Bakeware & Cookware > Cookware > </li></ul></ul><ul><ul><li>Open Stock Pots > Double Boilers & Steamers </li></ul></ul><ul><ul><li>Kitchen > Small Appliances > Steamers </li></ul></ul>
    • 45. results-driven clarification before refinement Search : storage
    • 46. crowd-sourcing to tag documents
    • 47. hcir cheats the precision / recall trade-off recall precision
    • 48. set retrieval 2.0 <ul><li>set retrieval that responds to queries with </li></ul><ul><ul><li>overview of the user's current context </li></ul></ul><ul><ul><li>organized set of options for exploration </li></ul></ul><ul><li>contextual summaries of document sets </li></ul><ul><ul><li>optimize system’s communication with user </li></ul></ul><ul><li>query refinement options </li></ul><ul><ul><li>optimize user’s communication with system </li></ul></ul>
    • 49. hcir using set retrieval 2.0 <ul><li>emphasize set summaries over ranked lists </li></ul><ul><li>establish a dialog between the user and the data </li></ul><ul><li>enable exploration and discovery </li></ul>
    • 50. think outside the (search) box <ul><li>relevance-centric search solves many use cases </li></ul><ul><li>but not some of the most valuable ones </li></ul><ul><li>support interaction, exploration </li></ul><ul><li>human-computer information retrieval </li></ul>
    • 51. one more thing …
    • 52. “ Google's mission is to organize the world's information and make it universally accessible and useful.”
    • 53. organizer or referee?
    • 54. thank you <ul><li>communication 1.0 </li></ul><ul><li>email: [email_address] </li></ul><ul><li>communication 2.0 </li></ul><ul><li>blog: http://thenoisychannel.com </li></ul><ul><li>twitter: http://twitter.com/dtunkelang </li></ul>

    ×