design for interaction
   Daniel Tunkelang
   Chief Scientist, Endeca

      © 2009 Endeca Technologies, Inc. All rights r...
about me




    Organizing SIGIR ’09 Industry Track in Boston on July 22nd!


2                       © 2009 Endeca Techn...
about endeca


     leading provider of
     search applications




         250M+
          end users
              per ...
what i hope you learn from this talk




     the db and ir perspectives have a common thread



              convergence...
overview




          don't put all your eggs in one basket



                 design for interaction



         human-...
don’t put all your eggs in one basket




              Still Life with Basket and Broken Eggs by Michael Edwards, 2008


...
the db approach: perfection in, perfection out




              http://www.storeitfoodsblog.com/category/food-preparation...
db usability researchers recognize the pain




8                   © 2009 Endeca Technologies, Inc. All rights reserved.
sql is hard


    Making Database Systems Usable
    [Jagadish et al., SIGMOD 2007]
                                      ...
data sucks and users are lazy


     Extracting Problems for Database
     and IR Researchers
     [Naughton, Spring 2008 ...
the ir way: don’t worry, be happy




                http://adsoftheworld.com/media/print/mcdonalds_burger_mysteries



1...
ir for db people: what would google do?


                                        tf-idf                                  ...
assumptions of relevance-centric ir approach



                                              • self-awareness

          ...
life is not a batch


     • db approach expects too much of user
     • ir approach expects too much of system



       ...
design for interaction




                   The Future of Social Interaction by Jim Stoten




15                       ...
changes assumptions about what to optimize




                                                                           ...
how do we optimize communication?




           transparency

                                                           ...
ir offers a black box




           ca c'est la caisse. le mouton que tu veux est dedans.




18                        ©...
db / set retrieval offers 2 out of 3




            transparency

                                                       ...
but we need it all!


     • set retrieval is a failure in the ir world
        – though quite successful in the db world!...
human-computer information retrieval



                          “Toward Human-Computer
                           Inform...
great idea




                                  how?




22                © 2009 Endeca Technologies, Inc. All rights re...
treat query construction as a process


     A Case for Interaction
     [Koenemann and Belkin, 1996]

     • used term fe...
expose the facets of semistructured content




24                  © 2009 Endeca Technologies, Inc. All rights reserved.
success in the lab and the field


     • favored in user studies by Marti Hearst
        – http://flamenco.berkeley.edu/
...
even a few db folks have drunk the kool-aid


     DataGuides
     [Goldman and Widom, VLDB 1997]
     • user-friendly sch...
what is semistructured data?




                                             • one universe

                            ...
data modeling flexibility


     • no a-priori schema
        – integrated sources without up-front schema design


     •...
semantically direct queries


                                                               which attributes
            ...
but let’s make this concrete


                         Uh oh, I’m presenting at
                        SIGMOD! Better fi...
quick, to the goog-mobile!




                                                                         not quite…




31 ...
i know, i’ll go to the library!




                                                                               #%@$!

...
let’s try a little hcir…




33                     © 2009 Endeca Technologies, Inc. All rights reserved.
hcir works for news too




34                  © 2009 Endeca Technologies, Inc. All rights reserved.
life in a semistructured world


     • search is a great starting point
        – users can’t / won’t initiate structured...
lots of trade-offs


     “everything should be made as simple
      as possible, but no simpler”

     “speed of thought”...
users want the triumvirate


     • transparency
     • control
     • guidance



           transparency and control are...
in closing




      all of us want to help people access information



        the best help is to help them help themse...
thank you…and come to SIGIR!


                communication 1.0
               email: dt@endeca.com

                 com...
Upcoming SlideShare
Loading in …5
×

Design for Interaction

6,345 views

Published on

Design for Interaction
by Daniel Tunkelang, Chief Scientist of Endeca
An invited presentation at SIGMOD '09 (http://sigmod09.org/)

Research in information retrieval has focused on presenting the most relevant results to a user in response to a free-text search query. Research in database systems assumes a model where the user enters a formal query, and the results are exactly those the user requested. Neither community has emphasized user interaction—a critical concern for practical information access.

As William Goffman noted in the 1960s and Nick Belkin continually reminds us today, the relationship between a document and query, though necessary, is not sufficient to determine relevance—yet ranked retrieval approaches rely heavily or exclusively on this relationship. Meanwhile, recent work on database usability by Jeff Naughton and H.V. Jagadish surfaces the rigidity of database systems that return nothing unless users know how to formulate precise queries.

This talk presents human-computer information retrieval (HCIR) as a general approach that addresses some of the key challenges facing both research communities. A vision first put forward by Gary Marchionini, HCIR expects people and systems to work together to implement information access. Such an approach requires rethinking information access not as a matching or ranking problem, but rather as a communication problem. Specifically, we need interfaces that optimize the bidirectional communication between the user and the system, thus optimizing the symbiotic division of labor between the two.

This talk reviews the history of HCIR efforts and presents ongoing work to implement the HCIR vision. In particular, it presents an interactive set retrieval approach that responds to queries with an overview of the user's current context and an organized set of options for incremental exploration.

Published in: Technology, Business
  • Daniel:

    I appreciate your presentation and the point of allowing the user to tell the system more of what he knows to the search engine. That is the theme of my proposal HyperPlex. Please take a look at the following and give your assessment. Thanks

    http://www.slideshare.net/putchavn/knowledge-representation-processing
    http://www.slideshare.net/putchavn/hyper-plex-high-precision-queryresponse-knowledge-repository-pdf
    http://www.slideshare.net/putchavn/concept-maps-knowledge-encoding
    You are welcome to email to putchavn@yahoo.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Design for Interaction

  1. design for interaction Daniel Tunkelang Chief Scientist, Endeca © 2009 Endeca Technologies, Inc. All rights reserved.
  2. about me Organizing SIGIR ’09 Industry Track in Boston on July 22nd! 2 © 2009 Endeca Technologies, Inc. All rights reserved.
  3. about endeca leading provider of search applications 250M+ end users per month 600+ customers $100M+ annual sales 3 © 2009 Endeca Technologies, Inc. All rights reserved.
  4. what i hope you learn from this talk the db and ir perspectives have a common thread convergence may be upon us but we need interaction to make it work 4 © 2009 Endeca Technologies, Inc. All rights reserved.
  5. overview don't put all your eggs in one basket design for interaction human-computer information retrieval 5 © 2009 Endeca Technologies, Inc. All rights reserved.
  6. don’t put all your eggs in one basket Still Life with Basket and Broken Eggs by Michael Edwards, 2008 6 © 2009 Endeca Technologies, Inc. All rights reserved.
  7. the db approach: perfection in, perfection out http://www.storeitfoodsblog.com/category/food-preparation/meat-grinder/ 7 © 2009 Endeca Technologies, Inc. All rights reserved.
  8. db usability researchers recognize the pain 8 © 2009 Endeca Technologies, Inc. All rights reserved.
  9. sql is hard Making Database Systems Usable [Jagadish et al., SIGMOD 2007] __ sql • labor-intensive query construction • lengthy query evaluation • high query reformulation cost 9 © 2009 Endeca Technologies, Inc. All rights reserved.
  10. data sucks and users are lazy Extracting Problems for Database and IR Researchers [Naughton, Spring 2008 North East DB/IR Day] • real data is – incomplete – inconsistent – incorrect • users don’t want to learn – data schemas – structured query languages we’re not gonna take it! 10 © 2009 Endeca Technologies, Inc. All rights reserved.
  11. the ir way: don’t worry, be happy http://adsoftheworld.com/media/print/mcdonalds_burger_mysteries 11 © 2009 Endeca Technologies, Inc. All rights reserved.
  12. ir for db people: what would google do? tf-idf PageRank SYSTEM: rank using IR model USER: information Need query select from results 12 © 2009 Endeca Technologies, Inc. All rights reserved.
  13. assumptions of relevance-centric ir approach • self-awareness • self-expression • model knows best • answer is a document • one-shot query 13 © 2009 Endeca Technologies, Inc. All rights reserved.
  14. life is not a batch • db approach expects too much of user • ir approach expects too much of system both approaches act as if it all comes down to a single query is that your final answer question? 14 © 2009 Endeca Technologies, Inc. All rights reserved.
  15. design for interaction The Future of Social Interaction by Jim Stoten 15 © 2009 Endeca Technologies, Inc. All rights reserved.
  16. changes assumptions about what to optimize precision recall complexity relevance communication 16 © 2009 Endeca Technologies, Inc. All rights reserved.
  17. how do we optimize communication? transparency guidance control 17 © 2009 Endeca Technologies, Inc. All rights reserved.
  18. ir offers a black box ca c'est la caisse. le mouton que tu veux est dedans. 18 © 2009 Endeca Technologies, Inc. All rights reserved.
  19. db / set retrieval offers 2 out of 3 transparency guidance control 19 © 2009 Endeca Technologies, Inc. All rights reserved.
  20. but we need it all! • set retrieval is a failure in the ir world – though quite successful in the db world! • but ranked retrieval is inherently crippled – no transparency, control, or guidance! how do we optimize for communication? 20 © 2009 Endeca Technologies, Inc. All rights reserved.
  21. human-computer information retrieval “Toward Human-Computer Information Retrieval” Gary Marchionini • don’t just guess the user’s intent • increase user responsibility and control • require and reward human intellectual effort 21 © 2009 Endeca Technologies, Inc. All rights reserved.
  22. great idea how? 22 © 2009 Endeca Technologies, Inc. All rights reserved.
  23. treat query construction as a process A Case for Interaction [Koenemann and Belkin, 1996] • used term feedback to improve alerting queries • users select from suggested terms • 17 – 34% improvement in precision @ 30 • users liked the feedback interface 23 © 2009 Endeca Technologies, Inc. All rights reserved.
  24. expose the facets of semistructured content 24 © 2009 Endeca Technologies, Inc. All rights reserved.
  25. success in the lab and the field • favored in user studies by Marti Hearst – http://flamenco.berkeley.edu/ • ubiquitous in ecommerce – amazon.com – eBay – endeca powers 42 of top 100 online retailers • taking over media, libraries, enterprise, etc. 25 © 2009 Endeca Technologies, Inc. All rights reserved.
  26. even a few db folks have drunk the kool-aid DataGuides [Goldman and Widom, VLDB 1997] • user-friendly schema summaries Magnet [Sinha and Karger, SIGMOD 2005] • navigation and refinement options common theme: semistructured 26 © 2009 Endeca Technologies, Inc. All rights reserved.
  27. what is semistructured data? • one universe • self-describing • blends data / meta-data 27 © 2009 Endeca Technologies, Inc. All rights reserved.
  28. data modeling flexibility • no a-priori schema – integrated sources without up-front schema design • richer modeling capabilities tame data complexity – hierarchy, multi-valued fields, sparse fields • schema flexibility eases schema evolution – new entity types, new data source WWW SOA, ESB, Groupware and Content Databases ERP Internet File Systems Web Service Collaboration Management 28 © 2009 Endeca Technologies, Inc. All rights reserved.
  29. semantically direct queries which attributes which on-sale items characterize on-sale are available in blue? blue items? price, sleeve, color, salePrice, brand, fabric, … <shirt> <buyingGuide> <sku>1234</sku> <title>Selecting the right <sleeve>Long</sleeve> ski coat for you.</title> <desc>Classic end-on-end shirt</desc> <file>skiguide.pdf</file> <price>39.99</price> <keyword>ski</keyword> <salePrice>29.99</salePrice> <keyword>coat</keyword> <color>Blue</color> ... <color>Yellow</color> </buyingGuide> <color>White</color> ... </shirt> <trousers> <sku>1579</sku> <price>59.99</price> <color>Khaki</color> ... </trousers> 29 © 2009 Endeca Technologies, Inc. All rights reserved.
  30. but let’s make this concrete Uh oh, I’m presenting at SIGMOD! Better find a good book about databases! 30 © 2009 Endeca Technologies, Inc. All rights reserved.
  31. quick, to the goog-mobile! not quite… 31 © 2009 Endeca Technologies, Inc. All rights reserved.
  32. i know, i’ll go to the library! #%@$! 32 © 2009 Endeca Technologies, Inc. All rights reserved.
  33. let’s try a little hcir… 33 © 2009 Endeca Technologies, Inc. All rights reserved.
  34. hcir works for news too 34 © 2009 Endeca Technologies, Inc. All rights reserved.
  35. life in a semistructured world • search is a great starting point – users can’t / won’t initiate structured queries • ranked lists are an inadequate ending point – search queries are lossy projections of intent • hcir leads users down a garden path to structure 35 © 2009 Endeca Technologies, Inc. All rights reserved.
  36. lots of trade-offs “everything should be made as simple as possible, but no simpler” “speed of thought” vs. “going nowhere quickly” “to err is human, but to really foul things up requires a computer” simple interfaces don’t always yield satisfaction 36 © 2009 Endeca Technologies, Inc. All rights reserved.
  37. users want the triumvirate • transparency • control • guidance transparency and control are easy guidance requires cleverness 37 © 2009 Endeca Technologies, Inc. All rights reserved.
  38. in closing all of us want to help people access information the best help is to help them help themselves design for interaction though transparency, control, guidance 38 © 2009 Endeca Technologies, Inc. All rights reserved.
  39. thank you…and come to SIGIR! communication 1.0 email: dt@endeca.com communication 2.0 blog: http://thenoisychannel.com twitter: http://twitter.com/dtunkelang SIGIR: July 19-23 in Boston Industry Track on July 22nd! 39 © 2009 Endeca Technologies, Inc. All rights reserved.

×