Is Search Broken?! Daniel Tunkelang Chief Scientist, Endeca
howdy! 1992: Bachelor’s + Master’s from MIT in CS + Math 1998: PhD from CMU in CS (ACO program) 1999: Co-founded Endeca!  2008: ???
overview Who is Endeca? Is search broken? If it is, what can we do about it?
who / what is endeca? Software to help people explore, analyze, and understand complex information, guiding them to unexpected insights and better decisions. 500+ customers $108M revenue in 2007.
some of our customers
Is search broken?
Search has hit a wall.
search hits a wall in ecommerce
search hits a wall in knowledge management  Current Search:  it outsourcing
search even hits a wall on the web Results  1-10  out of about  344,000,000  for  ir
But is search broken?
the accountants don’t think so
most users don’t think so 75
or do they? 78% wish search engines could read their minds. What frustrates users most? 25%: deluge of results 24%: too many paid listings 19%: inability to understand their keywords 19%: disorganized / random results The State of Search Autobytel & Kelton Research, Oct ’07
web search vs. enterprise search “ Search on the internet is solved.   I always find what I need.   But why not in the enterprise?    Seems like a solution waiting to  happen.” - a Fortune 500 CTO
Can theory help?
precision  = fraction of retrieved documents that are relevant recall  = fraction of relevant documents that are retrieved retrieved documents relevant documents
why improve precision? the truth, nothing but the truth
why improve recall? the whole truth,
what we want… the truth, the whole truth,   nothing but the truth
but there is a trade-off… recall precision
which should we favor? Precision …to avoid annoying users with irrelevant results? Recall …to make sure we don’t throw away results the user wants / needs?
Enough stalling…what’s the answer?!
depends on what you want vs.
you get what you pay for There are easy use cases… 30% of queries are navigational. 30% of queries lead to Wikipedia pages. Users won’t pay, but advertisers will! …and hard use cases. Queries where recall matters. Exploratory search. Enterprises will pay for insight.
Great, bring on the insight!
technology alone can’t provide insight The system can’t read your mind. Your spouse / best friend can’t read your mind. Sometimes you can’t read your own mind.
So should we just give up?
technology is a catalyst Computers are good at analysis. People are good at using what they know. How do we get the best of both worlds?
with apologies to luis von ahn
human-computer information retrieval Instead of guessing the user’s intent, optimize communication. De-emphasize the top ten documents; response is a set of documents. Think beyond single queries; support refinement and exploration.
hcir cheats the trade-off recall precision
But how do we implement HCIR?
endeca's approach: guided summarization Set retrieval that responds to queries with an overview of the user's current context. an organized set of options for incremental exploration. Contextual summaries of document sets  optimize system’s communication with user. Query refinement options optimize user’s communication with system.
guided summarization for ecommerce Matching Categories include: Appliances > Small Appliances > Irons & Steamers Appliances > Small Appliances > Microwaves & Steamers Bath > Sauna & Spas > Steamers Kitchen > Bakeware & Cookware > Cookware > Open Stock Pots > Double Boilers & Steamers Kitchen > Small Appliances > Steamers
guided summarization for KM
Guided summarization starts with faceted search.
facets 101
But faceted search isn’t enough…
showing the right facets: microwaves vs.
showing the right facets: ceiling fans
traditional topic taxonomy
dynamic topic facet Subject Electronic data processing  (1002) Distributed processing  (937) Parallel processing  (619) Computer networks  (562) Fault-tolerant-computing  (365) Show more… Subject Artificial intelligence   (227) High performance computing   (244) Automatic theorem proving   (9) History   (11) Client/server computing   (185) Information technology   (145) Computer algorithms   (110) Java   (77) Computer architecture   (162) Law and legislation   (70) Computer networks   (552) Logic, Symbolic and mathematical   (16) Computer programs   (139) Mathematics   (70) Computer security   (151) Mobile communication systems   (54) Computer software   (253) Operating systems   (87) Computers   (124) Parallel processing   (619) Database management   (277) Research   (83) Distributed processing   (937) Software engineering   (197) Electronic data processing   (1002) Supercomputers   (139) Electronic digital computers   (148) Web databases   (54) Fault-tolerant computing   (365) Wireless communication systems   (97)
facets populated using entity extraction apple production
cutting through facets to show the big picture Search : storage
summarization: more than search and browse
guided summarization – a summary Guided summarization enables a dialog between the user and the data, enabling exploration and discovery.
The Moral
think outside the box Search works for many use cases. But not for some of the most valuable ones. Focus on human-computer information retrieval.
One More Thing
maybe we should treat search as a game
thank you Questions?

Is Search Broken?!

  • 1.
    Is Search Broken?!Daniel Tunkelang Chief Scientist, Endeca
  • 2.
    howdy! 1992: Bachelor’s+ Master’s from MIT in CS + Math 1998: PhD from CMU in CS (ACO program) 1999: Co-founded Endeca! 2008: ???
  • 3.
    overview Who isEndeca? Is search broken? If it is, what can we do about it?
  • 4.
    who / whatis endeca? Software to help people explore, analyze, and understand complex information, guiding them to unexpected insights and better decisions. 500+ customers $108M revenue in 2007.
  • 5.
    some of ourcustomers
  • 6.
  • 7.
  • 8.
    search hits awall in ecommerce
  • 9.
    search hits awall in knowledge management Current Search: it outsourcing
  • 10.
    search even hitsa wall on the web Results 1-10 out of about 344,000,000 for ir
  • 11.
  • 12.
  • 13.
    most users don’tthink so 75
  • 14.
    or do they?78% wish search engines could read their minds. What frustrates users most? 25%: deluge of results 24%: too many paid listings 19%: inability to understand their keywords 19%: disorganized / random results The State of Search Autobytel & Kelton Research, Oct ’07
  • 15.
    web search vs.enterprise search “ Search on the internet is solved. I always find what I need. But why not in the enterprise? Seems like a solution waiting to happen.” - a Fortune 500 CTO
  • 16.
  • 17.
    precision =fraction of retrieved documents that are relevant recall = fraction of relevant documents that are retrieved retrieved documents relevant documents
  • 18.
    why improve precision?the truth, nothing but the truth
  • 19.
    why improve recall?the whole truth,
  • 20.
    what we want…the truth, the whole truth, nothing but the truth
  • 21.
    but there isa trade-off… recall precision
  • 22.
    which should wefavor? Precision …to avoid annoying users with irrelevant results? Recall …to make sure we don’t throw away results the user wants / needs?
  • 23.
  • 24.
    depends on whatyou want vs.
  • 25.
    you get whatyou pay for There are easy use cases… 30% of queries are navigational. 30% of queries lead to Wikipedia pages. Users won’t pay, but advertisers will! …and hard use cases. Queries where recall matters. Exploratory search. Enterprises will pay for insight.
  • 26.
    Great, bring onthe insight!
  • 27.
    technology alone can’tprovide insight The system can’t read your mind. Your spouse / best friend can’t read your mind. Sometimes you can’t read your own mind.
  • 28.
    So should wejust give up?
  • 29.
    technology is acatalyst Computers are good at analysis. People are good at using what they know. How do we get the best of both worlds?
  • 30.
    with apologies toluis von ahn
  • 31.
    human-computer information retrievalInstead of guessing the user’s intent, optimize communication. De-emphasize the top ten documents; response is a set of documents. Think beyond single queries; support refinement and exploration.
  • 32.
    hcir cheats thetrade-off recall precision
  • 33.
    But how dowe implement HCIR?
  • 34.
    endeca's approach: guidedsummarization Set retrieval that responds to queries with an overview of the user's current context. an organized set of options for incremental exploration. Contextual summaries of document sets optimize system’s communication with user. Query refinement options optimize user’s communication with system.
  • 35.
    guided summarization forecommerce Matching Categories include: Appliances > Small Appliances > Irons & Steamers Appliances > Small Appliances > Microwaves & Steamers Bath > Sauna & Spas > Steamers Kitchen > Bakeware & Cookware > Cookware > Open Stock Pots > Double Boilers & Steamers Kitchen > Small Appliances > Steamers
  • 36.
  • 37.
    Guided summarization startswith faceted search.
  • 38.
  • 39.
    But faceted searchisn’t enough…
  • 40.
    showing the rightfacets: microwaves vs.
  • 41.
    showing the rightfacets: ceiling fans
  • 42.
  • 43.
    dynamic topic facetSubject Electronic data processing (1002) Distributed processing (937) Parallel processing (619) Computer networks (562) Fault-tolerant-computing (365) Show more… Subject Artificial intelligence (227) High performance computing (244) Automatic theorem proving (9) History (11) Client/server computing (185) Information technology (145) Computer algorithms (110) Java (77) Computer architecture (162) Law and legislation (70) Computer networks (552) Logic, Symbolic and mathematical (16) Computer programs (139) Mathematics (70) Computer security (151) Mobile communication systems (54) Computer software (253) Operating systems (87) Computers (124) Parallel processing (619) Database management (277) Research (83) Distributed processing (937) Software engineering (197) Electronic data processing (1002) Supercomputers (139) Electronic digital computers (148) Web databases (54) Fault-tolerant computing (365) Wireless communication systems (97)
  • 44.
    facets populated usingentity extraction apple production
  • 45.
    cutting through facetsto show the big picture Search : storage
  • 46.
    summarization: more thansearch and browse
  • 47.
    guided summarization –a summary Guided summarization enables a dialog between the user and the data, enabling exploration and discovery.
  • 48.
  • 49.
    think outside thebox Search works for many use cases. But not for some of the most valuable ones. Focus on human-computer information retrieval.
  • 50.
  • 51.
    maybe we shouldtreat search as a game
  • 52.