• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Search as Communication: Lessons from a Personal Journey
 

Search as Communication: Lessons from a Personal Journey

on

  • 8,928 views

Search as Communication: Lessons from a Personal Journey ...

Search as Communication: Lessons from a Personal Journey
by Daniel Tunkelang (Head of Query Understanding, LinkedIn)

Presented at Etsy's Code as Craft Series on May 21, 2013

When I tell people I spent a decade studying computer science at MIT and CMU, most assume that I focused my studies in information retrieval — after all, I’ve spent most of my professional life working on search.

But that’s not how it happened. I learned about information extraction as a summer intern at IBM Research, where I worked on visual query reformulation. I learned how search engines work by building one at Endeca. It was only after I’d hacked my way through the problem for a few years that I started to catch up on the rich scholarly literature of the past few decades.

As a result, I developed a point of view about search without the benefit of academic conventional wisdom. Specifically, I came to see search not so much as a ranking problem as a communication problem.

In this talk, I’ll explain my communication-centric view of search, offering examples, general techniques, and open problems.

--

Daniel Tunkelang is Head of Query Understanding at LinkedIn. Educated at MIT and CMU, he has his career working on big data, addressing key challenges in search, data mining, user interfaces, and network analysis. He co-founded enterprise search and business intelligence pioneer Endeca, where he spent a decade as its Chief Scientist. In 2011, Endeca was acquired by Oracle for over $1B. Previous to LinkedIn, he led a team at Google working on local search quality. Daniel has authored fifteen patents, written a textbook on faceted search, and created the annual symposium on human-computer interaction and information retrieval.

Statistics

Views

Total Views
8,928
Views on SlideShare
7,471
Embed Views
1,457

Actions

Likes
18
Downloads
49
Comments
1

12 Embeds 1,457

http://www.linkedin.com 892
https://www.linkedin.com 170
http://specialedition.linkedin.com 147
http://www.google.com 127
https://twitter.com 61
http://www.borjasantaolalla.com 51
http://www.scoop.it 3
http://safe.txmblr.com 2
http://www.onlydoo.com 1
http://plus.url.google.com 1
http://10.1.12.249 1
http://tweetedtimes.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Hai Daniel: I too studied what library science offered to information retrieval. HyperPlex incorporates the classification Master Table Of Contents as an option for exploring and navigating to what is needed. Later I learnt that MTOC is Ontology (a set of ontologies). We use it in HyperPlex. Please take look and see if you can use it.

    http://www.slideshare.net/putchavn/hyper-plex-high-precision-queryresponse-knowledge-repository-pdf

    Hope you do not mind too many comments in a short span.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Search as Communication: Lessons from a Personal Journey Search as Communication: Lessons from a Personal Journey Presentation Transcript

    • Search  as  Communica/on:  Lessons  from  a  Personal  Journey  Daniel  Tunkelang  Head  of  Query  Understanding,  LinkedIn  
    • These  are  great  textbooks  on  informa/on  retrieval.  
    • Unfortunately,  I  never  read  them  in  school.  But  I  did  study  graphs  and  stuff.    
    • I  found  myself  developing  a  search  engine.  
    • And  the  next  thing  I  knew,  I  was  a  search  guy.  
    • So  what  did  I  learn  along  the  way?  
    • Search  isnt  a  ranking  problem.  Its  a  communica/on  problem.  
    • Outline  1.  Lessons  from  Library  Science    2.  Adventures  with  InformaAon  ExtracAon    3.  A  Moment  of  Clarity  
    • 1.  Lessons  from  Library  Science  
    • InformaAon  need   query   select  from  results  rank  using  IR  model  USER:  SYSTEM:  M-­‐idf   PageRank  A  birds-­‐eye  view  of  how  search  engines  work.  
    • Old  school  search:  ask  a  librarian.  
    • Search  lives  in  an  informa/on-­‐seeking  context.    [Pirolli  and  Card,  2005]  
    • vs.  Recognize  ambiguity  and  ask  for  clarifica/on.  
    • Clarify,  then  refine.  Computers   Books  
    • Faceted  search.  It’s  not  just  for  e-­‐commerce.  
    • Give  users  transparency,  guidance,  and  control.  
    • Take-­‐away  for  search  engine  developers:      Act  like  a  librarian.  Communicate  with  your  user.  
    • 2.  Adventures  with  Informa/on  Extrac/on  
    • String  matching  is  great  but  has  limits.  
    • 20  20for i in [1..n]!s ← w1 w2 … wi!if Pc(s) > 0!a ← new Segment()!a.segs ← {s}!a.prob ← Pc(s)!B[i] ← {a}!for j in [1..i-1]!for b in B[j]!s ← wj wj+1 … wi!if Pc(s) > 0!a ← new Segment()!a.segs ← b.segs U {s}!a.prob ← b.prob * Pc(s)!B[i] ← B[i] U {a}!sort B[i] by prob!truncate B[i] to size k!People  search  for  en//es.  Recognize  them!  
    • Named  en/ty  recogni/on  is  free,  as  in  free  beer.  
    • Problem:  they  process  each  document  separately.  EnAty  DetecAon  System  Why  not  take  advantage  of  corpus  features?      
    • Give  your  documents  the  right  to  vote!  Use  a  high-­‐recall  method  to  collect  candidates.  •  e.g.,  all  Atle-­‐case  spans  of  words  other  than  single  word  beginning  a  sentence.    Process  each  document  separately.  •  Each  candidate  is  assigned  an  enAty  type,  or  no  type  at  all.    If  a  candidate  is  mostly  assigned  a  single  enAty  type,  extrapolate  to  all  its  occurrences.  
    • Looking  for  topics?  Use  idf,  and  its  cousin  ridf.  Inverse  document  frequency  (idf)  •  Too  low?  Probably  a  stop  word.  •  Too  high?  Could  be  noise.    Residual  inverse  document  frequency  (ridf)  •  Predict  idf  using  Poisson  model.  •  Difference  between  idf  and  predicted  idf.    “a  good  keyword  is  far  from  Poisson”            [Church  and  Gale,  1995]  
    • Terminology  extrac/on?  Try  data  recycling.  
    • Obtain  en//es  by  any  means  necessary.  
    • Take-­‐away  for  search  engine  developers:      En/ty  detec/on  is  crucial.  And  it  isn’t  that  hard.  
    • 3.  A  Moment  of  Clarity  
    • informaAon  Need   query   select  from  results  rank  using  IR  model  USER:  SYSTEM:  M-­‐idf   PageRank  Let’s  go  back  to  our  pigeons  for  a  moment.    
    • What  does  this  process  look  like  to  the  system?  vs.  
    • And  here’s  what  it  looks  like  to  the  user.  GOOD   NOT  SO  GOOD  But  can  the  system  tell  the  difference?  
    • User  experience  should  reflect  system  confidence.  vs.  
    • h^p://searchengineland.com/ge`ng-­‐organized-­‐paid-­‐search-­‐user-­‐intent-­‐the-­‐search-­‐funnel-­‐116312  Derived  from  [Jansen  et  al,  2007].  Searches  reflect  a  variety  of  informa/on  needs.  
    • 34  34for i in [1..n]!s ← w1 w2 … wi!if Pc(s) > 0!a ← new Segment()!a.segs ← {s}!a.prob ← Pc(s)!B[i] ← {a}!for j in [1..i-1]!for b in B[j]!s ← wj wj+1 … wi!if Pc(s) > 0!a ← new Segment()!a.segs ← b.segs U {s}!a.prob ← b.prob * Pc(s)!B[i] ← B[i] U {a}!sort B[i] by prob!truncate B[i] to size k!We  can  segment  informa/on  need  from  the  query.  
    • We  can  learn  from  analyzing  user  behavior.  
    • And  we  can  look  at  our  relevance  scores.  Naviga/onal   Exploratory  
    • Claudia  Hauff,  Query  Difficulty  for  Digital  Libraries  [2009]  There  are  many  pre-­‐  and  post-­‐retrieval  signals.  
    • Take-­‐away  for  search  engine  developers:      Queries  vary  in  difficulty.  Recognize  and  adapt.  
    • Review  1.  Lessons  from  Library  Science  •  Act  like  a  librarian.  Communicate  with  users.    2.  Adventures  with  InformaAon  ExtracAon  •  EnAty  detecAon  is  crucial.  And  isn’t  that  hard.    3.  A  Moment  of  Clarity  •  Queries  vary  in  difficulty.  Recognize  and  adapt.  
    • Conclusion:  Read  the  textbooks.  But  treat  search  as  a  communica/on  problem.  
    • WE’RE  HIRING!  hbp://data.linkedin.com/search      Contact  me:  dtunkelang@linkedin.com  hbp://linkedin.com/in/dtunkelang