Your SlideShare is downloading. ×
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Actions speak louder than words: Analyzing large-scale query logs to improve the research experience

2,250

Published on

Analyzing anonymized query and click through logs leads to a better understanding of user behaviors and intentions and provides great opportunities to respond to users with an improved search …

Analyzing anonymized query and click through logs leads to a better understanding of user behaviors and intentions and provides great opportunities to respond to users with an improved search experience. A large-scale provider of SaaS services, Serials Solutions is uniquely positioned to learn from the dataset of queries aggregated from the Summon service generated by millions of users at hundreds of libraries around the world.

In this session, we will describe our Relevance Metrics Framework and provide examples of insights gained during its development and implementation. We will also cover recent product changes inspired by these insights. Chandra and Susan, from the Summon dev team, will share insights and outcomes from this ongoing process and highlight how analysis of large-scale query logs helps improve the academic research experience.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,250
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Ac#ons  Speak  Louder  than  Words:     Analyzing  large-­‐scale  query  logs  to  improve  the  research   experience   Ted.Diamond Susan.Price Raman.ChandrasekarCode4Lib  2013  
  • 2. Overview  •  Summon    ®•  The  Relevance  Metrics  Framework  (RMF)   •  helping  us  go  from  user  acCons  to  metrics  to  a  beEer  user  experience   •  Goals   •  Data  flow  in  RMF   •  Query  Sessions   •  From  Logs  to  StaCsCcs   •  Metrics  Computed   •  Challenges  in  RMF  •  Insights  from  RMF  •  Summary   2  
  • 3. The  Summon®  Discovery  Service    •  Hosted/SoQware  as  a   Service  •  Match  &  Merge   combines  rich  metadata   and  full  text  from   mulCple  sources  •  Single-­‐unified  index  •  1  billion+  items  •  >  500  customers  •  Relevancy  in  >  17   languages   3  
  • 4. The  Summon    Relevance  Metrics  Framework  (RMF)   4  
  • 5. RMF  Goals  •  Observe  and  log  user  ac(ons   •  Queries,  Types  of  queries   •  Features  used  (e.g.  filters,  advanced  search)   •  Click  paEerns    •  Compute  quality  of  search  results   •  Metrics  from  user  behavior,  such  as  clicks    •  Analyze  data  to  improve  search  results   and  enhance  the  research  experience   5  
  • 6. Data  Acquired  in  RMF  •  Queries   Sample  queries     •  Query  terms   •  anniversary  9/11   •  Filters   •  kidney  stone   •  moore  vs  mack   •  Advanced  syntax   •  moore  vs  mack  trucks  •  Clicks   •  •  Armee  Deutschland  Rolle  in  Einheit   body  art  in  the  workplace     •  the  moplah  rebellion  of  1921  john  j.  banning   •  平凡的世界  We  collect  all  queries  and   •  9780140027631   •  “boundary  dispute”  Title:(india)  clicks,  not  just  a  sample  è   •  SubjectTerms:"孙少平”    large  logs   •  TitleCombined:  (AnalyCcal  Biochemistry)                      s.fvf(ContentType,  Book  Review,  t)       •  Hawthorne,  Mark.  "The  Tale  of  TaEoos:                              The  history  and  culture  of  body    Sampling  will  not  cover                  art  in  India  and  abroad.  ”  the  long  tail  of  queries       6    
  • 7. Key  RMF  concept:  Session  •  “Finding”  oQen  spans  mulCple  queries.    •  Users  add/remove  filters;  change  search  terms    •  We  define  Search  sessions  •           Sequence  of  events  with  same  session  Id,  with:   •  No  breaks  >  90  minutes   •  Total  elapsed  Cme  <=  8  hours   •  Possibly  spanning  day  boundary  •  Data  grouped  by  session,  sorted  by  Cme          Different  level  of  abstracCon,  more  robust.   7  
  • 8. RMF  Data  Flow   Search   Search   Search    Server    Server   Search    Server    Server   Fetch     Logs   8  
  • 9. From  Logs  to  Metrics  &  Sta#s#cs   •  Remove  ‘noise’     (e.g.  test  queries)   •  IdenCfy  session  boundaries   •  Associate  clicks  with  queries     •  Compute  search  goodness   metrics  for  queries  and   sessions   •  Compute  staCsCcs  on   aggregated  data:   Abandonment,  MRR,  DCG   9  
  • 10. Data  Flow:  Metrics,  Sta#s#cs  Genera#on   Session   Query   Data   Data   Session   Query   Metrics   Metrics   Calculator   Calculator   Session   Query   Metrics   Metrics   Session   Query   StaCsCcs   StaCsCcs   Calculator   Calculator   Session   Query   Stats   Stats   10  
  • 11. Metrics  Computed:  Abandonment      Search  Abandonment  l  Intui(on:  Good  results  lead  to  clicks     So  compute:   l  %  queries  with  no  clicks  on  results   l  %  sessions  with  no  clicks  on  results     Usually  lower  abandonment  is  beEer.               11  
  • 12. Metrics  Computed:  MRR  Mean  Reciprocal  Rank  (MRR)   •  Click  on  result  #3  è  Intui(on:  Relevant  results   MRR  =  0.33  (=  1/3)  should  rank  high       •  MRR  =  0.15  è    Compute:   First  good  result   1/(Rank  of  top-­‐ranked  clicked   around  rank  6     result)   (~  1/(0.15))      L     Higher  MRR  is  beEer!   •  Best  MRR  =  1.0  !   12  
  • 13. Metrics  Computed:  DCG  Discounted  CumulaCve  Gain  (DCG)   Intui(on:  Best  to  have  relevant  results  in  the  ‘right’  order   So:   l  More  points  for  top-­‐ranking  results  clicked.   Discounted  as  you  go  down  the  result  set.   l  Cumulated  across  all  clicks  for  a  query   l  Typical  formula  for  DCG  at  rank  p,   if  ri  is  the  relevance  of  result  at  rank  i:                          DCGp  =  r1  +  Σj=2..p  (rj/log2(j))     l  We  assume  clicks  imply  relevance   13  
  • 14. Challenges  with  Log  Data  •  Dealing  with  query  or  click  spam/noise   •  Remove  expired  sessions   •  Mark  spam  as  “suspect”,  exclude  it   •  Note:  If  relaCvely  liEle  spam,  minimal  effect  on  metrics  •  Assigning  queries  to  sessions   •  Ideally,  a  session  =  one  user  +  one  informaCon  need  •  Measuring  relevance   •  Clicks:  imperfect  proxy  for  relevance/user  saCsfacCon  •  DisCnguishing  real  changes  in  relevance   from  other  causes  (e.g.  academic  calendar)                                                                              Be  Pragma(c!   14  
  • 15. How  is  RMF  helpful?    Some  examples…   15  
  • 16. Impact:  Valuable  Data  Source  Aggregated,  cleaned  data  useful  for  Autocomplete  and  Query  suggesCons       16  
  • 17. Impact:  Great  for  Analyses    How  many  results  per  page  (RPP)  is  opCmal?  More  not  always  beEer:   •  Too  many  è  user  has  to  wait   •  Too  few    è      user  has  to  keep  going  to  the    Next    link   •  RPP  was  25,  is  10  OK?     •  Used  RMF  to  model  click  rate  changes;  verified  in  producCon     Yes,  users  can  sCll  change  RPP  J     17  
  • 18. Session  Abandonment  by  #Terms  in  1st  Query     Abandonment:  Smaller  is  beEer   0   2   4   6   8   10   12   Number  of  search  terms  in  query   18  
  • 19. Impact:  Improving  User  Experience    •  Abandonment  seen  as  very  different  on   short  vs.  long  queries   •  Similar  behavior  seen  in  web  search     •  QuesCons:     •  Why?     •  How  can  we  improve  the  (re)search  experience?     •  Web  search:  Need  to  infer  intent,  or  segment  search   Ongoing  work…   19  
  • 20. Session  Abandonment  vs.  #me   Thanksgiving   Christmas   Data  Loss   20  
  • 21. Impact:  Data  Use  Plan  •  Huge  variance  in  data  across  Cme   •  e.g.  behavior  during,  at  the  end  of,  and  aQer  semester   •  [any  guesses  why?]  •  Cannot  use  small  segments  of  data  for  decision-­‐ making   •  Need  to  use  straCfied  samples,  across  Cme     21  
  • 22. Takeaways  •  Relevance  Metrics  Framework   •  Using  what  people  do,  not  say  they  do:     AcCons  vs.  Words   •  Sessions  as  a  concept   •  Going  from  logs  to  metrics  to  staCsCcs   •  Challenges  in  using  this  data  •  Some  insights  we  gained:   •  Valuable  in  many  ways              è  ConCnual  Improvements  in  Summon   22  
  • 23. Thanks!  Thanks  to  Ted,  and  to  the  Summon  team!                                                                                                            Ques#ons??        Contact  us:Susan.Price@SerialsSolutions.comRaman.Chandrasekar@SerialsSolutions.com @synthesiser 23  

×