Advertisement
Advertisement

More Related Content

Similar to To be or not be engaged: What are the questions (to ask)?(20)

Advertisement

More from Mounia Lalmas-Roelleke(20)

Advertisement

To be or not be engaged: What are the questions (to ask)?

  1. To  be  or  not  be  engaged:     What  are  the  ques2ons  (to  ask)?   Mounia  Lalmas   Yahoo!  Labs  Barcelona   mounia@acm.org   1  
  2. About  me   •  Since  January  2011:  Visi2ng  Principal  Scien2st  at  Yahoo!   Labs  Barcelona   •  User  engagement,  social  media,  search   •  1999-­‐2008:  Lecturer  (assistant  professor)  to  Professor  at   Queen  Mary,  University  of  London   •  XML  retrieval  and  evalua>on  (INEX)   •  2008-­‐2010:  MicrosoR  Research/RAEng  Research  Professor   at  the  University  of  Glasgow   •  Quantum  theory  to  model  informa>on  retrieval   Blog:  labtomarket.wordpress.com   2  
  3. Why  is  it  important  to  engage  users?   •  In  today’s  wired  world,  users  have  enhanced   expecta>ons  about  their  interac>ons  with  technology    …  resul>ng  in  increased  compe>>on  amongst  the                                    purveyors  and  designers  of  interac>ve  systems.     •  In  addi>on  to  u>litarian  factors,  such  as  usability,  we   must  consider  the  hedonic  and  experien>al  factors  of   interac>ng  with  technology,  such  as  fun,  fulfillment,  play,   and  user  engagement.   •  In  order  to  make  engaging  systems,  we  need  to   understand  what  user  engagement  is  and  how  to   measure  it.   3  
  4. Why  is  it  important  to  measure  and  interpret  user   engagement  well?   CTR   4  
  5. Outline   •  What  is  user  engagement?   •  What  are  the  characteris>cs  of  user  engagement?   •  How  to  measure  user  engagement?   •  What  are  the  ques>ons  to  ask?   saliency,  interes>ng,  serendipity,  relevance,   sen>ment,  reading,  news,  social  media,     user  generated  content,  automa>c  linking,   aesthe>cs.   5  
  6. WHAT  IS  USER   ENGAGEMENT?   6  
  7. h[p://thenextweb.com/asia/2013/05/03/kakao-­‐talk-­‐rolls-­‐out-­‐plus-­‐friend-­‐home-­‐a-­‐ revamped-­‐pla_orm-­‐to-­‐connect-­‐users-­‐with-­‐their-­‐favorite-­‐brands/   Engagement  is  on  everyone’s  mind   h[p://socialbarrel.com/70-­‐percent-­‐of-­‐brand-­‐engagement-­‐on-­‐pinterest-­‐come-­‐from-­‐users/51032/   h[p://iac>onable.com/user-­‐engagement/   h[p://www.cio.com.au/ar>cle/459294/ heart_founda>on_uses_gamifica>on_drive_user_engagement/   h[p://www.localgov.co.uk/index.cfm?method=news.detail&id=109512   h[p://www.trefis.com/stock/lnkd/ar>cles/179410/linkedin-­‐makes-­‐a-­‐90-­‐ million-­‐bet-­‐on-­‐pulse-­‐to-­‐help-­‐drive-­‐user-­‐engagement/2013-­‐04-­‐15   7  
  8. What  is  user  engagement?   User  engagement  is  a  quality  of  the  user  experience  that   emphasizes  the  posi>ve  aspects  of  interac>on  –  in   par>cular  the  fact  of  being  cap>vated  by  the  technology   (Ahield  et  al,  2011).                     user  feelings:  happy,  sad,   excited,  …   emo>onal,  cogni>ve  and  behavioural  connec>on     that  exists,  at  any  point  in  >me  and  over  >me,  between     a  user  and  a  technological  resource     user  interac2ons:  click,     read,  comment,  buy…   user  mental  states:  involved,     lost,  concentrated…   8  
  9. Considera2ons  in  the  measurement  of   user  engagement   •  Short  term  (within  session)  and  long  term  (across   mul>ple  sessions)   •  Laboratory  vs.  field  studies   •  Subjec>ve  vs.  objec>ve  measurement   •  Large  scale  (e.g.,  dwell  >me  of  100,000  people)   vs.  small  scale  (gaze  pa[erns  of  10  people)   •  User  engagement  as  process  vs.  as  product   One  is  not  be[er  than  other;  it  depends  on  what  is  the  aim.   9  
  10. CHARACTERISTICS   OF  USER   ENGAGEMENT   10  
  11. Characteris2cs  of  user  engagement  (I)   • Users  must  be  focused  to  be  engaged     • Distor>ons  in  the  subjec>ve  percep>on  of  >me  used  to   measure  it   Focused  a_en2on   (Webster  &  Ho,  1997;  O’Brien,   2008)   • Emo>ons  experienced  by  user  are  intrinsically  mo>va>ng   • Ini>al  affec>ve  “hook”  can  induce  a  desire  for  explora>on,   ac>ve  discovery  or  par>cipa>on   Posi2ve  Affect     (O’Brien  &  Toms,  2008)   • Sensory,  visual  appeal  of  interface  s>mulates  user  &   promotes  focused  a[en>on   • Linked  to  design  principles  (e.g.  symmetry,  balance,  saliency)   Aesthe2cs     (Jacques  et  al,  1995;  O’Brien,  2008)   • People  remember  enjoyable,  useful,  engaging  experiences   and  want  to  repeat  them   • Reflected  in  e.g.  the  propensity  of  users  to  recommend  an   experience/a  site/a  product   Endurability       (Read,  MacFarlane,  &  Casey,  2002;   O’Brien,  2008)   11  
  12. Characteris2cs  of  user  engagement  (II)   •  Novelty,  surprise,  unfamiliarity  and  the  unexpected   •  Appeal  to  users’  curiosity;  encourages  inquisi>ve   behavior  and  promotes  repeated  engagement   Novelty     (Webster  &  Ho,  1997;  O’Brien,   2008)     •  Richness  captures  the  growth  poten>al  of  an  ac>vity   •  Control  captures  the  extent  to  which  a  person  is  able   to  achieve  this  growth  poten>al   Richness  and  control   (Jacques  et  al,  1995;  Webster  &   Ho,  1997)   •  Trust  is  a  necessary  condi>on  for  user  engagement   •  Implicit  contract  among  people  and  en>>es  which  is   more  than  technological   Reputa2on,  trust  and   expecta2on  (Attfield et al, 2011)   •  Difficul>es  in  sevng  up  “laboratory”  style   experiments   •  Why  should  users  engage?   Mo2va2on,  interests,   incen2ves,  and  benefits   (Jacques  et  al.,  1995;  O’Brien  &   Toms,  2008)   12  
  13. MEASURING  USER   ENGAGEMENT   13  
  14. Measuring  user  engagement   Measures   Characteris2cs   Self-­‐reported   engagement   Ques>onnaire,  interview,  report,   product  reac>on  cards,  think-­‐aloud   Subjec>ve   Short-­‐  and  long-­‐term   Lab    and  field   Small-­‐scale   Product  outcome   Cogni>ve   engagement   Task-­‐based  methods  (>me  spent,   follow-­‐on  task)     Physiological  measures  (e.g.  EEG,  SCL,   fMRI,  eye  tracking,  mouse-­‐tracking)   Objec>ve   Short-­‐term   Lab  and  field   Small-­‐scale  and  large-­‐scale   Process  outcome   Interac>on   engagement   Web  analy>cs       metrics  +  models   Objec>ve   Short-­‐  and  long-­‐term   Field     Large-­‐scale   Process  outcome   14  
  15. Large-­‐scale  measurements  of  user   engagement  –  Web  analy2cs   Intra-­‐session  measures   Inter-­‐session  measures   •  Dwell  >me  /  session  dura>on   •  Play  >me  (video)   •  (Mouse  movement)   •  Click  through  rate  (CTR)   •  Mouse  movement   •  Number  of  pages  viewed  (click   depth)   •  Conversion  rate  (mostly  for  e-­‐ commerce)   •  Number  of  UCG  (comments)     •  Frac>on  of  return  visits     •  Time  between  visits  (inter-­‐session   >me,  absence  >me)   •  Total  view  >me  per  month  (video)   •  Life>me  value  (number  of  ac>ons)   •  Number  of  sessions  per  unit  of  >me   •  Total  usage  >me  per  unit  of  >me   •  Number  of  friends  on  site  (social   networks)   •  Number  of  UCG  (comments)   •  Intra-­‐session  engagement  measures  our  success  in  a[rac>ng  the   user  to  remain  on  our  site  for  as  long  as  possible.   •  Inter-­‐session  engagement  can  be  measured  directly  or,  for   commercial  sites,  by  observing  life>me  customer  value.   15  
  16. Cogni2ve  engagement   •  Eye  tracking   •  Mouse  movement   •  Face  expression   •  Psychophysiological  measures          Respira>on,  Pulse  rate          Temperature,  Brain  wave,          Skin  conductance,  …     16  
  17. Signals  –  Signals  –  Signals:  Five  studies   self-­‐reported  engagement     WHAT  ARE  THE   QUESTIONS  TO  ASK?   Interac>on  engagement     17  
  18. STUDY  I   •  Domain:  entertainment  news   •  Study:  saliency   •  Measurement:  focus  a[en>on  and  affect   18  +  Lori  McCay-­‐Peet  +  Vidhya  Navalpakkam  
  19. •  How  the  visual  catchiness  (saliency)  of  “relevant”   informa>on  impacts  user  engagement  metrics  such   as  focused  a[en>on  and  emo>on  (affect)   •  focused  a_en2on  refers  to  the  exclusion  of  other   things   •  affect  relates  to  the  emo>ons  experienced  during   the  interac>on   •  Saliency  model  of  visual  a[en>on  developed                         by  (Iv  &  Koch,  2000)       Self-­‐report  engagement   19  
  20. Manipula2ng  saliency   Web  page  screenshot      Saliency  maps     salient  condi>on  non-­‐salient  condi>on   (McCay-­‐Peet  et  al,  2012)   20  
  21. Study  design   •  8  tasks  =  finding  latest  news  or  headline  on  celebrity  or   entertainment  topic   •  Affect  measured  pre-­‐  and  post-­‐  task  using  the  Posi>ve   e.g.  “determined”,  “a[en>ve” and  Nega>ve  e.g.  “hos>le”,   “afraid”  Affect  Schedule  (PANAS)     •  Focused  a[en>on  measured    with  7-­‐item  focused   a4en5on  subscale  e.g.  “I  was  so  involved  in  my  news  tasks  that  I   lost  track  of  >me”,  “I  blocked  things  out  around  me  when  I  was   comple>ng  the  news  tasks”  and  perceived  >me   •  Interest  level  in  topics  (pre-­‐task)  and  ques>onnaire   (post-­‐task)  e.g.  “I  was  interested  in  the  content  of  the  web   pages”,  “I  wanted  to  find  out  more  about  the  topics  that  I   encountered  on  the  web  pages”   •  189  (90+99)  par>cipants  from  Amazon  Mechanical  Turk     21  
  22. PANAS  (10  posi2ve  items  and  10  nega2ve  items)   •  You  feel  this  way  right  now,  that  is,  at  the  present  moment      [1  =  very  slightly  or  not  at  all;  2  =  a  li[le;  3  =  moderately;                4  =  quite  a  bit;  5  =  extremely]              [randomize  items]        distressed,  upset,  guilty,  scared,  hos>le,      irritable,  ashamed,  nervous,  ji[ery,  afraid    interested,  excited,  strong,  enthusias>c,  proud,    alert,  inspired,  determined,  a[en>ve,  ac>ve   (Watson,  Clark  &  Tellegen,  1988)   22  
  23. 7-­‐item  focused  a_en2on  subscale       (part  of  the  31-­‐item  user  engagement  scale)      5-­‐point  scale  (strong  disagree  to  strong  agree)   1.  I  lost  myself  in  this  news  tasks  experience   2.  I  was  so  involved  in  my  news  tasks  that  I  lost  track  of  >me   3.  I  blocked  things  out  around  me  when  I  was  comple>ng  the   news  tasks   4.  When  I  was  performing  these  news  tasks,  I  lost  track  of   the  world  around  me   5.  The  >me  I  spent  performing  these  news  tasks  just  slipped   away   6.  I  was  absorbed  in  my  news  tasks     7.  During  the  news  tasks  experience  I  let  myself  go   (O'Brien  &  Toms,  2010)   23  
  24. Saliency  and  posi2ve  affect   •  When  headlines  are  visually  non-­‐salient   •   users  are  slow  at  finding  them,  report  more   distrac>on  due  to  web  page  features,  and  show  a   drop  in  affect   •  When  headlines  are  visually  catchy  or  salient   •   user  find  them  faster,  report  that  it  is  easy  to   focus,  and  maintain  posi>ve  affect   •  Saliency  is  helpful  in  task  performance,  focusing/ avoiding  distrac2on  and  in  maintaining  posi2ve   affect   24  
  25. Saliency  and  focused  a_en2on   •  Adapted  focused  a[en>on  subscale  from  the   online  shopping  domain  to  entertainment  news   domain   •  Users  reported  “easier  to  focus  in  the  salient   condi>on”  BUT  no  significant  improvement  in  the   focused  a[en>on  subscale  or  differences  in   perceived  >me  spent  on  tasks     •  User  interest  in  web  page  content  is  a  good   predictor  of  focused  a_en2on,  which  in  turn  is  a   good  predictor  of  posi2ve  affect     25  
  26. Self-­‐repor2ng,  crowdsourcing,  saliency  and   user  engagement   •  Interac>on  of  saliency,  focused  a[en>on,  and  affect,   together  with  user  interest,  is  complex.   •  Using  crowdsourcing  worked!   •  What  next?     •  include  web  page  content  as  a  quality  of  user   engagement  in  focused  a[en>on  scale   •  more  “realis2c”  user  (interac>ve)  reading  experience   •  other  measurements:  mouse-­‐tracking,  eye-­‐tracking,   facial  expression  analysis,  etc.   (McCay-­‐Peet,  Lalmas  &  Navalpakkam,  2012)   26  
  27. STUDY  II   •  Domain:  news  and  user  generated  content   (comments)   •  Study:  interes>ngness  and  sen>ment   •  Measurement:  focus  a[en>on,  affect  and   gaze   27   +  Ioannis  Arapakis  +  Barla  Cambazoglu   +  Mari-­‐Carmen  Marcos  +  Joemon  Jose  
  28. Gaze  and  self-­‐repor2ng   •  News  +  comments   •  Sen>ment,  interest   •  57  users  (lab-­‐based)   •  Reading  task  (114)     •  Ques>onnaire  (qualita>ve  data)   •  Record  eye  tracking                (quan>ta>ve  data)   Three  metrics:  gaze,  focus   a[en>on  and  posi>ve  affect   28  (Lin  et  al,  2007)  
  29. Interes2ng  content  promote  users   engagement  metrics   •  All  three  metrics:     •  focus  a[en>on,  posi>ve  affect  &  gaze   •  What  is  the  right  trade-­‐off?   •  news  is  news  J     •  Can  we  predict?   •  provider,  editor,  writer,  category,  genre,  visual  aids,  …,   sen2mentality,  …   •  Role  of  user-­‐generated  content  (comments)   •  As  measure  of  engagement?   •  To  promote  engagement?   29  
  30. Lots  of  sen2ments  but  with  nega2ve   connota2ons!   •  Positive effect (and interest, enjoyment and wanted to know more) correlates •  Positively (é) with sentimentality (lots of emotions) •  Negatively (ê) with positive polarity (happy news) Sen2Strenght  (from  -­‐5  to  5  per  word)            sen>mentality:  sum  of  absolute  values  (amount  of  sen>ments)          polairity:  sum  of  values  (direc>on  of  the  sen>ments:  posi>ve  vs    nega>ve)   (Thelwall,  Buckley  &  Paltoglou,  2012)   30  
  31. Effect  of  comments  on  user  engagement   •  6  ranking  of  comments:   •  most  replied,  most  popular,  newest   •  sen>mentality  high,  sen>mentality  low   •  polarity  plus,  polarity  minus   •  Longer  gaze  on   •  newest  and  most  popular  for  interes>ng  news   •  most  replied  and  high  sen>mentality  for  non-­‐interes>ng   news   •  Can  we  leverage  this  to  prolong  user  a[en>on?   31  
  32. Gaze,  sen2mentality,  interest   •  Interes>ng  and  “a[rac>ve”  content!   •  Sen>ment  as  a  proxy  of  focus  a[en>on,  posi>ve   affect  and  gaze?   •  Next   •  Larger-­‐scale  study   •  Other  domains  (beyond  news!)   •  Role  of  social  signals  (e.g.  Facebook,  Twi[er)   •  Lots  more  data:  mouse  tracking,  EEG,  facial  expression   (Arapakis  et  al.,  2013)   32  
  33. STUDY  III   •  Domain:  news  and  social  media  (Wikipedia)   •  Study:  interes>ngness,  aesthe>cs,  task   •  Measurement:  focus  a[en>on,  affect  and   mouse  movement   33  +  David  Warnock  
  34. Mouse  tracking  and  self-­‐repor2ng   •  324  users  from  Amazon  Mechanical  Turk  (between  subject   design)   •  Two  domains  (BBC  News  and  Wikipedia)   •  Two  tasks  (reading  and  search)   •  “Normal  vs  Ugly” interface   •  Ques>onnaires  (qualita>ve  data)   •  focus  a[en>on,  posi>ve  effect,  novelty,     •  interest,  usability,  aesthe>cs     •  +  demographics,  handeness  &  hardware   •  Mouse  tracking  (quan>ta>ve  data)   •  movement  speed,  movement  rate,  click  rate,  pause  length,   percentage  of  >me  s>ll   34  
  35. “Ugly” vs  “Normal”  Interface  (BBC  News)   35  
  36. Mouse  tracking  can  tell  about   •  Age   •  Hardware   •  Mouse   •  Trackpad   •  Task   •  Searching:  There  are  many  different  types  of  phobia.  What   is  Gephyrophobia  a  fear  of?   •  Reading:  (Wikipedia)  Archimedes,  Sec5on  1:  Biography   36  
  37. Mouse  tracking  could  not  tell  much  about   •  focused  a[en>on  and  posi>ve  affect   •  user  interests  in  the  task/topic   •  BUT  BUT  BUT  BUT   •  “ugly”  variant  did  not  result  in  lower  aesthe>cs  scores   •   although  BBC  >  Wikipedia   •  BUT  –  the  comments  le•  …   •  Wikipedia:  “The  website  was  simply  awful.  Ads  flashing  everywhere,  poor   text  colors  on  a  dark  blue  background.”;  “The  webpage  was  en5rely  blue.  I   don't  know  if  it  was  supposed  to  be  like  that,  but  it  definitely  detracted              from  the  browsing  experience.”   •  BBC  News:  “The  website's  layout  and  color  scheme  were  a  bitch  to   navigate  and  read.”;  “Comic  sans  is  a  horrible  font.”   37  
  38. Mouse  tracking  and  user  engagement   •  Task  and  hardware   •  Do  we  have  a  Hawthorne  Effect???   •  “Usability” vs  engagement   •  “Even  uglier”  interface?     •  Within-­‐  vs  between-­‐subject  design?   •  What  next?   •  Sequence  of  movements   •  Automa>c  clustering   (Warnock  &  Lalmas,  2013)     38  
  39. STUDY  IV   •  Domain:  news   •  Study:  automa>c  linking   •  Measurement:  interes>ngness   39   +  Ioannis  Arapakis  +Hakan  Ceylan   +  Pinar  Domnez  
  40. Automatic linking & reading experience 40   Keeping  users   reading  more  ar>cles  
  41. LEPA: Linker for Events to Past Articles LEPA is a a fully automated approach to constructing hyperlinks in news articles using “simple” text processing and understanding techniques Indexer   • Processes  ar>cles  over  a   >me  period  by   extrac>ng  features  from   each  ar>cle  and  storing   them  to  facilitate  faster   retrieval   Linker   •  Iden>fies  sentences  that   contain  newsworthy  events   •  For  each  such  event  it   retrieves  from  the  index  all   the  matching  ar>cles  and   links  the  top-­‐ranked  with  the   event   41  
  42. Three-stage evaluation Pilot  study   Assessing  reading   experience   Assessing  links   + 42  
  43. Pilot study •  Rating results: •  Bad: 35.15% •  Fair: 33.93% •  Good: 20% •  Excellent: 9.09% •  Not Judged: 1.81% • With 63.03% of the links being good: •  initial evidence that LEPA is not too far from the optimum achieved by human editors Professional  editors        A  collec>on  of  system-­‐embedded        links  (164  ar>cle-­‐link  combina>ons)          5-­‐point  Likert  Scale:  (i)  bad,  (ii)  fair,        (iii)  good,  (iv)  excellent,  and  (v)  not        judged       43  
  44. Assessing  the  links:  are  they  related?   •  664  par>cipants  recruited  through  Amazon  Mechanical   Turk;  between-­‐group  design  (two  groups)   •  Precision  =  frac>on  of  links  (total=164)  that  received,  in   terms  of  relatedness,  a  score  equal  to,  or  greater  than,  3   on  a  5-­‐point  Likert  scale     System-Embedded Links Manually-curated Links Participant A Participant B All Participant A Participant B All Related to the main theme 49% 42% 45% 54% 51% 53% Related to subtopic 21% 24% 22% 31% 34% 33% Tangentially related 13% 15% 14% 9% 12% 10% Unrelated 15% 16% 16% 5% 1% 3% Other 2% 2% 2% 1% 2% 1% 44  
  45. Assessing  the  Reading  Experience   •  120  par>cipants  recruited  through  Amazom  Mechanical   Turk;  between-­‐groups  design  (three  groups)   •  Editors  +  two  opposite  “extremes”  of  LEPA:     •  High  recall:  best  at  embedding  newsworthy  links  &  ar>cles  that   provide  interes>ng  insights   •  High  precision:  best  in  terms  of  embedding  the  right  number  of  links   good  topical   coverage   informa2ve ness   broader   perspec2ve   interes2ng   insights   good  topical   coverage   link   presenta2on   content   volume   posi2ve   news   reading   experience   45   induc>ve,  thema>c  coding  of   open-­‐ended  ques>ons  
  46. Automa2c  linking  and  news  reading  experience   •  Even  under  realis>c  and  uncontrolled  condi>ons,   performance  of  LEPA  comparable  to  that  of  editors,   and  in  some  cases  be[er   •  High  precision  vs.  high  recall   •  High  precision  threshold  leads  to  a  be[er  news   reading  experience:  less  is  more      “They  were  too  many,  being  mostly  quite  long,  in  some   cases  more  than  half  the  length  of  the  main  ar5cle,  and   some5mes  they  repeated  the  same  iden5cal  informa5on”   46  
  47. STUDY  V   •  Domain:  social  media  (Yahoo!  Answers  and   Wikipedia)   •  Study:  serendipity   •  Measurement:  relevance,  unexpectedness,   interes>ngness   47  +  Ilaria  Bordino  +  Yelena  Mejova  
  48. En2ty-­‐driven  Exploratory  Search   Linguis-cally  Mo-vated  Seman-c  Aggrega-on  Engines     “transi5on  to  a  truly  seman5c  aggrega5on  paradigm  where   machines  understand  a  user’s  intent,  discover  and  organize  facts,   iden5fy  opinions,  experiences  and  trends”   En>ty   Search   we  build  an  en>ty-­‐driven  serendipitous  search  system  based   on  en>ty  networks  extracted  from  Wikipedia  and  Yahoo!   Answers   Serendipity   finding  something  good  or  useful  while  not   specifically  looking  for  it,  serendipitous  search   systems  provide  relevant  and  interes>ng  results   48  
  49. Yahoo!  Answers        vs        Wikipedia   community-­‐driven  ques>on  &   answer  portal   •  67  336  144  ques>ons  &   261  770  047  answers   •  January  1,  2010  –   December  31,  2011   •  English-­‐language   community-­‐driven   encyclopedia   •  3  795  865  ar>cles   •  as  of  end  of  December   2011   •  English  Wikipedia   curated   high-­‐quality  knowledge   variety  of  niche  topics   minimally  curated   opinions,  gossip,  personal  info   variety  of  points  of  view   49  
  50. Entity  &  Relationship  Extraction   •  en2ty  –  any  well-­‐defined  concept  that  has  a  Wikipedia  page   •  rela2onship  –  a  topical  rela>onship/similarity  between  a  pair  of   en>>es  based  on  document  co-­‐occurrence   •  related  to  the  number  of  documents  in  which  the  two  en>>es  occur     50   Dataset   #  Nodes   #  Edges   Density   #  Isolated   Yahoo!  Answers   896,799   112,595,138   0.00028   69,856   Wikipedia   1,754,069   237,058,218   0.00015   82,381   Dataset   Avg  Degree   Max  Degree   Size  of  Largest  CC   Yahoo!  Answers   251   231,921   826,402  (92.15%)   Wikipedia   270   346,070   1,671,241  (95.28%)  
  51. Wikipedia   51   Yahoo!  Answers  
  52. Retrieval   Wikipedia   Yahoo!  Answers   Combined   Precision  @  5   0.668   0.724   0.744   MAP   0.716   0.762   0.782   Jus>n  Bieber,  Nicki  Minaj,  Katy  Perry,  Shakira,  Eminem,  Lady  Gaga,  Jose   Mourinho,  Selena  Gomez,  Kim  Kardashian,  Miley  Cyrus,  Robert  Pavnson,   Adele  %28singer%29,  Steve  Jobs,  Osama  bin  Laden,  Ron  Paul,  Twi[er,   Facebook,  Ne_lix,  IPad,  IPhone,  Touchpad,  Kindle,  Olympic  Games,  Cricket,   FIFA,  Tennis,  Mount  Everest,  Eiffel  Tower,  Oxford  Street,  Nubcrburgring,  Hai>,   Chile,  Libya,  Egypt,  Middle  East,  Earthquake,  Oil  spill,  Tsunami,  Subprime   mortgage  crisis,  Bailout,  Terrorism,  Asperger  syndrome,  McDonal's,  Vitamin  D,   Appendici>s,  Cholera,  Influenza,  Pertussis,  Vaccine,  Childbirth   3  labels  per  query-­‐result  pair   gold  standard  quality  control   Yahoo!  Answers   Jon  Rubinstein   Timothy  Cook   Kane  Kramer   Steve  Wozniak   Jerry  York   Wikipedia   System  7   PowerPC  G4   SuperDrive   Power  Macintosh   Power  Compu>ng  Corp.   Steve  Jobs   •  Annotator  agreement   (overlap):  0.85   •  Average  overlap  in  top   5  results:    <1     52   retrieve  en>>es  most  related  to  a   query  en>ty  using  random  walk  
  53. |  relevant  &  unexpected  |  /  |  unexpected  |   number  of  serendipitous  results  out  of  all  of   the  unexpected  results  retrieved   |  relevant  &  unexpected  |  /  |  retrieved  |   serendipitous  out  of  all  retrieved   53   Baseline   Data   Top:  5  en>>es  that  occur  most  frequently   WP   0.63  (0.58)   in  top  5  search  from  Bing  and  Google   YA   0.69  (0.63)   Top  –WP:  same  as  above,  but  excluding     WP   0.63  (0.58)   Wikipedia  page  from  results   YA   0.70  (0.64)   Rel:  top  5  en>>es  in  the  related  query     WP   0.64  (0.61)   sugges>ons  provided  by  Bing  and  Google   YA   0.70  (0.65)   Rel  +  Top:  union  of  Top  and  Rel   WP   0.61  (0.54)   YA   0.68  (0.57)   Serendipity    “making  fortunate  discoveries  by  accident”   Serendipity  =  unexpectedness  +  relevance      “Expected”  result  baselines  from  web  search  
  54. Interes2ngness  ≠  Relevance     Interes2ng  >  Relevant                                                                                                Relevant  >  Interes2ng   Oil  Spill    à       Penguins  in  Sweaters    WP   Robert  Pavnson    à     Water  for  Elephants    WP   Lady  Gaga    à    Britney  Spears        WP   Egypt    à    Cairo  Conference      WP   Ne_lix    à    Blu-­‐ray  Disc      YA   Egypt    à     Ptolemaic  Kingdom    WP  &  YA   54  (Bordino,  Mejova  &  Lalmas,  2013)    
  55. Similarity  (Kendall’s  tau-­‐b)  between  result  sets  and  reference  ranking   55    Data   tau-­‐b   Which  result  is  more    WP   0.162   relevant  to  the  query?    YA   0.336   If  someone  is  interested  in  the  query,  would    WP   0.162   they  also  be  interested  in  the  result?    YA   0.312   Even  if  you  are  not  interested  in  the  query,    WP   0.139   is  the  result  interes5ng  to  you  personally?    YA   0.324   Would  you  learn  anything  new  about    WP   0.167    the  query  from  the  results    YA   0.307   Following  (Arguello  et  al,  2011)   1.  Labelers  provide  pairwise   comparisons  between  results   2.  Combine  into  a  reference  ranking   3.  Compare  result  ranking  to  op>mal   ranking  using  Kendall’s  tau   Assessing     “interes2ngness”  
  56. Multimedia search activities often driven by entertainment needs, not by information needs Serendipity  in  multimedia  search?   (Slaney,  2011)   56  
  57. What  are  the  ques2ons  to  ask?   •  No  one  measurement  is  perfect  or  complete.   •  All  studies  (process  or  product)  have  different   constraints.   •  Need  to  ensure  methods  are  applied  consistently   with  a[en>on  to  reliability:  what  is  a  good   signal?   •  More  emphasis  should  be  placed  on  using  mixed   methods  to  improve  the  validity  of  the   measures.   •  Be  careful  of  the  WEIRD  syndrome  (Western,   Educated,  Industrialized,  Rich,  and  Democra>c)   57  
  58. Acknowledgements   •  Collaborators:  Ioannis  Arapakis,  Ilaria  Bordino,  ,  Barla   Cambazoglu,  Hakan  Ceylan,  Pinar  Domnez,  Lori  McCay-­‐ Peet,  Yelena  Mejova,  Vidhya  Navalpakkam,  David   Warnock,  and  others  at  Yahoo!  Labs.   •  This  talk  uses  some  material  from  a  tutorial  “Measuring   User  Engagement”  given  at  WWW  2013,  Rio  de  Janeiro   (with  Heather  O’Brien  and  Elad  Yom-­‐Tov)   Blog:  labtomarket.wordpress.com   58  
Advertisement