To be or not be engaged: What are the questions (to ask)?


Published on

In the online world, user engagement refers to the quality of the user experience that emphasizes the phenomena associated with wanting to use a web application longer and frequently. User engagement is a multifaceted, complex phenomenon, giving rise to a number of approaches for its measurement: self-reporting (e.g., questionnaires); observational methods (e.g., facial expression analysis, desktop actions); and web analytics using online behavior metrics. These methods represent various trade-offs between the scale of the data analyzed and the depth of understanding. For instance, surveys are hardly scalable but offer rich, qualitative insights, whereas click data can be collected on a large-scale but are more difficult to analyze. Still, the core research questions each type of measurement is able to answer are unclear. This talk will present various efforts aiming at combining approaches to measure engagement and seeking to provide insights into what questions to ask when measuring engagement.

Keynote at 18th International Conference on Application of Natural Language to Information Systems (NLDB2013), University of Salford, MediaCityUK


Published in: Technology, Business

To be or not be engaged: What are the questions (to ask)?

  1. 1. To  be  or  not  be  engaged:    What  are  the  ques2ons  (to  ask)?  Mounia  Lalmas  Yahoo!  Labs  Barcelona  1  
  2. 2. About  me  •  Since  January  2011:  Visi2ng  Principal  Scien2st  at  Yahoo!  Labs  Barcelona  •  User  engagement,  social  media,  search  •  1999-­‐2008:  Lecturer  (assistant  professor)  to  Professor  at  Queen  Mary,  University  of  London  •  XML  retrieval  and  evalua>on  (INEX)  •  2008-­‐2010:  MicrosoR  Research/RAEng  Research  Professor  at  the  University  of  Glasgow  •  Quantum  theory  to  model  informa>on  retrieval  Blog:  2  
  3. 3. Why  is  it  important  to  engage  users?  •  In  today’s  wired  world,  users  have  enhanced  expecta>ons  about  their  interac>ons  with  technology    …  resul>ng  in  increased  compe>>on  amongst  the                                    purveyors  and  designers  of  interac>ve  systems.    •  In  addi>on  to  u>litarian  factors,  such  as  usability,  we  must  consider  the  hedonic  and  experien>al  factors  of  interac>ng  with  technology,  such  as  fun,  fulfillment,  play,  and  user  engagement.  •  In  order  to  make  engaging  systems,  we  need  to  understand  what  user  engagement  is  and  how  to  measure  it.  3  
  4. 4. Why  is  it  important  to  measure  and  interpret  user  engagement  well?  CTR  4  
  5. 5. Outline  •  What  is  user  engagement?  •  What  are  the  characteris>cs  of  user  engagement?  •  How  to  measure  user  engagement?  •  What  are  the  ques>ons  to  ask?  saliency,  interes>ng,  serendipity,  relevance,  sen>ment,  reading,  news,  social  media,    user  generated  content,  automa>c  linking,  aesthe>cs.  5  
  6. 6. WHAT  IS  USER  ENGAGEMENT?  6  
  7. 7. h[p://­‐talk-­‐rolls-­‐out-­‐plus-­‐friend-­‐home-­‐a-­‐revamped-­‐pla_orm-­‐to-­‐connect-­‐users-­‐with-­‐their-­‐favorite-­‐brands/  Engagement  is  on  everyone’s  mind  h[p://­‐percent-­‐of-­‐brand-­‐engagement-­‐on-­‐pinterest-­‐come-­‐from-­‐users/51032/  h[p://iac>­‐engagement/  h[p://>cle/459294/heart_founda>on_uses_gamifica>on_drive_user_engagement/  h[p://  h[p://>cles/179410/linkedin-­‐makes-­‐a-­‐90-­‐million-­‐bet-­‐on-­‐pulse-­‐to-­‐help-­‐drive-­‐user-­‐engagement/2013-­‐04-­‐15   7  
  8. 8. What  is  user  engagement?  User  engagement  is  a  quality  of  the  user  experience  that  emphasizes  the  posi>ve  aspects  of  interac>on  –  in  par>cular  the  fact  of  being  cap>vated  by  the  technology  (Ahield  et  al,  2011).                    user  feelings:  happy,  sad,  excited,  …  emo>onal,  cogni>ve  and  behavioural  connec>on    that  exists,  at  any  point  in  >me  and  over  >me,  between    a  user  and  a  technological  resource    user  interac2ons:  click,    read,  comment,  buy…  user  mental  states:  involved,    lost,  concentrated…  8  
  9. 9. Considera2ons  in  the  measurement  of  user  engagement  •  Short  term  (within  session)  and  long  term  (across  mul>ple  sessions)  •  Laboratory  vs.  field  studies  •  Subjec>ve  vs.  objec>ve  measurement  •  Large  scale  (e.g.,  dwell  >me  of  100,000  people)  vs.  small  scale  (gaze  pa[erns  of  10  people)  •  User  engagement  as  process  vs.  as  product  One  is  not  be[er  than  other;  it  depends  on  what  is  the  aim.  9  
  11. 11. Characteris2cs  of  user  engagement  (I)  • Users  must  be  focused  to  be  engaged    • Distor>ons  in  the  subjec>ve  percep>on  of  >me  used  to  measure  it  Focused  a_en2on  (Webster  &  Ho,  1997;  O’Brien,  2008)  • Emo>ons  experienced  by  user  are  intrinsically  mo>va>ng  • Ini>al  affec>ve  “hook”  can  induce  a  desire  for  explora>on,  ac>ve  discovery  or  par>cipa>on  Posi2ve  Affect    (O’Brien  &  Toms,  2008)  • Sensory,  visual  appeal  of  interface  s>mulates  user  &  promotes  focused  a[en>on  • Linked  to  design  principles  (e.g.  symmetry,  balance,  saliency)  Aesthe2cs    (Jacques  et  al,  1995;  O’Brien,  2008)  • People  remember  enjoyable,  useful,  engaging  experiences  and  want  to  repeat  them  • Reflected  in  e.g.  the  propensity  of  users  to  recommend  an  experience/a  site/a  product  Endurability      (Read,  MacFarlane,  &  Casey,  2002;  O’Brien,  2008)  11  
  12. 12. Characteris2cs  of  user  engagement  (II)  •  Novelty,  surprise,  unfamiliarity  and  the  unexpected  •  Appeal  to  users’  curiosity;  encourages  inquisi>ve  behavior  and  promotes  repeated  engagement  Novelty    (Webster  &  Ho,  1997;  O’Brien,  2008)    •  Richness  captures  the  growth  poten>al  of  an  ac>vity  •  Control  captures  the  extent  to  which  a  person  is  able  to  achieve  this  growth  poten>al  Richness  and  control  (Jacques  et  al,  1995;  Webster  &  Ho,  1997)  •  Trust  is  a  necessary  condi>on  for  user  engagement  •  Implicit  contract  among  people  and  en>>es  which  is  more  than  technological  Reputa2on,  trust  and  expecta2on  (Attfield et al,2011)  •  Difficul>es  in  sevng  up  “laboratory”  style  experiments  •  Why  should  users  engage?  Mo2va2on,  interests,  incen2ves,  and  benefits  (Jacques  et  al.,  1995;  O’Brien  &  Toms,  2008)  12  
  14. 14. Measuring  user  engagement  Measures   Characteris2cs  Self-­‐reported  engagement  Ques>onnaire,  interview,  report,  product  reac>on  cards,  think-­‐aloud  Subjec>ve  Short-­‐  and  long-­‐term  Lab    and  field  Small-­‐scale  Product  outcome  Cogni>ve  engagement  Task-­‐based  methods  (>me  spent,  follow-­‐on  task)    Physiological  measures  (e.g.  EEG,  SCL,  fMRI,  eye  tracking,  mouse-­‐tracking)  Objec>ve  Short-­‐term  Lab  and  field  Small-­‐scale  and  large-­‐scale  Process  outcome  Interac>on  engagement  Web  analy>cs      metrics  +  models  Objec>ve  Short-­‐  and  long-­‐term  Field    Large-­‐scale  Process  outcome  14  
  15. 15. Large-­‐scale  measurements  of  user  engagement  –  Web  analy2cs  Intra-­‐session  measures   Inter-­‐session  measures  •  Dwell  >me  /  session  dura>on  •  Play  >me  (video)  •  (Mouse  movement)  •  Click  through  rate  (CTR)  •  Mouse  movement  •  Number  of  pages  viewed  (click  depth)  •  Conversion  rate  (mostly  for  e-­‐commerce)  •  Number  of  UCG  (comments)    •  Frac>on  of  return  visits    •  Time  between  visits  (inter-­‐session  >me,  absence  >me)  •  Total  view  >me  per  month  (video)  •  Life>me  value  (number  of  ac>ons)  •  Number  of  sessions  per  unit  of  >me  •  Total  usage  >me  per  unit  of  >me  •  Number  of  friends  on  site  (social  networks)  •  Number  of  UCG  (comments)  •  Intra-­‐session  engagement  measures  our  success  in  a[rac>ng  the  user  to  remain  on  our  site  for  as  long  as  possible.  •  Inter-­‐session  engagement  can  be  measured  directly  or,  for  commercial  sites,  by  observing  life>me  customer  value.  15  
  16. 16. Cogni2ve  engagement  •  Eye  tracking  •  Mouse  movement  •  Face  expression  •  Psychophysiological  measures          Respira>on,  Pulse  rate          Temperature,  Brain  wave,          Skin  conductance,  …    16  
  17. 17. Signals  –  Signals  –  Signals:  Five  studies  self-­‐reported  engagement    WHAT  ARE  THE  QUESTIONS  TO  ASK?  Interac>on  engagement    17  
  18. 18. STUDY  I  •  Domain:  entertainment  news  •  Study:  saliency  •  Measurement:  focus  a[en>on  and  affect  18  +  Lori  McCay-­‐Peet  +  Vidhya  Navalpakkam  
  19. 19. •  How  the  visual  catchiness  (saliency)  of  “relevant”  informa>on  impacts  user  engagement  metrics  such  as  focused  a[en>on  and  emo>on  (affect)  •  focused  a_en2on  refers  to  the  exclusion  of  other  things  •  affect  relates  to  the  emo>ons  experienced  during  the  interac>on  •  Saliency  model  of  visual  a[en>on  developed                        by  (Iv  &  Koch,  2000)      Self-­‐report  engagement  19  
  20. 20. Manipula2ng  saliency  Web  page  screenshot      Saliency  maps    salient  condi>on  non-­‐salient  condi>on  (McCay-­‐Peet  et  al,  2012)  20  
  21. 21. Study  design  •  8  tasks  =  finding  latest  news  or  headline  on  celebrity  or  entertainment  topic  •  Affect  measured  pre-­‐  and  post-­‐  task  using  the  Posi>ve  e.g.  “determined”,  “a[en>ve” and  Nega>ve  e.g.  “hos>le”,  “afraid”  Affect  Schedule  (PANAS)    •  Focused  a[en>on  measured    with  7-­‐item  focused  a4en5on  subscale  e.g.  “I  was  so  involved  in  my  news  tasks  that  I  lost  track  of  >me”,  “I  blocked  things  out  around  me  when  I  was  comple>ng  the  news  tasks”  and  perceived  >me  •  Interest  level  in  topics  (pre-­‐task)  and  ques>onnaire  (post-­‐task)  e.g.  “I  was  interested  in  the  content  of  the  web  pages”,  “I  wanted  to  find  out  more  about  the  topics  that  I  encountered  on  the  web  pages”  •  189  (90+99)  par>cipants  from  Amazon  Mechanical  Turk    21  
  22. 22. PANAS  (10  posi2ve  items  and  10  nega2ve  items)  •  You  feel  this  way  right  now,  that  is,  at  the  present  moment      [1  =  very  slightly  or  not  at  all;  2  =  a  li[le;  3  =  moderately;                4  =  quite  a  bit;  5  =  extremely]              [randomize  items]        distressed,  upset,  guilty,  scared,  hos>le,      irritable,  ashamed,  nervous,  ji[ery,  afraid    interested,  excited,  strong,  enthusias>c,  proud,    alert,  inspired,  determined,  a[en>ve,  ac>ve  (Watson,  Clark  &  Tellegen,  1988)  22  
  23. 23. 7-­‐item  focused  a_en2on  subscale      (part  of  the  31-­‐item  user  engagement  scale)      5-­‐point  scale  (strong  disagree  to  strong  agree)  1.  I  lost  myself  in  this  news  tasks  experience  2.  I  was  so  involved  in  my  news  tasks  that  I  lost  track  of  >me  3.  I  blocked  things  out  around  me  when  I  was  comple>ng  the  news  tasks  4.  When  I  was  performing  these  news  tasks,  I  lost  track  of  the  world  around  me  5.  The  >me  I  spent  performing  these  news  tasks  just  slipped  away  6.  I  was  absorbed  in  my  news  tasks    7.  During  the  news  tasks  experience  I  let  myself  go  (OBrien  &  Toms,  2010)   23  
  24. 24. Saliency  and  posi2ve  affect  •  When  headlines  are  visually  non-­‐salient  •   users  are  slow  at  finding  them,  report  more  distrac>on  due  to  web  page  features,  and  show  a  drop  in  affect  •  When  headlines  are  visually  catchy  or  salient  •   user  find  them  faster,  report  that  it  is  easy  to  focus,  and  maintain  posi>ve  affect  •  Saliency  is  helpful  in  task  performance,  focusing/avoiding  distrac2on  and  in  maintaining  posi2ve  affect  24  
  25. 25. Saliency  and  focused  a_en2on  •  Adapted  focused  a[en>on  subscale  from  the  online  shopping  domain  to  entertainment  news  domain  •  Users  reported  “easier  to  focus  in  the  salient  condi>on”  BUT  no  significant  improvement  in  the  focused  a[en>on  subscale  or  differences  in  perceived  >me  spent  on  tasks    •  User  interest  in  web  page  content  is  a  good  predictor  of  focused  a_en2on,  which  in  turn  is  a  good  predictor  of  posi2ve  affect    25  
  26. 26. Self-­‐repor2ng,  crowdsourcing,  saliency  and  user  engagement  •  Interac>on  of  saliency,  focused  a[en>on,  and  affect,  together  with  user  interest,  is  complex.  •  Using  crowdsourcing  worked!  •  What  next?    •  include  web  page  content  as  a  quality  of  user  engagement  in  focused  a[en>on  scale  •  more  “realis2c”  user  (interac>ve)  reading  experience  •  other  measurements:  mouse-­‐tracking,  eye-­‐tracking,  facial  expression  analysis,  etc.  (McCay-­‐Peet,  Lalmas  &  Navalpakkam,  2012)   26  
  27. 27. STUDY  II  •  Domain:  news  and  user  generated  content  (comments)  •  Study:  interes>ngness  and  sen>ment  •  Measurement:  focus  a[en>on,  affect  and  gaze  27  +  Ioannis  Arapakis  +  Barla  Cambazoglu  +  Mari-­‐Carmen  Marcos  +  Joemon  Jose  
  28. 28. Gaze  and  self-­‐repor2ng  •  News  +  comments  •  Sen>ment,  interest  •  57  users  (lab-­‐based)  •  Reading  task  (114)    •  Ques>onnaire  (qualita>ve  data)  •  Record  eye  tracking                (quan>ta>ve  data)  Three  metrics:  gaze,  focus  a[en>on  and  posi>ve  affect  28  (Lin  et  al,  2007)  
  29. 29. Interes2ng  content  promote  users  engagement  metrics  •  All  three  metrics:    •  focus  a[en>on,  posi>ve  affect  &  gaze  •  What  is  the  right  trade-­‐off?  •  news  is  news  J    •  Can  we  predict?  •  provider,  editor,  writer,  category,  genre,  visual  aids,  …,  sen2mentality,  …  •  Role  of  user-­‐generated  content  (comments)  •  As  measure  of  engagement?  •  To  promote  engagement?   29  
  30. 30. Lots  of  sen2ments  but  with  nega2ve  connota2ons!  •  Positive effect (and interest, enjoyment and wanted toknow more) correlates•  Positively (é) with sentimentality (lots ofemotions)•  Negatively (ê) with positive polarity (happy news)Sen2Strenght  (from  -­‐5  to  5  per  word)            sen>mentality:  sum  of  absolute  values  (amount  of  sen>ments)          polairity:  sum  of  values  (direc>on  of  the  sen>ments:  posi>ve  vs    nega>ve)  (Thelwall,  Buckley  &  Paltoglou,  2012)  30  
  31. 31. Effect  of  comments  on  user  engagement  •  6  ranking  of  comments:  •  most  replied,  most  popular,  newest  •  sen>mentality  high,  sen>mentality  low  •  polarity  plus,  polarity  minus  •  Longer  gaze  on  •  newest  and  most  popular  for  interes>ng  news  •  most  replied  and  high  sen>mentality  for  non-­‐interes>ng  news  •  Can  we  leverage  this  to  prolong  user  a[en>on?  31  
  32. 32. Gaze,  sen2mentality,  interest  •  Interes>ng  and  “a[rac>ve”  content!  •  Sen>ment  as  a  proxy  of  focus  a[en>on,  posi>ve  affect  and  gaze?  •  Next  •  Larger-­‐scale  study  •  Other  domains  (beyond  news!)  •  Role  of  social  signals  (e.g.  Facebook,  Twi[er)  •  Lots  more  data:  mouse  tracking,  EEG,  facial  expression  (Arapakis  et  al.,  2013)  32  
  33. 33. STUDY  III  •  Domain:  news  and  social  media  (Wikipedia)  •  Study:  interes>ngness,  aesthe>cs,  task  •  Measurement:  focus  a[en>on,  affect  and  mouse  movement  33  +  David  Warnock  
  34. 34. Mouse  tracking  and  self-­‐repor2ng  •  324  users  from  Amazon  Mechanical  Turk  (between  subject  design)  •  Two  domains  (BBC  News  and  Wikipedia)  •  Two  tasks  (reading  and  search)  •  “Normal  vs  Ugly” interface  •  Ques>onnaires  (qualita>ve  data)  •  focus  a[en>on,  posi>ve  effect,  novelty,    •  interest,  usability,  aesthe>cs    •  +  demographics,  handeness  &  hardware  •  Mouse  tracking  (quan>ta>ve  data)  •  movement  speed,  movement  rate,  click  rate,  pause  length,  percentage  of  >me  s>ll  34  
  35. 35. “Ugly” vs  “Normal”  Interface  (BBC  News)  35  
  36. 36. Mouse  tracking  can  tell  about  •  Age  •  Hardware  •  Mouse  •  Trackpad  •  Task  •  Searching:  There  are  many  different  types  of  phobia.  What  is  Gephyrophobia  a  fear  of?  •  Reading:  (Wikipedia)  Archimedes,  Sec5on  1:  Biography  36  
  37. 37. Mouse  tracking  could  not  tell  much  about  •  focused  a[en>on  and  posi>ve  affect  •  user  interests  in  the  task/topic  •  BUT  BUT  BUT  BUT  •  “ugly”  variant  did  not  result  in  lower  aesthe>cs  scores  •   although  BBC  >  Wikipedia  •  BUT  –  the  comments  le•  …  •  Wikipedia:  “The  website  was  simply  awful.  Ads  flashing  everywhere,  poor  text  colors  on  a  dark  blue  background.”;  “The  webpage  was  en5rely  blue.  I  dont  know  if  it  was  supposed  to  be  like  that,  but  it  definitely  detracted              from  the  browsing  experience.”  •  BBC  News:  “The  websites  layout  and  color  scheme  were  a  bitch  to  navigate  and  read.”;  “Comic  sans  is  a  horrible  font.”  37  
  38. 38. Mouse  tracking  and  user  engagement  •  Task  and  hardware  •  Do  we  have  a  Hawthorne  Effect???  •  “Usability” vs  engagement  •  “Even  uglier”  interface?    •  Within-­‐  vs  between-­‐subject  design?  •  What  next?  •  Sequence  of  movements  •  Automa>c  clustering  (Warnock  &  Lalmas,  2013)    38  
  39. 39. STUDY  IV  •  Domain:  news  •  Study:  automa>c  linking  •  Measurement:  interes>ngness  39  +  Ioannis  Arapakis  +Hakan  Ceylan  +  Pinar  Domnez  
  40. 40. Automaticlinking&readingexperience40  Keeping  users  reading  more  ar>cles  
  41. 41. LEPA: Linker for Events to Past ArticlesLEPA is a a fully automated approach toconstructing hyperlinks in news articlesusing “simple” text processing andunderstanding techniquesIndexer  • Processes  ar>cles  over  a  >me  period  by  extrac>ng  features  from  each  ar>cle  and  storing  them  to  facilitate  faster  retrieval  Linker  •  Iden>fies  sentences  that  contain  newsworthy  events  •  For  each  such  event  it  retrieves  from  the  index  all  the  matching  ar>cles  and  links  the  top-­‐ranked  with  the  event   41  
  42. 42. Three-stageevaluationPilot  study  Assessing  reading  experience  Assessing  links  +42  
  43. 43. Pilot study•  Rating results:•  Bad: 35.15%•  Fair: 33.93%•  Good: 20%•  Excellent: 9.09%•  Not Judged: 1.81%• With 63.03% of the links being good:•  initial evidence that LEPA is not too far from theoptimum achieved by human editorsProfessional  editors        A  collec>on  of  system-­‐embedded        links  (164  ar>cle-­‐link  combina>ons)          5-­‐point  Likert  Scale:  (i)  bad,  (ii)  fair,        (iii)  good,  (iv)  excellent,  and  (v)  not        judged      43  
  44. 44. Assessing  the  links:  are  they  related?  •  664  par>cipants  recruited  through  Amazon  Mechanical  Turk;  between-­‐group  design  (two  groups)  •  Precision  =  frac>on  of  links  (total=164)  that  received,  in  terms  of  relatedness,  a  score  equal  to,  or  greater  than,  3  on  a  5-­‐point  Likert  scale    System-Embedded Links Manually-curated LinksParticipantAParticipantBAll ParticipantAParticipantBAllRelated to the maintheme49% 42% 45% 54% 51% 53%Related to subtopic 21% 24% 22% 31% 34% 33%Tangentially related 13% 15% 14% 9% 12% 10%Unrelated 15% 16% 16% 5% 1% 3%Other 2% 2% 2% 1% 2% 1%44  
  45. 45. Assessing  the  Reading  Experience  •  120  par>cipants  recruited  through  Amazom  Mechanical  Turk;  between-­‐groups  design  (three  groups)  •  Editors  +  two  opposite  “extremes”  of  LEPA:    •  High  recall:  best  at  embedding  newsworthy  links  &  ar>cles  that  provide  interes>ng  insights  •  High  precision:  best  in  terms  of  embedding  the  right  number  of  links  good  topical  coverage  informa2veness  broader  perspec2ve  interes2ng  insights  good  topical  coverage  link  presenta2on  content  volume  posi2ve  news  reading  experience  45  induc>ve,  thema>c  coding  of  open-­‐ended  ques>ons  
  46. 46. Automa2c  linking  and  news  reading  experience  •  Even  under  realis>c  and  uncontrolled  condi>ons,  performance  of  LEPA  comparable  to  that  of  editors,  and  in  some  cases  be[er  •  High  precision  vs.  high  recall  •  High  precision  threshold  leads  to  a  be[er  news  reading  experience:  less  is  more      “They  were  too  many,  being  mostly  quite  long,  in  some  cases  more  than  half  the  length  of  the  main  ar5cle,  and  some5mes  they  repeated  the  same  iden5cal  informa5on”  46  
  47. 47. STUDY  V  •  Domain:  social  media  (Yahoo!  Answers  and  Wikipedia)  •  Study:  serendipity  •  Measurement:  relevance,  unexpectedness,  interes>ngness  47  +  Ilaria  Bordino  +  Yelena  Mejova  
  48. 48. En2ty-­‐driven  Exploratory  Search  Linguis-cally  Mo-vated  Seman-c  Aggrega-on  Engines    “transi5on  to  a  truly  seman5c  aggrega5on  paradigm  where  machines  understand  a  user’s  intent,  discover  and  organize  facts,  iden5fy  opinions,  experiences  and  trends”  En>ty  Search  we  build  an  en>ty-­‐driven  serendipitous  search  system  based  on  en>ty  networks  extracted  from  Wikipedia  and  Yahoo!  Answers  Serendipity   finding  something  good  or  useful  while  not  specifically  looking  for  it,  serendipitous  search  systems  provide  relevant  and  interes>ng  results  48  
  49. 49. Yahoo!  Answers        vs        Wikipedia  community-­‐driven  ques>on  &  answer  portal  •  67  336  144  ques>ons  &  261  770  047  answers  •  January  1,  2010  –  December  31,  2011  •  English-­‐language  community-­‐driven  encyclopedia  •  3  795  865  ar>cles  •  as  of  end  of  December  2011  •  English  Wikipedia  curated  high-­‐quality  knowledge  variety  of  niche  topics  minimally  curated  opinions,  gossip,  personal  info  variety  of  points  of  view  49  
  50. 50. Entity  &  Relationship  Extraction  •  en2ty  –  any  well-­‐defined  concept  that  has  a  Wikipedia  page  •  rela2onship  –  a  topical  rela>onship/similarity  between  a  pair  of  en>>es  based  on  document  co-­‐occurrence  •  related  to  the  number  of  documents  in  which  the  two  en>>es  occur    50  Dataset   #  Nodes   #  Edges   Density   #  Isolated  Yahoo!  Answers   896,799   112,595,138   0.00028   69,856  Wikipedia   1,754,069   237,058,218   0.00015   82,381  Dataset   Avg  Degree   Max  Degree   Size  of  Largest  CC  Yahoo!  Answers   251   231,921   826,402  (92.15%)  Wikipedia   270   346,070   1,671,241  (95.28%)  
  51. 51. Wikipedia  51  Yahoo!  Answers  
  52. 52. Retrieval  Wikipedia   Yahoo!  Answers   Combined  Precision  @  5   0.668   0.724   0.744  MAP   0.716   0.762   0.782  Jus>n  Bieber,  Nicki  Minaj,  Katy  Perry,  Shakira,  Eminem,  Lady  Gaga,  Jose  Mourinho,  Selena  Gomez,  Kim  Kardashian,  Miley  Cyrus,  Robert  Pavnson,  Adele  %28singer%29,  Steve  Jobs,  Osama  bin  Laden,  Ron  Paul,  Twi[er,  Facebook,  Ne_lix,  IPad,  IPhone,  Touchpad,  Kindle,  Olympic  Games,  Cricket,  FIFA,  Tennis,  Mount  Everest,  Eiffel  Tower,  Oxford  Street,  Nubcrburgring,  Hai>,  Chile,  Libya,  Egypt,  Middle  East,  Earthquake,  Oil  spill,  Tsunami,  Subprime  mortgage  crisis,  Bailout,  Terrorism,  Asperger  syndrome,  McDonals,  Vitamin  D,  Appendici>s,  Cholera,  Influenza,  Pertussis,  Vaccine,  Childbirth  3  labels  per  query-­‐result  pair  gold  standard  quality  control  Yahoo!  Answers  Jon  Rubinstein  Timothy  Cook  Kane  Kramer  Steve  Wozniak  Jerry  York  Wikipedia  System  7  PowerPC  G4  SuperDrive  Power  Macintosh  Power  Compu>ng  Corp.  Steve  Jobs  •  Annotator  agreement  (overlap):  0.85  •  Average  overlap  in  top  5  results:    <1    52  retrieve  en>>es  most  related  to  a  query  en>ty  using  random  walk  
  53. 53. |  relevant  &  unexpected  |  /  |  unexpected  |  number  of  serendipitous  results  out  of  all  of  the  unexpected  results  retrieved   |  relevant  &  unexpected  |  /  |  retrieved  |  serendipitous  out  of  all  retrieved   53  Baseline   Data  Top:  5  en>>es  that  occur  most  frequently   WP   0.63  (0.58)  in  top  5  search  from  Bing  and  Google   YA   0.69  (0.63)  Top  –WP:  same  as  above,  but  excluding     WP   0.63  (0.58)  Wikipedia  page  from  results   YA   0.70  (0.64)  Rel:  top  5  en>>es  in  the  related  query     WP   0.64  (0.61)  sugges>ons  provided  by  Bing  and  Google   YA   0.70  (0.65)  Rel  +  Top:  union  of  Top  and  Rel   WP   0.61  (0.54)  YA   0.68  (0.57)  Serendipity    “making  fortunate  discoveries  by  accident”  Serendipity  =  unexpectedness  +  relevance      “Expected”  result  baselines  from  web  search  
  54. 54. Interes2ngness  ≠  Relevance    Interes2ng  >  Relevant                                                                                                Relevant  >  Interes2ng  Oil  Spill    à      Penguins  in  Sweaters    WP  Robert  Pavnson    à    Water  for  Elephants    WP  Lady  Gaga    à    Britney  Spears        WP  Egypt    à    Cairo  Conference      WP  Ne_lix    à    Blu-­‐ray  Disc      YA  Egypt    à    Ptolemaic  Kingdom    WP  &  YA  54  (Bordino,  Mejova  &  Lalmas,  2013)    
  55. 55. Similarity  (Kendall’s  tau-­‐b)  between  result  sets  and  reference  ranking  55    Data   tau-­‐b  Which  result  is  more    WP   0.162  relevant  to  the  query?    YA   0.336  If  someone  is  interested  in  the  query,  would    WP   0.162  they  also  be  interested  in  the  result?    YA   0.312  Even  if  you  are  not  interested  in  the  query,    WP   0.139  is  the  result  interes5ng  to  you  personally?    YA   0.324  Would  you  learn  anything  new  about    WP   0.167    the  query  from  the  results    YA   0.307  Following  (Arguello  et  al,  2011)  1.  Labelers  provide  pairwise  comparisons  between  results  2.  Combine  into  a  reference  ranking  3.  Compare  result  ranking  to  op>mal  ranking  using  Kendall’s  tau  Assessing    “interes2ngness”  
  56. 56. Multimediasearchactivities oftendriven byentertainmentneeds, not byinformationneedsSerendipity  in  multimedia  search?  (Slaney,  2011)  56  
  57. 57. What  are  the  ques2ons  to  ask?  •  No  one  measurement  is  perfect  or  complete.  •  All  studies  (process  or  product)  have  different  constraints.  •  Need  to  ensure  methods  are  applied  consistently  with  a[en>on  to  reliability:  what  is  a  good  signal?  •  More  emphasis  should  be  placed  on  using  mixed  methods  to  improve  the  validity  of  the  measures.  •  Be  careful  of  the  WEIRD  syndrome  (Western,  Educated,  Industrialized,  Rich,  and  Democra>c)  57  
  58. 58. Acknowledgements  •  Collaborators:  Ioannis  Arapakis,  Ilaria  Bordino,  ,  Barla  Cambazoglu,  Hakan  Ceylan,  Pinar  Domnez,  Lori  McCay-­‐Peet,  Yelena  Mejova,  Vidhya  Navalpakkam,  David  Warnock,  and  others  at  Yahoo!  Labs.  •  This  talk  uses  some  material  from  a  tutorial  “Measuring  User  Engagement”  given  at  WWW  2013,  Rio  de  Janeiro  (with  Heather  O’Brien  and  Elad  Yom-­‐Tov)  Blog:  58