Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Setting Up a Living Lab for Information Access Research


Published on

Living Labs are real-life test and experimentation environments in which users and producers are brought together to develop, validate and test novel techniques and approaches. Primarily set up to observe users interacting with technology in a real-life context, they are an interesting methodology to push forward user-centric research. In this talk, I will make a case for the introduction of a living lab to promote research on interactive information retrieval. Moreover, I will introduce CLEF NewsReel, a living lab where participants are invited to evaluate news recommendation techniques in real-time by providing news recommendations to actual users that visit commercial news portals to satisfy their information needs.

Presented on June 6, 2014 at Living Labs Challenge workshop ( in Amsterdam, The Netherlands.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Setting Up a Living Lab for Information Access Research

  1. 1. Se#ng  Up  a  Living  Lab  for  Informa3on  Access  Research Frank  Hopfgartner,  DAI-­‐Labor,  Technische  Universität  Berlin  
  2. 2. In  CLEF  NEWSREEL,  par3cipants  can   develop  news  recommendaBon   algorithms  and  have  them  tested  by   millions  of  users  over  the  period  of  a  few   months  in  a  living  lab.     Why  am  I  here?   Because  I  co-­‐organise  CLEF  NEWSREEL  
  3. 3. Overview   Part  2  (Hands-­‐on  Experience)   Part  1  (Academic  Overview)   Living  Labs   (Introduc3on)   Living  Labs  for  IR   Research   CLEF  NEWSREEL  
  4. 4. So  what  are  living  labs?   Rely on feedback from real users to develop convincing demonstrators that showcase potentials of an idea or a product. Real-life test and experimentation environment to fill the pre-commercial gap between fundamental research and innovation.
  5. 5. §  Na3onal  research  ini3a3ve  on   energy  efficiency  in  the  housing   and  traffic  domains   §  Efficiency  House  Plus  is  a  small   power  plant  that  can  export  energy   surpluses  into  the  local  power  grid   §  Equipped  with  1000  data  sources   such  as  movement  sensors,   weather  data,  etc.   Example:  Efficiency  House  Plus   [BMVBS,  2011]   Source:  Werner  Sobek  
  6. 6. What  can  be  studied?   §  205  Smart  meters   §  39  Heat  pumps   §  74  Illumina3on  sensors   §  38  Photovoltaic  sensors   §  …   1000  data  points   Efficiency  House  Plus  with  electro  mobility,  Berlin   Research  inita3ve  of  BMVBS   §  Detec3on  of  resident  presence  in   home  environment   §  Energy  consump3on  is  an   indicator  for  presence,  but  some   devices  con3nually  consume   energy   §  Recogni3on  of  resident  acBviBes   §  Draw  conclusions  about  user   ac3vity  based  on  usage  of  home   appliances   §  Recommenda3on  of  op3mized   heaBng  schedules   §  Gradually  learn  characteris3c   behavior  to  create  personalized   schedules  for  hea3ng  control     InnovaBon   Data  Analysis  
  7. 7. QuesBons  to  be  addressed   How  will  people   really  use  the   technology?   Who  is   interested  in   my  product?   What  is  the   willingness  to   pay?   Is  there  a  need   for  my  product?   What   parameters  do  I   need?  
  8. 8. Overview   Part  2  (Hands-­‐on  Experience)   Part  1  (Academic  Overview)   Living  Labs   (Introduc3on)   Living  Labs  for  IR   Research   CLEF  NEWSREEL  
  9. 9. Do  we  need  Living  Labs  for  IR  Research?  
  10. 10. Why?   Cranfield (1962-1966) Medlars   (1966-­‐1967)   SMART   (1961-­‐1995)   TREC  (1992-­‐today)   NTCIR  /  CLEF   (1999/2000-­‐ today)   Let’s  have  a  look  at  the  history  of  IR  evalua3on  
  11. 11. Develop   system/ algorithm   Prepare   appropriate   dataset   Perform  user   study   Measure   performance   Cranfield  EvaluaBon  Paradigm   §  Use  standard  test  collec3on  (e.g.,  from  TREC)  with  documents,  relevance   assessments  and  search  tasks   §  Create  your  own  test  collec3on  (domain  specific)   §  Ask  users  to  perform  search  tasks  in  controlled  environment   §  Simulated  work  task  situa3on   §  Standard  IR  evalua3on  metrics   §  Qualita3ve  Methods   §  Baseline     §  Fancy  improvement  that  will  change  the  world  
  12. 12. Laboratory  SeVng   Find  as  many  documents     as  possible  for  a  given   search  task   Act  naturally  while  I  watch     everything  you  are  doing   I  tell  you  what  is  relevant!   NOT SUITABLE FOR RESEARCH ON USER-CENTRED IR
  13. 13. EvaluaBon  of  User-­‐Centred  IR   (Personalised  Search)   Context   §  Country   §  Social  Connec3on   §  Locality   §  Personal  History     §  Mobile  Search   Evalua3on  Issues   §  Observer-­‐expectancy  effect   §  Atypical  search  task   §  Missing  context/background   §  Missing  incen3ve  to  sa3sfy   own  informa3on  need   Personalised  Search  
  14. 14. An  alternaBve  seVng   Use  our  system  to  find   the  informa3on  you  are   looking  for   Use  the  system   whenever  you  want  for   whatever  reason   You  decide   what  you   consider  to  be   relevant   How  to  evaluate?  
  15. 15. User  SimulaBon   [ECIR’08,  ACM  TOIS,  2011]   Allows  fine  tuning  (White  et  al.,  2005)   But  does  not  replace  user  study  
  16. 16. A  /  B  tesBng   Evaluate   submit  to   SIGIR  
  17. 17. OK,  cool.  Go  for  it!   Sure…  But  who  has  the  users?  
  18. 18. These  guys  have…  
  19. 19. OK,  then  let’s  pay  for  the  users…  
  20. 20. Evalua3on  campaigns   Crowdsourcing  works   Micro-­‐tasks   §  Image  CLEF  (Nowak  and  Rüger,   2010)   §  INEX  (Kazai  et  al.,  2011)   §  TREC  Blog  (McCreadie  et  al.,  2011)   §  MediaEval  (Loni  et  al.,  2013)   §  Data  annota3on   §  Document  annota3on   §  Document  categorisa3on   §  Itera3ve  system  evalua3on   Ac3va3ng  the  crowd   §  Users  may  have  interest  in   annota3ng  items  that  they  know   well   §  Users  may  be  apracted  by  incen3ves   to  annotate  items    
  21. 21. But…   §  Personalised  search   needs  users  who  follow   their  own  informaBon   needs.   §  Users  need  to  be  driven   by  their  own  intrinsic   moBvaBon.   EXTRINSIC   Mo3va3on     Comes  from   the  outside   INTRINSIC   Mo3va3on     Exists   within  the   individual  
  22. 22. Therefore…   “A  living  laboratory  on  the  Web  that   brings  researchers  and  searchers  together   is  needed  to  facilitate  ISSS  (Informa3on-­‐ Seeking  Support  System)  evalua3on.”    Kelly  et  al.,  2009  
  23. 23. Living  Labs  for  IR  evaluaBon   Local  domain  search   Newsreel  Product  search   Real  users  interac3ng  with  a  system     following  their  own  informa3on  need   RealisBc  se#ng  where  users  are  not  restricted     by  closed  laboratory  condic3ons   Ideally:  Many  users  to  perform  A/B  tesBng   Source  (Guinea  pig):  hpp://living-­‐­‐content/uploads/2014/05/livinglab.logo_.textunder.square200.png  
  24. 24. Privacy  and  security   Challenges   Legal  and  ethical  issues   §  Hos3ng  data  on  secure  server   §  Gaining  subjects’  trust   §  Coping  with  need  for  privacy   §  Alterna3ves  when  individuals  will  not   share  their  data   §  User  consent   §  Ethics  approval   §  Trust  between  par3es   §  Copyright  issues   §  Commercial  sensi3vity  of  data   Prac3cal  challenges   §  Forming  living  labs  for  IR  partners  within   the  research  community   §  Obtaining  commercial  partners   §  Defining  tasks  and  scenarios  for   evalua3on  purposes   Technical  challenges   §  Designing  and  implemen3ng  living  labs   architecture   §  Cost  of  implementa3on   §  Maintenance  and  adop3on   §  Managing  living  labs  infrastructure   Source:  hpp://living-­‐­‐for-­‐papers/  
  25. 25. Overview   Part  2  (Hands-­‐on  Experience)   Part  1  (Academic  Overview)   Living  Labs   (Introduc3on)   Living  Labs  for  IR   Research   CLEF  NEWSREEL  
  26. 26.     In  CLEF  NEWSREEL,  par3cipants  can   develop  news  recommendaBon   algorithms  and  have  them  tested  by   millions  of  users  over  the  period  of  a  few   months  in  a  living  lab.     Again…  
  27. 27.   Recommender  Systems  help  users   to  find  items  that  they  were  not   searching  for.       What  are  recommender  systems?  
  28. 28. Items?
  29. 29. §  First  living  lab  for  the   evalua3on  of  news   recommenda3on   algorithms  in  real-­‐3me   §  Organised  as  plista   Contest,  as  a  challenge  at   ACM  RecSys’13  and  as   campaign-­‐style  evalua3on   lab  of  CLEF’14   Example:  News  ArBcles   Source  (Image):  T.  Brodt  of  
  30. 30. OrganisaBon  (CLEF  NEWSREEL)   Leading  provider  of  a  recommenda3on  and     adver3sement  network  in  Central  Europe   Thousands  of  content  providers  rely  on  plista     to  generate  recommenda3ons  for  their     customers  (i.e.,  web  users)   Applica3on-­‐oriented  research  on  smart     informa3on  systems   Steering  Commipee  of  experts  from  the     fields  of  IR  and  RecSys   Central  Innova3on  Programme  SME  
  31. 31. •  Given a dataset, predict news articles a user will click on Offline Evaluation •  Recommend articles in real-time over several months Online Evaluation CLEF  NEWSREEL  Tasks   Started  in  November  2013   TASK  1  TASK  2   @clefnewsreel   hpp://www.clef-­‐  
  32. 32. Predict  interac3ons  based  on  an  OFFLINE  dataset   Task  1:  Offline  EvaluaBon  DATASET   EVALUATION   §  Traffic  and  content  updates  of  9   German-­‐language  news  content   provider  websites   §  Traffic:  Reading  ar3cle,  clicking  on   recommenda3ons   §  Updates:  adding  and  upda3ng   news  ar3cles   §  Recorded  in  June  2013   §  65  GB,  84  Million  records   §  [Kille  et  al.,  2013]   §  Dataset  split  into  different  Bme   segments   §  Par3cipants  have  to  predict   interacBons  of  these  segments   §  Quality  measured  by  the  ra3o  of   successful  predic3ons  by  the  total   number  of  predic3ons  
  33. 33. Recommend  news  ar3cles  in  REAL-­‐TIME   Task  2:  Online  EvaluaBon  LIVING  LAB   EVALUATION   §  Provide  recommenda3ons  for   visitors  of  the  news  portals  of   plista’s  customers   §  Ten  portals  (local  news,  sports,   business,  technology)     §  Communica3on  via  Open   Recommender  Plaworm  (ORP)   §  Provide  recommenda3ons  within   <100ms  (VM  provided  if   necessary)   §  Three  pre-­‐defined  evalua3on   periods   §  5-­‐23  February  2014   §  1-­‐14  April  2014   §  5-­‐19  May  2014   §  Evalua3on  criteria   §  Number  of  clicks   §  Number  of  requests   §  Click-­‐through  rate  
  34. 34. Living  Lab  Scenario   …   Publisher  A   Publisher  n   Researcher  1   Researcher  n   …   plista   ORP   …   Millions  of  visitors   Publishers   Teams  
  35. 35. Open  Recommender  Plaform   More  about  it   later  
  36. 36. Number  of  clicks  
  37. 37. Number  of  requests  
  38. 38. Click-­‐Through  Rate  
  39. 39. Privacy  and  security   Challenges   Legal  and  ethical  issues   §  Hos3ng  data  on  secure  server   §  Gaining  subjects’  trust   §  Coping  with  need  for  privacy   §  Alterna3ves  when  individuals  will  not   share  their  data   §  User  consent   §  Ethics  approval   §  Trust  between  par3es   §  Copyright  issues   §  Commercial  sensi3vity  of  data   Prac3cal  challenges   §  Forming  living  labs  for  partners  within  the   research  community   §  Obtaining  commercial  partners   §  Defining  tasks  and  scenarios  for   evalua3on  purposes   Technical  challenges   §  Designing  and  implemen3ng  living  labs   architecture   §  Cost  of  implementa3on   §  Maintenance  and  adop3on   §  Managing  living  labs  infrastructure   Source:  hpp://living-­‐­‐for-­‐papers/  
  40. 40. §  Hos3ng  data  on  secure   server   §  Gaining  subjects’  trust   §  Coping  with  need  for  privacy   §  Alterna3ves  when   individuals  will  not  share   their  data   Privacy  and  security   §  No  search  queries  are   provided.   §  Data  stream  is  pseudo-­‐ mized,  i.e.,  users  cannot   be  iden3fied  based  on   their  IP  or  search  queries.  
  41. 41. §  User  consent   §  Ethics  approval   §  Trust  between  par3es   §  Copyright  issues   §  Commercial  sensi3vity  of   data   Legal  and  ethical  issues   §  Researchers  do  not   interact  with  users.   §  Business  rela3on  of  plista   and  their  customers.   §  Par3cipants  have  to  agree   to  terms  before   par3cipa3ng.  
  42. 42. §  Designing  and  implemen3ng   living  labs  architecture   §  Cost  of  implementa3on   §  Maintenance  and  adop3on   §  Managing  living  labs   infrastructure   Technical  challenges   §  Infrastructure  developed   in  context  of  research   project  EPEN.   §  Constantly  monitor  the   system.    
  43. 43. §  Forming  living  labs  for   partners  within  the  research   community   §  Obtaining  commercial   partners   §  Defining  tasks  and  scenarios   for  evalua3on  purposes   PracBcal  challenges   §  Always  keep  in  contact   with  your  par3cipants.   §  Adver3se.   §  Make  sure  no  one  can   cheat!   §  It’s  a  Win-­‐Win-­‐Win-­‐Win   situa3on.  (-­‐>  Torben)    
  44. 44. Acknowledgement  CO-­‐ORGANISERS   STEERING  COMMITTEE   §  Andreas  Lommatzsch   §  Benjamin  Kille   §  Till  Plumbaum   §  Torben  Brodt   §  Tobias  Heintz   §  Pablo  Castells   §  Paolo  Cremonesi   §  Hideo  Hoho   §  Udo  Kruschwitz   §  Joemon  M.  Jose   §  Mounia  Lalmas   §  Martha  Larson   §  Jimmy  Lin   §  Vivien  Petras   §  Domonkos  Tikk  
  45. 45. www.dai-­‐   Fon   Fax   +49  (0)  30  /  314  –  74   +49  (0)  30  /  314  –  74  003   DAI-­‐Labor   Technische  Universität  Berlin   Fakultät  IV  –     Elektrotechnik  &  Informa3k   Ernst-­‐Reuter-­‐Platz  7   10587  Berlin,  Deutschland   Distributed  Ar3ficial  Intelligence   Laboratory   Frank  Hopfgartner,  PhD   @OkapiBM25   Director  of   Competence  Center  Informa3on  Retrieval  and   Machine  Learning     frank.hopfgartner@tu-­‐   202   Thank  you