Your SlideShare is downloading. ×
0
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Using games to improve computer vision solutions
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Using games to improve computer vision solutions

764

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
764
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Using  games  to  improve    computer  vision  solu1ons      Dr.  Oge  Marques  May  21,  2013  
  • 2. Take-­‐home  message  Solu1ons  to  many  problems  in  computer  vision  can  be  improved  using  human  computa1on,  par1cularly  through  properly  designed  games  (with  a  purpose).      Oge Marques
  • 3. Related  work  •  Luis  von  Ahn  (hKp://gwap.com)    Oge Marques
  • 4. Related  work  •  ESP  Oge Marques10 Axel Carlier, Oge Marques, Vincent Charvillat6 ConclusionsThis paper proposed a novel approach to solving a selected subset of computer vi-sion problems using games and described Ask’nSeek, a novel, simple, fun, web-basedguessing game based on images, their most relevant regions, and the spatial relation-ships among them. Two noteworthy aspects of the proposed game are: (i) it does in onegame what ESP [2] and Peekaboom [3] do in two games (namely, collecting labels andlocating the objects associated with those labels); and (ii) it avoids explicitly asking theuser to map labels and regions thanks to our novel semi-supervised learning algorithm.We also described how the information collected from very few game logs per imagewas used to feed a machine learning algorithm, which in turn produces the outline ofthe most relevant regions within the image and their labels.Our game can also be extended and improved in several directions, among them:different game modes, timer(s), addition of a social component (e.g., play against yourFacebook friends), extending the interface to allow touchscreen gestures for tablet-based play, and incorporation of incentives to the game, e.g., badges or coins, whichshould – among other things – encourage switching roles (master-seeker) periodically.References1. von Ahn, L., Dabbish, L.: Designing games with a purpose. Commun. ACM 51 (2008) 58–672. von Ahn, L., Dabbish, L.: Esp: Labeling images with a computer game. In: AAAI SpringSymposium: Knowledge Collection from Volunteer Contributors. (2005) 91–983. von Ahn, L., Liu, R., Blum, M.: Peekaboom: a game for locating objects in images. In: CHI.(2006) 55–644. Ho, C.J., Chang, T.H., Lee, J.C., jen Hsu, J.Y., Chen, K.T.: Kisskissban: a competitive humancomputation game for image annotation. SIGKDD Expl. 12 (2010) 21–24
  • 5. Related  work  •  Peekaboom  Oge Marques10 Axel Carlier, Oge Marques, Vincent Charvillat6 ConclusionsThis paper proposed a novel approach to solving a selected subset of computer vi-sion problems using games and described Ask’nSeek, a novel, simple, fun, web-basedguessing game based on images, their most relevant regions, and the spatial relation-ships among them. Two noteworthy aspects of the proposed game are: (i) it does in onegame what ESP [2] and Peekaboom [3] do in two games (namely, collecting labels andlocating the objects associated with those labels); and (ii) it avoids explicitly asking theuser to map labels and regions thanks to our novel semi-supervised learning algorithm.We also described how the information collected from very few game logs per imagewas used to feed a machine learning algorithm, which in turn produces the outline ofthe most relevant regions within the image and their labels.Our game can also be extended and improved in several directions, among them:different game modes, timer(s), addition of a social component (e.g., play against yourFacebook friends), extending the interface to allow touchscreen gestures for tablet-based play, and incorporation of incentives to the game, e.g., badges or coins, whichshould – among other things – encourage switching roles (master-seeker) periodically.References1. von Ahn, L., Dabbish, L.: Designing games with a purpose. Commun. ACM 51 (2008) 58–672. von Ahn, L., Dabbish, L.: Esp: Labeling images with a computer game. In: AAAI SpringSymposium: Knowledge Collection from Volunteer Contributors. (2005) 91–983. von Ahn, L., Liu, R., Blum, M.: Peekaboom: a game for locating objects in images. In: CHI.(2006) 55–644. Ho, C.J., Chang, T.H., Lee, J.C., jen Hsu, J.Y., Chen, K.T.: Kisskissban: a competitive humancomputation game for image annotation. SIGKDD Expl. 12 (2010) 21–245. Steggink, J., Snoek, C.G.M.: Adding semantics to image-region annotations with the name-
  • 6. Our  work  •  Ask’nSeek    (with  Vincent  Charvillat  and  Axel  Carlier,  ECCV  2012)  – A  two-­‐player  guessing  game  that  helps  solving  the  problems  of      object    detec/on  and    labeling      and      seman/c  scene    segmenta/on.  Oge Marques
  • 7. AsknSeek  •  Contribu1ons  – It  solves  two  computer  vision  problems  –  object  detec1on  and  labeling  –  in  a  single  game  – It  learns  spa1al  rela1onships  within  the  image  from  game  logs.    Oge Marques
  • 8. AsknSeek  •  The  basics  – Ask’nSeek  is  a  two-­‐player,  web-­‐based,  game  that  can  be  played  on  a  contemporary  browser  without  any  need  for  plug-­‐ins.    – One  player,  the  master,  hides  a  rectangular  region  somewhere  within  a  randomly  chosen  image.    – The  second  player  (seeker)  tries  to  guess  the  loca1on  of  the  hidden  region  through  a  series  of  successive  guesses,  expressed  by  clicking  at  some  point  in  the  image.    Oge Marques
  • 9. AsknSeek  •  Master  and  seeker  Oge Marques
  • 10. AsknSeek  •  Master  and  seeker  Oge Marques
  • 11. AsknSeek  -­‐  terminology  •  According  to  the  classifica1on  in  [von  Ahn,  CACM  2008],  Ask’nSeek  is  an  Inversion-­‐Problem  game,  because  “given  an  input,  Player  1  (in  our  case,  called  master)  produces  an  output,  and  Player  2  (the  seeker)  guesses  the  input”.    – More  specifically,  the  input  in  ques1on  is  the  loca1on  of  the  hidden  region  within  an  image  and  the  outputs  produced  by  Player  1  are  what  we  call  indica/ons.    Oge Marques
  • 12. AsknSeek  -­‐  terminology  •  Ini*al  Setup:  Two  players  are  randomly  chosen  by  the  game  itself.    •  Rules:  The  master  produces  an  input  (by  hiding  a  rectangular  region  within  an  image).  Based  on  this  input,  the  master  produces  outputs  (spa1al  clues,  i.e.,  indica1ons)  that  are  sent  to  the  seeker.  The  outputs  from  the  master  should  help  the  seeker  produce  the  original  input,  i.e.,  locate  the  hidden  box.    •  Winning  Condi*on:  The  seeker  produces  the  input  that  was  originally  produced  by  the  master,  i.e.,  guesses  the  correct  loca1on  by  clicking  on  any  pixel  within  the  hidden  bounding  box.    Oge Marques
  • 13. AsknSeek  •  Game  logs  – The  game  logs  contain  labels  (oaen  in  mul1ple  languages)  as  well  as  ‘on’,  ‘par1ally  on’  and  ‘lea-­‐right-­‐above-­‐below’  rela1ons.    •  Examples  of  labels  include  foreground  objects  (e.g.,  dog,  bus)  as  well  as  other  seman1cally  meaningful  regions  within  the  image  (e.g.,  sky,  road).    Oge Marques
  • 14. AsknSeek  •  Game  logs  – Very  few  (~  7)  games  per  image  are  needed!  Oge Marqueswe considered only the game logs that use the spatial relations “abothe left of” and “on the right of”. We then used that information tg box limiting the region that respects all the constraints defined by thin which the object must reside. Figure 3(a) plots in black all the pde this bounding box for the ‘dog’ object.(a) (b)alysis of simulation logs with different number of simulated games: (a) 10,0
  • 15. AsknSeek  "under  the  hood"  •  Super-­‐pixel  segmenta1on  using  SLIC    Oge Marques
  • 16. AsknSeek  "under  the  hood"  •  Learning  strategy:  GMM  +  EM  •  Crosses:  candidate  super-­‐pixels  •  Red:  ON  points  •  Blue:  PARTIALLY  ON  points  •  Green:  LEFT-­‐RIGHT-­‐ABOVE-­‐BELOW  bounding  box  Oge Marques
  • 17. AsknSeek  •  Examples  of  results  Oge Marques
  • 18. AsknSeek  •  Examples  of  results  Oge Marques
  • 19. AsknSeek  •  Examples  of  results  Oge Marques
  • 20. AsknSeek  •  Comparison  against  baseline  detector  (*)  Oge MarquesAsk’nSeek: a new game for object detection and labeling 9(a) (b)(c) (d)Fig. 5. Representative representative results: (a) baseline dog detector from [9]; (b) result fromour approach for label ‘dog’; (c) baseline cat detector from [9]; (d) result from our approach forlabel ‘cat’.(1 a)eµl(k)and Sbl = b bSl(k 1)+ (1 b) eSl(k)ac-tually improve the likelihood:a⇤,b⇤= argmaxa,bP(X,R |µal ,Sbl ,g(k)) (11)q(k) is updated with K new clusters locationsbµl(k)µa⇤l and covariances bSl(k)Sb⇤l , pro-duced by K optimizations (equation 11 for l =1,.,K) that make the likelihood monotonically in-creasing.Until P(X,R |q(k),g(k)) converge.8. REFERENCES[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, andS. Susstrunk. Slic superpixels. Technical report, EPFL, 2010.[2] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen.Interactively co-segmentating topically related images withintelligent scribble guidance. IJCV, 93(3):273–292, 2011.[3] S. Branson, P. Perona, and S. Belongie. Strong supervisionfrom weak annotation: Interactive training of deformablepart models. In ICCV, Barcelona, 2011.[4] S. Branson, C. Wah, B. Babenko, F. Schroff, P. Welinder,P. Perona, and S. Belongie. Visual recognition with humansin the loop. In ECCV, Heraklion, 2010.[5] A. Carlier, G. Ravindra, V. Charvillat, and W. T. Ooi.Combining content-based analysis and crowdsourcing toimprove user interaction with zoomable video. In ACMMM’11, pages 43–52, 2011.[6] O. Chapelle, B. Schölkopf, and A. Zien, editors.Semi-Supervised Learning. MIT Press, Cambridge, 2006.[7] D. Comaniciu and P. Meer. Mean shift: A robust approachtoward feature space analysis. IEEE Trans. PAMI,24(5):603–619, 2002.[8] S. Cooper, F. Khatib, A. Treuille, J. Barbero, J. Lee,M. Beenen, A. Leaver-Fay, D. Baker, Z. Popovic, andF. Players. Predicting protein structures with a multiplayeronline game. Nature, 466(7307):756–760, Aug. 2010.[9] P. Felzenszwalb, R. Girshick, and D. McAllester.Discriminatively trained deformable part models, release 4.http://people.cs.uchicago.edu/ pff/latent-release4/.[10] P. Felzenszwalb, R. Girshick, D. McAllester, andD. Ramanan. Object detection with discriminatively trainedpart-based models. IEEE Trans. PAMI, 32:1627–1645, 2010.Tagcaptcha: annotating images with captchas. In ACMMM’10, pages 1557–1558, 2010.[18] Y. Ni, J. Dong, J. Feng, and S. Yan. Purposivehidden-object-game: embedding human computation inpopular game. In ACM MM’11, pages 1121–1124, 2011.[19] A. J. Quinn and B. B. Bederson. Human computation: asurvey and taxonomy of a growing field. In CHI’11, pages1403–1412, 2011.[20] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman.Labelme: A database and web-based tool for imageannotation. IJCV, 77(1-3):157–173, 2008.[21] N. Savage. Gaining wisdom from crowds. Commun. ACM,55(3):13–15, Mar. 2012.[22] S. Seung. Eyewire. http://eyewire.org/, 2012.[23] J. Steggink and C. G. M. Snoek. Adding semantics toimage-region annotations with the name-it-game.Multimedia Systems, 17(5):367–378, October 2011.[24] A. Vedaldi and B. Fulkerson. VLFeat: An open and portablelibrary of computer vision algorithms.http://www.vlfeat.org/, 2008.[25] S. Vijayanarasimhan and K. Grauman. Cost-sensitive activevisual category learning. IJCV, 91(1):24–44, 2011.[26] L. von Ahn and L. Dabbish. Esp: Labeling images with acomputer game. In AAAI Spring Symposium: KnowledgeCollection from Volunteer Contributors, pages 91–98, 2005.[27] L. von Ahn and L. Dabbish. Designing games with apurpose. Commun. ACM, 51(8):58–67, 2008.[28] L. von Ahn, R. Liu, and M. Blum. Peekaboom: a game forlocating objects in images. In CHI, pages 55–64, 2006.[29] C. Wah, S. Branson, P. Perona, and S. Belongie. Multiclassrecognition and part localization with humans in the loop. InICCV, Barcelona, 2011.[30] W. Wu and J. Yang. Smartlabel: an object labeling tool usingiterated harmonic energy minimization. In ACM MM’06,pages 891–900, 2006.[31] J. Yuen, B. C. Russell, C. Liu, and A. Torralba. Labelmevideo: Building a video database with human annotations. InICCV, pages 1451–1458, 2009.[32] M.-C. Yuen, L.-J. Chen, and I. King. A survey of humancomputation systems. In Computational Science andEngineering, 2009. CSE ’09. International Conference on,volume 4, pages 723 –728, aug. 2009.[33] X. Zhu and A. B. Goldberg. Introduction to Semi-SupervisedLearning. Synthesis Lectures on Artificial Intelligence andMachine Learning. Morgan & Claypool Publishers, 2009.
  • 21. AsknSeek  •  Comparison  against  saliency  map  results  – (e):  Harels  implementa1on  of  In  et  al.  – (f):  results  from  our  game  Oge Marquesuc-ed,theme-owateanding1],sed1].l toon-(c) (d)(e) (f)Figure 7: Representative representative results: (a) baselinedog detector from [9]; (b) result from our approach for la-bel ‘dog’; (c) baseline cat detector from [9]; (d) result from ourapproach for label ‘cat’; (e) saliency map produced by [11]; (f)comparable result from our approach.
  • 22. AsknSeek  •  Final  remarks  – It  does  in  one  game  what  ESP  and  Peekaboom  do  in  two  games  (namely,  collec1ng  labels  and  loca1ng  the  objects  associated  with  those  labels).  – It  avoids  explicitly  asking  the  user  to  map  labels  and  regions  thanks  to  our  novel  semi-­‐supervised  learning  algorithm.    – It  requires  very  few  game  logs  per  image  to  feed  a  machine  learning  algorithm,  which  in  turn  produces  the  outline  of  the  most  relevant  regions  within  the  image  and  their  labels.    Oge Marques
  • 23. AsknSeek  •  Ongoing  work  – Different  segmenta1on  algorithms  and  figures  of  merit  •  Joint  work  with  UPC  (Barcelona)  – Extensive  evalua1on  against  PASCAL  VOC  dataset  Oge Marques
  • 24. AsknSeek  •  Many  possibili1es  – Different  segmenta1on  algorithms  – (Mul1-­‐language)  text-­‐based  processing  – "Objects  in  context"  •  Biggest  challenge  – Making  the  game  fun!  – Increasing  player  base  Oge Marques
  • 25. AsknSeek  •  Lets  play!  •  hKp://1nyurl.com/asknseek    Oge Marques
  • 26. Our  work  •  Guess  That  Face    (with  Mathias  Lux  and  Justyn  Snyder,  CHI  2013)  – a  face  recogni1on  game  that  reverse  engineers  the  human  biological  threshold  for  accurately  recognizing  blurred  faces  of  celebri1es  under  1me-­‐varying  condi1ons.  Oge Marques! ! !
  • 27. •  Mo1va1on:  – Human  vision:  we  are  remarkably  good  at  recognizing  (severely  blurred)  famous  faces    Oge MarquesFig. 1. Unlike current machine-based systems, human observers are able to handle significant degradations in face images. For instance,subjects are able to recognize more than half of all familiar faces shown to them at the resolution depicted here. Individuals shown inorder are: Michael Jordan, Woody Allen, Goldie Hawn, Bill Clinton, Tom Hanks, Saddam Hussein, Elvis Presley, Jay Leno,Sinha et al.: Face Recognition by Humans: Nineteen Results Researchers Should Know AboutSinha et al.: Face Recognition by Humans: Nineteen Results Researchers Should Know About
  • 28. •  Mo1va1on:  – Computer  vision:  face  recogni1on  s1ll  not  mature  – Machine  learning:  what  if  we  train  algorithms  with  blurry  faces  instead?  – Game  play:  SongPop  and  Facebook  Oge Marques
  • 29. •  The  game:  –  Player  must  analyze  a  series  of  randomly-­‐generated  images  of  celebri1es  while  the  images  transi1on  from  an  ini1ally  severely-­‐blurred  state  to  their  original  state  over  a  constant  interval  of  1me.    –  While  the  image  is  being  progressively  de-­‐blurred  on  the  screen,  players  are  prompted  to  select  the  name  of  the  celebrity  who  they  believe  is  correct,  which  they  do  once  they  have  confidence  in  their  answer.    Oge Marques!
  • 30. •  Dataset  (original):  – 48  popular  celebri1es  +  2  poli1cians  – Each  image  in  the  dataset  also  has  two  varia1ons:  de-­‐saturated  and  horizontally-­‐flipped  Oge Marques
  • 31. •  Game  logs:  –  the  image  iden1fier  in  the  dataset  that  contained  the  correct  answer,    –  the  varia1on  of  the  image  (normal,  de-­‐saturated,  or  horizontally  flipped),    –  the  degree  of  blurring  (ranging  from  0  to  65),    –  the  answer  the  user  selected  (or  the  1me-­‐out),    –  the  orienta1on  of  both  the  correct  answer  and  the  selected  answer,    –  the  1me  stamp,    –  the  round  number.    Oge Marques
  • 32. •  Terminology:  –  Guess  That  Face  is  an  inversion-­‐problem  game,  where  one  player  provides  output  while  the  second  player  guesses  the  input  provided  by  the  first  player.    –  Specifically,  for  Guess  That  Face,  the  “player”  who  generates  the  random  output  is  a  computer,  while  the  player  who  guesses  the  correct  input  is  a  human.    –  The  output  in  ques1on  is  both  the  image  of  the  celebrity  and  the  set  of  op1ons  and  the  input  is    the  correct  answer  selected  by  the  player.    Oge Marques
  • 33. •  User  studies:  – 28  undergraduate  students  at  FAU  par1cipated.    – The  mean  age  is  21.    – 82%  are  male  and  18%  are  female.      – Many  strongly  agreed  on  “I  found  the  game  to  be  very  easy  to  understand,”  with  a  mean  of  4.93  on  a  5-­‐point  Likert  scale  [1  ..  5].  Oge Marques
  • 34. •  User  studies:  – 33.3%  of  the  students  answered  that  they  confessed  to  random  guessing.  S1ll,  80%  of  the  students  played  at  least  one  game  without  using  an  exploit  or  randomly  guessing.  – 80%  of  the  par1cipants  saw  poten1al  for  broader  deployment,  i.e.,  as  a  social  media  applica1on.  – 80%  of  the  players  would  recommend  the  game  to  their  friends.  Oge Marques
  • 35. •  Deblurring  method    – StackBlur  Gaussian  blurring  algorithm    – Blurring  radius  ranging  from  0  to  65.  – Total  de-­‐blurring  1me  =  8.125  seconds.      Oge Marques
  • 36. •  Results:    Oge Marques
  • 37. •  Conclusions:  – Promising  preliminary  results  – Work  in  Progress  – Poten1al  to  extend  to  other  areas  and  age  groups  Oge Marques
  • 38. Guess  That  Face  •  Lets  play!  •  hKp://1nyurl.com/guessthazace      Oge Marques
  • 39. Ongoing  and  future  work  •  Build  a  framework  for  development  of  games  for  vision  problems  •  Extend  work  of  AsknSeek  and  GTF  to  other  vision  problems,  e.g.,  scene  classifica1on,  object  recogni1on  •  Develop  mobile  games  for  children  to  help  human  vision  scien1sts  study  developmental  aspects  of  vision  Oge Marques
  • 40. Concluding  remarks  •  "Recipe  for  success"  – Games  that  are  well-­‐designed  (look  like  games,  have  a  certain  addic1ve  component,  make  you  want  to  share  with  friends,  have  a  long  shelf  life)  – Games  that  are  fun  to  play  – Game  logs  that  convey  useful  informa1on  that  would  not  be  easily  obtained  otherwise  – Meaningful  (open)  computer  vision  problems  – Sound  machine  learning  strategies  to  leverage  the  knowledge  acquired  through  game  logs.  Oge Marques
  • 41. Discussion  •  Which  (types  of)  games?    •  Which  (group  of)  vision  problems?  •  How  to  avoid  the  trap  of  "games  that  arent"?  •  How  to  make  a  game  "go  viral"?    •  What  else  can  go  wrong?  Oge Marques
  • 42. Lets  get  to  work!  •  Which  computer  vision  problem  would  you  like  to  solve  using  games?  •  Contact  me  with  ideas:  omarques@fau.edu    Oge Marques

×