Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010)


Published on

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010)

  1. 1. Oge Marques Florida Atlantic University Boca Raton, FL - USA
  2. 2.   “Image search and retrieval” is not a problem, but rather a collection of related problems that look like one.   10 years after “the end of the early years”, research in image search and retrieval still has many open problems, challenges, and opportunities.
  3. 3.   This is a highly interdisciplinary field, but … Image and (Multimedia) Information Video Database Retrieval Processing Systems Visual Machine Computer Learning Information Vision Retrieval Visual data Human Visual Data Mining modeling and Perception representation
  4. 4.   There are many things that I believe…   … but cannot prove
  5. 5. The “big mismatch”
  6. 6.   It’s been 10 years since the “end of the early years” [Smeulders et al., 2000] ◦  Are the challenges from 2000 still relevant? ◦  Are the directions and guidelines from 2000 still appropriate?
  7. 7.   Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Driving forces   “[…] content-based image retrieval (CBIR) will continue to grow in every direction: new audiences, new purposes, new styles of use, new modes of interaction, larger data sets, and new methods to solve the problems.”
  8. 8.   Yes, we have seen many new audiences, new purposes, new styles of use, and new modes of interaction emerge.   Each of these usually requires new methods to solve the problems that they bring.   However, not too many researchers see them as a driving force (as they should).
  9. 9.   Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Heritage of computer vision   “An important obstacle to overcome […] is to realize that image retrieval does not entail solving the general image understanding problem.”
  10. 10.   I’m afraid I have bad news… ◦  Computer vision hasn’t made so much progress during the past 10 years. ◦  Some classical problems 
 (including image 
 remain unresolved. ◦  Similarly, CBIR from a 
 pure computer vision
 perspective didn’t work 
 too well either.
  11. 11.   Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Influence on computer vision   “[…] CBIR offers a different look at traditional computer vision problems: large data sets, no reliance on strong segmentation, and revitalized interest in color image processing and invariance.”
  12. 12.   The adoption of large data sets became standard practice in computer vision (see Torralba’s work).   No reliance on strong segmentation (still unresolved)  new areas of research, e.g., automatic ROI extraction and RBIR.   Color image processing and color descriptors became incredibly popular, useful, and (to some degree) effective.   Invariance still a huge problem ◦  But it’s cheaper than ever to have multiple views.
  13. 13.   Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Similarity and learning   “We make a pledge for the importance of human- based similarity rather than general similarity. Also, the connection between image semantics, image data, and query context will have to be made clearer in the future.”   “[…] in order to bring semantics to the user, learning is inevitable.”
  14. 14.   Similarity is a tough problem to crack and model.   See it for yourself…
  15. 15.   Are these two images similar?
  16. 16.   Are these two images similar?
  17. 17.   Is the second or the third image more similar to the first?
  18. 18.   Which image fits better to the first two: the third or the fourth?
  19. 19.   Is learning really inevitable?   Maybe, maybe not, but it sure comes handy in some specific cases… ◦  SVM anyone?
  20. 20.   Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Interaction   Better visualization options, more control to the user, ability to provide feedback […]
  21. 21.   Significant progress on visualization interfaces and devices.   Relevance Feedback: still a very tricky tradeoff (effort vs. perceived benefit), but more popular than ever (rating, thumbs up/ down, etc.)
  22. 22.   Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Need for databases   “The connection between CBIR and database research is likely to increase in the future. […] problems like the definition of suitable query languages, efficient search in high dimensional feature space, search in the presence of changing similarity measures are largely unsolved […]”
  23. 23.   Very little progress ◦  Image search and retrieval has benefited much more from document information retrieval than from database research.
  24. 24.   Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  The problem of evaluation   CBIR could use a reference standard against which new algorithms could be evaluated (similar to TREC in the field of text recognition).   “A comprehensive and publicly available collection of images, sorted by class and retrieval purposes, together with a protocol to standardize experimental practices, will be instrumental in the next phase of CBIR.”
  25. 25.   Significant progress on benchmarks, standardized datasets, etc. ◦  ImageCLEF ◦  Pascal VOC Challenge ◦  MSRA dataset ◦  Simplicity dataset ◦  UCID dataset and ground truth (GT) ◦  Accio / SIVAL dataset and GT ◦  Caltech 101, Caltech 256 ◦  LabelMe
  26. 26.   Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Semantic gap and other sources   “A critical point in the advancement of CBIR is the semantic gap, where the meaning of an image is rarely self-evident. […] One way to resolve the semantic gap comes from sources outside the image by integrating other sources of information about the image in the query.”
  27. 27.   The semantic gap problem has not been solved (and maybe will never be…)   What are the alternatives? 1.  Treat visual similarity and semantic relatedness differently   Examples: Alipr, Google similarity search, etc. 2.  Improve both (text-based and visual) search methods independently 3.  Trust the user   CFIR, collaborative filtering, crowdsourcing, games.
  28. 28.   I postulate that image search and retrieval is not a problem (but, instead, a collection of related problems that look like one)   There are many potential opportunities for good solutions to specific problems   One promising avenue: think about image retrieval as added value (e.g., like.com, SPE, etc.)
  29. 29.   Google Similarity Search (VisualRank) [Jing & Baluja, 2008]   Google Goggles (mobile visual search)
  30. 30.   Google Goggles understands narrow-domain search and retrieval   Several other apps for iPhone, iPad, and Android (e.g., kooaba and Fetch!)
  31. 31.   The Web 2.0 has brought about: ◦  New data sources ◦  New usage patterns ◦  New understanding about the users, their needs, habits, preferences ◦  New opportunities ◦  Lots of metadata! ◦  A chance to experience a true paradigm shift   Before: image annotation is tedious, labor-intensive, expensive   After: image annotation is fun!
  32. 32.   Games! ◦  Google Image Labeler ◦  Games with a purpose (GWAP):   The ESP Game   Squigl   Matchin
  33. 33.   New devices and services… ◦  Flickr (b. 2004) ◦  YouTube (b. 2005) ◦  Flip video cameras (b. 2006) ◦  iPhone (b. 2007) ◦  iPad (b. 2010)
  34. 34.   New opportunities for narrowing the semantic gap ◦  From bottom up: (semi-)automatic image annotation ◦  From top down: using (content / context) ontologies ◦  Combining top-down and bottom-up   New fields of research, including: ◦  Tag recommendation systems ◦  User intentions in image search
  35. 35.   Many opportunities await…
  36. 36. –  I believe (but cannot prove…) that successful Image Search & Retrieval solutions will: •  combine content-based image retrieval (CBIR) with metadata (high-level semantic-based image retrieval) •  only be truly successful in narrow domains •  include the user in the loop –  Relevance Feedback (RF) –  Collaborative efforts (tagging, rating, annotating) •  provide friendly, intuitive interfaces •  incorporate results and insights from cognitive science, particularly human visual attention, perception, and memory
  37. 37. Questions? omarques@fau.edu