Oge Marques
Florida Atlantic University
     Boca Raton, FL - USA
    “Image search and retrieval” is not a problem,
     but rather a collection of related problems that
     look like one.

    10 years after “the end of the early years”,
     research in image search and retrieval still has
     many open problems, challenges, and
     opportunities.
    This is a highly interdisciplinary field, but …

                        Image and       (Multimedia)
                                                         Information
                          Video          Database
                                                           Retrieval
                        Processing        Systems




                                          Visual
                     Machine                                 Computer
                     Learning          Information            Vision
                                         Retrieval



                                         Visual data
                                                        Human Visual
                         Data Mining    modeling and
                                                         Perception
                                       representation
    There are many things that I believe…




    … but cannot prove
The “big mismatch”
    It’s been 10 years since the “end of the early
     years” [Smeulders et al., 2000]




     ◦  Are the challenges from 2000 still relevant?
     ◦  Are the directions and guidelines from 2000 still
        appropriate?
    Revisiting the ‘Concluding Remarks’ from
     [Smeulders et al., 2000]:

     ◦  Driving forces
        “[…] content-based image retrieval (CBIR) will continue
         to grow in every direction: new audiences, new
         purposes, new styles of use, new modes of interaction,
         larger data sets, and new methods to solve the
         problems.”
    Yes, we have seen many new audiences, new
     purposes, new styles of use, and new modes
     of interaction emerge.

    Each of these usually requires new methods
     to solve the problems that they bring.

    However, not too many researchers see them
     as a driving force (as they should).
    Revisiting the ‘Concluding Remarks’ from
     [Smeulders et al., 2000]:

     ◦  Heritage of computer vision
        “An important obstacle to overcome […] is to realize
         that image retrieval does not entail solving the general
         image understanding problem.”
    I’m afraid I have bad news…
     ◦  Computer vision hasn’t made so much progress
        during the past 10 years.

     ◦  Some classical problems 

        (including image 

        understanding)

        remain unresolved.

     ◦  Similarly, CBIR from a 

        pure computer vision

        perspective didn’t work 

        too well either.
    Revisiting the ‘Concluding Remarks’ from
     [Smeulders et al., 2000]:

     ◦  Influence on computer vision
        “[…] CBIR offers a different look at traditional computer
         vision problems: large data sets, no reliance on strong
         segmentation, and revitalized interest in color image
         processing and invariance.”
    The adoption of large data sets became standard
     practice in computer vision (see Torralba’s work).
    No reliance on strong segmentation (still
     unresolved)  new areas of research, e.g.,
     automatic ROI extraction and RBIR.
    Color image processing and color descriptors
     became incredibly popular, useful, and (to some
     degree) effective.
    Invariance still a huge problem
     ◦  But it’s cheaper than ever to have multiple views.
    Revisiting the ‘Concluding Remarks’ from
     [Smeulders et al., 2000]:

     ◦  Similarity and learning
        “We make a pledge for the importance of human-
         based similarity rather than general similarity. Also,
         the connection between image semantics, image data,
         and query context will have to be made clearer in the
         future.”
        “[…] in order to bring semantics to the user, learning is
         inevitable.”
    Similarity is a tough problem to crack and
     model.

    See it for yourself…
    Are these two images similar?
    Are these two images similar?
    Is the second or the third image more similar
     to the first?
    Which image fits better to the first two: the
     third or the fourth?
    Is learning really inevitable?

    Maybe, maybe not, but it sure comes handy
     in some specific cases…
     ◦  SVM anyone?
    Revisiting the ‘Concluding Remarks’ from
     [Smeulders et al., 2000]:

     ◦  Interaction
        Better visualization options, more control to the user,
         ability to provide feedback […]
    Significant progress on visualization
     interfaces and devices.

    Relevance Feedback: still a very tricky
     tradeoff (effort vs. perceived benefit), but
     more popular than ever (rating, thumbs up/
     down, etc.)
    Revisiting the ‘Concluding Remarks’ from
     [Smeulders et al., 2000]:

     ◦  Need for databases
        “The connection between CBIR and database research
         is likely to increase in the future. […] problems like the
         definition of suitable query languages, efficient search
         in high dimensional feature space, search in the
         presence of changing similarity measures are largely
         unsolved […]”
    Very little progress
     ◦  Image search and retrieval has benefited much
        more from document information retrieval than
        from database research.
    Revisiting the ‘Concluding Remarks’ from
     [Smeulders et al., 2000]:

     ◦  The problem of evaluation
        CBIR could use a reference standard against which new
         algorithms could be evaluated (similar to TREC in the
         field of text recognition).
        “A comprehensive and publicly available collection of
         images, sorted by class and retrieval purposes,
         together with a protocol to standardize experimental
         practices, will be instrumental in the next phase of
         CBIR.”
    Significant progress on benchmarks,
     standardized datasets, etc.
     ◦  ImageCLEF
     ◦  Pascal VOC Challenge
     ◦  MSRA dataset
     ◦  Simplicity dataset
     ◦  UCID dataset and ground truth (GT)
     ◦  Accio / SIVAL dataset and GT
     ◦  Caltech 101, Caltech 256
     ◦  LabelMe
    Revisiting the ‘Concluding Remarks’ from
     [Smeulders et al., 2000]:

     ◦  Semantic gap and other sources
        “A critical point in the advancement of CBIR is the
         semantic gap, where the meaning of an image is rarely
         self-evident. […] One way to resolve the semantic gap
         comes from sources outside the image by integrating
         other sources of information about the image in the
         query.”
    The semantic gap problem has not been
     solved (and maybe will never be…)

    What are the alternatives?
     1.  Treat visual similarity and semantic relatedness
         differently
        Examples: Alipr, Google similarity search, etc.
     2.  Improve both (text-based and visual) search
         methods independently
     3.  Trust the user
        CFIR, collaborative filtering, crowdsourcing, games.
    I postulate that image search and retrieval is
     not a problem (but, instead, a collection of
     related problems that look like one)

    There are many potential opportunities for
     good solutions to specific problems

    One promising avenue: think about image
     retrieval as added value (e.g., like.com, SPE,
     etc.)
    Google Similarity Search (VisualRank) [Jing &
     Baluja, 2008]



    Google Goggles (mobile visual search)
    Google Goggles understands narrow-domain
     search and retrieval




    Several other apps for iPhone, iPad, and
     Android (e.g., kooaba and Fetch!)
    The Web 2.0 has brought about:
     ◦  New data sources
     ◦  New usage patterns
     ◦  New understanding about the users, their needs,
        habits, preferences
     ◦  New opportunities
     ◦  Lots of metadata!

     ◦  A chance to experience a true paradigm shift
        Before: image annotation is tedious, labor-intensive,
         expensive
        After: image annotation is fun!
    Games!
     ◦  Google Image Labeler
     ◦  Games with a purpose (GWAP):
        The ESP Game
        Squigl
        Matchin
    New devices and services…

     ◦  Flickr (b. 2004)
     ◦  YouTube (b. 2005)
     ◦  Flip video cameras (b. 2006)
     ◦  iPhone (b. 2007)
     ◦  iPad (b. 2010)
    New opportunities for narrowing the semantic
     gap
     ◦  From bottom up: (semi-)automatic image
        annotation
     ◦  From top down: using (content / context)
        ontologies
     ◦  Combining top-down and bottom-up

    New fields of research, including:
     ◦  Tag recommendation systems
     ◦  User intentions in image search
    Many opportunities await…
–    I believe (but cannot prove…) that successful
     Image Search & Retrieval solutions will:
     •  combine content-based image retrieval (CBIR) with
        metadata (high-level semantic-based image
        retrieval)
     •  only be truly successful in narrow domains
     •  include the user in the loop
      –  Relevance Feedback (RF)
      –  Collaborative efforts (tagging, rating, annotating)
     •  provide friendly, intuitive interfaces
     •  incorporate results and insights from cognitive
        science, particularly human visual attention,
        perception, and memory
Questions?




             omarques@fau.edu

Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010)

  • 1.
    Oge Marques Florida AtlanticUniversity Boca Raton, FL - USA
  • 2.
      “Image search and retrieval” is not a problem, but rather a collection of related problems that look like one.   10 years after “the end of the early years”, research in image search and retrieval still has many open problems, challenges, and opportunities.
  • 3.
      This is a highly interdisciplinary field, but … Image and (Multimedia) Information Video Database Retrieval Processing Systems Visual Machine Computer Learning Information Vision Retrieval Visual data Human Visual Data Mining modeling and Perception representation
  • 4.
      There are many things that I believe…   … but cannot prove
  • 5.
  • 6.
      It’s been 10 years since the “end of the early years” [Smeulders et al., 2000] ◦  Are the challenges from 2000 still relevant? ◦  Are the directions and guidelines from 2000 still appropriate?
  • 7.
      Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Driving forces   “[…] content-based image retrieval (CBIR) will continue to grow in every direction: new audiences, new purposes, new styles of use, new modes of interaction, larger data sets, and new methods to solve the problems.”
  • 8.
      Yes, we have seen many new audiences, new purposes, new styles of use, and new modes of interaction emerge.   Each of these usually requires new methods to solve the problems that they bring.   However, not too many researchers see them as a driving force (as they should).
  • 9.
      Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Heritage of computer vision   “An important obstacle to overcome […] is to realize that image retrieval does not entail solving the general image understanding problem.”
  • 10.
      I’m afraid I have bad news… ◦  Computer vision hasn’t made so much progress during the past 10 years. ◦  Some classical problems 
 (including image 
 understanding)
 remain unresolved. ◦  Similarly, CBIR from a 
 pure computer vision
 perspective didn’t work 
 too well either.
  • 11.
      Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Influence on computer vision   “[…] CBIR offers a different look at traditional computer vision problems: large data sets, no reliance on strong segmentation, and revitalized interest in color image processing and invariance.”
  • 12.
      The adoption of large data sets became standard practice in computer vision (see Torralba’s work).   No reliance on strong segmentation (still unresolved)  new areas of research, e.g., automatic ROI extraction and RBIR.   Color image processing and color descriptors became incredibly popular, useful, and (to some degree) effective.   Invariance still a huge problem ◦  But it’s cheaper than ever to have multiple views.
  • 14.
      Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Similarity and learning   “We make a pledge for the importance of human- based similarity rather than general similarity. Also, the connection between image semantics, image data, and query context will have to be made clearer in the future.”   “[…] in order to bring semantics to the user, learning is inevitable.”
  • 15.
      Similarity is a tough problem to crack and model.   See it for yourself…
  • 16.
      Are these two images similar?
  • 17.
      Are these two images similar?
  • 18.
      Is the second or the third image more similar to the first?
  • 19.
      Which image fits better to the first two: the third or the fourth?
  • 20.
      Is learning really inevitable?   Maybe, maybe not, but it sure comes handy in some specific cases… ◦  SVM anyone?
  • 21.
      Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Interaction   Better visualization options, more control to the user, ability to provide feedback […]
  • 22.
      Significant progress on visualization interfaces and devices.   Relevance Feedback: still a very tricky tradeoff (effort vs. perceived benefit), but more popular than ever (rating, thumbs up/ down, etc.)
  • 23.
      Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Need for databases   “The connection between CBIR and database research is likely to increase in the future. […] problems like the definition of suitable query languages, efficient search in high dimensional feature space, search in the presence of changing similarity measures are largely unsolved […]”
  • 24.
      Very little progress ◦  Image search and retrieval has benefited much more from document information retrieval than from database research.
  • 25.
      Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  The problem of evaluation   CBIR could use a reference standard against which new algorithms could be evaluated (similar to TREC in the field of text recognition).   “A comprehensive and publicly available collection of images, sorted by class and retrieval purposes, together with a protocol to standardize experimental practices, will be instrumental in the next phase of CBIR.”
  • 26.
      Significant progress on benchmarks, standardized datasets, etc. ◦  ImageCLEF ◦  Pascal VOC Challenge ◦  MSRA dataset ◦  Simplicity dataset ◦  UCID dataset and ground truth (GT) ◦  Accio / SIVAL dataset and GT ◦  Caltech 101, Caltech 256 ◦  LabelMe
  • 27.
      Revisiting the ‘Concluding Remarks’ from [Smeulders et al., 2000]: ◦  Semantic gap and other sources   “A critical point in the advancement of CBIR is the semantic gap, where the meaning of an image is rarely self-evident. […] One way to resolve the semantic gap comes from sources outside the image by integrating other sources of information about the image in the query.”
  • 28.
      The semantic gap problem has not been solved (and maybe will never be…)   What are the alternatives? 1.  Treat visual similarity and semantic relatedness differently   Examples: Alipr, Google similarity search, etc. 2.  Improve both (text-based and visual) search methods independently 3.  Trust the user   CFIR, collaborative filtering, crowdsourcing, games.
  • 29.
      I postulate that image search and retrieval is not a problem (but, instead, a collection of related problems that look like one)   There are many potential opportunities for good solutions to specific problems   One promising avenue: think about image retrieval as added value (e.g., like.com, SPE, etc.)
  • 30.
      Google Similarity Search (VisualRank) [Jing & Baluja, 2008]   Google Goggles (mobile visual search)
  • 31.
      Google Goggles understands narrow-domain search and retrieval   Several other apps for iPhone, iPad, and Android (e.g., kooaba and Fetch!)
  • 32.
      The Web 2.0 has brought about: ◦  New data sources ◦  New usage patterns ◦  New understanding about the users, their needs, habits, preferences ◦  New opportunities ◦  Lots of metadata! ◦  A chance to experience a true paradigm shift   Before: image annotation is tedious, labor-intensive, expensive   After: image annotation is fun!
  • 33.
      Games! ◦  Google Image Labeler ◦  Games with a purpose (GWAP):   The ESP Game   Squigl   Matchin
  • 34.
      New devices and services… ◦  Flickr (b. 2004) ◦  YouTube (b. 2005) ◦  Flip video cameras (b. 2006) ◦  iPhone (b. 2007) ◦  iPad (b. 2010)
  • 35.
      New opportunities for narrowing the semantic gap ◦  From bottom up: (semi-)automatic image annotation ◦  From top down: using (content / context) ontologies ◦  Combining top-down and bottom-up   New fields of research, including: ◦  Tag recommendation systems ◦  User intentions in image search
  • 36.
      Many opportunities await…
  • 37.
    –  I believe (but cannot prove…) that successful Image Search & Retrieval solutions will: •  combine content-based image retrieval (CBIR) with metadata (high-level semantic-based image retrieval) •  only be truly successful in narrow domains •  include the user in the loop –  Relevance Feedback (RF) –  Collaborative efforts (tagging, rating, annotating) •  provide friendly, intuitive interfaces •  incorporate results and insights from cognitive science, particularly human visual attention, perception, and memory
  • 38.
    Questions? omarques@fau.edu