Recent advances in visual information retrieval marques klu june 2010

Oge Marques
Florida Atlantic University
Boca Raton, FL - USA

  VIR is a highly interdisciplinary field, but …

Image and (Multimedia)
Information
Video Database
Retrieval
Processing Systems

Visual
Machine Computer
Learning Information Vision
Retrieval

Visual data
Human Visual
Data Mining modeling and
Perception
representation

Klagenfurt - June 2010

  There are many things that I believe…

  … but cannot prove


The “big mismatch”


  Part I
◦  10 years after the “end of the early years”
  Where are we now?
  Part II
◦  Medical image retrieval
  Challenges and opportunities
  Part III
◦  Where is VIR headed?
  Advice for young researchers


  It’s been 10 years since the “end of the early
years” [Smeulders et al., 2000]

◦  Are the challenges from 2000 still relevant?
◦  Are the directions and guidelines from 2000 still
appropriate?


  Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:

◦  Driving forces
  “[…] content-based image retrieval (CBIR) will continue
to grow in every direction: new audiences, new
purposes, new styles of use, new modes of interaction,
larger data sets, and new methods to solve the
problems.”


  Yes, we have seen many new audiences, new
purposes, new styles of use, and new modes
of interaction emerge.

  Each of these usually requires new methods
to solve the problems that they bring.

  However, not too many researchers see them
as a driving force (as they should).



◦  Heritage of computer vision
  “An important obstacle to overcome […] is to realize
that image retrieval does not entail solving the general
image understanding problem.”


  I’m afraid I have bad news…
◦  Computer vision hasn’t made so much progress
during the past 10 years.

◦  Some classical problems  
(including image  
understanding) 
remain unresolved.

◦  Similarly, CBIR from a  
pure computer vision 
perspective didn’t work  
too well either.



◦  Influence on computer vision
  “[…] CBIR offers a different look at traditional computer
vision problems: large data sets, no reliance on strong
segmentation, and revitalized interest in color image
processing and invariance.”


  The adoption of large data sets became standard
practice in computer vision (see Torralba’s work).
  No reliance on strong segmentation (still
unresolved)  new areas of research, e.g.,
automatic ROI extraction and RBIR.
  Color image processing and color descriptors
became incredibly popular, useful, and (to some
degree) effective.
  Invariance still a huge problem
◦  But it’s cheaper than ever to have multiple views.



◦  Similarity and learning
  “We make a pledge for the importance of human-
based similarity rather than general similarity. Also,
the connection between image semantics, image data,
and query context will have to be made clearer in the
future.”
  “[…] in order to bring semantics to the user, learning is
inevitable.”


  The authors were pointing in the right
direction (human in the loop, role of context,
benefits from learning,…)

  However:
◦  Similarity is a tough problem to crack and model.
  Even the understanding of how humans judge image
similarity is very limited.
◦  Machine learning is almost inevitable…
  … but sometimes it can be abused.



◦  Interaction
  Better visualization options, more control to the user,
ability to provide feedback […]


  Significant progress on visualization
interfaces and devices.

  Relevance Feedback: still a very tricky
tradeoff (effort vs. perceived benefit), but
more popular than ever (rating, thumbs up/
down, etc.)



◦  Need for databases
  “The connection between CBIR and database research
is likely to increase in the future. […] problems like the
definition of suitable query languages, efficient search
in high dimensional feature space, search in the
presence of changing similarity measures are largely
unsolved […]”


  Very little progress

◦  Image search and retrieval has benefited much
more from document information retrieval than
from database research.



◦  The problem of evaluation
  CBIR could use a reference standard against which new
algorithms could be evaluated (similar to TREC in the
field of text recognition).
  “A comprehensive and publicly available collection of
images, sorted by class and retrieval purposes,
together with a protocol to standardize experimental
practices, will be instrumental in the next phase of
CBIR.”


  Significant progress on benchmarks,
standardized datasets, etc.

◦  ImageCLEF
◦  Pascal VOC Challenge
◦  MSRA dataset
◦  Simplicity dataset
◦  UCID dataset and ground truth (GT)
◦  Accio / SIVAL dataset and GT
◦  Caltech 101, Caltech 256
◦  LabelMe



◦  Semantic gap and other sources
  “A critical point in the advancement of CBIR is the
semantic gap, where the meaning of an image is rarely
self-evident. […] One way to resolve the semantic gap
comes from sources outside the image by integrating
other sources of information about the image in the
query.”


  The semantic gap problem has not been
solved (and maybe will never be…)

  What are the alternatives?
1.  Treat visual similarity and semantic relatedness
differently
  Examples: Alipr, Google similarity search, etc.
2.  Improve both (text-based and visual) search
methods independently
3.  Trust the user
  CFIR, collaborative filtering, crowdsourcing, games.


  Challenges
◦  We’re entering a new country…
  How much can we bring?
  Do we speak the language?
  Do we know their culture?
  Do they understand us and where we come from?
  Opportunities
◦  They use images (extensively)
◦  They have expert knowledge
◦  Domains are narrow (almost by definition)
◦  Fewer clients, but potentially more $$


  An overview of the challenges

◦  Different terminology
◦  Standards (e.g., DICOM)
◦  Modality dependencies
◦  Equipment dependencies
◦  Privacy issues
◦  Proprietary data
◦  A tough sell?


  Be prepared for:
◦  New acronyms
  CBMIR (Content-Based Medical Image Retrieval)
  PACS (Picture Archiving and Communication System)
  DICOM (Digital Imaging and COmmunication in
Medicine)
  Hospital Information Systems (HIS)
  Radiological Information Systems (RIS)
◦  New phrases
  Imaging informatics
◦  Lots of technical medical terms


  DICOM (http://medical.nema.org/)
◦  Global IT standard, created in 1993, used in
virtually all hospitals worldwide.
◦  Designed to ensure the interoperability of different
systems and manage related workflow.
◦  Will be required by all EHR systems that include
imaging information as an integral part of the
patient record.
◦  750+ technical and medical experts participate in
20+ active DICOM working groups.
◦  Standard is updated 4-5 times per year.
◦  Many available tools! (see http://www.idoimaging.com/)


  The IRMA code [Lehmann et al., 2003]
◦  4 axes with 3 to 4 positions, each in
{0,...9,a,...,z}, where "0" denotes "unspecified" to
determine the end of a path along an axis.
  Technical code (T) describes the imaging
modality
  Directional code (D) models body orientations
  Anatomical code (A) refers to the body region
examined
  Biological code (B) describes the biological
system examined.


  The IRMA code [Lehmann et al., 2003]
◦  The entire code results in a character string of <14
characters (IRMA: TTTT – DDD – AAA – BBB).

Example: “x-ray, projection radiography,
analog, high energy – sagittal, left lateral
decubitus, inspiration – chest, lung –
respiratory system, lung”

Source: [Lehmann et al., 2003]

  The IRMA code
[Lehmann et
al., 2003]

◦  The companion
tool…

Source: [Lehmann et al., 2004]


  Most current retrieval systems in clinical use rely
on text keywords such as DICOM header
information to perform retrieval.
  CBIR has been widely researched in a variety of
domains and provides an intuitive and expressive
method for querying visual data using features,
e.g. color, shape, and texture.
  Current CBIR systems:
◦  are not easily integrated into the healthcare
environment;
◦  have not been widely evaluated using a large dataset;
and
◦  lack the ability to perform relevance feedback to refine
retrieval results.

Source: [Hsu et al., 2009]

  CBMIR is still a relatively
small dot on the map of
the medical imaging
community.

Source: Program of SPIE Medical Imaging 2010 Multiconference

  New gaps!

◦  Just when you
thought the
semantic gap
was your only
problem…

Source: [Deserno, Antani, and Long, 2009]


  USA
◦  NIH (National Institutes of Health)
  NIBIB - National Institute of Biomedical Imaging and
Bioengineering
  NCI - National Cancer Institute
  NLM – National Libraries of Medicine
◦  Several universities and hospitals
  Europe
◦  Aachen University (Germany)
◦  Geneva University (Switzerland)
  Big companies (Siemens, GE, etc.)


  IRMA (Image Retrieval in Medical Applications)

◦  Aachen University (Germany)
  http://ganymed.imib.rwth-aachen.de/irma/

◦  3 online demos:
  IRMA Query demo: allows the evaluation of CBIR on several
databases.
  IRMA Extended Query Refinement demo: CBIR from the IRMA
database (a subset of 10,000 images).
  Spine Pathology and Image Retrieval Systems (SPIRS)
designed by the NLM/NIH (USA): holds information of
~17,000 spine x-rays.


  MedGIFT (GNU Image Finding Tool)

◦  Geneva University (Switzerland)
  http://www.sim.hcuge.ch/medgift/

◦  Large effort, including projects such as:
  Talisman (lung image retrieval)
  Case-based fracture image retrieval system
  Onco-Media: medical image retrieval + grid computing
  ImageCLEF: evaluation and validation
  medSearch


  WebMIRS

◦  NIH / NLM (USA)
  http://archive.nlm.nih.gov/proj/webmirs/index.php

◦  Query by text + navigation by categories

◦  Uses datasets and related x-ray images from the
National Health and Nutrition Examination Survey
(NHANES)


  SPIRS (Spine Pathology & Image Retrieval
System): Web-based image retrieval system
for large biomedical databases
◦  NIH / UCLA (USA)
◦  Great case study on highly specialized CBMIR

Klagenfurt - June 2010 Source: [Hsu et al., 2009]

  National Biomedical Imaging Archive (NBIA)

◦  NCI / NIH (USA)
  https://imaging.nci.nih.gov/

◦  Search based on metadata (DICOM fields)
◦  3 search options:
  Simple
  Advanced
  Dynamic


  ARSS Goldminer

◦  American Roentgen Ray Society (USA)
  http://goldminer.arrs.org/

◦  Query by text
◦  Results can be filtered by:
  Modality
  Age
  Sex


  Yottalook Images

◦  iVirtuoso (USA)
  http://www.yottalook.com/

◦  Developed and maintained by four radiologists
◦  Query by text
◦  Claims to use 4 “core technologies”:
  "natural query analysis”
  "semantic ontology”
  “relevance algorithm”
  a specialized content delivery system that provides
high yield content based on the search term.


  ImageCLEF Medical Image Retrieval 2010
  http://www.imageclef.org/2010/medical
◦  Data set: 77,000 images from articles published in
Radiology and Radiographics including text of the
captions and link to the html of the full text articles.
◦  3 types of tasks:
  Modality Classification: given an image, return its
modality (MR, CT, XR, etc.)
  Ad-hoc retrieval: classic medical retrieval task, with 3
“flavors”: textual, mixed and semantic queries
  Case-based retrieval: retrieve cases including images
that might best suit the provided case description.


  Better user interfaces, which are responsive,
highly interactive, and capable of supporting
relevance feedback.
◦  In other words, address the “Performance Gap
Category” and the “Usability Gap Category”.


  New applications of CBMIR, including:
◦  Teaching
◦  Research
◦  Diagnosis
◦  PACS and Electronic Patient Records

  CBMIR evaluation using medical experts

  Integration of local and global features


  New descriptors
◦  Example: the Fuzzy Rule Based Compact Composite
Descriptor (CCD), which includes global image
features capturing both brightness and texture
characteristics in a 1D Histogram [Chatzichristofis &
Boutalis, 2009]


  Partial match schemes (see [Hsu et al., 2009])

Source: [Hsu et al., 2009]

  New devices (e.g., iPad)


  Advice for [young] researchers

◦  In this last part, I’ve compiled pieces and bits of
advice that I believe might help researchers who are
entering the field.

◦  They focus on research avenues that I personally
consider to be the most promising.


  LOOK…

◦  at yourself (how do you search for images and
videos?)

◦  around (related areas and how they have grown)

◦  at Google (and other major players)


  Which sites do you use?
◦  Why?
  Which search options do you use?
◦  What do you do when the returned results aren’t
good?
  What is the single most useful feature that
you wish those sites had?

  What are your intentions and how do you
express them?


  Semi-automatic image annotation
  Tag recommendation systems

  Story annotation engines
  Content-based image filtering

  Copyright detection
  Watermark detection
◦  and many more


  Google Similarity Search (VisualRank) [Jing &
Baluja, 2008]

  Google Goggles (mobile visual search)


  THINK…

◦  mobile devices

◦  new devices and services

◦  social networks

◦  games


  Google Goggles understands narrow-domain
search and retrieval

  Several other apps for iPhone, iPad, and
Android (e.g., kooaba and Fetch!)


  Flickr (b. 2004)
  YouTube (b. 2005)

  Flip video cameras (b. 2006)
  iPhone (b. 2007)

  iPad (b. 2010)


  The Web 2.0 has brought about:
◦  New data sources
◦  New usage patterns
◦  New understanding about the users, their needs,
habits, preferences
◦  New opportunities
◦  Lots of metadata!

◦  A chance to experience a true paradigm shift
  Before: image annotation is tedious, labor-intensive,
expensive
  After: image annotation is fun!


◦  Google Image Labeler

◦  Games with a purpose (GWAP):
  The ESP Game
  Squigl
  Matchin


  UNDERSTAND…

◦  human intentions

◦  human emotions

◦  user’s preferences and needs


  CREATE…

◦  better interfaces

◦  better user experience

◦  new business opportunities (added value)


  Image Genius (sponsored by FAU / will
become startup)

  Fully functional online prototype of a medical image
retrieval system (MEDIX) with DICOM capabilities

  Unsupervised ROI extraction from an image
(by Gustavo B. Borba, UTFPR, Brazil)

–  I believe (but cannot prove…) that successful
VIR solutions will:
•  combine content-based image retrieval (CBIR) with
metadata (high-level semantic-based image
retrieval)
•  only be truly successful in narrow domains
•  include the user in the loop
–  Relevance Feedback (RF)
–  Collaborative efforts (tagging, rating, annotating)
•  provide friendly, intuitive interfaces
•  incorporate results and insights from cognitive
science, particularly human visual attention,
perception, and memory


  “Image search and retrieval” is not a problem,
but rather a collection of related problems that
look like one.

  There is a great need for good solutions to
specific problems.

  10 years after “the end of the early years”,
research in visual information retrieval still has
many open problems, challenges, and
opportunities.


Questions?

omarques@fau.edu


Recent advances in visual information retrieval marques klu june 2010

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Viewers also liked

Viewers also liked (20)

Similar to Recent advances in visual information retrieval marques klu june 2010

Similar to Recent advances in visual information retrieval marques klu june 2010 (20)

Recently uploaded

Recently uploaded (20)

Recent advances in visual information retrieval marques klu june 2010