2. VIR is a highly interdisciplinary field, but …
Image and (Multimedia)
Information
Video Database
Retrieval
Processing Systems
Visual
Machine Computer
Learning Information Vision
Retrieval
Visual data
Human Visual
Data Mining modeling and
Perception
representation
Klagenfurt - June 2010
3. There are many things that I believe…
… but cannot prove
Klagenfurt - June 2010
5. Part I
◦ 10 years after the “end of the early years”
Where are we now?
Part II
◦ Medical image retrieval
Challenges and opportunities
Part III
◦ Where is VIR headed?
Advice for young researchers
Klagenfurt - June 2010
6.
7. It’s been 10 years since the “end of the early
years” [Smeulders et al., 2000]
◦ Are the challenges from 2000 still relevant?
◦ Are the directions and guidelines from 2000 still
appropriate?
Klagenfurt - June 2010
8. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Driving forces
“[…] content-based image retrieval (CBIR) will continue
to grow in every direction: new audiences, new
purposes, new styles of use, new modes of interaction,
larger data sets, and new methods to solve the
problems.”
Klagenfurt - June 2010
9. Yes, we have seen many new audiences, new
purposes, new styles of use, and new modes
of interaction emerge.
Each of these usually requires new methods
to solve the problems that they bring.
However, not too many researchers see them
as a driving force (as they should).
Klagenfurt - June 2010
10. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Heritage of computer vision
“An important obstacle to overcome […] is to realize
that image retrieval does not entail solving the general
image understanding problem.”
Klagenfurt - June 2010
11. I’m afraid I have bad news…
◦ Computer vision hasn’t made so much progress
during the past 10 years.
◦ Some classical problems
(including image
understanding)
remain unresolved.
◦ Similarly, CBIR from a
pure computer vision
perspective didn’t work
too well either.
Klagenfurt - June 2010
12. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Influence on computer vision
“[…] CBIR offers a different look at traditional computer
vision problems: large data sets, no reliance on strong
segmentation, and revitalized interest in color image
processing and invariance.”
Klagenfurt - June 2010
13. The adoption of large data sets became standard
practice in computer vision (see Torralba’s work).
No reliance on strong segmentation (still
unresolved) new areas of research, e.g.,
automatic ROI extraction and RBIR.
Color image processing and color descriptors
became incredibly popular, useful, and (to some
degree) effective.
Invariance still a huge problem
◦ But it’s cheaper than ever to have multiple views.
Klagenfurt - June 2010
14. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Similarity and learning
“We make a pledge for the importance of human-
based similarity rather than general similarity. Also,
the connection between image semantics, image data,
and query context will have to be made clearer in the
future.”
“[…] in order to bring semantics to the user, learning is
inevitable.”
Klagenfurt - June 2010
15. The authors were pointing in the right
direction (human in the loop, role of context,
benefits from learning,…)
However:
◦ Similarity is a tough problem to crack and model.
Even the understanding of how humans judge image
similarity is very limited.
◦ Machine learning is almost inevitable…
… but sometimes it can be abused.
Klagenfurt - June 2010
16. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Interaction
Better visualization options, more control to the user,
ability to provide feedback […]
Klagenfurt - June 2010
17. Significant progress on visualization
interfaces and devices.
Relevance Feedback: still a very tricky
tradeoff (effort vs. perceived benefit), but
more popular than ever (rating, thumbs up/
down, etc.)
Klagenfurt - June 2010
18. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Need for databases
“The connection between CBIR and database research
is likely to increase in the future. […] problems like the
definition of suitable query languages, efficient search
in high dimensional feature space, search in the
presence of changing similarity measures are largely
unsolved […]”
Klagenfurt - June 2010
19. Very little progress
◦ Image search and retrieval has benefited much
more from document information retrieval than
from database research.
Klagenfurt - June 2010
20. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ The problem of evaluation
CBIR could use a reference standard against which new
algorithms could be evaluated (similar to TREC in the
field of text recognition).
“A comprehensive and publicly available collection of
images, sorted by class and retrieval purposes,
together with a protocol to standardize experimental
practices, will be instrumental in the next phase of
CBIR.”
Klagenfurt - June 2010
21. Significant progress on benchmarks,
standardized datasets, etc.
◦ ImageCLEF
◦ Pascal VOC Challenge
◦ MSRA dataset
◦ Simplicity dataset
◦ UCID dataset and ground truth (GT)
◦ Accio / SIVAL dataset and GT
◦ Caltech 101, Caltech 256
◦ LabelMe
Klagenfurt - June 2010
22. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Semantic gap and other sources
“A critical point in the advancement of CBIR is the
semantic gap, where the meaning of an image is rarely
self-evident. […] One way to resolve the semantic gap
comes from sources outside the image by integrating
other sources of information about the image in the
query.”
Klagenfurt - June 2010
23. The semantic gap problem has not been
solved (and maybe will never be…)
What are the alternatives?
1. Treat visual similarity and semantic relatedness
differently
Examples: Alipr, Google similarity search, etc.
2. Improve both (text-based and visual) search
methods independently
3. Trust the user
CFIR, collaborative filtering, crowdsourcing, games.
Klagenfurt - June 2010
24.
25. Challenges
◦ We’re entering a new country…
How much can we bring?
Do we speak the language?
Do we know their culture?
Do they understand us and where we come from?
Opportunities
◦ They use images (extensively)
◦ They have expert knowledge
◦ Domains are narrow (almost by definition)
◦ Fewer clients, but potentially more $$
Klagenfurt - June 2010
26. An overview of the challenges
◦ Different terminology
◦ Standards (e.g., DICOM)
◦ Modality dependencies
◦ Equipment dependencies
◦ Privacy issues
◦ Proprietary data
◦ A tough sell?
Klagenfurt - June 2010
27. Be prepared for:
◦ New acronyms
CBMIR (Content-Based Medical Image Retrieval)
PACS (Picture Archiving and Communication System)
DICOM (Digital Imaging and COmmunication in
Medicine)
Hospital Information Systems (HIS)
Radiological Information Systems (RIS)
◦ New phrases
Imaging informatics
◦ Lots of technical medical terms
Klagenfurt - June 2010
28. DICOM (http://medical.nema.org/)
◦ Global IT standard, created in 1993, used in
virtually all hospitals worldwide.
◦ Designed to ensure the interoperability of different
systems and manage related workflow.
◦ Will be required by all EHR systems that include
imaging information as an integral part of the
patient record.
◦ 750+ technical and medical experts participate in
20+ active DICOM working groups.
◦ Standard is updated 4-5 times per year.
◦ Many available tools! (see http://www.idoimaging.com/)
Klagenfurt - June 2010
29. The IRMA code [Lehmann et al., 2003]
◦ 4 axes with 3 to 4 positions, each in
{0,...9,a,...,z}, where "0" denotes "unspecified" to
determine the end of a path along an axis.
Technical code (T) describes the imaging
modality
Directional code (D) models body orientations
Anatomical code (A) refers to the body region
examined
Biological code (B) describes the biological
system examined.
Klagenfurt - June 2010
30. The IRMA code [Lehmann et al., 2003]
◦ The entire code results in a character string of <14
characters (IRMA: TTTT – DDD – AAA – BBB).
Example: “x-ray, projection radiography,
analog, high energy – sagittal, left lateral
decubitus, inspiration – chest, lung –
respiratory system, lung”
Source: [Lehmann et al., 2003]
Klagenfurt - June 2010
31. The IRMA code
[Lehmann et
al., 2003]
◦ The companion
tool…
Source: [Lehmann et al., 2004]
Klagenfurt - June 2010
32. Most current retrieval systems in clinical use rely
on text keywords such as DICOM header
information to perform retrieval.
CBIR has been widely researched in a variety of
domains and provides an intuitive and expressive
method for querying visual data using features,
e.g. color, shape, and texture.
Current CBIR systems:
◦ are not easily integrated into the healthcare
environment;
◦ have not been widely evaluated using a large dataset;
and
◦ lack the ability to perform relevance feedback to refine
retrieval results.
Source: [Hsu et al., 2009]
Klagenfurt - June 2010
33. CBMIR is still a relatively
small dot on the map of
the medical imaging
community.
Source: Program of SPIE Medical Imaging 2010 Multiconference
Klagenfurt - June 2010
34. New gaps!
◦ Just when you
thought the
semantic gap
was your only
problem…
Source: [Deserno, Antani, and Long, 2009]
Klagenfurt - June 2010
35. USA
◦ NIH (National Institutes of Health)
NIBIB - National Institute of Biomedical Imaging and
Bioengineering
NCI - National Cancer Institute
NLM – National Libraries of Medicine
◦ Several universities and hospitals
Europe
◦ Aachen University (Germany)
◦ Geneva University (Switzerland)
Big companies (Siemens, GE, etc.)
Klagenfurt - June 2010
36. IRMA (Image Retrieval in Medical Applications)
◦ Aachen University (Germany)
http://ganymed.imib.rwth-aachen.de/irma/
◦ 3 online demos:
IRMA Query demo: allows the evaluation of CBIR on several
databases.
IRMA Extended Query Refinement demo: CBIR from the IRMA
database (a subset of 10,000 images).
Spine Pathology and Image Retrieval Systems (SPIRS)
designed by the NLM/NIH (USA): holds information of
~17,000 spine x-rays.
Klagenfurt - June 2010
37. MedGIFT (GNU Image Finding Tool)
◦ Geneva University (Switzerland)
http://www.sim.hcuge.ch/medgift/
◦ Large effort, including projects such as:
Talisman (lung image retrieval)
Case-based fracture image retrieval system
Onco-Media: medical image retrieval + grid computing
ImageCLEF: evaluation and validation
medSearch
Klagenfurt - June 2010
38. WebMIRS
◦ NIH / NLM (USA)
http://archive.nlm.nih.gov/proj/webmirs/index.php
◦ Query by text + navigation by categories
◦ Uses datasets and related x-ray images from the
National Health and Nutrition Examination Survey
(NHANES)
Klagenfurt - June 2010
39. SPIRS (Spine Pathology & Image Retrieval
System): Web-based image retrieval system
for large biomedical databases
◦ NIH / UCLA (USA)
◦ Great case study on highly specialized CBMIR
Klagenfurt - June 2010 Source: [Hsu et al., 2009]
40. National Biomedical Imaging Archive (NBIA)
◦ NCI / NIH (USA)
https://imaging.nci.nih.gov/
◦ Search based on metadata (DICOM fields)
◦ 3 search options:
Simple
Advanced
Dynamic
Klagenfurt - June 2010
41. ARSS Goldminer
◦ American Roentgen Ray Society (USA)
http://goldminer.arrs.org/
◦ Query by text
◦ Results can be filtered by:
Modality
Age
Sex
Klagenfurt - June 2010
42. Yottalook Images
◦ iVirtuoso (USA)
http://www.yottalook.com/
◦ Developed and maintained by four radiologists
◦ Query by text
◦ Claims to use 4 “core technologies”:
"natural query analysis”
"semantic ontology”
“relevance algorithm”
a specialized content delivery system that provides
high yield content based on the search term.
Klagenfurt - June 2010
43. ImageCLEF Medical Image Retrieval 2010
http://www.imageclef.org/2010/medical
◦ Data set: 77,000 images from articles published in
Radiology and Radiographics including text of the
captions and link to the html of the full text articles.
◦ 3 types of tasks:
Modality Classification: given an image, return its
modality (MR, CT, XR, etc.)
Ad-hoc retrieval: classic medical retrieval task, with 3
“flavors”: textual, mixed and semantic queries
Case-based retrieval: retrieve cases including images
that might best suit the provided case description.
Klagenfurt - June 2010
44. Better user interfaces, which are responsive,
highly interactive, and capable of supporting
relevance feedback.
◦ In other words, address the “Performance Gap
Category” and the “Usability Gap Category”.
Klagenfurt - June 2010
45. New applications of CBMIR, including:
◦ Teaching
◦ Research
◦ Diagnosis
◦ PACS and Electronic Patient Records
CBMIR evaluation using medical experts
Integration of local and global features
Klagenfurt - June 2010
46. New descriptors
◦ Example: the Fuzzy Rule Based Compact Composite
Descriptor (CCD), which includes global image
features capturing both brightness and texture
characteristics in a 1D Histogram [Chatzichristofis &
Boutalis, 2009]
Klagenfurt - June 2010
47. Partial match schemes (see [Hsu et al., 2009])
Source: [Hsu et al., 2009]
Klagenfurt - June 2010
48. New devices (e.g., iPad)
Klagenfurt - June 2010
49.
50. Advice for [young] researchers
◦ In this last part, I’ve compiled pieces and bits of
advice that I believe might help researchers who are
entering the field.
◦ They focus on research avenues that I personally
consider to be the most promising.
Klagenfurt - June 2010
51. LOOK…
◦ at yourself (how do you search for images and
videos?)
◦ around (related areas and how they have grown)
◦ at Google (and other major players)
Klagenfurt - June 2010
52. Which sites do you use?
◦ Why?
Which search options do you use?
◦ What do you do when the returned results aren’t
good?
What is the single most useful feature that
you wish those sites had?
What are your intentions and how do you
express them?
Klagenfurt - June 2010
53. Semi-automatic image annotation
Tag recommendation systems
Story annotation engines
Content-based image filtering
Copyright detection
Watermark detection
◦ and many more
Klagenfurt - June 2010
54. Google Similarity Search (VisualRank) [Jing &
Baluja, 2008]
Google Goggles (mobile visual search)
Klagenfurt - June 2010
55. THINK…
◦ mobile devices
◦ new devices and services
◦ social networks
◦ games
Klagenfurt - June 2010
56. Google Goggles understands narrow-domain
search and retrieval
Several other apps for iPhone, iPad, and
Android (e.g., kooaba and Fetch!)
Klagenfurt - June 2010
58. The Web 2.0 has brought about:
◦ New data sources
◦ New usage patterns
◦ New understanding about the users, their needs,
habits, preferences
◦ New opportunities
◦ Lots of metadata!
◦ A chance to experience a true paradigm shift
Before: image annotation is tedious, labor-intensive,
expensive
After: image annotation is fun!
Klagenfurt - June 2010
59. ◦ Google Image Labeler
◦ Games with a purpose (GWAP):
The ESP Game
Squigl
Matchin
Klagenfurt - June 2010
60. UNDERSTAND…
◦ human intentions
◦ human emotions
◦ user’s preferences and needs
Klagenfurt - June 2010
61. CREATE…
◦ better interfaces
◦ better user experience
◦ new business opportunities (added value)
Klagenfurt - June 2010
62. Image Genius (sponsored by FAU / will
become startup)
63. Fully functional online prototype of a medical image
retrieval system (MEDIX) with DICOM capabilities
64. Unsupervised ROI extraction from an image
(by Gustavo B. Borba, UTFPR, Brazil)
65. – I believe (but cannot prove…) that successful
VIR solutions will:
• combine content-based image retrieval (CBIR) with
metadata (high-level semantic-based image
retrieval)
• only be truly successful in narrow domains
• include the user in the loop
– Relevance Feedback (RF)
– Collaborative efforts (tagging, rating, annotating)
• provide friendly, intuitive interfaces
• incorporate results and insights from cognitive
science, particularly human visual attention,
perception, and memory
Klagenfurt - June 2010
66. “Image search and retrieval” is not a problem,
but rather a collection of related problems that
look like one.
There is a great need for good solutions to
specific problems.
10 years after “the end of the early years”,
research in visual information retrieval still has
many open problems, challenges, and
opportunities.
Klagenfurt - June 2010
67. Questions?
omarques@fau.edu
Klagenfurt - June 2010