Presented by Sujit Pal
April 10-11, 2018
Evolving a Medical Image
Similarity Search
Haystack 2018, Charlottesville, VA
| 2
• Early user of Solr at CNET before it was open-sourced
• Search at Healthline (consumer health)
 Lucene/Solr
 Taxonomy backed “Concept” Search
• Medical image classification at Elsevier
 Deep Learning / Caffe
 Machine Learning (Logistic Regression)
• Duplicate image Detection
 Computer Vision / OpenCV, LIRE (Lucene Image Retrieval Engine)
 Deep Learning / Keras
• Medical Similarity Search
 Semantic rather than structural similarity
Background
| 3
• Ron Daniel
 Help with expertise in Computer Vision techniques
• Matt Corkum
 Caption based Image Search Platform
 Tooling and Integration for Image Search done against this plaform
• Adrian Rosenbrock
 PyImageSearch and OpenCV
• Doug Turnbull
 Elastic{ON} 2016 talk about Image Search
Acknowledgements
| 4
Image Search Workflow
• Internal application for image review and tagging
| 5
• Feature Extraction
 Converting images to feature vectors
• Indexing Strategies
 Represent vectors using (text based) search index
• Evaluation
 Search Quality metrics
Steps
| 6
• Global Features
 Color
 Texture (Edge)
• Quantize image
• Build Histogram
• Histogram is feature vector
• Descriptors
 RGB
 HSV
 Opponent
 CEDD
 FCTH
 JCD
Feature Extraction – Global Features
Image Credits: Shutterstock, 7-Themes.com, Kids Britannica, Pexels.com, and OpenCV Tutorials
| 7
• Local Features
 Edges and Corners
 Scale Invariant Feature Transform (SIFT)
 Speeded up Robust Features (SURF)
 Difference of Gaussians (DoG)
• Tile image and compute features per tile
• Cluster features
Feature Extraction – Local Features
• Centroids are
vocabulary words
• Image represented
as histogram of
vocab words.
Image Credits: OpenCV Tutorials, ScienceDirect.com
| 8
Feature Extraction – Deep Learning Features
Image Credits: CAIS++, Distill.pub
• Deep Learning models outperform traditional models for CV tasks
• Works like edge and color detectors at lower layers, and object detectors at
higher layers
• Encodes semantics of image rather than just color, texture and shapes
• Learns transformation from image to vector as a series of convolutions
• Many high performing models trained on large image datasets available
| 9
Feature Extraction – Deep Learning Features (cont’d)
Image Credits: i-systems.github.io and ufldl.stanford.edu
• Deep Learning models are a sequence of convolutions and pooling
operations
• Each successive layer has a deeper (more convolution operations) over a
larger part of the image (pooling).
| 10
• Idea of using convolutions for feature extraction not new to CV, e.g.,
used in Haar Cascades
• But traditional CV uses specific convolutions for a task to extract
features for that task
• Deep Learning starts with random convolutions and uses (image,
label) pairs to learn convolutions appropriate to task
Feature Extraction – Deep Learning Features (cont’d)
Image Credit: Greg Borenstein
| 11
Feature Extraction – Deep Learning Features (cont’d)
woman
Image Credits: eepLearning.net
• Image to vector transformation == sequence of learned convolutions and
pooling operations
• Remove classification layer from pre-trained network.
• Run images through truncated network to product image vectors.
| 12
Indexing Strategies
• Naïve approaches
 Linear search – LIRE default
 Pre-compute K (approximate) nearest neighbors
• Text based indexes
 Index-able unit is document (stream of tokens from an alphabet)
 Image needs to be converted into a sequence of tokens from a “visual” alphabet
- Locality Sensitive Hashing (LSH)
- Metric Spaces Indexing
- Bag of Visual Words (BoVW)
• Text+Payload based indexes
 Represent vectors as payloads with custom similarity
• Tensor based indexes
 Supports indexing and querying of image feature vectors natively
 Uses approximate nearest neighbor techniques
 NMSLib – Non-Metric Space Library (ok for <= 1M vectors)
 FAISS – Facebook AI Similarity Search
• Hybrid indexes
 Vespa.ai – supports both text and tensor based queries
| 13
• Image vectors written out as “index0|score0 index1|score1 …”
• Query image vectorized and sparsified, then provided as a string
consisting of non-zero indices after sparsification, for example,
“index50 index54 index67”.
• Payload similarity implementation provided as Groovy script to
Elasticsearch 1.5 (ES) engine, returns cosine similarity
• Find similar images using the ES function_score_query
• Did not scale beyond few hundred images in index
• Recent ES versions require custom Java ScriptEngine
implementation registered as plugin, so probably better scaling now.
Indexing Strategy – Payloads + Custom Similarity
| 14
• LSH - similar objects hashed to same bin.
• Assume image feature vectors V of rank R.
• Generate k values of vector Ai (also of rank R)
and bi from random normal distribution.
• Compute k values of hashes hi using following
formula:
• If at least m of k hashes for a pair of images
match, then the images are near duplicates.
• No ranking of similarities possible.
• Good for finding near duplicates.
Indexing Strategy – Locality Sensitive Hashing
| 15
• Also known as Perspective based Space Transformation
• Based on the idea that objects that are similar to a set of reference
objects are similar to each other.
• Randomly select k (≈ 2√N) images as reference objects RO
• Compute distance of each object from each reference image in RO
using the following distance formula:
• Posting list for each image is the m nearest reference objects
ordered by distance.
• Haven’t tried this, but looks promising.
Indexing Strategy – Metric Spaces Indexing
| 16
• Briefly touched upon this when talking about Local Features
• Tile image, compute local descriptors (such as SIFT, SURF, etc) for
each tile
• Cluster these descriptors across all images
• Generate a vocabulary of Visual words out of the centroids of these
clusters
• Represent each image in index as a sequence of visual words
• During query, tile and compute local descriptors, then find the
closest words for each descriptor in vocabulary, and search using
this sequence of visual words.
• Used LIRE’s built-in support for generating a BoVW based index but
results not very encouraging.
Indexing Strategy – Bag of Visual Words (BoVW)
| 17
• Produces approximate nearest neighbors
• Cluster image vectors into smaller clusters. Size of each cluster
should be chosen such that brute force KNN (with KD-Tree support
if available) is tractable
• For each cluster, compute K nearest neighbors for each image in
cluster
• Save ordered list of neighbor image IDs against each image
• At search time, the neighbors are simply looked up using the source
image ID
• Works well for my Similar Images functionality (closed system)
• For unknown query image, two step process to find the cluster and
then find K nearest neighbors
Indexing Strategy – Precompute K nearest neighbors
| 18
• Data Collection
 4 similarity levels (Essentially Identical, Very Similar, Similar, Different)
• Metrics
 Precision @k
 Mean Average Precision (MAP)
 Recall
 F1-score
 nDCG
 Correlation
Evaluation
| 19
• Similarity Page has
a Reset Similarity
button for each
similar image.
• Default is Similar,
overridden if needed
and captured into
logging database
• About 2000 pairs
(220 unique source
images) captured
using interface
Evaluation – Data Collection
| 20
• Almost Identical and Very Similar count as full hit (+1), and Similar
counts as half (+0.5), Different as non (+0).
• Precision @k results
Evaluation – Precision @k
k precision
1 0.3287
3 0.1096
5 0.0657
10 0. 0329
| 21
• Distance Metric: Cosine Similarity
• Features used:
 Baseline: LIRE Global Features
 Best: vectors from Xception
Evaluation – Correlation Results
Pearson Baseline Xception
Pearson -0.102 -0.566
Spearman -0.071 -0.495
| 22
Future Work
• Include captions for image search
• We have tried word2vec and skip-thoughts to generate caption vectors
but it didn’t result in appreciable improvement
• Two stage search, caption search + refine with image, or vice versa
• Investigate metric spaces indexing approach
• Investigate dimensionality reduction – since curse of dimensionality seems
to be a common issue mentioned in computer vision literature
• Investigate using indexing approaches that allow tensor search
• Incorporate outputs of multiple classifiers to create faceted search
functionality that can be overlaid on results
• By genre – radiology, data graphics, microscopy, etc.
• By anatomical part
• By specialty
• By keywords in caption
• By concepts in caption
My contact information:
sujit.pal@elsevier.com
Thank you!

Evolving a Medical Image Similarity Search

  • 1.
    Presented by SujitPal April 10-11, 2018 Evolving a Medical Image Similarity Search Haystack 2018, Charlottesville, VA
  • 2.
    | 2 • Earlyuser of Solr at CNET before it was open-sourced • Search at Healthline (consumer health)  Lucene/Solr  Taxonomy backed “Concept” Search • Medical image classification at Elsevier  Deep Learning / Caffe  Machine Learning (Logistic Regression) • Duplicate image Detection  Computer Vision / OpenCV, LIRE (Lucene Image Retrieval Engine)  Deep Learning / Keras • Medical Similarity Search  Semantic rather than structural similarity Background
  • 3.
    | 3 • RonDaniel  Help with expertise in Computer Vision techniques • Matt Corkum  Caption based Image Search Platform  Tooling and Integration for Image Search done against this plaform • Adrian Rosenbrock  PyImageSearch and OpenCV • Doug Turnbull  Elastic{ON} 2016 talk about Image Search Acknowledgements
  • 4.
    | 4 Image SearchWorkflow • Internal application for image review and tagging
  • 5.
    | 5 • FeatureExtraction  Converting images to feature vectors • Indexing Strategies  Represent vectors using (text based) search index • Evaluation  Search Quality metrics Steps
  • 6.
    | 6 • GlobalFeatures  Color  Texture (Edge) • Quantize image • Build Histogram • Histogram is feature vector • Descriptors  RGB  HSV  Opponent  CEDD  FCTH  JCD Feature Extraction – Global Features Image Credits: Shutterstock, 7-Themes.com, Kids Britannica, Pexels.com, and OpenCV Tutorials
  • 7.
    | 7 • LocalFeatures  Edges and Corners  Scale Invariant Feature Transform (SIFT)  Speeded up Robust Features (SURF)  Difference of Gaussians (DoG) • Tile image and compute features per tile • Cluster features Feature Extraction – Local Features • Centroids are vocabulary words • Image represented as histogram of vocab words. Image Credits: OpenCV Tutorials, ScienceDirect.com
  • 8.
    | 8 Feature Extraction– Deep Learning Features Image Credits: CAIS++, Distill.pub • Deep Learning models outperform traditional models for CV tasks • Works like edge and color detectors at lower layers, and object detectors at higher layers • Encodes semantics of image rather than just color, texture and shapes • Learns transformation from image to vector as a series of convolutions • Many high performing models trained on large image datasets available
  • 9.
    | 9 Feature Extraction– Deep Learning Features (cont’d) Image Credits: i-systems.github.io and ufldl.stanford.edu • Deep Learning models are a sequence of convolutions and pooling operations • Each successive layer has a deeper (more convolution operations) over a larger part of the image (pooling).
  • 10.
    | 10 • Ideaof using convolutions for feature extraction not new to CV, e.g., used in Haar Cascades • But traditional CV uses specific convolutions for a task to extract features for that task • Deep Learning starts with random convolutions and uses (image, label) pairs to learn convolutions appropriate to task Feature Extraction – Deep Learning Features (cont’d) Image Credit: Greg Borenstein
  • 11.
    | 11 Feature Extraction– Deep Learning Features (cont’d) woman Image Credits: eepLearning.net • Image to vector transformation == sequence of learned convolutions and pooling operations • Remove classification layer from pre-trained network. • Run images through truncated network to product image vectors.
  • 12.
    | 12 Indexing Strategies •Naïve approaches  Linear search – LIRE default  Pre-compute K (approximate) nearest neighbors • Text based indexes  Index-able unit is document (stream of tokens from an alphabet)  Image needs to be converted into a sequence of tokens from a “visual” alphabet - Locality Sensitive Hashing (LSH) - Metric Spaces Indexing - Bag of Visual Words (BoVW) • Text+Payload based indexes  Represent vectors as payloads with custom similarity • Tensor based indexes  Supports indexing and querying of image feature vectors natively  Uses approximate nearest neighbor techniques  NMSLib – Non-Metric Space Library (ok for <= 1M vectors)  FAISS – Facebook AI Similarity Search • Hybrid indexes  Vespa.ai – supports both text and tensor based queries
  • 13.
    | 13 • Imagevectors written out as “index0|score0 index1|score1 …” • Query image vectorized and sparsified, then provided as a string consisting of non-zero indices after sparsification, for example, “index50 index54 index67”. • Payload similarity implementation provided as Groovy script to Elasticsearch 1.5 (ES) engine, returns cosine similarity • Find similar images using the ES function_score_query • Did not scale beyond few hundred images in index • Recent ES versions require custom Java ScriptEngine implementation registered as plugin, so probably better scaling now. Indexing Strategy – Payloads + Custom Similarity
  • 14.
    | 14 • LSH- similar objects hashed to same bin. • Assume image feature vectors V of rank R. • Generate k values of vector Ai (also of rank R) and bi from random normal distribution. • Compute k values of hashes hi using following formula: • If at least m of k hashes for a pair of images match, then the images are near duplicates. • No ranking of similarities possible. • Good for finding near duplicates. Indexing Strategy – Locality Sensitive Hashing
  • 15.
    | 15 • Alsoknown as Perspective based Space Transformation • Based on the idea that objects that are similar to a set of reference objects are similar to each other. • Randomly select k (≈ 2√N) images as reference objects RO • Compute distance of each object from each reference image in RO using the following distance formula: • Posting list for each image is the m nearest reference objects ordered by distance. • Haven’t tried this, but looks promising. Indexing Strategy – Metric Spaces Indexing
  • 16.
    | 16 • Brieflytouched upon this when talking about Local Features • Tile image, compute local descriptors (such as SIFT, SURF, etc) for each tile • Cluster these descriptors across all images • Generate a vocabulary of Visual words out of the centroids of these clusters • Represent each image in index as a sequence of visual words • During query, tile and compute local descriptors, then find the closest words for each descriptor in vocabulary, and search using this sequence of visual words. • Used LIRE’s built-in support for generating a BoVW based index but results not very encouraging. Indexing Strategy – Bag of Visual Words (BoVW)
  • 17.
    | 17 • Producesapproximate nearest neighbors • Cluster image vectors into smaller clusters. Size of each cluster should be chosen such that brute force KNN (with KD-Tree support if available) is tractable • For each cluster, compute K nearest neighbors for each image in cluster • Save ordered list of neighbor image IDs against each image • At search time, the neighbors are simply looked up using the source image ID • Works well for my Similar Images functionality (closed system) • For unknown query image, two step process to find the cluster and then find K nearest neighbors Indexing Strategy – Precompute K nearest neighbors
  • 18.
    | 18 • DataCollection  4 similarity levels (Essentially Identical, Very Similar, Similar, Different) • Metrics  Precision @k  Mean Average Precision (MAP)  Recall  F1-score  nDCG  Correlation Evaluation
  • 19.
    | 19 • SimilarityPage has a Reset Similarity button for each similar image. • Default is Similar, overridden if needed and captured into logging database • About 2000 pairs (220 unique source images) captured using interface Evaluation – Data Collection
  • 20.
    | 20 • AlmostIdentical and Very Similar count as full hit (+1), and Similar counts as half (+0.5), Different as non (+0). • Precision @k results Evaluation – Precision @k k precision 1 0.3287 3 0.1096 5 0.0657 10 0. 0329
  • 21.
    | 21 • DistanceMetric: Cosine Similarity • Features used:  Baseline: LIRE Global Features  Best: vectors from Xception Evaluation – Correlation Results Pearson Baseline Xception Pearson -0.102 -0.566 Spearman -0.071 -0.495
  • 22.
    | 22 Future Work •Include captions for image search • We have tried word2vec and skip-thoughts to generate caption vectors but it didn’t result in appreciable improvement • Two stage search, caption search + refine with image, or vice versa • Investigate metric spaces indexing approach • Investigate dimensionality reduction – since curse of dimensionality seems to be a common issue mentioned in computer vision literature • Investigate using indexing approaches that allow tensor search • Incorporate outputs of multiple classifiers to create faceted search functionality that can be overlaid on results • By genre – radiology, data graphics, microscopy, etc. • By anatomical part • By specialty • By keywords in caption • By concepts in caption
  • 23.