SlideShare a Scribd company logo
1 of 44
Download to read offline
On-the-fly Visual Category
Search in Web-scale Image
Collections
Ken Chatfield - University of Oxford
May 2015
• Search large unannotated datasets of 1M+
images for object categories
• Do so in real-time and without any prior
knowledge
Motivation and Objectives
‘Regular’ Category Retrieval
1,000 ILSVRC
classes
Pre-trained CNN
e.g. Alexnet
S O F T M A X
‘Regular’ Category Retrieval
car?
lion?
apple?
bus?
Pre-trained CNN
e.g. Alexnet
S O F T M A X
On-the-fly Category Retrieval
Pre-trained CNN
e.g. Alexnet
fc7
training data
from the web
OTF Classifier
e.g. Linear SVM
On-the-fly Category Retrieval
Pre-trained CNN
e.g. Alexnet
fc7
training data
from the web
OTF Classifier
e.g. Linear SVM
• Bootstrap training using images from the web
• Use highly compact ConvNet features +
compression as the basis of a OTF system
• Plus: Novel GPU architecture for iterative on-
the-fly learning
Proposed Solution
Architecture Outline
Car|
Google Image
Search Sourced
Training Images
Image
Encoder
φ( I )
φ( I+ )
Fixed negative pool
precomputed
features
Linear SVM
φ( I- )
w
Target Dataset
wTφ( It )
Ranking
φ( It )
precomputed
features
Flickr
Pinterest
etc.
Need for Speed
Car|
Google Image
Search Sourced
Training Images
Image
Encoder
φ( I )
φ( I+ )
Negative pool
Linear SVM
φ( I- )
w
Target Dataset
wTφ( It )
Ranking
φ( It )
Flickr
Pinterest
etc.
Ranking most critical stage
w wTφ( It )
φ( It )
Must compute w.X for all image features in dataset giving
complexity of O(ND) so important to reduce image
representation dimensionality:
• Obtain 128-D representation from CNN

(488 MB / 1M images)
• Then compress further using binarization

(122 MB / 1M images)
• Or using product quantization

(30.5 MB / 1M images)
Fast Ranking = Compact Representation
N – # images in test set
D – dim of image representation
Lower-dimensional Features
Taking CNN-M network as base:
conv3
512x3x3
conv4
512x3x3
conv2
256x5x5
conv1
96x7x7
conv5
512x3x3
fc6
d.o.
4096-D
fc7
d.o.
4096-D
ILSVRC
softm
ax
Lower-dimensional Features
Taking CNN-M network as base:
conv3
512x3x3
conv4
512x3x3
conv2
256x5x5
conv1
96x7x7
conv5
512x3x3
ILSVRC
softm
axfc6
d.o.
2048-D
fc7
d.o.
2048-D
Replace 4096-D fc layer w. 2048-D, 128-D layers
Lower-dimensional Features
Taking CNN-M network as base:
conv3
512x3x3
conv4
512x3x3
conv2
256x5x5
conv1
96x7x7
conv5
512x3x3
ILSVRC
softm
axfc6
d.o.
128-D
fc7
d.o.
128-D
Replace 4096-D fc layer w. 2048-D, 128-D layers
Lower-dimensional Features
mAP(VOC07)
78
78.75
79.5
80.25
81
4096 2048 1024 128
78.6
79.91
80.1
79.89
Compression
• Binarization by embedding into Hamming space:
e : RD
! BM
Where M > D and U is obtained by taking the first D columns of
the QR-decomposition of a random M x M matrix
bi = sgn(Uxi)
• Product Quantization
…
…
…
…
D
S
d
Q
Evaluation Dataset
10,000 annotated images
PASCAL VOC 2007
1M unannotated images
MIRFLICKR-1M
• Want to evaluate CNN features for real-world photo retrieval
• Disjoint from ImageNet (as CNN trained on that) + with less
focus on fine-grained retrieval
Evaluation Dataset
1 2 3 4 53
Using MIRFLICKR-1M dataset as distractors
Evaluation Dataset
1 2 33
Remove false negatives and evaluate Precision @ K…
Using MIRFLICKR-1M dataset as distractors
where K = 100
Or evaluate Precision @ K over MIRFLICKR-1M directly
Retrieval Results
Results for two sample classes over VOC + Distractor data
(Retrieve ~500 images from within 1M images – TP are 0.05% of dataset)
1000 10 20 30 40 50 60 70 80 90
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Rank
Precision
CNN 2048
CNN 128
CNN 128 PQ
FK 512
CNN 128 rpbin
1000 10 20 30 40 50 60 70 80 90
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Rank
Precision
Class: Sheep Class: Motorbike
! CNN 128 (Prec. 0.32 @ 100) ! CNN 128 (Prec 0.77 @ 100)
Retrieval Results
Results for two sample classes over VOC + Distractor data
(Retrieve ~500 images from within 1M images – TP are 0.05% of dataset)
! CNN 128 (Prec. 0.32 @ 100) ! CNN 128 (Prec 0.77 @ 100)
Retrieval Results
CNN-M 2048
CNN-M 128
CNN-M 128 BIN
55.4
51.0
50.1
95.4
95.1
94.0
90.9
92.3
—
VOC Training Google Training
CNN-M 128 PQ 50.5 94.6 92.1
7.63 GB
488 MB
122 MB
30.5 MB
FV 29.3 80.5 — 312.8 GB
Freeform Queries
Freeform Queries
VOC vs Google Training
! ‘Chair’ – CNN 128 (Prec. 0.92 @ 100) (Prec. 0.86 @ 100)
! ‘Train’ – CNN 128 (Prec. 1.0 @ 100) (Prec. 1.0 @ 100)
VOC Training Google Training
Instances & Faces too
Instances
Root SIFT
Extractor
ψ( I ) → xi
φ( I+ )
VQ
Encoder
φ( xi )
Hamming
Encoder
φ( xi )
Spatial
Verification
φ( xi )
ψ( It )
Target Dataset
match?
match?
Ranking
x N
(take max)
Ntraining
images
Faces
Ntraining
images
φ( It )
Target Dataset
Tracks
Ranking
Linear
SVM
w
φ( I- )
Negative Pool
φ( If+ )If+Face
Extractor
ψ( I ) → If
Pre-trained
Face CNN
φ( I )
Live Demo
Landing Page1
User enters text query term and
selects search modality
(e.g. ‘forest’ using object
category search)
Ranked Results3
A ranked list of visually matching
images is displayed within 1~30 secs
of entering the cold query
Querying2
A live view of images downloaded
from Google Image search as they
are used to construct a visual
appearance model on-the-fly
Can try out the system live over a dataset of 5M+ images
sourced from BBC News footage at:
http://varro3.robots.ox.ac.uk:9090
Question:
How can we adapt standard GPU ConvNet pipeline
for on-the-fly search?
We want:
• simultaneous feature computation/model training
• highly parallel operation by using a GPU-bound
architecture
ConvNet-based Architecture
• Libraries such as Caffe allow for fast computation
of ConvNet features entirely on GPU
ConvNet-based Architecture
RGB
CNN feat.
conv
stack
fc
stack
Fixed negative pool
Sheep|
Google Image
Search
Training Images
precomputed
CNN feats
SVM
Model
w
ConvNet-based Architecture
RGB
xB/2 Pos.
CNN feat.
conv
stack
fc
stack
CNN feat.
xB/2 Neg.
Fixed negative pool
Sheep|
Google Image
Search
Training Images
SVM Loss Layer 5 =
1
B
X
i=1..B
I[yiw>
xi < 1]yixi
Batch Sampler
Batch size = B
precomputed
CNN feats
CPU Frontend GPU Backend
ConvNet-based Architecture
RGB
xB/2 Pos.
CNN feat.
conv
stack
fc
stack
CNN feat.
xB/2 Neg.
Fixed negative pool
Sheep|
Google Image
Search
Training Images
SVM Loss Layer 5 =
1
B
X
i=1..B
I[yiw>
xi < 1]yixi
Batch Sampler
Batch size = B
Image Buffer
precomputed
CNN feats
CPU Frontend GPU Backend
ConvNet-based Architecture
Batch Sampler
Batch size = B
Fixed negative pool
Sheep|
Google Image
Search
Training Images
Image Buffer
RGB
xB/2 Pos.
CNN feat.
xB/2 Neg.
CNN feat.
Target Dataset:
MIRFLICKR
Every
τsecs
conv
stack
fc
stack
Model
w
precomputed
CNN feats
CPU Frontend GPU Backend
Inner Product Layer
precomputed
CNN feats
SVM Loss Layer
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
Retrieval Results
• Images are fed into the network at a rate of 12 per second
• Dataset is ranked with current model every ~0.2 seconds
• Most rankings stabilise in under 1 second
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Seconds
Precision@100
10images
20images
30images
0.15s0.36s0.54s0.73s
sofasheepbushorse
Currently working on the following extensions:
• How to select negative training images more
intelligently (e.g. selection of most discriminative
negative images per query from a larger 1M+ pool of
non-class images)
• How to establish a confidence measure for images in
the output ranking, so know when a query works well
or not, and source training images more intelligently
• Query attribute refinement (sporty + car)
Continued Work
“On-the-fly Learning for Visual Search of Large-scale Image and Video Datasets”

IJMIR 2015 Ken Chatfield, Relja Arandjelovic, Omkar Parkhi, Andrew Zisserman
“Efficient On-the-fly Category Retrieval using ConvNets and GPUs” 

ACCV 2014 Ken Chatfield, Karen Simonyan, Andrew Zisserman
“Return of the Devil in the Details: Delving Deep into Convolutional Nets”

BMVC 2014 Ken Chatfield, Karen Simonyan, Andrea Vedaldi, Andrew
Zisserman (Best Paper Prize)
http://www.robots.ox.ac.uk/~ken
Related Publications

More Related Content

What's hot

Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imageryShunta Saito
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsElectronic Arts / DICE
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagerySuneel Marthi
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
 
Landuse Classification from Satellite Imagery using Deep Learning
Landuse Classification from Satellite Imagery using Deep LearningLanduse Classification from Satellite Imagery using Deep Learning
Landuse Classification from Satellite Imagery using Deep LearningDataWorks Summit
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3Electronic Arts / DICE
 

What's hot (9)

Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imagery
 
Scope Stack Allocation
Scope Stack AllocationScope Stack Allocation
Scope Stack Allocation
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-Graphics
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
 
Landuse Classification from Satellite Imagery using Deep Learning
Landuse Classification from Satellite Imagery using Deep LearningLanduse Classification from Satellite Imagery using Deep Learning
Landuse Classification from Satellite Imagery using Deep Learning
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
 

Similar to On-the-fly Visual Category Search in Web-scale Image Collections

Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsWee Hyong Tok
 
[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platform[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platformNaoki (Neo) SATO
 
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...CodeOps Technologies LLP
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNsAuro Tripathy
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionKai-Wen Zhao
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesKen Chatfield
 
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAmazon Web Services
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...Herman Wu
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...Amazon Web Services
 
Gdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_uploadGdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_uploadJunsik Whang
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfAubainYro1
 
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola MykhailychFwdays
 
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...Daosheng Mu
 
Convolutional Neural Network for pixel-wise skyline detection
Convolutional Neural Network for pixel-wise skyline detectionConvolutional Neural Network for pixel-wise skyline detection
Convolutional Neural Network for pixel-wise skyline detectionDarian Frajberg
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
 

Similar to On-the-fly Visual Category Search in Web-scale Image Collections (20)

Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
 
[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platform[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platform
 
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNs
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet Features
 
Computer vision
Computer vision Computer vision
Computer vision
 
Region-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object RetrievalRegion-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object Retrieval
 
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
 
Gdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_uploadGdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_upload
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdf
 
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
 
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
 
Convolutional Neural Network for pixel-wise skyline detection
Convolutional Neural Network for pixel-wise skyline detectionConvolutional Neural Network for pixel-wise skyline detection
Convolutional Neural Network for pixel-wise skyline detection
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

On-the-fly Visual Category Search in Web-scale Image Collections

  • 1. On-the-fly Visual Category Search in Web-scale Image Collections Ken Chatfield - University of Oxford May 2015
  • 2. • Search large unannotated datasets of 1M+ images for object categories • Do so in real-time and without any prior knowledge Motivation and Objectives
  • 3. ‘Regular’ Category Retrieval 1,000 ILSVRC classes Pre-trained CNN e.g. Alexnet S O F T M A X
  • 5. On-the-fly Category Retrieval Pre-trained CNN e.g. Alexnet fc7 training data from the web OTF Classifier e.g. Linear SVM
  • 6. On-the-fly Category Retrieval Pre-trained CNN e.g. Alexnet fc7 training data from the web OTF Classifier e.g. Linear SVM
  • 7. • Bootstrap training using images from the web • Use highly compact ConvNet features + compression as the basis of a OTF system • Plus: Novel GPU architecture for iterative on- the-fly learning Proposed Solution
  • 8. Architecture Outline Car| Google Image Search Sourced Training Images Image Encoder φ( I ) φ( I+ ) Fixed negative pool precomputed features Linear SVM φ( I- ) w Target Dataset wTφ( It ) Ranking φ( It ) precomputed features Flickr Pinterest etc.
  • 9. Need for Speed Car| Google Image Search Sourced Training Images Image Encoder φ( I ) φ( I+ ) Negative pool Linear SVM φ( I- ) w Target Dataset wTφ( It ) Ranking φ( It ) Flickr Pinterest etc. Ranking most critical stage w wTφ( It ) φ( It )
  • 10. Must compute w.X for all image features in dataset giving complexity of O(ND) so important to reduce image representation dimensionality: • Obtain 128-D representation from CNN
 (488 MB / 1M images) • Then compress further using binarization
 (122 MB / 1M images) • Or using product quantization
 (30.5 MB / 1M images) Fast Ranking = Compact Representation N – # images in test set D – dim of image representation
  • 11. Lower-dimensional Features Taking CNN-M network as base: conv3 512x3x3 conv4 512x3x3 conv2 256x5x5 conv1 96x7x7 conv5 512x3x3 fc6 d.o. 4096-D fc7 d.o. 4096-D ILSVRC softm ax
  • 12. Lower-dimensional Features Taking CNN-M network as base: conv3 512x3x3 conv4 512x3x3 conv2 256x5x5 conv1 96x7x7 conv5 512x3x3 ILSVRC softm axfc6 d.o. 2048-D fc7 d.o. 2048-D Replace 4096-D fc layer w. 2048-D, 128-D layers
  • 13. Lower-dimensional Features Taking CNN-M network as base: conv3 512x3x3 conv4 512x3x3 conv2 256x5x5 conv1 96x7x7 conv5 512x3x3 ILSVRC softm axfc6 d.o. 128-D fc7 d.o. 128-D Replace 4096-D fc layer w. 2048-D, 128-D layers
  • 15. Compression • Binarization by embedding into Hamming space: e : RD ! BM Where M > D and U is obtained by taking the first D columns of the QR-decomposition of a random M x M matrix bi = sgn(Uxi) • Product Quantization … … … … D S d Q
  • 16. Evaluation Dataset 10,000 annotated images PASCAL VOC 2007 1M unannotated images MIRFLICKR-1M • Want to evaluate CNN features for real-world photo retrieval • Disjoint from ImageNet (as CNN trained on that) + with less focus on fine-grained retrieval
  • 17. Evaluation Dataset 1 2 3 4 53 Using MIRFLICKR-1M dataset as distractors
  • 18. Evaluation Dataset 1 2 33 Remove false negatives and evaluate Precision @ K… Using MIRFLICKR-1M dataset as distractors where K = 100 Or evaluate Precision @ K over MIRFLICKR-1M directly
  • 19. Retrieval Results Results for two sample classes over VOC + Distractor data (Retrieve ~500 images from within 1M images – TP are 0.05% of dataset) 1000 10 20 30 40 50 60 70 80 90 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Rank Precision CNN 2048 CNN 128 CNN 128 PQ FK 512 CNN 128 rpbin 1000 10 20 30 40 50 60 70 80 90 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Rank Precision Class: Sheep Class: Motorbike ! CNN 128 (Prec. 0.32 @ 100) ! CNN 128 (Prec 0.77 @ 100)
  • 20. Retrieval Results Results for two sample classes over VOC + Distractor data (Retrieve ~500 images from within 1M images – TP are 0.05% of dataset) ! CNN 128 (Prec. 0.32 @ 100) ! CNN 128 (Prec 0.77 @ 100)
  • 21. Retrieval Results CNN-M 2048 CNN-M 128 CNN-M 128 BIN 55.4 51.0 50.1 95.4 95.1 94.0 90.9 92.3 — VOC Training Google Training CNN-M 128 PQ 50.5 94.6 92.1 7.63 GB 488 MB 122 MB 30.5 MB FV 29.3 80.5 — 312.8 GB
  • 24. VOC vs Google Training ! ‘Chair’ – CNN 128 (Prec. 0.92 @ 100) (Prec. 0.86 @ 100) ! ‘Train’ – CNN 128 (Prec. 1.0 @ 100) (Prec. 1.0 @ 100) VOC Training Google Training
  • 25. Instances & Faces too Instances Root SIFT Extractor ψ( I ) → xi φ( I+ ) VQ Encoder φ( xi ) Hamming Encoder φ( xi ) Spatial Verification φ( xi ) ψ( It ) Target Dataset match? match? Ranking x N (take max) Ntraining images Faces Ntraining images φ( It ) Target Dataset Tracks Ranking Linear SVM w φ( I- ) Negative Pool φ( If+ )If+Face Extractor ψ( I ) → If Pre-trained Face CNN φ( I )
  • 26. Live Demo Landing Page1 User enters text query term and selects search modality (e.g. ‘forest’ using object category search) Ranked Results3 A ranked list of visually matching images is displayed within 1~30 secs of entering the cold query Querying2 A live view of images downloaded from Google Image search as they are used to construct a visual appearance model on-the-fly Can try out the system live over a dataset of 5M+ images sourced from BBC News footage at: http://varro3.robots.ox.ac.uk:9090
  • 27. Question: How can we adapt standard GPU ConvNet pipeline for on-the-fly search? We want: • simultaneous feature computation/model training • highly parallel operation by using a GPU-bound architecture ConvNet-based Architecture • Libraries such as Caffe allow for fast computation of ConvNet features entirely on GPU
  • 28. ConvNet-based Architecture RGB CNN feat. conv stack fc stack Fixed negative pool Sheep| Google Image Search Training Images precomputed CNN feats SVM Model w
  • 29. ConvNet-based Architecture RGB xB/2 Pos. CNN feat. conv stack fc stack CNN feat. xB/2 Neg. Fixed negative pool Sheep| Google Image Search Training Images SVM Loss Layer 5 = 1 B X i=1..B I[yiw> xi < 1]yixi Batch Sampler Batch size = B precomputed CNN feats CPU Frontend GPU Backend
  • 30. ConvNet-based Architecture RGB xB/2 Pos. CNN feat. conv stack fc stack CNN feat. xB/2 Neg. Fixed negative pool Sheep| Google Image Search Training Images SVM Loss Layer 5 = 1 B X i=1..B I[yiw> xi < 1]yixi Batch Sampler Batch size = B Image Buffer precomputed CNN feats CPU Frontend GPU Backend
  • 31. ConvNet-based Architecture Batch Sampler Batch size = B Fixed negative pool Sheep| Google Image Search Training Images Image Buffer RGB xB/2 Pos. CNN feat. xB/2 Neg. CNN feat. Target Dataset: MIRFLICKR Every τsecs conv stack fc stack Model w precomputed CNN feats CPU Frontend GPU Backend Inner Product Layer precomputed CNN feats SVM Loss Layer
  • 32. 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images Retrieval Results • Images are fed into the network at a rate of 12 per second • Dataset is ranked with current model every ~0.2 seconds • Most rankings stabilise in under 1 second 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 0.15s0.36s0.54s0.73s sofasheepbushorse
  • 33. 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images Retrieval Results • Images are fed into the network at a rate of 12 per second • Dataset is ranked with current model every ~0.2 seconds • Most rankings stabilise in under 1 second 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 0.15s0.36s0.54s0.73s sofasheepbushorse
  • 34. Retrieval Results • Images are fed into the network at a rate of 12 per second • Dataset is ranked with current model every ~0.2 seconds • Most rankings stabilise in under 1 second 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 0.15s0.36s0.54s0.73s sofasheepbushorse
  • 35. Retrieval Results • Images are fed into the network at a rate of 12 per second • Dataset is ranked with current model every ~0.2 seconds • Most rankings stabilise in under 1 second 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 0.15s0.36s0.54s0.73s sofasheepbushorse
  • 36. Retrieval Results • Images are fed into the network at a rate of 12 per second • Dataset is ranked with current model every ~0.2 seconds • Most rankings stabilise in under 1 second 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 0.15s0.36s0.54s0.73s sofasheepbushorse
  • 37. Retrieval Results • Images are fed into the network at a rate of 12 per second • Dataset is ranked with current model every ~0.2 seconds • Most rankings stabilise in under 1 second 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 0.15s0.36s0.54s0.73s sofasheepbushorse
  • 38. Retrieval Results • Images are fed into the network at a rate of 12 per second • Dataset is ranked with current model every ~0.2 seconds • Most rankings stabilise in under 1 second 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 0.15s0.36s0.54s0.73s sofasheepbushorse
  • 39. Retrieval Results • Images are fed into the network at a rate of 12 per second • Dataset is ranked with current model every ~0.2 seconds • Most rankings stabilise in under 1 second 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 0.15s0.36s0.54s0.73s sofasheepbushorse
  • 40. Retrieval Results • Images are fed into the network at a rate of 12 per second • Dataset is ranked with current model every ~0.2 seconds • Most rankings stabilise in under 1 second 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 0.15s0.36s0.54s0.73s sofasheepbushorse
  • 41. Retrieval Results • Images are fed into the network at a rate of 12 per second • Dataset is ranked with current model every ~0.2 seconds • Most rankings stabilise in under 1 second 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 0.15s0.36s0.54s0.73s sofasheepbushorse
  • 42. Retrieval Results • Images are fed into the network at a rate of 12 per second • Dataset is ranked with current model every ~0.2 seconds • Most rankings stabilise in under 1 second 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Seconds Precision@100 10images 20images 30images 0.15s0.36s0.54s0.73s sofasheepbushorse
  • 43. Currently working on the following extensions: • How to select negative training images more intelligently (e.g. selection of most discriminative negative images per query from a larger 1M+ pool of non-class images) • How to establish a confidence measure for images in the output ranking, so know when a query works well or not, and source training images more intelligently • Query attribute refinement (sporty + car) Continued Work
  • 44. “On-the-fly Learning for Visual Search of Large-scale Image and Video Datasets”
 IJMIR 2015 Ken Chatfield, Relja Arandjelovic, Omkar Parkhi, Andrew Zisserman “Efficient On-the-fly Category Retrieval using ConvNets and GPUs” 
 ACCV 2014 Ken Chatfield, Karen Simonyan, Andrew Zisserman “Return of the Devil in the Details: Delving Deep into Convolutional Nets”
 BMVC 2014 Ken Chatfield, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman (Best Paper Prize) http://www.robots.ox.ac.uk/~ken Related Publications