SlideShare a Scribd company logo
1 of 93
Deep Residual Hashing
Neural Network ​
for Image Retrieval
Edwin Efraín Jiménez Lepe
Thesis Advisor:
Dr. Andrés Méndez Vázquez
CINVESTAV, Unidad Guadalajara, Jalisco.
October 30th, 2017
Master Thesis Defense
Agenda
◎ Introduction
◎ Theoretical Framework
◎ Proposal
◎ Experimental Work and Results
◎ Conclusion and Future work
◎ Bibliography
2
1.
Introduction
3
46,200
posts
uploaded
1.8 million
snaps
created
4.1 million
videos
viewed
15,000 GIFs
sent via
messenger
70,017
hours
watched
In 2017, every minute
Source: http://www.visualcapitalist.com/happens-internet-minute-2017/
Instagram Youtube Facebook
Messenger
Netflix Snapchat
How do we search?
Results from search in Google
5
Text based (1970)
Search for a specific
label
dog, black and white, art
Manual annotation of images
is expensive and subjective (Tamura,
1984) [1]
Content based (1990)
Search for visual
features as texture
and color (Zheng, 2016) [2]
How do we search?
Results from search in Google
6
“
A picture is worth a thousand
words
7
According to (Yoo, 2002) [3]:
◎ 𝑋 = 𝑥1, … , 𝑥 𝑁 be a database with 𝑁 images
◎ 𝐹 = 𝑓1, … , 𝑓𝑁 where 𝑓𝑖 is a feature vector associated with 𝑥𝑖
Let 𝑇 represent a mapping from the image space onto the h-dimensional
feature space, 𝑓𝑖, i.e.,
𝑇 ∶ 𝑥 → 𝑓,
where 𝑥 ∈ 𝑋 and f ∈ 𝐹.
The similarity between two images 𝑥𝑖 and 𝑥𝑗 can be measured using
similarity function 𝑑 𝑓𝑖, 𝑓𝑗 .
Content Based Image Retrieval (CBIR)
8
The problem of retrieval can be posed as follows:
Given a query image 𝑞, retrieve a subset of images 𝑀 from 𝑋 such that:
𝑑 𝑇 𝑞 , 𝑇 𝑚 ≤ 𝑡, m ∈ 𝑀,
where 𝑡 is a user-defined threshold.
Instead of this, a user can ask the system to output, say, the top-𝑘 images.
Content Based Image Retrieval (CBIR)
9
CBIR, how does it works?
Operation of CBIR system. Extracted from
randomcodingtopics.files.wordpress.com/2012/05/rorissa_figure11.png
10
Online
Offline
What is an image?
A matrix
𝑀 ∈ ℤℎ × 𝑤 ×𝑐
which integer values
are in range [0, 255]
Where ℎ is height
𝑤 is the width
and 𝑐 indicates the number
of channels
Image from: https://ujwlkarn.files.wordpress.com/2016/08/8-gif.gif?w=192&h=192&zoom=2
11
Where CBIR is used?
Medicine
Research: Search for
unusual patterns in patient
records.
Diagnostics: similar cases
can be found [5]
Biodiversity Information Systems [6]
users can have results with images of similar
living beings that exists in a specific country or
those who share the environment and have
similar shape
12
Where CBIR is used?
Digital libraries [7]
Forensic [8]
The retrieval based on image content has
been used in criminal images, fingerprints
and scene images.
13
Where CBIR is used?
Online Shopping
14
How did CBIR have changed through time?
15
Image from (Sheng, 2016) [9]
Motivation
16
Semantic Gap
“the lack of coincidence between the information that
one can extract from the visual data and the
interpretation that the same data have for a user in a
given situation” (Smeulders, 2000) [4]
Viewpoint and pose Intra-class and Inter-
class variability
Objectives
General
◎ Propose a CNN model to generate a feature based code for image
indexing.
17
Specific
◎ Improve the current precision of retrieval in CBIR systems using a
deep learning approach to generate codes for image indexing in a
supervised manner.
◎ Analyze the influence of model depth in the mean average precision
(MAP) of retrieval task.
◎ Evaluate generated code length influence in the MAP of retrieval
task.
2.
Theoretical
Framework
18
Artificial Neural Network
19
Image from: http://cs231n.github.io/assets/nn1
Artificial Neural Network
Image from: http://cs231n.github.io/assets/nn1
Perceptron
Artificial Neural Network
Image from: http://cs231n.github.io/assets/nn1
Perceptron
Perceptron
Image from: https://d4datascience.files.wordpress.com/2016/09/600px-artificialneuronmodel_english.png
Convolutional Neural Network
23
A neural network with two special layers:
◎ Convolution
◎ Subsampling (pooling)
Image from: https://www.researchgate.net/profile/Konstantin_Pervunin
Convolution
24
Image from: https://i1.wp.com/timdettmers.com/wp-
content/uploads/2015/03/convolution.png?resize=500%2C193
Convolution
25
Input (f) Kernel (h) Output (g)
3 3 2 1 0
0 0 1 3 1
3 1 2 2 3
2 0 0 2 2
2 0 0 0 1
𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 =
𝑘=−∞
∞
𝑙=−∞
∞
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
2 1 0
0 2 2
2 1 0
12 12 17
10 17 19
9 6 14
Convolution
26
Input Output
‘full’, ‘valid’, ‘same’
Image from: http://deeplearning.net/software/theano/_images/numerical_no_padding_no_strides.gif
𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 =
𝑘=−∞
∞
𝑙=−∞
∞
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
outputDim =
𝑓−ℎ+2𝑃𝑎𝑑𝑑𝑖𝑛𝑔
𝑠𝑡𝑟𝑖𝑑𝑒
+ 1
Max-Pooling
27
Image from: https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-
content/uploads/2017/06/28133544/pooling.png
Input Output
Max-Pooling
28
Input Output
23 4 16 90
12 32 12 45
5 7 8 9
2 12 14 56
32 90
12 56
𝑦 𝑘𝑖𝑗 = max
𝑝,𝑞 ∈𝑅 𝑖𝑗
𝑥 𝑘𝑝𝑞
Max-Pooling
29
Input Output
23 4 16 90
12 32 12 45
5 7 8 9
2 12 14 56
32 90
12 56
𝑦 𝑘𝑖𝑗 = max
𝑝,𝑞 ∈𝑅 𝑖𝑗
𝑥 𝑘𝑝𝑞
Activation functions
30
sigmoid =
1
1+𝑒−𝑥 tanh =
𝑒 𝑥+𝑒−𝑥
𝑒 𝑥+𝑒−𝑥
Rectified Linear Unit (ReLU)
31
𝑦𝑖𝑗 = max (𝑥𝑖𝑗, 0)
ResNet, (He, 2016) [10]
32
Residual Block
ResNet
33
3.
Proposal
How to retrieve images
34
Deep Residual Hashing Neural Network
(DRHN)
35
Residual Block
𝑅𝑒𝐿𝑈 𝑥𝑖𝑗 = max (𝑥𝑖𝑗, 0)
𝐸𝑙𝑒𝑚𝑒𝑛𝑡-𝑤𝑖𝑠𝑒 𝑎𝑑𝑑𝑖𝑡𝑖𝑜𝑛 = 𝐴𝑑𝑑(𝑥, 𝑧) = 𝑦𝑖𝑗 = 𝑥𝑖𝑗 + 𝑧𝑖𝑗
𝐵𝑎𝑡𝑐ℎ-𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 (Loffe, 2015) [11]= 𝐵𝑁𝛾,𝛽 𝑥
𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾2,𝛽2
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾1,𝛽1
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑥, 𝑤1 , 𝑤2 , 𝑥
𝑐𝑜𝑛𝑣 𝑓, ℎ = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 =
𝑘=−∞
∞
𝑙=−∞
∞
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
36
DRHN Architecture
Layer Output dim.
Input 3x32x32
𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑥, 𝑤1
16x32x32
Residual Group(n) 16x32x32
Residual Group(n)* 32x16x16
Residual Group(n)* 64x8x8
Residual Group(n) 64x8x8
Residual Group(n) 64x8x8
Residual Group(n)* 128x4x4
Average Pooling Layer 128
Hash Layer H
Fully Connected Layer with Softmax 10
37
DRHN-1 Architecture
Layer
Input
𝑦 = 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑖𝑛𝑝𝑢𝑡, 𝑤1
𝑦1 = 𝐴𝑑𝑑 𝐵𝑁𝛾2,𝛽2
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾1,𝛽1
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤2 , 𝑤3 , 𝑦
𝑦2 = 𝐴𝑑𝑑 𝐵𝑁𝛾4,𝛽4
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾3,𝛽3
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦1, 𝑤4 , 𝑤5 , 𝑦1
𝑦3 = 𝐴𝑑𝑑 𝐵𝑁𝛾6,𝛽6
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾5,𝛽5
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦2, 𝑤6 , 𝑤7 , 𝑦2
𝑦4 = 𝐴𝑑𝑑 𝐵𝑁𝛾8,𝛽8
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾7,𝛽7
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦3, 𝑤8 , 𝑤9 , 𝑦3
𝑦5 = 𝐴𝑑𝑑 𝐵𝑁𝛾10,𝛽10
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾9,𝛽9
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦4, 𝑤10 , 𝑤11 , 𝑦4
𝑦6 = 𝐴𝑑𝑑 𝐵𝑁𝛾12,𝛽12
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾11,𝛽11
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦5, 𝑤12 , 𝑤13 , 𝑦5
Average Pooling Layer
Hash Layer
Fully Connected Layer with Softmax
38
Input
𝑦 = 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑖𝑛𝑝𝑢𝑡, 𝑤1
𝑦1 = 𝐴𝑑𝑑 𝐵𝑁𝛾2,𝛽2
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾1,𝛽1
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤2 , 𝑤3 , 𝑦
𝑦2 = 𝐴𝑑𝑑 𝐵𝑁𝛾4,𝛽4
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾3,𝛽3
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦1, 𝑤4 , 𝑤5 , 𝑦1
𝑦3 = 𝐴𝑑𝑑 𝐵𝑁𝛾6,𝛽6
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾5,𝛽5
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦2, 𝑤6 , 𝑤7 , 𝑦2
𝑦4 = 𝐴𝑑𝑑 𝐵𝑁𝛾8,𝛽8
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾7,𝛽7
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦3, 𝑤8 , 𝑤9 , 𝑦3
𝑦5 = 𝐴𝑑𝑑 𝐵𝑁𝛾10,𝛽10
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾9,𝛽9
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦4, 𝑤10 , 𝑤11 , 𝑦4
𝑦6 = 𝐴𝑑𝑑 𝐵𝑁𝛾12,𝛽12
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾11,𝛽11
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦5, 𝑤12 , 𝑤13 , 𝑦5
𝑦7 = 𝐴𝑑𝑑 𝐵𝑁𝛾14,𝛽14
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾13,𝛽13
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦6, 𝑤14 , 𝑤15 , 𝑦6
𝑦8 = 𝐴𝑑𝑑 𝐵𝑁𝛾16,𝛽16
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾15,𝛽15
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦7, 𝑤16 , 𝑤17 , 𝑦7
𝑦9 = 𝐴𝑑𝑑 𝐵𝑁𝛾18,𝛽18
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾17,𝛽17
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦8, 𝑤18 , 𝑤19 , 𝑦8
𝑦10 = 𝐴𝑑𝑑 𝐵𝑁𝛾20,𝛽20
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾19,𝛽19
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦9, 𝑤20 , 𝑤21 , 𝑦9
𝑦11 = 𝐴𝑑𝑑 𝐵𝑁𝛾22,𝛽22
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾21,𝛽21
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦10, 𝑤22 , 𝑤23 , 𝑦10
𝑦12 = 𝐴𝑑𝑑 𝐵𝑁𝛾24,𝛽24
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾23,𝛽23
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦11, 𝑤24 , 𝑤25 , 𝑦11
Average Pooling Layer
Hash Layer
DRHN-2 Architecture
39
DRHN-n Architecture
Layer
Input
𝑦 = 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑖𝑛𝑝𝑢𝑡, 𝑤
𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛
𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛
𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛
𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛
𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛
𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛
Average Pooling Layer
Hash Layer
Fully Connected Layer with Softmax
40
DRHN
Softmax = 𝜎 𝑧 𝑗 =
𝑒
𝑧 𝑗
𝑘
𝐾
𝑒 𝑧 𝑘
for 𝑗 = 1, … , 𝐾.
Training
𝐿𝑜𝑠𝑠𝑖 = −𝑡𝑖 log 𝑧𝑖
41
DRHN
Generating binary codes
DRHN Generating binary index
42
Hash Layer (𝐻) is a fully connected layer with a sigmoid activation
function s:
𝐻 = 𝑠 𝑊⊤ 𝑥 + 𝑏 ∈ ℝℎ
◎ x ∈ ℝ 𝑑 is the input of the layer
◎ 𝑏 ∈ ℝℎ
is a bias
◎ W ∈ ℝ 𝑑 × ℎ represent the weights that connect the units of 𝑥 with
𝐻.
DRHN Generating binary index
43
Given an image 𝐼 ∈ ℝ 𝑐×𝑧×𝑤
○ where 𝑐 represents the channels of the image
○ 𝑧 the height
○ 𝑤 the width
the layers of the model from Input layer to 𝐻 performs the mapping
from ℝ 𝑐×𝑧×𝑤 to ℝℎ.
To obtain the binary code related to 𝐼 as described by (Lin, 2015) [12], we
1. Extract the output of 𝐻
2. Binarize the activation by a threshold to obtain the correspondent
code.
3. For each element in 𝐻 we apply the sign function:
𝑠𝑖𝑔𝑛 𝐻 𝑖 =
1, 𝑖𝑓 𝐻 𝑖 > 0.5
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
DRHN Image Retrieval
44
◎ 𝐼𝑀 = 𝐼1, 𝐼2, … , 𝐼 𝑛 , dataset with 𝑛 images
◎ 𝐼𝑀 𝐻 = 𝐻1, 𝐻2, … , 𝐻 𝑛 the corresponding binary codes
◎ Query image 𝐼 𝑞
◎ Binary code of query image 𝐻 𝑞
Return the elements of 𝐼𝑀 where:
𝑑 𝐻 𝐻 𝑞, 𝐻𝑖 ≤ 𝑡
where 𝑑 𝐻 denotes Hamming distance
𝑡 is a user-defined threshold
or the top-𝑘 if a specific 𝑘 number of images is required.
Overview
45
l
4.
Experimental work
And results
46
Dataset CIFAR-10
47
◎ 60,000 color images
32x32 pixels.
◎ 10 classes
◎ Training set
50,000 images
◎ Validation set
10,000 images
Evaluation Metric for Classification
48
𝐴𝐶𝐶 =
𝑖=1
𝑁
𝑐𝑙 𝑦𝑖
𝑁
𝑐𝑙 𝑦𝑖 =
1, 𝑖𝑓 𝑦𝑖 == 𝑦𝑖
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Training
49
Classification accuracy during training.
Training
50
Loss during training.
Evaluation Metrics for Image Retrieval
51
where 𝐿(𝑞) denotes query image 𝑞 label and 𝐿(𝑖) denotes 𝑖th ranked
image label.
𝑃@𝐾 =
𝑖=1
𝑘
𝑋(𝑖)
𝑘
𝑋 𝑖 =
1, 𝑖𝑓 𝐿 𝑞 = 𝐿 𝑖
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Evaluation Metrics for Image Retrieval
52
𝑚𝐴𝑃 =
1
|𝑄|
𝑖=1
|𝑄|
1
𝑚 𝑞
𝑘=1
𝑚 𝑞
𝑃@𝐾
where |𝑄| represent the number of queries and 𝑚 𝑞 is the number of
result images for a given query 𝑞.
Example of Evaluation
53
𝑞
Retrieved
images
𝑃@𝐾 =
𝑖=1
𝑘
𝑋(𝑖)
𝑘
k = 1
𝑃@𝐾 =
1
1
= 1
Example of Evaluation
54
𝑞
Retrieved
images
𝑃@𝐾 =
𝑖=1
𝑘
𝑋(𝑖)
𝑘
k = 2
𝑃@𝐾 =
2
2
= 1
Example of Evaluation
55
𝑞
Retrieved
images
𝑃@𝐾 =
𝑖=1
𝑘
𝑋(𝑖)
𝑘
k = 3
𝑃@𝐾 =
2
3
= 0,66667
Example of Evaluation
56
𝑞
Retrieved
images
𝑃@𝐾 =
𝑖=1
𝑘
𝑋(𝑖)
𝑘
k = 4
𝑃@𝐾 =
3
4
= 0,75
We compare against
57
Unsupervised
◎ Locality Sensitive Hashing (LSH) [13]
◎ Spectral Hashing (SH) [14]
◎ Iterative Quantization (ITQ) [15]
Supervised
◎ Kernel Supervised Hashing (KSH) [16]
◎ Minimal Loss Hashing (MLH) [17]
◎ Binary Reconstruction Embedding (BRE) [18]
◎ ITQ-CCA [15]
Deep Learning Approaches
◎ CNNH, CNNH+ [19]
◎ DNNH [20]
◎ DHN [21]
◎ CNNBH [22]
◎ DLBHC [12]
Evaluation on CIFAR-10
58
• Validation set (10,000 images) as query images
• Training set (50,000 images) as database images
• 𝑘 = [1 … 1000]
• Average 𝑃@𝐾 for all query images
59𝑃@𝐾 for 𝐾 = 1, … , 1000
60
Method 12 bits 32 bits 48 bits
DRHN-15 92.65 92.23 92.91
DLBHC 89.3 89.72 89.73
CNNBH 53.2 61.0 61.7
DHN 55.5 60.3 62.1
DNNH 55.2 55.8 58.1
CNNH+ 46.5 52.1 53.2
CNNH 43.9 50.9 52.2
KSH 30.3 34.6 35.6
ITQ-CCA 26.4 28.8 29.5
LSH 12.1 12.0 12.0
MAP of different proposals in CIFAR-10
61
mAP comparison of different depth in our method on CIFAR-10 dataset
62
precision comparison of different depth in our method on CIFAR-10 dataset
63
Average time to generate the binary code for an image (ms)
64
Top 10 retrieved images from CIFAR-10 using DRHN-9 with different bit numbers.
65
Top 10 retrieved images for boat images in CIFAR-10 using DRHN-9 and h=[48, 128].
5.
Conclusion
And future work
66
67
◎We proposed a deep model of CNN based on the
ResNet to generate binary codes in a supervised
manner, allowing an efficient search of images based
on their content.
◎This method takes fully advantage of data labels and
raw data thanks to the properties of CNN.
◎Experimental results show that we can outperform
several state-ofthe-art results on image retrieval on a
popular dataset (CIFAR-10).
68
◎The mAP in retrieval task increases about model depth.
◎ Having a short binary code leads to similar results in
different image queries of same class
◎With an appropriate combination of model depth and
binary code length we can obtain a good visual result.
Future Work
69
◎Train the model for retrieval task in more complex
image datasets (with hundreds or thousands of classes)
to measure the proposal’s scalability (ImageNet).
◎Experiment with supervised methods that are not
based on deep learning, because of time and compute
consumption.
Publications
70
One contribution to ICANN 2017 (26 th International
Conference on Artificial Neural Networks) entitled Deep
Residual Hashing Network for Image Retrieval.
It is included in the Springer book Artificial Neural
Networks and Machine Learning - ICANN 2017, which
belong to the series Lecture Notes in Computer Science
(LNCS).
6.
Bibliography
71
1. H. Tamura and N. Yokoya, “Image database systems: A survey”, Pattern
recognition, vol. 17, no. 1, pp. 29–43, 1984.
2. L. Zheng, Y. Yang, and Q. Tian, “Sift meets cnn: A decade survey of
instance retrieval”, arXiv preprint arXiv:1608.01807, 2016.
3. H.-W. Yoo, D.-S. Jang, S.-H. Jung, J.-H. Park, and K.-S. Song, “Visual
information retrieval system via content-based approach”, Pattern
Recognition, vol. 35, no. 3, pp. 749–769, 2002.
4. Arnold W.M. Smeulders, Marcel Worring, Simone Santini, Amarnath
Gupta, and Ramesh Jain. 2000. Content-Based Image Retrieval at the
End of the Early Years. IEEE Transactions on Pattern Analy-sis and
Machine Intelligence 22, 12 (2000), 1349–1380.
5. H. Müller, N. Michoux, D. Bandon, and A. Geissbuhler, “A review of
content based image retrieval systems in medical applications—clinical
benefits and future directions”, International journal of medical
informatics, vol. 73, no. 1, pp. 1–23, 2004
6. R. d. S. Torres, C. B. Medeiros, M. A. Gonçcalves, and E. A. Fox, “A digital
library framework for biodiversity information systems”, International
Journal on Digital Libraries, vol. 6, no. 1, pp. 3–17, 2006.
7. Zhu, B., Ramsey, M., & Chen, H. (2000). Creating a large-scale content-
based airphoto image digital library. IEEE Transactions on Image
Processing, 9(1), 163-167.
8. S. A. Gulhane, “Content based image retrieval from forensic image
databases”, International Journal of engineering Research and
Applications, vol. 1, no. 5, pp. 66–70
72
9. Zheng, L., Yang, Y., & Tian, Q. (2016). SIFT meets CNN: a decade survey of
instance retrieval. arXiv preprint arXiv:1608.01807.
10. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition”, in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2016, pp. 770–778
11. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
network training by reducing internal covariate shift”, arXiv preprint
arXiv:1502.03167, 2015.
12. K. Lin, H.-F. Yang, J.-H. Hsiao, and C.-S. Chen, “Deep learning of binary
hash codes for fast image retrieval”, in Proceedings of the IEEE
conference on computer vision and pattern recognition workshops,
2015, pp. 27–35.
13. A. Gionis, P. Indyk, R. Motwani, et al., “Similarity search in high
dimensions via hashing”, in VLDB, vol. 99, 1999, pp. 518–529.
14. Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing”, in Advances in
neural information processing systems, 2009, pp. 1753–1760.
15. Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, “Iterative quantization:
A procrustean approach to learning binary codes for large-scale image
retrieval”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 35, no. 12, pp. 2916–2929, 2013.
16. W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang, “Supervised hashing
with kernels”, in Computer Vision and Pattern Recognition (CVPR), 2012
IEEE Conference on, IEEE, 2012, pp. 2074–2081
73
17. M. Norouzi and D. M. Blei, “Minimal loss hashing for compact binary
codes”, in Proceedings of the 28th international conference on machine
learning (ICML-11), 2011, pp. 353–360.
18. B. Kulis and T. Darrell, “Learning to hash with binary reconstructive
embeddings”, in Advances in neural information processing systems,
2009, pp. 1042–1050
19. R. Xia, Y. Pan, H. Lai, C. Liu, and S. Yan, “Supervised hashing for image
retrieval via image representation learning.”, in AAAI, vol. 1, 2014, p. 2
20. H. Lai, Y. Pan, Y. Liu, and S. Yan, “Simultaneous feature learning and
hash coding with deep neural networks”, in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2015, pp.
3270–3278.
21. H. Zhu, M. Long, J. Wang, and Y. Cao, “Deep hashing network for efficient
similarity retrieval.”, in AAAI, 2016, pp. 2415–2421.
22. J.Guo and J.Li, “Cnn based hashing for image retrieval”, ArXiv preprint
arXiv:1509.01354, 2015.
74
Thanks!
Any questions?
You can find me at:
eejimenez@gdl.cinvestav.mx
edwinjimenezlepe
lepe92
“
What is research,
but a blind date with knowledge.
William Henry
Appendix
77
2-D Convolution
78
2-D Convolution (1)
79
◎ Input Image 𝑓 of size 𝑚 × 𝑛 where 𝑚 indicates rows and 𝑛
columns.
◎ Kernel or filter ℎ
Let 𝑓 𝑚, 𝑛 = 0 if
𝑛 < 0, 𝑛 ≥ 𝑁
𝑚 < 0, 𝑚 ≥ 𝑀
𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 =
𝑘=−∞
∞
𝑙=−∞
∞
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
𝑔 𝑚, 𝑛 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 = −∞
∞
−∞
∞
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙 𝑑𝑘 𝑑𝑙
2-D Convolution (2)
80
𝑔 𝑚, 𝑛 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
Example
ℎ =
ℎ[0,0] ℎ[0,1]
ℎ[1,0] ℎ[1,1]
Let h 𝑚, 𝑛 = 0 if
𝑚, 𝑛 > 1
𝑚, 𝑛 < 0
𝑓 =
𝑓[0,0] ⋯ 𝑓[0, 𝑁 − 1]
⋮ ⋱ ⋮
𝑓[𝑀 − 1,0] ⋯ 𝑓[𝑀 − 1, 𝑁 − 1]
2-D Convolution (2)
81
ℎ =
ℎ[0,0] ℎ[0,1]
ℎ[1,0] ℎ[1,1]𝑓 =
𝑓[0,0] ⋯ 𝑓[0, 𝑁 − 1]
⋮ ⋱ ⋮
𝑓[𝑀 − 1,0] ⋯ 𝑓[𝑀 − 1, 𝑁 − 1]
𝑔 0,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 0 − 𝑙 = 𝑓 0,0 ℎ[0,0]
𝑔 0,1 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 1 − 𝑙 = 𝑓 0,0 ℎ 0,1 + 𝑓 0,1 ℎ 0,0
𝑔 1,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 1 − 𝑘, 0 − 𝑙 = 𝑓 0,0 ℎ 1,0 + 𝑓 1,0 ℎ 0,0
2-D Convolution (3)
82
𝑓 =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
𝑔 0,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 0 − 𝑙 = 17 ∗ 1
𝑔 0,1 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 1 − 𝑙 = 17 ∗ 3 + 24 ∗ 1
𝑔 1,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 1 − 𝑘, 0 − 𝑙 = 17 ∗ 0 + 23 ∗ 1
h=
1 3 1
0 5 0
2 1 2
2-D Convolution (3)
83
𝑓 =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
𝑔 0,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 0 − 𝑙 = 17 ∗ 1
𝑔 0,1 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 1 − 𝑙 = 17 ∗ 3 + 24 ∗ 1
𝑔 1,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 1 − 𝑘, 0 − 𝑙 = 17 ∗ 0 + 23 ∗ 1
h=
1 3 1
0 5 0
2 1 2
2-D Convolution
84
CNN complexity
𝑂
𝑙=1
𝑑
𝑛𝑙−1 ∙ 𝑠𝑙
2
∙ 𝑛𝑙 ∙ 𝑚𝑙
2
Where 𝑙 is the index of a convolutional layer, and 𝑑 is the depth
(number of convolutional layers).
𝑛𝑙 is the number of filters (also know as “width”) in the 𝑙-th layer.
𝑛𝑙−1 is also known as the number of input channels of the 𝑙-th layer.
𝑠𝑙 is the spatial size (length) of the filter.
𝑚𝑙 is the spatial size of the output feature map.
From Convolutional Neural Networks at Constrained Time Cost
(Kaiming He)
85
Why a CNN?
◎ Researchers have applied CNNs to enhance the
performance of image retrieval lately, since the top
layers of deep convolutional neural networks can
provide a high-level descriptor of the visual content of
images. (Fast content-based image retrieval using
Convolutional Neural Network and Hash Function)
86
Why a CNN?
◎ We need to extract the best features to retrieve
‘relevant’ images
◎ CNN models achieve state of the art results in image
classification
87
88
Deep Residual Hashing Neural Network
89
◎ 1.3 million high-resolution images
◎ 1000 different classes
Image from: http://danielnouri.org
Results of CNN on ImageNet
Results of CNN on ImageNet
90
Image from https://www.quora.com/What-is-the-state-of-the-art-today-on-ImageNet-classification-In-other-
words-has-anybody-beaten-Deep-Residual-Learning
Dataset CIFAR-100
91
This dataset is just like the CIFAR-100, except it has 100
classes containing 600 images each. There are 500
training images and 100 testing images per class. The 100
classes in the CIFAR-100 are grouped into 20 superclasses.
Each image comes with a "fine" label (the class to which it
belongs) and a "coarse" label (the superclass to which it
belongs).
92mAP comparison of different depth in our method on CIFAR-100 dataset.
93Image retrieval precision with 128 bits on CIFAR-100 dataset.

More Related Content

What's hot

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Active contour segmentation
Active contour segmentationActive contour segmentation
Active contour segmentationNishant Jain
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
 A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs) A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)Thomas da Silva Paula
 
Computer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC AlgorithmComputer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC Algorithmallyn joy calcaben
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)Sanjay Saha
 
Image-to-Image Translation
Image-to-Image TranslationImage-to-Image Translation
Image-to-Image TranslationJunho Kim
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 
A Deep Journey into Super-resolution
A Deep Journey into Super-resolutionA Deep Journey into Super-resolution
A Deep Journey into Super-resolutionRonak Mehta
 
CV_Chap 6 Motion Representation
CV_Chap 6 Motion RepresentationCV_Chap 6 Motion Representation
CV_Chap 6 Motion RepresentationKhushali Kathiriya
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Amol Patil
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroBill Liu
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxDeep Learning Italia
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421穗碧 陳
 
Physics-Informed Machine Learning
Physics-Informed Machine LearningPhysics-Informed Machine Learning
Physics-Informed Machine LearningOmarYounis21
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 

What's hot (20)

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Active contour segmentation
Active contour segmentationActive contour segmentation
Active contour segmentation
 
Object tracking
Object trackingObject tracking
Object tracking
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
 A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs) A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
 
Computer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC AlgorithmComputer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC Algorithm
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)
 
Image-to-Image Translation
Image-to-Image TranslationImage-to-Image Translation
Image-to-Image Translation
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
A Deep Journey into Super-resolution
A Deep Journey into Super-resolutionA Deep Journey into Super-resolution
A Deep Journey into Super-resolution
 
CV_Chap 6 Motion Representation
CV_Chap 6 Motion RepresentationCV_Chap 6 Motion Representation
CV_Chap 6 Motion Representation
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421
 
Physics-Informed Machine Learning
Physics-Informed Machine LearningPhysics-Informed Machine Learning
Physics-Informed Machine Learning
 
Computer Vision Crash Course
Computer Vision Crash CourseComputer Vision Crash Course
Computer Vision Crash Course
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 

Similar to Deep Residual Hashing Neural Network for Image Retrieval

Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDing Li
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
Fuzzy entropy based optimal
Fuzzy entropy based optimalFuzzy entropy based optimal
Fuzzy entropy based optimalijsc
 
Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015Soma Boubou
 
Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleYvonne K. Matos
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...Shuhei Yoshida
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdfnyomans1
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptxssuser7807522
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowOswald Campesato
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Networkaciijournal
 
Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...IJECEIAES
 
Landmark Retrieval & Recognition
Landmark Retrieval & RecognitionLandmark Retrieval & Recognition
Landmark Retrieval & Recognitionkenluck2001
 
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...宏毅 李
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Introduction to Image Processing
Introduction to Image ProcessingIntroduction to Image Processing
Introduction to Image ProcessingIsrael Gbati
 

Similar to Deep Residual Hashing Neural Network for Image Retrieval (20)

Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Fuzzy entropy based optimal
Fuzzy entropy based optimalFuzzy entropy based optimal
Fuzzy entropy based optimal
 
Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015
 
Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and Kaggle
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx
 
Eye deep
Eye deepEye deep
Eye deep
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
 
Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...
 
Landmark Retrieval & Recognition
Landmark Retrieval & RecognitionLandmark Retrieval & Recognition
Landmark Retrieval & Recognition
 
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Introduction to Image Processing
Introduction to Image ProcessingIntroduction to Image Processing
Introduction to Image Processing
 
Deep Neural Network
Deep Neural NetworkDeep Neural Network
Deep Neural Network
 

Recently uploaded

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?Watsoo Telematics
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 

Recently uploaded (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 

Deep Residual Hashing Neural Network for Image Retrieval

  • 1. Deep Residual Hashing Neural Network ​ for Image Retrieval Edwin Efraín Jiménez Lepe Thesis Advisor: Dr. Andrés Méndez Vázquez CINVESTAV, Unidad Guadalajara, Jalisco. October 30th, 2017 Master Thesis Defense
  • 2. Agenda ◎ Introduction ◎ Theoretical Framework ◎ Proposal ◎ Experimental Work and Results ◎ Conclusion and Future work ◎ Bibliography 2
  • 4. 46,200 posts uploaded 1.8 million snaps created 4.1 million videos viewed 15,000 GIFs sent via messenger 70,017 hours watched In 2017, every minute Source: http://www.visualcapitalist.com/happens-internet-minute-2017/ Instagram Youtube Facebook Messenger Netflix Snapchat
  • 5. How do we search? Results from search in Google 5 Text based (1970) Search for a specific label dog, black and white, art Manual annotation of images is expensive and subjective (Tamura, 1984) [1]
  • 6. Content based (1990) Search for visual features as texture and color (Zheng, 2016) [2] How do we search? Results from search in Google 6
  • 7. “ A picture is worth a thousand words 7
  • 8. According to (Yoo, 2002) [3]: ◎ 𝑋 = 𝑥1, … , 𝑥 𝑁 be a database with 𝑁 images ◎ 𝐹 = 𝑓1, … , 𝑓𝑁 where 𝑓𝑖 is a feature vector associated with 𝑥𝑖 Let 𝑇 represent a mapping from the image space onto the h-dimensional feature space, 𝑓𝑖, i.e., 𝑇 ∶ 𝑥 → 𝑓, where 𝑥 ∈ 𝑋 and f ∈ 𝐹. The similarity between two images 𝑥𝑖 and 𝑥𝑗 can be measured using similarity function 𝑑 𝑓𝑖, 𝑓𝑗 . Content Based Image Retrieval (CBIR) 8
  • 9. The problem of retrieval can be posed as follows: Given a query image 𝑞, retrieve a subset of images 𝑀 from 𝑋 such that: 𝑑 𝑇 𝑞 , 𝑇 𝑚 ≤ 𝑡, m ∈ 𝑀, where 𝑡 is a user-defined threshold. Instead of this, a user can ask the system to output, say, the top-𝑘 images. Content Based Image Retrieval (CBIR) 9
  • 10. CBIR, how does it works? Operation of CBIR system. Extracted from randomcodingtopics.files.wordpress.com/2012/05/rorissa_figure11.png 10 Online Offline
  • 11. What is an image? A matrix 𝑀 ∈ ℤℎ × 𝑤 ×𝑐 which integer values are in range [0, 255] Where ℎ is height 𝑤 is the width and 𝑐 indicates the number of channels Image from: https://ujwlkarn.files.wordpress.com/2016/08/8-gif.gif?w=192&h=192&zoom=2 11
  • 12. Where CBIR is used? Medicine Research: Search for unusual patterns in patient records. Diagnostics: similar cases can be found [5] Biodiversity Information Systems [6] users can have results with images of similar living beings that exists in a specific country or those who share the environment and have similar shape 12
  • 13. Where CBIR is used? Digital libraries [7] Forensic [8] The retrieval based on image content has been used in criminal images, fingerprints and scene images. 13
  • 14. Where CBIR is used? Online Shopping 14
  • 15. How did CBIR have changed through time? 15 Image from (Sheng, 2016) [9]
  • 16. Motivation 16 Semantic Gap “the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation” (Smeulders, 2000) [4] Viewpoint and pose Intra-class and Inter- class variability
  • 17. Objectives General ◎ Propose a CNN model to generate a feature based code for image indexing. 17 Specific ◎ Improve the current precision of retrieval in CBIR systems using a deep learning approach to generate codes for image indexing in a supervised manner. ◎ Analyze the influence of model depth in the mean average precision (MAP) of retrieval task. ◎ Evaluate generated code length influence in the MAP of retrieval task.
  • 19. Artificial Neural Network 19 Image from: http://cs231n.github.io/assets/nn1
  • 20. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Perceptron
  • 21. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Perceptron
  • 23. Convolutional Neural Network 23 A neural network with two special layers: ◎ Convolution ◎ Subsampling (pooling) Image from: https://www.researchgate.net/profile/Konstantin_Pervunin
  • 25. Convolution 25 Input (f) Kernel (h) Output (g) 3 3 2 1 0 0 0 1 3 1 3 1 2 2 3 2 0 0 2 2 2 0 0 0 1 𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 = 𝑘=−∞ ∞ 𝑙=−∞ ∞ 𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙 2 1 0 0 2 2 2 1 0 12 12 17 10 17 19 9 6 14
  • 26. Convolution 26 Input Output ‘full’, ‘valid’, ‘same’ Image from: http://deeplearning.net/software/theano/_images/numerical_no_padding_no_strides.gif 𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 = 𝑘=−∞ ∞ 𝑙=−∞ ∞ 𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙 outputDim = 𝑓−ℎ+2𝑃𝑎𝑑𝑑𝑖𝑛𝑔 𝑠𝑡𝑟𝑖𝑑𝑒 + 1
  • 28. Max-Pooling 28 Input Output 23 4 16 90 12 32 12 45 5 7 8 9 2 12 14 56 32 90 12 56 𝑦 𝑘𝑖𝑗 = max 𝑝,𝑞 ∈𝑅 𝑖𝑗 𝑥 𝑘𝑝𝑞
  • 29. Max-Pooling 29 Input Output 23 4 16 90 12 32 12 45 5 7 8 9 2 12 14 56 32 90 12 56 𝑦 𝑘𝑖𝑗 = max 𝑝,𝑞 ∈𝑅 𝑖𝑗 𝑥 𝑘𝑝𝑞
  • 30. Activation functions 30 sigmoid = 1 1+𝑒−𝑥 tanh = 𝑒 𝑥+𝑒−𝑥 𝑒 𝑥+𝑒−𝑥
  • 31. Rectified Linear Unit (ReLU) 31 𝑦𝑖𝑗 = max (𝑥𝑖𝑗, 0)
  • 32. ResNet, (He, 2016) [10] 32 Residual Block
  • 35. Deep Residual Hashing Neural Network (DRHN) 35 Residual Block 𝑅𝑒𝐿𝑈 𝑥𝑖𝑗 = max (𝑥𝑖𝑗, 0) 𝐸𝑙𝑒𝑚𝑒𝑛𝑡-𝑤𝑖𝑠𝑒 𝑎𝑑𝑑𝑖𝑡𝑖𝑜𝑛 = 𝐴𝑑𝑑(𝑥, 𝑧) = 𝑦𝑖𝑗 = 𝑥𝑖𝑗 + 𝑧𝑖𝑗 𝐵𝑎𝑡𝑐ℎ-𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 (Loffe, 2015) [11]= 𝐵𝑁𝛾,𝛽 𝑥 𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾2,𝛽2 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾1,𝛽1 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑥, 𝑤1 , 𝑤2 , 𝑥 𝑐𝑜𝑛𝑣 𝑓, ℎ = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 = 𝑘=−∞ ∞ 𝑙=−∞ ∞ 𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
  • 36. 36 DRHN Architecture Layer Output dim. Input 3x32x32 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑥, 𝑤1 16x32x32 Residual Group(n) 16x32x32 Residual Group(n)* 32x16x16 Residual Group(n)* 64x8x8 Residual Group(n) 64x8x8 Residual Group(n) 64x8x8 Residual Group(n)* 128x4x4 Average Pooling Layer 128 Hash Layer H Fully Connected Layer with Softmax 10
  • 37. 37 DRHN-1 Architecture Layer Input 𝑦 = 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑖𝑛𝑝𝑢𝑡, 𝑤1 𝑦1 = 𝐴𝑑𝑑 𝐵𝑁𝛾2,𝛽2 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾1,𝛽1 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤2 , 𝑤3 , 𝑦 𝑦2 = 𝐴𝑑𝑑 𝐵𝑁𝛾4,𝛽4 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾3,𝛽3 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦1, 𝑤4 , 𝑤5 , 𝑦1 𝑦3 = 𝐴𝑑𝑑 𝐵𝑁𝛾6,𝛽6 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾5,𝛽5 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦2, 𝑤6 , 𝑤7 , 𝑦2 𝑦4 = 𝐴𝑑𝑑 𝐵𝑁𝛾8,𝛽8 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾7,𝛽7 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦3, 𝑤8 , 𝑤9 , 𝑦3 𝑦5 = 𝐴𝑑𝑑 𝐵𝑁𝛾10,𝛽10 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾9,𝛽9 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦4, 𝑤10 , 𝑤11 , 𝑦4 𝑦6 = 𝐴𝑑𝑑 𝐵𝑁𝛾12,𝛽12 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾11,𝛽11 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦5, 𝑤12 , 𝑤13 , 𝑦5 Average Pooling Layer Hash Layer Fully Connected Layer with Softmax
  • 38. 38 Input 𝑦 = 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑖𝑛𝑝𝑢𝑡, 𝑤1 𝑦1 = 𝐴𝑑𝑑 𝐵𝑁𝛾2,𝛽2 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾1,𝛽1 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤2 , 𝑤3 , 𝑦 𝑦2 = 𝐴𝑑𝑑 𝐵𝑁𝛾4,𝛽4 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾3,𝛽3 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦1, 𝑤4 , 𝑤5 , 𝑦1 𝑦3 = 𝐴𝑑𝑑 𝐵𝑁𝛾6,𝛽6 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾5,𝛽5 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦2, 𝑤6 , 𝑤7 , 𝑦2 𝑦4 = 𝐴𝑑𝑑 𝐵𝑁𝛾8,𝛽8 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾7,𝛽7 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦3, 𝑤8 , 𝑤9 , 𝑦3 𝑦5 = 𝐴𝑑𝑑 𝐵𝑁𝛾10,𝛽10 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾9,𝛽9 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦4, 𝑤10 , 𝑤11 , 𝑦4 𝑦6 = 𝐴𝑑𝑑 𝐵𝑁𝛾12,𝛽12 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾11,𝛽11 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦5, 𝑤12 , 𝑤13 , 𝑦5 𝑦7 = 𝐴𝑑𝑑 𝐵𝑁𝛾14,𝛽14 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾13,𝛽13 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦6, 𝑤14 , 𝑤15 , 𝑦6 𝑦8 = 𝐴𝑑𝑑 𝐵𝑁𝛾16,𝛽16 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾15,𝛽15 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦7, 𝑤16 , 𝑤17 , 𝑦7 𝑦9 = 𝐴𝑑𝑑 𝐵𝑁𝛾18,𝛽18 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾17,𝛽17 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦8, 𝑤18 , 𝑤19 , 𝑦8 𝑦10 = 𝐴𝑑𝑑 𝐵𝑁𝛾20,𝛽20 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾19,𝛽19 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦9, 𝑤20 , 𝑤21 , 𝑦9 𝑦11 = 𝐴𝑑𝑑 𝐵𝑁𝛾22,𝛽22 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾21,𝛽21 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦10, 𝑤22 , 𝑤23 , 𝑦10 𝑦12 = 𝐴𝑑𝑑 𝐵𝑁𝛾24,𝛽24 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾23,𝛽23 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦11, 𝑤24 , 𝑤25 , 𝑦11 Average Pooling Layer Hash Layer DRHN-2 Architecture
  • 39. 39 DRHN-n Architecture Layer Input 𝑦 = 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑖𝑛𝑝𝑢𝑡, 𝑤 𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛 𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛 𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛 𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛 𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛 𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛 Average Pooling Layer Hash Layer Fully Connected Layer with Softmax
  • 40. 40 DRHN Softmax = 𝜎 𝑧 𝑗 = 𝑒 𝑧 𝑗 𝑘 𝐾 𝑒 𝑧 𝑘 for 𝑗 = 1, … , 𝐾. Training 𝐿𝑜𝑠𝑠𝑖 = −𝑡𝑖 log 𝑧𝑖
  • 42. DRHN Generating binary index 42 Hash Layer (𝐻) is a fully connected layer with a sigmoid activation function s: 𝐻 = 𝑠 𝑊⊤ 𝑥 + 𝑏 ∈ ℝℎ ◎ x ∈ ℝ 𝑑 is the input of the layer ◎ 𝑏 ∈ ℝℎ is a bias ◎ W ∈ ℝ 𝑑 × ℎ represent the weights that connect the units of 𝑥 with 𝐻.
  • 43. DRHN Generating binary index 43 Given an image 𝐼 ∈ ℝ 𝑐×𝑧×𝑤 ○ where 𝑐 represents the channels of the image ○ 𝑧 the height ○ 𝑤 the width the layers of the model from Input layer to 𝐻 performs the mapping from ℝ 𝑐×𝑧×𝑤 to ℝℎ. To obtain the binary code related to 𝐼 as described by (Lin, 2015) [12], we 1. Extract the output of 𝐻 2. Binarize the activation by a threshold to obtain the correspondent code. 3. For each element in 𝐻 we apply the sign function: 𝑠𝑖𝑔𝑛 𝐻 𝑖 = 1, 𝑖𝑓 𝐻 𝑖 > 0.5 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 44. DRHN Image Retrieval 44 ◎ 𝐼𝑀 = 𝐼1, 𝐼2, … , 𝐼 𝑛 , dataset with 𝑛 images ◎ 𝐼𝑀 𝐻 = 𝐻1, 𝐻2, … , 𝐻 𝑛 the corresponding binary codes ◎ Query image 𝐼 𝑞 ◎ Binary code of query image 𝐻 𝑞 Return the elements of 𝐼𝑀 where: 𝑑 𝐻 𝐻 𝑞, 𝐻𝑖 ≤ 𝑡 where 𝑑 𝐻 denotes Hamming distance 𝑡 is a user-defined threshold or the top-𝑘 if a specific 𝑘 number of images is required.
  • 47. Dataset CIFAR-10 47 ◎ 60,000 color images 32x32 pixels. ◎ 10 classes ◎ Training set 50,000 images ◎ Validation set 10,000 images
  • 48. Evaluation Metric for Classification 48 𝐴𝐶𝐶 = 𝑖=1 𝑁 𝑐𝑙 𝑦𝑖 𝑁 𝑐𝑙 𝑦𝑖 = 1, 𝑖𝑓 𝑦𝑖 == 𝑦𝑖 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 51. Evaluation Metrics for Image Retrieval 51 where 𝐿(𝑞) denotes query image 𝑞 label and 𝐿(𝑖) denotes 𝑖th ranked image label. 𝑃@𝐾 = 𝑖=1 𝑘 𝑋(𝑖) 𝑘 𝑋 𝑖 = 1, 𝑖𝑓 𝐿 𝑞 = 𝐿 𝑖 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 52. Evaluation Metrics for Image Retrieval 52 𝑚𝐴𝑃 = 1 |𝑄| 𝑖=1 |𝑄| 1 𝑚 𝑞 𝑘=1 𝑚 𝑞 𝑃@𝐾 where |𝑄| represent the number of queries and 𝑚 𝑞 is the number of result images for a given query 𝑞.
  • 53. Example of Evaluation 53 𝑞 Retrieved images 𝑃@𝐾 = 𝑖=1 𝑘 𝑋(𝑖) 𝑘 k = 1 𝑃@𝐾 = 1 1 = 1
  • 54. Example of Evaluation 54 𝑞 Retrieved images 𝑃@𝐾 = 𝑖=1 𝑘 𝑋(𝑖) 𝑘 k = 2 𝑃@𝐾 = 2 2 = 1
  • 55. Example of Evaluation 55 𝑞 Retrieved images 𝑃@𝐾 = 𝑖=1 𝑘 𝑋(𝑖) 𝑘 k = 3 𝑃@𝐾 = 2 3 = 0,66667
  • 56. Example of Evaluation 56 𝑞 Retrieved images 𝑃@𝐾 = 𝑖=1 𝑘 𝑋(𝑖) 𝑘 k = 4 𝑃@𝐾 = 3 4 = 0,75
  • 57. We compare against 57 Unsupervised ◎ Locality Sensitive Hashing (LSH) [13] ◎ Spectral Hashing (SH) [14] ◎ Iterative Quantization (ITQ) [15] Supervised ◎ Kernel Supervised Hashing (KSH) [16] ◎ Minimal Loss Hashing (MLH) [17] ◎ Binary Reconstruction Embedding (BRE) [18] ◎ ITQ-CCA [15] Deep Learning Approaches ◎ CNNH, CNNH+ [19] ◎ DNNH [20] ◎ DHN [21] ◎ CNNBH [22] ◎ DLBHC [12]
  • 58. Evaluation on CIFAR-10 58 • Validation set (10,000 images) as query images • Training set (50,000 images) as database images • 𝑘 = [1 … 1000] • Average 𝑃@𝐾 for all query images
  • 59. 59𝑃@𝐾 for 𝐾 = 1, … , 1000
  • 60. 60 Method 12 bits 32 bits 48 bits DRHN-15 92.65 92.23 92.91 DLBHC 89.3 89.72 89.73 CNNBH 53.2 61.0 61.7 DHN 55.5 60.3 62.1 DNNH 55.2 55.8 58.1 CNNH+ 46.5 52.1 53.2 CNNH 43.9 50.9 52.2 KSH 30.3 34.6 35.6 ITQ-CCA 26.4 28.8 29.5 LSH 12.1 12.0 12.0 MAP of different proposals in CIFAR-10
  • 61. 61 mAP comparison of different depth in our method on CIFAR-10 dataset
  • 62. 62 precision comparison of different depth in our method on CIFAR-10 dataset
  • 63. 63 Average time to generate the binary code for an image (ms)
  • 64. 64 Top 10 retrieved images from CIFAR-10 using DRHN-9 with different bit numbers.
  • 65. 65 Top 10 retrieved images for boat images in CIFAR-10 using DRHN-9 and h=[48, 128].
  • 67. 67 ◎We proposed a deep model of CNN based on the ResNet to generate binary codes in a supervised manner, allowing an efficient search of images based on their content. ◎This method takes fully advantage of data labels and raw data thanks to the properties of CNN. ◎Experimental results show that we can outperform several state-ofthe-art results on image retrieval on a popular dataset (CIFAR-10).
  • 68. 68 ◎The mAP in retrieval task increases about model depth. ◎ Having a short binary code leads to similar results in different image queries of same class ◎With an appropriate combination of model depth and binary code length we can obtain a good visual result.
  • 69. Future Work 69 ◎Train the model for retrieval task in more complex image datasets (with hundreds or thousands of classes) to measure the proposal’s scalability (ImageNet). ◎Experiment with supervised methods that are not based on deep learning, because of time and compute consumption.
  • 70. Publications 70 One contribution to ICANN 2017 (26 th International Conference on Artificial Neural Networks) entitled Deep Residual Hashing Network for Image Retrieval. It is included in the Springer book Artificial Neural Networks and Machine Learning - ICANN 2017, which belong to the series Lecture Notes in Computer Science (LNCS).
  • 72. 1. H. Tamura and N. Yokoya, “Image database systems: A survey”, Pattern recognition, vol. 17, no. 1, pp. 29–43, 1984. 2. L. Zheng, Y. Yang, and Q. Tian, “Sift meets cnn: A decade survey of instance retrieval”, arXiv preprint arXiv:1608.01807, 2016. 3. H.-W. Yoo, D.-S. Jang, S.-H. Jung, J.-H. Park, and K.-S. Song, “Visual information retrieval system via content-based approach”, Pattern Recognition, vol. 35, no. 3, pp. 749–769, 2002. 4. Arnold W.M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. 2000. Content-Based Image Retrieval at the End of the Early Years. IEEE Transactions on Pattern Analy-sis and Machine Intelligence 22, 12 (2000), 1349–1380. 5. H. Müller, N. Michoux, D. Bandon, and A. Geissbuhler, “A review of content based image retrieval systems in medical applications—clinical benefits and future directions”, International journal of medical informatics, vol. 73, no. 1, pp. 1–23, 2004 6. R. d. S. Torres, C. B. Medeiros, M. A. Gonçcalves, and E. A. Fox, “A digital library framework for biodiversity information systems”, International Journal on Digital Libraries, vol. 6, no. 1, pp. 3–17, 2006. 7. Zhu, B., Ramsey, M., & Chen, H. (2000). Creating a large-scale content- based airphoto image digital library. IEEE Transactions on Image Processing, 9(1), 163-167. 8. S. A. Gulhane, “Content based image retrieval from forensic image databases”, International Journal of engineering Research and Applications, vol. 1, no. 5, pp. 66–70 72
  • 73. 9. Zheng, L., Yang, Y., & Tian, Q. (2016). SIFT meets CNN: a decade survey of instance retrieval. arXiv preprint arXiv:1608.01807. 10. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778 11. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift”, arXiv preprint arXiv:1502.03167, 2015. 12. K. Lin, H.-F. Yang, J.-H. Hsiao, and C.-S. Chen, “Deep learning of binary hash codes for fast image retrieval”, in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2015, pp. 27–35. 13. A. Gionis, P. Indyk, R. Motwani, et al., “Similarity search in high dimensions via hashing”, in VLDB, vol. 99, 1999, pp. 518–529. 14. Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing”, in Advances in neural information processing systems, 2009, pp. 1753–1760. 15. Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, “Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2916–2929, 2013. 16. W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang, “Supervised hashing with kernels”, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2012, pp. 2074–2081 73
  • 74. 17. M. Norouzi and D. M. Blei, “Minimal loss hashing for compact binary codes”, in Proceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 353–360. 18. B. Kulis and T. Darrell, “Learning to hash with binary reconstructive embeddings”, in Advances in neural information processing systems, 2009, pp. 1042–1050 19. R. Xia, Y. Pan, H. Lai, C. Liu, and S. Yan, “Supervised hashing for image retrieval via image representation learning.”, in AAAI, vol. 1, 2014, p. 2 20. H. Lai, Y. Pan, Y. Liu, and S. Yan, “Simultaneous feature learning and hash coding with deep neural networks”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3270–3278. 21. H. Zhu, M. Long, J. Wang, and Y. Cao, “Deep hashing network for efficient similarity retrieval.”, in AAAI, 2016, pp. 2415–2421. 22. J.Guo and J.Li, “Cnn based hashing for image retrieval”, ArXiv preprint arXiv:1509.01354, 2015. 74
  • 75. Thanks! Any questions? You can find me at: eejimenez@gdl.cinvestav.mx edwinjimenezlepe lepe92
  • 76. “ What is research, but a blind date with knowledge. William Henry
  • 79. 2-D Convolution (1) 79 ◎ Input Image 𝑓 of size 𝑚 × 𝑛 where 𝑚 indicates rows and 𝑛 columns. ◎ Kernel or filter ℎ Let 𝑓 𝑚, 𝑛 = 0 if 𝑛 < 0, 𝑛 ≥ 𝑁 𝑚 < 0, 𝑚 ≥ 𝑀 𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 = 𝑘=−∞ ∞ 𝑙=−∞ ∞ 𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙 𝑔 𝑚, 𝑛 = 𝑘=0 𝑀−1 𝑙=0 𝑁−1 𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙 𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 = −∞ ∞ −∞ ∞ 𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙 𝑑𝑘 𝑑𝑙
  • 80. 2-D Convolution (2) 80 𝑔 𝑚, 𝑛 = 𝑘=0 𝑀−1 𝑙=0 𝑁−1 𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙 Example ℎ = ℎ[0,0] ℎ[0,1] ℎ[1,0] ℎ[1,1] Let h 𝑚, 𝑛 = 0 if 𝑚, 𝑛 > 1 𝑚, 𝑛 < 0 𝑓 = 𝑓[0,0] ⋯ 𝑓[0, 𝑁 − 1] ⋮ ⋱ ⋮ 𝑓[𝑀 − 1,0] ⋯ 𝑓[𝑀 − 1, 𝑁 − 1]
  • 81. 2-D Convolution (2) 81 ℎ = ℎ[0,0] ℎ[0,1] ℎ[1,0] ℎ[1,1]𝑓 = 𝑓[0,0] ⋯ 𝑓[0, 𝑁 − 1] ⋮ ⋱ ⋮ 𝑓[𝑀 − 1,0] ⋯ 𝑓[𝑀 − 1, 𝑁 − 1] 𝑔 0,0 = 𝑘=0 𝑀−1 𝑙=0 𝑁−1 𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 0 − 𝑙 = 𝑓 0,0 ℎ[0,0] 𝑔 0,1 = 𝑘=0 𝑀−1 𝑙=0 𝑁−1 𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 1 − 𝑙 = 𝑓 0,0 ℎ 0,1 + 𝑓 0,1 ℎ 0,0 𝑔 1,0 = 𝑘=0 𝑀−1 𝑙=0 𝑁−1 𝑓 𝑘, 𝑙 ℎ 1 − 𝑘, 0 − 𝑙 = 𝑓 0,0 ℎ 1,0 + 𝑓 1,0 ℎ 0,0
  • 82. 2-D Convolution (3) 82 𝑓 = 17 24 1 8 15 23 5 7 14 16 4 6 13 20 22 10 12 19 21 3 11 18 25 2 9 𝑔 0,0 = 𝑘=0 𝑀−1 𝑙=0 𝑁−1 𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 0 − 𝑙 = 17 ∗ 1 𝑔 0,1 = 𝑘=0 𝑀−1 𝑙=0 𝑁−1 𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 1 − 𝑙 = 17 ∗ 3 + 24 ∗ 1 𝑔 1,0 = 𝑘=0 𝑀−1 𝑙=0 𝑁−1 𝑓 𝑘, 𝑙 ℎ 1 − 𝑘, 0 − 𝑙 = 17 ∗ 0 + 23 ∗ 1 h= 1 3 1 0 5 0 2 1 2
  • 83. 2-D Convolution (3) 83 𝑓 = 17 24 1 8 15 23 5 7 14 16 4 6 13 20 22 10 12 19 21 3 11 18 25 2 9 𝑔 0,0 = 𝑘=0 𝑀−1 𝑙=0 𝑁−1 𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 0 − 𝑙 = 17 ∗ 1 𝑔 0,1 = 𝑘=0 𝑀−1 𝑙=0 𝑁−1 𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 1 − 𝑙 = 17 ∗ 3 + 24 ∗ 1 𝑔 1,0 = 𝑘=0 𝑀−1 𝑙=0 𝑁−1 𝑓 𝑘, 𝑙 ℎ 1 − 𝑘, 0 − 𝑙 = 17 ∗ 0 + 23 ∗ 1 h= 1 3 1 0 5 0 2 1 2
  • 85. CNN complexity 𝑂 𝑙=1 𝑑 𝑛𝑙−1 ∙ 𝑠𝑙 2 ∙ 𝑛𝑙 ∙ 𝑚𝑙 2 Where 𝑙 is the index of a convolutional layer, and 𝑑 is the depth (number of convolutional layers). 𝑛𝑙 is the number of filters (also know as “width”) in the 𝑙-th layer. 𝑛𝑙−1 is also known as the number of input channels of the 𝑙-th layer. 𝑠𝑙 is the spatial size (length) of the filter. 𝑚𝑙 is the spatial size of the output feature map. From Convolutional Neural Networks at Constrained Time Cost (Kaiming He) 85
  • 86. Why a CNN? ◎ Researchers have applied CNNs to enhance the performance of image retrieval lately, since the top layers of deep convolutional neural networks can provide a high-level descriptor of the visual content of images. (Fast content-based image retrieval using Convolutional Neural Network and Hash Function) 86
  • 87. Why a CNN? ◎ We need to extract the best features to retrieve ‘relevant’ images ◎ CNN models achieve state of the art results in image classification 87
  • 88. 88 Deep Residual Hashing Neural Network
  • 89. 89 ◎ 1.3 million high-resolution images ◎ 1000 different classes Image from: http://danielnouri.org Results of CNN on ImageNet
  • 90. Results of CNN on ImageNet 90 Image from https://www.quora.com/What-is-the-state-of-the-art-today-on-ImageNet-classification-In-other- words-has-anybody-beaten-Deep-Residual-Learning
  • 91. Dataset CIFAR-100 91 This dataset is just like the CIFAR-100, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).
  • 92. 92mAP comparison of different depth in our method on CIFAR-100 dataset.
  • 93. 93Image retrieval precision with 128 bits on CIFAR-100 dataset.

Editor's Notes

  1. 𝐹= 𝑓 1 , …, 𝑓 𝑁 where 𝑓 𝑖 is a feature vector associated with 𝑥 𝑖 and contains the relevant information required for measuring the similarity between images.
  2. Definition of an image or picture as composition of light that is captured by a camera, and a digital picture wich is a two dimensional matrix of pixels
  3. Digital libraries from the paper of springer, forensics, and add the example of amazon and aliexpress
  4. Digital libraries from the paper of springer, forensics, and add the example of amazon and aliexpress
  5. Digital libraries from the paper of springer, forensics, and add the example of amazon and aliexpress
  6. Digital libraries from the paper of springer, forensics, and add the example of amazon and aliexpress
  7. Sigmoid between 0 and 1, tanh between -1 and 1
  8. Sigmoid between 0 and 1, tanh between -1 and 1
  9. Batch Normalization is a method to reduce internal covariate shift in neural networks, first described in (1), leading to the possible usage of higher learning rates. In principle, the method adds an additional step between the layers, in which the output of the layer before is normalized.
  10. Researchers have applied cnns to enhance the performance of image retrieval lately, since the top layers of deep convolutional NN can provide a high-level descriptor of the visual content of images. Fast content-based image retrieval using CNN and hash function
  11. Researchers have applied cnns to enhance the performance of image retrieval lately, since the top layers of deep convolutional NN can provide a high-level descriptor of the visual content of images. Fast content-based image retrieval using CNN and hash function
  12. Researchers have applied cnns to enhance the performance of image retrieval lately, since the top layers of deep convolutional NN can provide a high-level descriptor of the visual content of images. Fast content-based image retrieval using CNN and hash function
  13. Researchers have applied cnns to enhance the performance of image retrieval lately, since the top layers of deep convolutional NN can provide a high-level descriptor of the visual content of images. Fast content-based image retrieval using CNN and hash function
  14. This method takes fully advantage of data labels (instead of requiring a triplet of images as other methods do)
  15. This method takes fully advantage of data labels (instead of requiring a triplet of images as other methods do)