Agenda
◎ Introduction
◎ Theoretical Framework
◎ Proposal
◎ Experimental Work and Results
◎ Conclusion and Future work
◎ Bibliography
2

46,200
posts
uploaded
1.8 million
snaps
created
4.1 million
videos
viewed
15,000 GIFs
sent via
messenger
70,017
hours
watched
In 2017, every minute
Source: http://www.visualcapitalist.com/happens-internet-minute-2017/
Instagram Youtube Facebook
Messenger
Netflix Snapchat

How do we search?
Results from search in Google
5
Text based (1970)
Search for a specific
label
dog, black and white, art
Manual annotation of images
is expensive and subjective (Tamura,
1984) [1]

Content based (1990)
Search for visual
features as texture
and color (Zheng, 2016) [2]
How do we search?
Results from search in Google
6

“
A picture is worth a thousand
words
7

According to (Yoo, 2002) [3]:
◎ 𝑋 = 𝑥1, … , 𝑥 𝑁 be a database with 𝑁 images
◎ 𝐹 = 𝑓1, … , 𝑓𝑁 where 𝑓𝑖 is a feature vector associated with 𝑥𝑖
Let 𝑇 represent a mapping from the image space onto the h-dimensional
feature space, 𝑓𝑖, i.e.,
𝑇 ∶ 𝑥 → 𝑓,
where 𝑥 ∈ 𝑋 and f ∈ 𝐹.
The similarity between two images 𝑥𝑖 and 𝑥𝑗 can be measured using
similarity function 𝑑 𝑓𝑖, 𝑓𝑗 .
Content Based Image Retrieval (CBIR)
8

The problem of retrieval can be posed as follows:
Given a query image 𝑞, retrieve a subset of images 𝑀 from 𝑋 such that:
𝑑 𝑇 𝑞 , 𝑇 𝑚 ≤ 𝑡, m ∈ 𝑀,
where 𝑡 is a user-defined threshold.
Instead of this, a user can ask the system to output, say, the top-𝑘 images.
Content Based Image Retrieval (CBIR)
9

CBIR, how does it works?
Operation of CBIR system. Extracted from
randomcodingtopics.files.wordpress.com/2012/05/rorissa_figure11.png
10
Online
Offline

What is an image?
A matrix
𝑀 ∈ ℤℎ × 𝑤 ×𝑐
which integer values
are in range [0, 255]
Where ℎ is height
𝑤 is the width
and 𝑐 indicates the number
of channels
Image from: https://ujwlkarn.files.wordpress.com/2016/08/8-gif.gif?w=192&h=192&zoom=2
11

Where CBIR is used?
Medicine
Research: Search for
unusual patterns in patient
records.
Diagnostics: similar cases
can be found [5]
Biodiversity Information Systems [6]
users can have results with images of similar
living beings that exists in a specific country or
those who share the environment and have
similar shape
12

Where CBIR is used?
Digital libraries [7]
Forensic [8]
The retrieval based on image content has
been used in criminal images, fingerprints
and scene images.
13

Where CBIR is used?
Online Shopping
14

How did CBIR have changed through time?
15
Image from (Sheng, 2016) [9]

Motivation
16
Semantic Gap
“the lack of coincidence between the information that
one can extract from the visual data and the
interpretation that the same data have for a user in a
given situation” (Smeulders, 2000) [4]
Viewpoint and pose Intra-class and Inter-
class variability

Objectives
General
◎ Propose a CNN model to generate a feature based code for image
indexing.
17
Specific
◎ Improve the current precision of retrieval in CBIR systems using a
deep learning approach to generate codes for image indexing in a
supervised manner.
◎ Analyze the influence of model depth in the mean average precision
(MAP) of retrieval task.
◎ Evaluate generated code length influence in the MAP of retrieval
task.

Artificial Neural Network
19
Image from: http://cs231n.github.io/assets/nn1

Artificial Neural Network
Image from: http://cs231n.github.io/assets/nn1
Perceptron

Perceptron
Image from: https://d4datascience.files.wordpress.com/2016/09/600px-artificialneuronmodel_english.png

Convolutional Neural Network
23
A neural network with two special layers:
◎ Convolution
◎ Subsampling (pooling)
Image from: https://www.researchgate.net/profile/Konstantin_Pervunin

Convolution
24
Image from: https://i1.wp.com/timdettmers.com/wp-
content/uploads/2015/03/convolution.png?resize=500%2C193

Convolution
25
Input (f) Kernel (h) Output (g)
3 3 2 1 0
0 0 1 3 1
3 1 2 2 3
2 0 0 2 2
2 0 0 0 1
𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 =
𝑘=−∞
∞
𝑙=−∞
∞
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
2 1 0
0 2 2
2 1 0
12 12 17
10 17 19
9 6 14

Convolution
26
Input Output
‘full’, ‘valid’, ‘same’
Image from: http://deeplearning.net/software/theano/_images/numerical_no_padding_no_strides.gif
𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 =
𝑘=−∞
∞
𝑙=−∞
∞
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
outputDim =
𝑓−ℎ+2𝑃𝑎𝑑𝑑𝑖𝑛𝑔
𝑠𝑡𝑟𝑖𝑑𝑒
+ 1

Max-Pooling
27
Image from: https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-
content/uploads/2017/06/28133544/pooling.png
Input Output

Max-Pooling
28
Input Output
23 4 16 90
12 32 12 45
5 7 8 9
2 12 14 56
32 90
12 56
𝑦 𝑘𝑖𝑗 = max
𝑝,𝑞 ∈𝑅 𝑖𝑗
𝑥 𝑘𝑝𝑞

Max-Pooling
29
Input Output
23 4 16 90
12 32 12 45
5 7 8 9
2 12 14 56
32 90
12 56
𝑦 𝑘𝑖𝑗 = max
𝑝,𝑞 ∈𝑅 𝑖𝑗
𝑥 𝑘𝑝𝑞

Activation functions
30
sigmoid =
1
1+𝑒−𝑥 tanh =
𝑒 𝑥+𝑒−𝑥
𝑒 𝑥+𝑒−𝑥

Rectified Linear Unit (ReLU)
31
𝑦𝑖𝑗 = max (𝑥𝑖𝑗, 0)

ResNet, (He, 2016) [10]
32
Residual Block

3.
Proposal
How to retrieve images
34

Deep Residual Hashing Neural Network
(DRHN)
35
Residual Block
𝑅𝑒𝐿𝑈 𝑥𝑖𝑗 = max (𝑥𝑖𝑗, 0)
𝐸𝑙𝑒𝑚𝑒𝑛𝑡-𝑤𝑖𝑠𝑒 𝑎𝑑𝑑𝑖𝑡𝑖𝑜𝑛 = 𝐴𝑑𝑑(𝑥, 𝑧) = 𝑦𝑖𝑗 = 𝑥𝑖𝑗 + 𝑧𝑖𝑗
𝐵𝑎𝑡𝑐ℎ-𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 (Loffe, 2015) [11]= 𝐵𝑁𝛾,𝛽 𝑥
𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾2,𝛽2
𝑐𝑜𝑛𝑣 𝐵𝑁𝛾1,𝛽1
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑥, 𝑤1 , 𝑤2 , 𝑥
𝑐𝑜𝑛𝑣 𝑓, ℎ = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 =
𝑘=−∞
∞
𝑙=−∞
∞
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙

36
DRHN Architecture
Layer Output dim.
Input 3x32x32
𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑥, 𝑤1
16x32x32
Residual Group(n) 16x32x32
Residual Group(n)* 32x16x16
Average Pooling Layer 128
Hash Layer H
Fully Connected Layer with Softmax 10

37
DRHN-1 Architecture
Layer
Input
𝑦 = 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑖𝑛𝑝𝑢𝑡, 𝑤1
𝑦1 = 𝐴𝑑𝑑 𝐵𝑁𝛾2,𝛽2
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤2 , 𝑤3 , 𝑦
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦1, 𝑤4 , 𝑤5 , 𝑦1
Average Pooling Layer
Hash Layer
Fully Connected Layer with Softmax

38
Input
𝑦 = 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑖𝑛𝑝𝑢𝑡, 𝑤1
𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤2 , 𝑤3 , 𝑦
Hash Layer
DRHN-2 Architecture

39
DRHN-n Architecture
Layer
Input
𝑦 = 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑖𝑛𝑝𝑢𝑡, 𝑤
𝑦 = 𝐴𝑑𝑑 𝐵𝑁𝛾,𝛽 𝑐𝑜𝑛𝑣 𝐵𝑁𝛾,𝛽 𝑅𝑒𝐿𝑈 𝑐𝑜𝑛𝑣 𝑦, 𝑤 , 𝑤 , 𝑦 × 𝑛
Hash Layer
Fully Connected Layer with Softmax

40
DRHN
Softmax = 𝜎 𝑧 𝑗 =
𝑒
𝑧 𝑗
𝑘
𝐾
𝑒 𝑧 𝑘
for 𝑗 = 1, … , 𝐾.
Training
𝐿𝑜𝑠𝑠𝑖 = −𝑡𝑖 log 𝑧𝑖

41
DRHN
Generating binary codes

DRHN Generating binary index
42
Hash Layer (𝐻) is a fully connected layer with a sigmoid activation
function s:
𝐻 = 𝑠 𝑊⊤ 𝑥 + 𝑏 ∈ ℝℎ
◎ x ∈ ℝ 𝑑 is the input of the layer
◎ 𝑏 ∈ ℝℎ
is a bias
◎ W ∈ ℝ 𝑑 × ℎ represent the weights that connect the units of 𝑥 with
𝐻.

DRHN Generating binary index
43
Given an image 𝐼 ∈ ℝ 𝑐×𝑧×𝑤
○ where 𝑐 represents the channels of the image
○ 𝑧 the height
○ 𝑤 the width
the layers of the model from Input layer to 𝐻 performs the mapping
from ℝ 𝑐×𝑧×𝑤 to ℝℎ.
To obtain the binary code related to 𝐼 as described by (Lin, 2015) [12], we
1. Extract the output of 𝐻
2. Binarize the activation by a threshold to obtain the correspondent
code.
3. For each element in 𝐻 we apply the sign function:
𝑠𝑖𝑔𝑛 𝐻 𝑖 =
1, 𝑖𝑓 𝐻 𝑖 > 0.5
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

DRHN Image Retrieval
44
◎ 𝐼𝑀 = 𝐼1, 𝐼2, … , 𝐼 𝑛 , dataset with 𝑛 images
◎ 𝐼𝑀 𝐻 = 𝐻1, 𝐻2, … , 𝐻 𝑛 the corresponding binary codes
◎ Query image 𝐼 𝑞
◎ Binary code of query image 𝐻 𝑞
Return the elements of 𝐼𝑀 where:
𝑑 𝐻 𝐻 𝑞, 𝐻𝑖 ≤ 𝑡
where 𝑑 𝐻 denotes Hamming distance
𝑡 is a user-defined threshold
or the top-𝑘 if a specific 𝑘 number of images is required.

4.
Experimental work
And results
46

Dataset CIFAR-10
47
◎ 60,000 color images
32x32 pixels.
◎ 10 classes
◎ Training set
50,000 images
◎ Validation set
10,000 images

Evaluation Metric for Classification
48
𝐴𝐶𝐶 =
𝑖=1
𝑁
𝑐𝑙 𝑦𝑖
𝑁
𝑐𝑙 𝑦𝑖 =
1, 𝑖𝑓 𝑦𝑖 == 𝑦𝑖

Training
49
Classification accuracy during training.

Training
50
Loss during training.

Evaluation Metrics for Image Retrieval
51
where 𝐿(𝑞) denotes query image 𝑞 label and 𝐿(𝑖) denotes 𝑖th ranked
image label.
𝑃@𝐾 =
𝑖=1
𝑘
𝑋(𝑖)
𝑘
𝑋 𝑖 =
1, 𝑖𝑓 𝐿 𝑞 = 𝐿 𝑖

Evaluation Metrics for Image Retrieval
52
𝑚𝐴𝑃 =
1
|𝑄|
𝑖=1
|𝑄|
1
𝑚 𝑞
𝑘=1
𝑚 𝑞
𝑃@𝐾
where |𝑄| represent the number of queries and 𝑚 𝑞 is the number of
result images for a given query 𝑞.

Example of Evaluation
53
𝑞
Retrieved
images
𝑃@𝐾 =
𝑖=1
𝑘
𝑋(𝑖)
𝑘
k = 1
𝑃@𝐾 =
1
1
= 1

54
𝑞
Retrieved
images
𝑃@𝐾 =
𝑖=1
𝑘
𝑋(𝑖)
𝑘
k = 2
𝑃@𝐾 =
2
2
= 1

55
𝑞
Retrieved
images
𝑃@𝐾 =
𝑖=1
𝑘
𝑋(𝑖)
𝑘
k = 3
𝑃@𝐾 =
2
3
= 0,66667

56
𝑞
Retrieved
images
𝑃@𝐾 =
𝑖=1
𝑘
𝑋(𝑖)
𝑘
k = 4
𝑃@𝐾 =
3
4
= 0,75

We compare against
57
Unsupervised
◎ Locality Sensitive Hashing (LSH) [13]
◎ Spectral Hashing (SH) [14]
◎ Iterative Quantization (ITQ) [15]
Supervised
◎ Kernel Supervised Hashing (KSH) [16]
◎ Minimal Loss Hashing (MLH) [17]
◎ Binary Reconstruction Embedding (BRE) [18]
◎ ITQ-CCA [15]
Deep Learning Approaches
◎ CNNH, CNNH+ [19]
◎ DNNH [20]
◎ DHN [21]
◎ CNNBH [22]
◎ DLBHC [12]

Evaluation on CIFAR-10
58
• Validation set (10,000 images) as query images
• Training set (50,000 images) as database images
• 𝑘 = [1 … 1000]
• Average 𝑃@𝐾 for all query images

59𝑃@𝐾 for 𝐾 = 1, … , 1000

60
Method 12 bits 32 bits 48 bits
DRHN-15 92.65 92.23 92.91
DLBHC 89.3 89.72 89.73
CNNBH 53.2 61.0 61.7
DHN 55.5 60.3 62.1
DNNH 55.2 55.8 58.1
CNNH+ 46.5 52.1 53.2
CNNH 43.9 50.9 52.2
KSH 30.3 34.6 35.6
ITQ-CCA 26.4 28.8 29.5
LSH 12.1 12.0 12.0
MAP of different proposals in CIFAR-10

61
mAP comparison of different depth in our method on CIFAR-10 dataset

62
precision comparison of different depth in our method on CIFAR-10 dataset

63
Average time to generate the binary code for an image (ms)

64
Top 10 retrieved images from CIFAR-10 using DRHN-9 with different bit numbers.

65
Top 10 retrieved images for boat images in CIFAR-10 using DRHN-9 and h=[48, 128].

5.
Conclusion
And future work
66

67
◎We proposed a deep model of CNN based on the
ResNet to generate binary codes in a supervised
manner, allowing an efficient search of images based
on their content.
◎This method takes fully advantage of data labels and
raw data thanks to the properties of CNN.
◎Experimental results show that we can outperform
several state-ofthe-art results on image retrieval on a
popular dataset (CIFAR-10).

68
◎The mAP in retrieval task increases about model depth.
◎ Having a short binary code leads to similar results in
different image queries of same class
◎With an appropriate combination of model depth and
binary code length we can obtain a good visual result.

Future Work
69
◎Train the model for retrieval task in more complex
image datasets (with hundreds or thousands of classes)
to measure the proposal’s scalability (ImageNet).
◎Experiment with supervised methods that are not
based on deep learning, because of time and compute
consumption.

Publications
70
One contribution to ICANN 2017 (26 th International
Conference on Artificial Neural Networks) entitled Deep
Residual Hashing Network for Image Retrieval.
It is included in the Springer book Artificial Neural
Networks and Machine Learning - ICANN 2017, which
belong to the series Lecture Notes in Computer Science
(LNCS).

1. H. Tamura and N. Yokoya, “Image database systems: A survey”, Pattern
recognition, vol. 17, no. 1, pp. 29–43, 1984.
2. L. Zheng, Y. Yang, and Q. Tian, “Sift meets cnn: A decade survey of
instance retrieval”, arXiv preprint arXiv:1608.01807, 2016.
3. H.-W. Yoo, D.-S. Jang, S.-H. Jung, J.-H. Park, and K.-S. Song, “Visual
information retrieval system via content-based approach”, Pattern
Recognition, vol. 35, no. 3, pp. 749–769, 2002.
4. Arnold W.M. Smeulders, Marcel Worring, Simone Santini, Amarnath
Gupta, and Ramesh Jain. 2000. Content-Based Image Retrieval at the
End of the Early Years. IEEE Transactions on Pattern Analy-sis and
Machine Intelligence 22, 12 (2000), 1349–1380.
5. H. Müller, N. Michoux, D. Bandon, and A. Geissbuhler, “A review of
content based image retrieval systems in medical applications—clinical
benefits and future directions”, International journal of medical
informatics, vol. 73, no. 1, pp. 1–23, 2004
6. R. d. S. Torres, C. B. Medeiros, M. A. Gonçcalves, and E. A. Fox, “A digital
library framework for biodiversity information systems”, International
Journal on Digital Libraries, vol. 6, no. 1, pp. 3–17, 2006.
7. Zhu, B., Ramsey, M., & Chen, H. (2000). Creating a large-scale content-
based airphoto image digital library. IEEE Transactions on Image
Processing, 9(1), 163-167.
8. S. A. Gulhane, “Content based image retrieval from forensic image
databases”, International Journal of engineering Research and
Applications, vol. 1, no. 5, pp. 66–70
72

9. Zheng, L., Yang, Y., & Tian, Q. (2016). SIFT meets CNN: a decade survey of
instance retrieval. arXiv preprint arXiv:1608.01807.
10. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition”, in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2016, pp. 770–778
11. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
network training by reducing internal covariate shift”, arXiv preprint
arXiv:1502.03167, 2015.
12. K. Lin, H.-F. Yang, J.-H. Hsiao, and C.-S. Chen, “Deep learning of binary
hash codes for fast image retrieval”, in Proceedings of the IEEE
conference on computer vision and pattern recognition workshops,
2015, pp. 27–35.
13. A. Gionis, P. Indyk, R. Motwani, et al., “Similarity search in high
dimensions via hashing”, in VLDB, vol. 99, 1999, pp. 518–529.
14. Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing”, in Advances in
neural information processing systems, 2009, pp. 1753–1760.
15. Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, “Iterative quantization:
A procrustean approach to learning binary codes for large-scale image
retrieval”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 35, no. 12, pp. 2916–2929, 2013.
16. W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang, “Supervised hashing
with kernels”, in Computer Vision and Pattern Recognition (CVPR), 2012
IEEE Conference on, IEEE, 2012, pp. 2074–2081
73

17. M. Norouzi and D. M. Blei, “Minimal loss hashing for compact binary
codes”, in Proceedings of the 28th international conference on machine
learning (ICML-11), 2011, pp. 353–360.
18. B. Kulis and T. Darrell, “Learning to hash with binary reconstructive
embeddings”, in Advances in neural information processing systems,
2009, pp. 1042–1050
19. R. Xia, Y. Pan, H. Lai, C. Liu, and S. Yan, “Supervised hashing for image
retrieval via image representation learning.”, in AAAI, vol. 1, 2014, p. 2
20. H. Lai, Y. Pan, Y. Liu, and S. Yan, “Simultaneous feature learning and
hash coding with deep neural networks”, in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2015, pp.
3270–3278.
21. H. Zhu, M. Long, J. Wang, and Y. Cao, “Deep hashing network for efﬁcient
similarity retrieval.”, in AAAI, 2016, pp. 2415–2421.
22. J.Guo and J.Li, “Cnn based hashing for image retrieval”, ArXiv preprint
arXiv:1509.01354, 2015.
74

Thanks!
Any questions?
You can find me at:
eejimenez@gdl.cinvestav.mx
edwinjimenezlepe
lepe92

“
What is research,
but a blind date with knowledge.
William Henry

2-D Convolution (1)
79
◎ Input Image 𝑓 of size 𝑚 × 𝑛 where 𝑚 indicates rows and 𝑛
columns.
◎ Kernel or filter ℎ
Let 𝑓 𝑚, 𝑛 = 0 if
𝑛 < 0, 𝑛 ≥ 𝑁
𝑚 < 0, 𝑚 ≥ 𝑀
𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 =
𝑘=−∞
∞
𝑙=−∞
∞
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
𝑔 𝑚, 𝑛 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
𝑔 𝑚, 𝑛 = 𝑓 𝑚, 𝑛 ∗ ℎ 𝑚, 𝑛 = −∞
∞
−∞
∞
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙 𝑑𝑘 𝑑𝑙

2-D Convolution (2)
80
𝑔 𝑚, 𝑛 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 𝑚 − 𝑘, 𝑛 − 𝑙
Example
ℎ =
ℎ[0,0] ℎ[0,1]
ℎ[1,0] ℎ[1,1]
Let h 𝑚, 𝑛 = 0 if
𝑚, 𝑛 > 1
𝑚, 𝑛 < 0
𝑓 =
𝑓[0,0] ⋯ 𝑓[0, 𝑁 − 1]
⋮ ⋱ ⋮
𝑓[𝑀 − 1,0] ⋯ 𝑓[𝑀 − 1, 𝑁 − 1]

2-D Convolution (2)
81
ℎ =
ℎ[0,0] ℎ[0,1]
ℎ[1,0] ℎ[1,1]𝑓 =
𝑓[0,0] ⋯ 𝑓[0, 𝑁 − 1]
⋮ ⋱ ⋮
𝑓[𝑀 − 1,0] ⋯ 𝑓[𝑀 − 1, 𝑁 − 1]
𝑔 0,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 0 − 𝑙 = 𝑓 0,0 ℎ[0,0]
𝑔 0,1 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 1 − 𝑙 = 𝑓 0,0 ℎ 0,1 + 𝑓 0,1 ℎ 0,0
𝑔 1,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 1 − 𝑘, 0 − 𝑙 = 𝑓 0,0 ℎ 1,0 + 𝑓 1,0 ℎ 0,0

2-D Convolution (3)
82
𝑓 =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
𝑔 0,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 0 − 𝑙 = 17 ∗ 1
𝑔 0,1 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 1 − 𝑙 = 17 ∗ 3 + 24 ∗ 1
𝑔 1,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 1 − 𝑘, 0 − 𝑙 = 17 ∗ 0 + 23 ∗ 1
h=
1 3 1
0 5 0
2 1 2

2-D Convolution (3)
83
𝑓 =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
𝑔 0,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 0 − 𝑙 = 17 ∗ 1
𝑔 0,1 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 0 − 𝑘, 1 − 𝑙 = 17 ∗ 3 + 24 ∗ 1
𝑔 1,0 =
𝑘=0
𝑀−1
𝑙=0
𝑁−1
𝑓 𝑘, 𝑙 ℎ 1 − 𝑘, 0 − 𝑙 = 17 ∗ 0 + 23 ∗ 1
h=
1 3 1
0 5 0
2 1 2

CNN complexity
𝑂
𝑙=1
𝑑
𝑛𝑙−1 ∙ 𝑠𝑙
2
∙ 𝑛𝑙 ∙ 𝑚𝑙
2
Where 𝑙 is the index of a convolutional layer, and 𝑑 is the depth
(number of convolutional layers).
𝑛𝑙 is the number of filters (also know as “width”) in the 𝑙-th layer.
𝑛𝑙−1 is also known as the number of input channels of the 𝑙-th layer.
𝑠𝑙 is the spatial size (length) of the filter.
𝑚𝑙 is the spatial size of the output feature map.
From Convolutional Neural Networks at Constrained Time Cost
(Kaiming He)
85

Why a CNN?
◎ Researchers have applied CNNs to enhance the
performance of image retrieval lately, since the top
layers of deep convolutional neural networks can
provide a high-level descriptor of the visual content of
images. (Fast content-based image retrieval using
Convolutional Neural Network and Hash Function)
86

Why a CNN?
◎ We need to extract the best features to retrieve
‘relevant’ images
◎ CNN models achieve state of the art results in image
classification
87

88
Deep Residual Hashing Neural Network

89
◎ 1.3 million high-resolution images
◎ 1000 different classes
Image from: http://danielnouri.org
Results of CNN on ImageNet

Results of CNN on ImageNet
90
Image from https://www.quora.com/What-is-the-state-of-the-art-today-on-ImageNet-classification-In-other-
words-has-anybody-beaten-Deep-Residual-Learning

Dataset CIFAR-100
91
This dataset is just like the CIFAR-100, except it has 100
classes containing 600 images each. There are 500
training images and 100 testing images per class. The 100
classes in the CIFAR-100 are grouped into 20 superclasses.
Each image comes with a "fine" label (the class to which it
belongs) and a "coarse" label (the superclass to which it
belongs).

92mAP comparison of different depth in our method on CIFAR-100 dataset.

93Image retrieval precision with 128 bits on CIFAR-100 dataset.

Deep Residual Hashing Neural Network for Image Retrieval

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Residual Hashing Neural Network for Image Retrieval

Similar to Deep Residual Hashing Neural Network for Image Retrieval (20)

Recently uploaded

Recently uploaded (20)

Deep Residual Hashing Neural Network for Image Retrieval

Editor's Notes