151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks

Perception and Intelligence Laboratory
Seoul
National
University
Sketch-based 3D Shape Retrievals
using Convolutional Neural Networks
Fang Wang, Le Kang, Yi Li
Junho Cho
15/11/06

• Sketch-based 3D Shape Retrieval using Convolutional Neural Networks
• CVPR 2015 oral
• Retrieving 3D models from 2D sketches
• http://users.cecs.anu.edu.au/~yili/
• Demo & code: http://users.cecs.anu.edu.au/~yili/cnnsbsr/
Perception and Intelligence Lab., Copyright © 2015 2
Introduction

• Sketch-based is easy input, yet rich enough to specify shapes.
• Early attempts: keywords, 3D shapes as queries
• Directly matching 2D sketches to 3D model is difficult
• Very different representations..
• Many methods 3D models projected to multiple 2D views.
• Then match sketch to one of its views.
• Variations in both sketch styles and 3D models  Low performance
Introduction and Early attempts

• Finding “Best views” for 3D model
• Ideally, one of viewpoint similar to query sketch
• Similarity score using Gabor, dense SIFT and GAILIF
• No guarantee of best views have similar viewpoints with sketches.
• Finding “Best view” is unsolved problem.
• Elusive definition
Early attempts

1. Propose to learn feature representations for sketch based shape
retrieval, bypass dilemma of “Best view” selection.
• Minimalism approach as opposed to multiple best views.
• Just chose 2 views randomly in 3D model with hypothesis.
• Still outperforms, showing features learned efficiently.
2. Two Siamese CNN to learn similarities in both
within-domain and cross domain.
• Sketches and views have distinctive intrinsic property.
• Two different CNN models for sketch and model view.
• Couples two input sources into the same target space.
3. Outperforms
Contributions

• CNN
• CNN effectively learn complicated mappings
from raw images to the target.
• Less domain knowledge.
(V.S. handcrafted features and shallow
learning frameworks.)
• Siamese Network
• Two identical sub-convolutional networks.
• Input as pairs of sample
• Similar input pairs  similar output vectors
• Dissimilar input pairs  dissimilar output vectors
• Used in a weakly supervised metric learning setting.
• Applied to text classification, speech feature
classification, Face verification.
CNN & Siamese Network

Learning a Similarity Measure Discriminatively Using a Siamese Network
(S. Chopra, R. Hadsell, and Y. Lecun, CVPR2005)

• Takes two samples, into separate but identical networks.
• Typical loss function defined over pairs.
• 𝑠1, 𝑠2 : two samples
• 𝑦 : binary similarity label. Same 0, Different 1
• 𝐷 𝑤 =∥ 𝑓 𝑠1; 𝑤1 − 𝑓 𝑠2; 𝑤2 ∥1 : distance
• Set 𝛼 =
1
𝐶 𝑝
, 𝛽 = 𝐶 𝑛, 𝛾 = −
2.77
𝐶 𝑛
, where 𝐶p = 0.2, Cn = 10
• Constants from Learning a Similarity Measure Discriminatively Using a Siamese Network
• Input pairs labeled as similar  Bring output vectors closer.
• Input pairs labeled as dissimilar  Push output vectors away.
• Back-propagated gradients computed individually on two sample sets
• Network updated by the average of two gradients
Basic Siamese CNN

𝑦=0 or 1
𝒚

Learning feature representations
for sketch based 3D shape retrieval

Illustrated example of Cross-domain matching
• Two domains: sketch & 3D models view
• (a) Mixed
• Learn correct mapping using pair similarities in each domain (s-s, v-v)
as well as their cross-domain relations (s-v) jointly.
• (b) Then two point sets correctly aligned in the feature space.
• After cross domain metric learning, matching can be performed in
sketch - sketch, view - view, sketch - view

• Basic Siamese Network used for samples from the same domain. (ex s-s, v-v)
• Cross domain setting, propose to extend it to two Siamese Networks,
one for view domain, one for sketch domain
• Define within-domain loss & cross domain loss
• Better performance! (compared to Basic Siamese Network)
Siamese Network for cross-domain matching

• Loss function newly defined
Sim of sketches Sim of views cross domain sim
• 𝑠1, 𝑣1: sketch and view in same class
• 𝑠2, 𝑣2: sketch and view in same class
• 𝑦 : Binary similarity label.
• L:
• Actually, do not use category labels in the framework.
• Possible description of desk? hand? face? doesn’t matter
𝐿(𝑠1, 𝑣2, 𝑦)

• Same network design for both networks, but learned separately.
• Unlikely to basic Siamese Network.

• Only two views from 3D models.
• Opposed to multiple views
• 2 was enough
• 1. Most of 3D models were up-right.
2. Two viewpoints randomly generated, angle difference larger than 45.
• Didn’t focus on best view. Comparing views are beyond the scope of paper.
• With chose viewpoint, generate 2D line drawings.
1. Closed boundaries
2. Suggestive Contours
D.Decarlo. Suggestive contours for conveying shape
View definitions and line drawing rendering

• PSB / SBSR dataset
• Widely used for 3D shape retrieval system evaluation
• 1814 3D models
• SBSR: 1814 hand drawn sketches collected using Amazon Mechanical Turk
• SHREC’13 & ‘14 dataset
• PSB Sketches in SBSR dataset not enough
• Imbalanced number of sketches for classes
• Can be biased
• 1258 models, each 80 instances.
• SHREC’14 greatly enlarged. 8987 3D models
• Very hard
• Models from various sources and arbitrarily oriented.
Datasets

1. Precision-recall curve
2. mAP
3. Nearest Neighbor(NN) used to measure top 1 retrieval accuracy
4. E-Measure(E):
harmonic mean of the precision and recall for the top 32 retrieval items
5. First/Second tier (FT/ST) and Discounted cumulated gain(DCG)
as defined in the PSB statistics
Evaluation criteria

• Generating pairs for Siamese network
• Reasonable proportion of similar and dissimilar pairs
• Dissimilar pairs 10x more than similar pairs for successful training.
• Randomly select 2 view pairs in the same category,
• 20 view samples from other categories
• Perform random paring for each training epoch.
• Data augmentation for sketch set
• Randomly performed affine transformations on sketch sample
• Generate more variations on sketches
• Two augmentation for each sketch sample.
Experimental Settings

• Siamese CNN based on Theano
• 2.8GHz CPU and GTX 780 GPU
• With preprocessed view features,
retrieval time about 2 ms on SHREC’13 dataset
• Training time proportional to total # of pairs and # of epochs.
• 2.5 h for PSB/SBSR, 6 h for SHREC’13
• No significant performance gain when increasing views. 210
• Increased computational cost, GPU memory.
• Two views are enough.
Computational cost

• Works very well on popular classes (human, face, plane)
• Fine grained categorizations are difficult to distinguish
• Shelf vs Box. only differ small.
• Semantic ambiguity is very hard
• Barn vs House. Differ in functionality.
• Importance of viewpoint is decreased in this approach.
• Plane, high degree of freedom, still retrieval results are excellent.
Results on PSB/SBSR dataset

• PSB/SBSR is very imbalanced dataset.
• 71 classes are not in training set, only in test set.
• Unseen classes can be retrievable?
Ex) Unseen
class
• Even work well on failure case. (ex. flower  potting plant)
• Demonstrates that it learns similarity effectively.

• Visualization of the learned features.
• PCA on the features into 2D
• Green dots: sketches
• Yellow: views.
• Similar shapes are grouped together
automatically.
• Animals, vehicles …
Results on SHREC

• Precision-recall curve
• Outperforms.
• 10% higher when small recall
• Curve decreases much slower  More stable
• 30% higher when recall reaches 1

• Noticeable over-fitting in training.
• Can be even better
• Standard metrics for comparison
• Performs better in every metric
• Also compared with basic Siamese
• Both sketches & views share the
same network
• Variations in two domains are
different! Confirmed
• Using same features(hand-crafted or
learned) for both domains are bad.

• Within-domain retrieval
• Already provided by dataset,
but recheck the method.
• View domain is more
consistent than sketch domain
• Inconsistency in sketch is
the most challenging issue.
• Powerful in learning features
for both within-domain and
cross-domain

• Proposed to learn feature presentations for sketch based 3D shape retrieval
• Instead of computing “best views”, use predefined viewpoints and
adopt two Siamese CNNs, one for views and one for sketches.
• Bypass dilemma of best view selection
• Experiments show the method is superior.
Conclusion

• 이거 자체가 learning cross domain similarities 로 해결해서 best views를 정
하는 이슈 없앰
• Minimalism approach as opposed to multiple best views.
• 3D model 자체가 다 위로 생김.
• 2개 이상 비교하는것보다 훨신 효과적임을 보임.
• 이건 결국 feature을 제대로 학습했음을 보여줌.
• Semantic level matching
• Comprehensive shape represenations. Rather than combination of shllow
features that only capture low level visual info

• Learn from CNN
• Use Siamese Network.
• Two input ousrces have distinctive intrinsic property  use 2 differenet CNN
models. Sketch and model.
• More power to capture different proporteis in idff domain.
• Loss function to align the results of two CNN models.
• Couples two input cources into the same targe tspace.
• Compare features directly using a simple distance function.
• Outperforms! Precision recall, NN1. retrievals in each domain effective, fast
computation based on filtering.

• Stopping criteria.
• All three of the datasets had been split into training and testing sets, but no
validation set was specified. Therefore, we terminated our algorithm after 50
epochs for PSB/SBSR and 20 for SHREC’13 dataset (or until convergence).
Multiple runs were performed and the mean values were reported.
Experimental Settings

Method dataset Measure 1 Measure 2 Measure 3 Measure 4
Baseline ABC 92 12 34 45
XXX ABC 32 32 54 76
YYY ABC 14 14 12 98
ZZZ ABC 32 23 32 67
Proposed ABC 14 42 41 87
Proposed (w.XX) ABC 32 15 35 67
Table example
Table Title (if you want it to place here)

Figure example
< Updated cells > < CNN architecture >
For highlight

151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks

Similar to 151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks (20)

More from Junho Cho

More from Junho Cho (7)

Recently uploaded

Recently uploaded (20)

151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks