Perception and Intelligence Laboratory
Seoul
National
University
Sketch-based 3D Shape Retrievals
using Convolutional Neural Networks
Fang Wang, Le Kang, Yi Li
Junho Cho
15/11/06
• Sketch-based 3D Shape Retrieval using Convolutional Neural Networks
• CVPR 2015 oral
• Retrieving 3D models from 2D sketches
• http://users.cecs.anu.edu.au/~yili/
• Demo & code: http://users.cecs.anu.edu.au/~yili/cnnsbsr/
Perception and Intelligence Lab., Copyright © 2015 2
Introduction
• Sketch-based is easy input, yet rich enough to specify shapes.
• Early attempts: keywords, 3D shapes as queries
• Directly matching 2D sketches to 3D model is difficult
• Very different representations..
• Many methods 3D models projected to multiple 2D views.
• Then match sketch to one of its views.
• Variations in both sketch styles and 3D models  Low performance
Perception and Intelligence Lab., Copyright © 2015 3
Introduction and Early attempts
• Finding “Best views” for 3D model
• Ideally, one of viewpoint similar to query sketch
• Similarity score using Gabor, dense SIFT and GAILIF
• No guarantee of best views have similar viewpoints with sketches.
• Finding “Best view” is unsolved problem.
• Elusive definition
Perception and Intelligence Lab., Copyright © 2015 4
Early attempts
1. Propose to learn feature representations for sketch based shape
retrieval, bypass dilemma of “Best view” selection.
• Minimalism approach as opposed to multiple best views.
• Just chose 2 views randomly in 3D model with hypothesis.
• Still outperforms, showing features learned efficiently.
2. Two Siamese CNN to learn similarities in both
within-domain and cross domain.
• Sketches and views have distinctive intrinsic property.
• Two different CNN models for sketch and model view.
• Couples two input sources into the same target space.
3. Outperforms
Perception and Intelligence Lab., Copyright © 2015 5
Contributions
Siamese CNN
• CNN
• CNN effectively learn complicated mappings
from raw images to the target.
• Less domain knowledge.
(V.S. handcrafted features and shallow
learning frameworks.)
• Siamese Network
• Two identical sub-convolutional networks.
• Input as pairs of sample
• Similar input pairs  similar output vectors
• Dissimilar input pairs  dissimilar output vectors
• Used in a weakly supervised metric learning setting.
• Applied to text classification, speech feature
classification, Face verification.
Perception and Intelligence Lab., Copyright © 2015 7
CNN & Siamese Network
Perception and Intelligence Lab., Copyright © 2015 8
Perception and Intelligence Lab., Copyright © 2015 9
Learning a Similarity Measure Discriminatively Using a Siamese Network
(S. Chopra, R. Hadsell, and Y. Lecun, CVPR2005)
• Takes two samples, into separate but identical networks.
• Typical loss function defined over pairs.
• 𝑠1, 𝑠2 : two samples
• 𝑦 : binary similarity label. Same 0, Different 1
• 𝐷 𝑤 =∥ 𝑓 𝑠1; 𝑤1 − 𝑓 𝑠2; 𝑤2 ∥1 : distance
• Set 𝛼 =
1
𝐶 𝑝
, 𝛽 = 𝐶 𝑛, 𝛾 = −
2.77
𝐶 𝑛
, where 𝐶p = 0.2, Cn = 10
• Constants from Learning a Similarity Measure Discriminatively Using a Siamese Network
• Input pairs labeled as similar  Bring output vectors closer.
• Input pairs labeled as dissimilar  Push output vectors away.
• Back-propagated gradients computed individually on two sample sets
• Network updated by the average of two gradients
Perception and Intelligence Lab., Copyright © 2015 10
Basic Siamese CNN
Perception and Intelligence Lab., Copyright © 2015 11
𝑦=0 or 1
𝒚
Learning feature representations
for sketch based 3D shape retrieval
Perception and Intelligence Lab., Copyright © 2015 13
Illustrated example of Cross-domain matching
• Two domains: sketch & 3D models view
• (a) Mixed
• Learn correct mapping using pair similarities in each domain (s-s, v-v)
as well as their cross-domain relations (s-v) jointly.
• (b) Then two point sets correctly aligned in the feature space.
• After cross domain metric learning, matching can be performed in
sketch - sketch, view - view, sketch - view
• Basic Siamese Network used for samples from the same domain. (ex s-s, v-v)
• Cross domain setting, propose to extend it to two Siamese Networks,
one for view domain, one for sketch domain
• Define within-domain loss & cross domain loss
• Better performance! (compared to Basic Siamese Network)
Perception and Intelligence Lab., Copyright © 2015 14
Siamese Network for cross-domain matching
• Loss function newly defined
Sim of sketches Sim of views cross domain sim
• 𝑠1, 𝑣1: sketch and view in same class
• 𝑠2, 𝑣2: sketch and view in same class
• 𝑦 : Binary similarity label.
• L:
• Actually, do not use category labels in the framework.
• Possible description of desk? hand? face? doesn’t matter
Perception and Intelligence Lab., Copyright © 2015 15
𝐿(𝑠1, 𝑣2, 𝑦)
• Same network design for both networks, but learned separately.
• Unlikely to basic Siamese Network.
Perception and Intelligence Lab., Copyright © 2015 16
• Only two views from 3D models.
• Opposed to multiple views
• 2 was enough
• 1. Most of 3D models were up-right.
2. Two viewpoints randomly generated, angle difference larger than 45.
• Didn’t focus on best view. Comparing views are beyond the scope of paper.
• With chose viewpoint, generate 2D line drawings.
1. Closed boundaries
2. Suggestive Contours
D.Decarlo. Suggestive contours for conveying shape
Perception and Intelligence Lab., Copyright © 2015 17
View definitions and line drawing rendering
Experiments
• PSB / SBSR dataset
• Widely used for 3D shape retrieval system evaluation
• 1814 3D models
• SBSR: 1814 hand drawn sketches collected using Amazon Mechanical Turk
• SHREC’13 & ‘14 dataset
• PSB Sketches in SBSR dataset not enough
• Imbalanced number of sketches for classes
• Can be biased
• 1258 models, each 80 instances.
• SHREC’14 greatly enlarged. 8987 3D models
• Very hard
• Models from various sources and arbitrarily oriented.
Perception and Intelligence Lab., Copyright © 2015 19
Datasets
1. Precision-recall curve
2. mAP
3. Nearest Neighbor(NN) used to measure top 1 retrieval accuracy
4. E-Measure(E):
harmonic mean of the precision and recall for the top 32 retrieval items
5. First/Second tier (FT/ST) and Discounted cumulated gain(DCG)
as defined in the PSB statistics
Perception and Intelligence Lab., Copyright © 2015 20
Evaluation criteria
• Generating pairs for Siamese network
• Reasonable proportion of similar and dissimilar pairs
• Dissimilar pairs 10x more than similar pairs for successful training.
• Randomly select 2 view pairs in the same category,
• 20 view samples from other categories
• Perform random paring for each training epoch.
• Data augmentation for sketch set
• Randomly performed affine transformations on sketch sample
• Generate more variations on sketches
• Two augmentation for each sketch sample.
Perception and Intelligence Lab., Copyright © 2015 21
Experimental Settings
• Siamese CNN based on Theano
• 2.8GHz CPU and GTX 780 GPU
• With preprocessed view features,
retrieval time about 2 ms on SHREC’13 dataset
• Training time proportional to total # of pairs and # of epochs.
• 2.5 h for PSB/SBSR, 6 h for SHREC’13
• No significant performance gain when increasing views. 210
• Increased computational cost, GPU memory.
• Two views are enough.
Perception and Intelligence Lab., Copyright © 2015 22
Computational cost
Results & Conclusion
• Works very well on popular classes (human, face, plane)
• Fine grained categorizations are difficult to distinguish
• Shelf vs Box. only differ small.
• Semantic ambiguity is very hard
• Barn vs House. Differ in functionality.
• Importance of viewpoint is decreased in this approach.
• Plane, high degree of freedom, still retrieval results are excellent.
Perception and Intelligence Lab., Copyright © 2015 24
Results on PSB/SBSR dataset
• PSB/SBSR is very imbalanced dataset.
• 71 classes are not in training set, only in test set.
• Unseen classes can be retrievable?
Ex) Unseen
class
• Even work well on failure case. (ex. flower  potting plant)
• Demonstrates that it learns similarity effectively.
Perception and Intelligence Lab., Copyright © 2015 25
• Visualization of the learned features.
• PCA on the features into 2D
• Green dots: sketches
• Yellow: views.
• Similar shapes are grouped together
automatically.
• Animals, vehicles …
Perception and Intelligence Lab., Copyright © 2015 26
Results on SHREC
Perception and Intelligence Lab., Copyright © 2015 27
Perception and Intelligence Lab., Copyright © 2015 28
Perception and Intelligence Lab., Copyright © 2015 29
Perception and Intelligence Lab., Copyright © 2015 30
• Precision-recall curve
• Outperforms.
• 10% higher when small recall
• Curve decreases much slower  More stable
• 30% higher when recall reaches 1
Perception and Intelligence Lab., Copyright © 2015 31
• Noticeable over-fitting in training.
• Can be even better
• Standard metrics for comparison
• Performs better in every metric
• Also compared with basic Siamese
• Both sketches & views share the
same network
• Variations in two domains are
different! Confirmed
• Using same features(hand-crafted or
learned) for both domains are bad.
Perception and Intelligence Lab., Copyright © 2015 32
• Within-domain retrieval
• Already provided by dataset,
but recheck the method.
• View domain is more
consistent than sketch domain
• Inconsistency in sketch is
the most challenging issue.
• Powerful in learning features
for both within-domain and
cross-domain
Perception and Intelligence Lab., Copyright © 2015 33
• Proposed to learn feature presentations for sketch based 3D shape retrieval
• Instead of computing “best views”, use predefined viewpoints and
adopt two Siamese CNNs, one for views and one for sketches.
• Bypass dilemma of best view selection
• Experiments show the method is superior.
Perception and Intelligence Lab., Copyright © 2015 34
Conclusion
Thank you
Chapter 01.
• 이거 자체가 learning cross domain similarities 로 해결해서 best views를 정
하는 이슈 없앰
• Minimalism approach as opposed to multiple best views.
• 3D model 자체가 다 위로 생김.
• 2개 이상 비교하는것보다 훨신 효과적임을 보임.
• 이건 결국 feature을 제대로 학습했음을 보여줌.
• Semantic level matching
• Comprehensive shape represenations. Rather than combination of shllow
features that only capture low level visual info
Perception and Intelligence Lab., Copyright © 2015 36
• Learn from CNN
• Use Siamese Network.
• Two input ousrces have distinctive intrinsic property  use 2 differenet CNN
models. Sketch and model.
• More power to capture different proporteis in idff domain.
• Loss function to align the results of two CNN models.
• Couples two input cources into the same targe tspace.
• Compare features directly using a simple distance function.
• Outperforms! Precision recall, NN1. retrievals in each domain effective, fast
computation based on filtering.
Perception and Intelligence Lab., Copyright © 2015 37
• Stopping criteria.
• All three of the datasets had been split into training and testing sets, but no
validation set was specified. Therefore, we terminated our algorithm after 50
epochs for PSB/SBSR and 20 for SHREC’13 dataset (or until convergence).
Multiple runs were performed and the mean values were reported.
Perception and Intelligence Lab., Copyright © 2015 38
Experimental Settings
Method dataset Measure 1 Measure 2 Measure 3 Measure 4
Baseline ABC 92 12 34 45
XXX ABC 32 32 54 76
YYY ABC 14 14 12 98
ZZZ ABC 32 23 32 67
Proposed ABC 14 42 41 87
Proposed (w.XX) ABC 32 15 35 67
Perception and Intelligence Lab., Copyright © 2015 39
Table example
Table Title (if you want it to place here)
Perception and Intelligence Lab., Copyright © 2015 40
Figure example
< Updated cells > < CNN architecture >
For highlight

151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks

  • 1.
    Perception and IntelligenceLaboratory Seoul National University Sketch-based 3D Shape Retrievals using Convolutional Neural Networks Fang Wang, Le Kang, Yi Li Junho Cho 15/11/06
  • 2.
    • Sketch-based 3DShape Retrieval using Convolutional Neural Networks • CVPR 2015 oral • Retrieving 3D models from 2D sketches • http://users.cecs.anu.edu.au/~yili/ • Demo & code: http://users.cecs.anu.edu.au/~yili/cnnsbsr/ Perception and Intelligence Lab., Copyright © 2015 2 Introduction
  • 3.
    • Sketch-based iseasy input, yet rich enough to specify shapes. • Early attempts: keywords, 3D shapes as queries • Directly matching 2D sketches to 3D model is difficult • Very different representations.. • Many methods 3D models projected to multiple 2D views. • Then match sketch to one of its views. • Variations in both sketch styles and 3D models  Low performance Perception and Intelligence Lab., Copyright © 2015 3 Introduction and Early attempts
  • 4.
    • Finding “Bestviews” for 3D model • Ideally, one of viewpoint similar to query sketch • Similarity score using Gabor, dense SIFT and GAILIF • No guarantee of best views have similar viewpoints with sketches. • Finding “Best view” is unsolved problem. • Elusive definition Perception and Intelligence Lab., Copyright © 2015 4 Early attempts
  • 5.
    1. Propose tolearn feature representations for sketch based shape retrieval, bypass dilemma of “Best view” selection. • Minimalism approach as opposed to multiple best views. • Just chose 2 views randomly in 3D model with hypothesis. • Still outperforms, showing features learned efficiently. 2. Two Siamese CNN to learn similarities in both within-domain and cross domain. • Sketches and views have distinctive intrinsic property. • Two different CNN models for sketch and model view. • Couples two input sources into the same target space. 3. Outperforms Perception and Intelligence Lab., Copyright © 2015 5 Contributions
  • 6.
  • 7.
    • CNN • CNNeffectively learn complicated mappings from raw images to the target. • Less domain knowledge. (V.S. handcrafted features and shallow learning frameworks.) • Siamese Network • Two identical sub-convolutional networks. • Input as pairs of sample • Similar input pairs  similar output vectors • Dissimilar input pairs  dissimilar output vectors • Used in a weakly supervised metric learning setting. • Applied to text classification, speech feature classification, Face verification. Perception and Intelligence Lab., Copyright © 2015 7 CNN & Siamese Network
  • 8.
    Perception and IntelligenceLab., Copyright © 2015 8
  • 9.
    Perception and IntelligenceLab., Copyright © 2015 9 Learning a Similarity Measure Discriminatively Using a Siamese Network (S. Chopra, R. Hadsell, and Y. Lecun, CVPR2005)
  • 10.
    • Takes twosamples, into separate but identical networks. • Typical loss function defined over pairs. • 𝑠1, 𝑠2 : two samples • 𝑦 : binary similarity label. Same 0, Different 1 • 𝐷 𝑤 =∥ 𝑓 𝑠1; 𝑤1 − 𝑓 𝑠2; 𝑤2 ∥1 : distance • Set 𝛼 = 1 𝐶 𝑝 , 𝛽 = 𝐶 𝑛, 𝛾 = − 2.77 𝐶 𝑛 , where 𝐶p = 0.2, Cn = 10 • Constants from Learning a Similarity Measure Discriminatively Using a Siamese Network • Input pairs labeled as similar  Bring output vectors closer. • Input pairs labeled as dissimilar  Push output vectors away. • Back-propagated gradients computed individually on two sample sets • Network updated by the average of two gradients Perception and Intelligence Lab., Copyright © 2015 10 Basic Siamese CNN
  • 11.
    Perception and IntelligenceLab., Copyright © 2015 11 𝑦=0 or 1 𝒚
  • 12.
    Learning feature representations forsketch based 3D shape retrieval
  • 13.
    Perception and IntelligenceLab., Copyright © 2015 13 Illustrated example of Cross-domain matching • Two domains: sketch & 3D models view • (a) Mixed • Learn correct mapping using pair similarities in each domain (s-s, v-v) as well as their cross-domain relations (s-v) jointly. • (b) Then two point sets correctly aligned in the feature space. • After cross domain metric learning, matching can be performed in sketch - sketch, view - view, sketch - view
  • 14.
    • Basic SiameseNetwork used for samples from the same domain. (ex s-s, v-v) • Cross domain setting, propose to extend it to two Siamese Networks, one for view domain, one for sketch domain • Define within-domain loss & cross domain loss • Better performance! (compared to Basic Siamese Network) Perception and Intelligence Lab., Copyright © 2015 14 Siamese Network for cross-domain matching
  • 15.
    • Loss functionnewly defined Sim of sketches Sim of views cross domain sim • 𝑠1, 𝑣1: sketch and view in same class • 𝑠2, 𝑣2: sketch and view in same class • 𝑦 : Binary similarity label. • L: • Actually, do not use category labels in the framework. • Possible description of desk? hand? face? doesn’t matter Perception and Intelligence Lab., Copyright © 2015 15 𝐿(𝑠1, 𝑣2, 𝑦)
  • 16.
    • Same networkdesign for both networks, but learned separately. • Unlikely to basic Siamese Network. Perception and Intelligence Lab., Copyright © 2015 16
  • 17.
    • Only twoviews from 3D models. • Opposed to multiple views • 2 was enough • 1. Most of 3D models were up-right. 2. Two viewpoints randomly generated, angle difference larger than 45. • Didn’t focus on best view. Comparing views are beyond the scope of paper. • With chose viewpoint, generate 2D line drawings. 1. Closed boundaries 2. Suggestive Contours D.Decarlo. Suggestive contours for conveying shape Perception and Intelligence Lab., Copyright © 2015 17 View definitions and line drawing rendering
  • 18.
  • 19.
    • PSB /SBSR dataset • Widely used for 3D shape retrieval system evaluation • 1814 3D models • SBSR: 1814 hand drawn sketches collected using Amazon Mechanical Turk • SHREC’13 & ‘14 dataset • PSB Sketches in SBSR dataset not enough • Imbalanced number of sketches for classes • Can be biased • 1258 models, each 80 instances. • SHREC’14 greatly enlarged. 8987 3D models • Very hard • Models from various sources and arbitrarily oriented. Perception and Intelligence Lab., Copyright © 2015 19 Datasets
  • 20.
    1. Precision-recall curve 2.mAP 3. Nearest Neighbor(NN) used to measure top 1 retrieval accuracy 4. E-Measure(E): harmonic mean of the precision and recall for the top 32 retrieval items 5. First/Second tier (FT/ST) and Discounted cumulated gain(DCG) as defined in the PSB statistics Perception and Intelligence Lab., Copyright © 2015 20 Evaluation criteria
  • 21.
    • Generating pairsfor Siamese network • Reasonable proportion of similar and dissimilar pairs • Dissimilar pairs 10x more than similar pairs for successful training. • Randomly select 2 view pairs in the same category, • 20 view samples from other categories • Perform random paring for each training epoch. • Data augmentation for sketch set • Randomly performed affine transformations on sketch sample • Generate more variations on sketches • Two augmentation for each sketch sample. Perception and Intelligence Lab., Copyright © 2015 21 Experimental Settings
  • 22.
    • Siamese CNNbased on Theano • 2.8GHz CPU and GTX 780 GPU • With preprocessed view features, retrieval time about 2 ms on SHREC’13 dataset • Training time proportional to total # of pairs and # of epochs. • 2.5 h for PSB/SBSR, 6 h for SHREC’13 • No significant performance gain when increasing views. 210 • Increased computational cost, GPU memory. • Two views are enough. Perception and Intelligence Lab., Copyright © 2015 22 Computational cost
  • 23.
  • 24.
    • Works verywell on popular classes (human, face, plane) • Fine grained categorizations are difficult to distinguish • Shelf vs Box. only differ small. • Semantic ambiguity is very hard • Barn vs House. Differ in functionality. • Importance of viewpoint is decreased in this approach. • Plane, high degree of freedom, still retrieval results are excellent. Perception and Intelligence Lab., Copyright © 2015 24 Results on PSB/SBSR dataset
  • 25.
    • PSB/SBSR isvery imbalanced dataset. • 71 classes are not in training set, only in test set. • Unseen classes can be retrievable? Ex) Unseen class • Even work well on failure case. (ex. flower  potting plant) • Demonstrates that it learns similarity effectively. Perception and Intelligence Lab., Copyright © 2015 25
  • 26.
    • Visualization ofthe learned features. • PCA on the features into 2D • Green dots: sketches • Yellow: views. • Similar shapes are grouped together automatically. • Animals, vehicles … Perception and Intelligence Lab., Copyright © 2015 26 Results on SHREC
  • 27.
    Perception and IntelligenceLab., Copyright © 2015 27
  • 28.
    Perception and IntelligenceLab., Copyright © 2015 28
  • 29.
    Perception and IntelligenceLab., Copyright © 2015 29
  • 30.
    Perception and IntelligenceLab., Copyright © 2015 30
  • 31.
    • Precision-recall curve •Outperforms. • 10% higher when small recall • Curve decreases much slower  More stable • 30% higher when recall reaches 1 Perception and Intelligence Lab., Copyright © 2015 31
  • 32.
    • Noticeable over-fittingin training. • Can be even better • Standard metrics for comparison • Performs better in every metric • Also compared with basic Siamese • Both sketches & views share the same network • Variations in two domains are different! Confirmed • Using same features(hand-crafted or learned) for both domains are bad. Perception and Intelligence Lab., Copyright © 2015 32
  • 33.
    • Within-domain retrieval •Already provided by dataset, but recheck the method. • View domain is more consistent than sketch domain • Inconsistency in sketch is the most challenging issue. • Powerful in learning features for both within-domain and cross-domain Perception and Intelligence Lab., Copyright © 2015 33
  • 34.
    • Proposed tolearn feature presentations for sketch based 3D shape retrieval • Instead of computing “best views”, use predefined viewpoints and adopt two Siamese CNNs, one for views and one for sketches. • Bypass dilemma of best view selection • Experiments show the method is superior. Perception and Intelligence Lab., Copyright © 2015 34 Conclusion
  • 35.
  • 36.
    • 이거 자체가learning cross domain similarities 로 해결해서 best views를 정 하는 이슈 없앰 • Minimalism approach as opposed to multiple best views. • 3D model 자체가 다 위로 생김. • 2개 이상 비교하는것보다 훨신 효과적임을 보임. • 이건 결국 feature을 제대로 학습했음을 보여줌. • Semantic level matching • Comprehensive shape represenations. Rather than combination of shllow features that only capture low level visual info Perception and Intelligence Lab., Copyright © 2015 36
  • 37.
    • Learn fromCNN • Use Siamese Network. • Two input ousrces have distinctive intrinsic property  use 2 differenet CNN models. Sketch and model. • More power to capture different proporteis in idff domain. • Loss function to align the results of two CNN models. • Couples two input cources into the same targe tspace. • Compare features directly using a simple distance function. • Outperforms! Precision recall, NN1. retrievals in each domain effective, fast computation based on filtering. Perception and Intelligence Lab., Copyright © 2015 37
  • 38.
    • Stopping criteria. •All three of the datasets had been split into training and testing sets, but no validation set was specified. Therefore, we terminated our algorithm after 50 epochs for PSB/SBSR and 20 for SHREC’13 dataset (or until convergence). Multiple runs were performed and the mean values were reported. Perception and Intelligence Lab., Copyright © 2015 38 Experimental Settings
  • 39.
    Method dataset Measure1 Measure 2 Measure 3 Measure 4 Baseline ABC 92 12 34 45 XXX ABC 32 32 54 76 YYY ABC 14 14 12 98 ZZZ ABC 32 23 32 67 Proposed ABC 14 42 41 87 Proposed (w.XX) ABC 32 15 35 67 Perception and Intelligence Lab., Copyright © 2015 39 Table example Table Title (if you want it to place here)
  • 40.
    Perception and IntelligenceLab., Copyright © 2015 40 Figure example < Updated cells > < CNN architecture > For highlight