SlideShare a Scribd company logo
1 of 37
Download to read offline
Cross-domain Complementary Learning
with Synthetic Data for Multi-Person
Part Segmentation
Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun
University of Washington, Seattle
Microsoft, Redmond
International Conference on Computer Vision (ICCV), Demonstration, 2019
1
Outline
• Introduction
• Related works
• Proposed method
• Experiments
• On-going work and Conclusion
2
Human part segmentation
• Human part segmentation aims at partitioning persons in the image
to multiple semantically consistent regions.
Typically 14 parts: Head, torso, left upper-arm, right upper-arm, left lower-arm, right lower-
arm, left hand, right hand, left thigh, right thigh, left shank, right shank, left foot, right foot
Input Image Part Segmentation
3
Challenges
• Training data labeling in pixel-level is very expensive and labor intensive.
4
Previous works
• People have been exploring synthetic data as an alternative.
• They trained deep CNN using the synthetic data.
Samples of the synthetic training data and the synthetic labels [CVPR17]
5
Previous works
Their method works well only on the well-controlled, single-person
scenario.
Learning from Synthetic Humans, CVPR 2017
Input
images
Output
results
6
The domain gap
• The discrepancy of pixel-value distributions between the
synthetic and real data makes transferring the knowledge
from the synthetic to real domain challenging.
Synthetic image Real images 7
Related works on street-view segmentation
• People are also trying to use graphics simulation for training a
segmentation model for street-view images.
• They also observe the domain-gap issue.
Zhang et al, Fully Convolutional Adaptation Networks for Semantic Segmentation, CVPR 2018.
8
Related works on street-view segmentation
• Previous studies tried to address the domain-gap issue by using
adversarial training.
• They use a discriminator to distinguish whether the input is from the
source or target domain.
[Tsai et al, ICCV2019], [Tsai et al, CVPR 2018], [Ren et al, CVPR2018], [Tzeng et al, CVPR2017], [Ganin et al, ICML2015]
Graphics simulation
Real-world images
9
Challenges
• Can we learn human part segmentation without data labeling?
• How to learn human part segmentation from graphics simulations,
and make the resulting model work well on real world scenario?
We propose a new approach, named cross-domain complementary
learning (CDCL) to address the challenges.
10
Our multi-person synthetic data
• We create a new multi-person synthetic dataset which contains multiple
persons performing various actions in a 3D room.
11
The idea
•We observe that real and synthetic humans both have
a skeleton (pose) representation.
12
Proposed method
• We propose to bridge the domains with skeletons and learn part
segmentation from synthetic data.
13
Proposed network: Module 1
Backbone
(ResNet101)
Part Affinity Fields
Keypoint Maps
Skeletons
Real Inputs
Head networks
The network architecture is similar to “Realtime Multi-Person
2D Pose Estimation using Part Affinity Fields,” in CVPR 2017.
14
Proposed network: Module 2
Backbone
(ResNet101)
Head networks
Keypoint Maps
Body Part Maps
Skeletons
Body Part
Segmentation
Synthetic Inputs
Part Affinity
Fields
15
Two modules are trained interchangeably
Backbone
(ResNet101)
Head networks
Keypoint Maps
Body Part Maps
Skeletons
Body Part
Segmentation
Backbone
(ResNet101)
Part Affinity
Fields
Keypoint Maps
Skeletons
Parameter Sharing
Synthetic Inputs
Real Inputs Head networks
Module 2
Module 1
Part Affinity
Fields
16
• Intersection over Union (IoU) is one of the most commonly used
metrics in semantic segmentation.
• IoU is calculated for each body part category separately.
• We average over all categories to provide a mean IoU.
Evaluation metric
IoU =
𝐴𝑟𝑒𝑎 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛
𝐴𝑒𝑟𝑎 𝑜𝑓 𝑈𝑛𝑖𝑜𝑛
𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 ∩ 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ
𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 ∪ 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ
17
Evaluation benchmarks
• Pascal-Person-Parts dataset
• 1716 training images
• 1817 test images
• COCO-DensePose dataset
• 26151 training images
• 1508 test images
18
Comparison on Pascal and COCO (mIOU, %)
Synthetic
Only
Adversarial
Training
Fang et al
CVPR18
OursChen et al
TPAMI18
Gong et al
CVPR17
Ours +
Real part labels
Use real part labels Use additional real
part labels
Ideal
19
Comparison on Pascal and COCO (mIOU, %)
Synthetic
Only
Adversarial
Training
Fang et al
CVPR18
OursChen et al
TPAMI18
Gong et al
CVPR17
Ours +
Real part labels
Use real part labels Use additional real
part labels
Ideal
Performance
Gap
20
Comparison on Pascal and COCO (mIOU, %)
Synthetic
Only
Adversarial
Training
Fang et al
CVPR18
OursChen et al
TPAMI18
Gong et al
CVPR17
Ours +
Real part labels
Use real part labels Use additional real
part labels
Ideal
Performance
Gap
21
Comparison on Pascal and COCO (mIOU, %)
Synthetic
Only
Adversarial
Training
Fang et al
CVPR18
OursChen et al
TPAMI18
Gong et al
CVPR17
Ours +
Real part labels
Use real part labels Use additional real
part labels
Ideal
Relax labeling
requirements!
22
Comparison on Pascal and COCO (mIOU, %)
Synthetic
Only
Adversarial
Training
Fang et al
CVPR18
OursChen et al
TPAMI18
Gong et al
CVPR17
Ours +
Real part labels
Use real part labels Use additional real
part labels
Ideal
Our performance
upper bound
23
Qualitative comparison
Training with Synthetic Data Only
[CVPR17]
Ours
24
Qualitative comparison
Domain Adaptation with
Adversarial Training
[CVPR18]
Ours
25
Ablation study
26
Synthetic training data analysis
27
Qualitative comparison
[1] Learning from Synthetic Humans, CVPR17.
28
Qualitative comparison
[1] Learning from Synthetic Humans, CVPR17.
29
General approach
• Our proposed cross-domain training approach is general and can be
extended to other applications, such as novel keypoint detection.
We can simply generate new labels on the synthetic data
30
Novel keypoint detection
• In some applications, we need to detect other keypoints (e.g., joints) such
as hand tips, toes, pelvis, spine.
• We create novel keypoints using the graphics simulator and train our model
to detect new human skeleton including those on the hands and feet.
The definition of our newly
created novel keypoints
31
Qualitative results
32
Conclusion
• We discover human pose is very effective to bridge the real and
synthetic domains for multi-person part segmentation.
• We introduce an effective framework to leverage information in
both real and synthetic images for multi-person part segmentation.
• Our method can be extended to generate labels for keypoints such
as those on hands and feet in real images without human labeling.
33
On-going work and future directions
• Reconstruct 3D human mesh from a single image
without ground truth training labels
34
On-going work and future directions
• Training data labeling for 3D body shape is very expensive.
First stage:
Ask workers to label parts
Second stage:
Ask workers to label the corresponding
points on 3D human model
Sampled points: uniformly sampled points within the part
Guler et al, “DensePose: Learning image-to-surface correspondence,” CVPR 2018. 35
On-going work and future directions
• We plan to explore different approaches to learn
human 3D body shape from graphics simulations.
36
Thank you
37

More Related Content

What's hot

보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?광희 이
 
Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Wesley De Neve
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsSangmin Woo
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksNAVER Engineering
 
Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Ohnishi Katsunori
 
Unsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 networkUnsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 network광희 이
 
Dario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringDario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringAdvanced-Concepts-Team
 
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal DatabasesDynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal DatabasesKonstantinos Zagoris
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
 
Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in ImagesAnil Kumar Gupta
 
LFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual ExplanationLFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual Explanation광희 이
 
184816386 x mining
184816386 x mining184816386 x mining
184816386 x mining496573
 
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural NetworksTemporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural NetworksUniversitat Politècnica de Catalunya
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...Ahmed Gad
 
Intelligent Multimedia Recommendation
Intelligent Multimedia RecommendationIntelligent Multimedia Recommendation
Intelligent Multimedia RecommendationWanjin Yu
 
Big-Data Analytics for Media Management
Big-Data Analytics for Media ManagementBig-Data Analytics for Media Management
Big-Data Analytics for Media Managementtechkrish
 
Usage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareUsage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareGlobalLogic Ukraine
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisAhmed Gad
 
Object Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningObject Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningJui-Hsin (Larry) Lai
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1San Kim
 

What's hot (20)

보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?
 
Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networks
 
Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向
 
Unsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 networkUnsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 network
 
Dario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringDario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineering
 
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal DatabasesDynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in Images
 
LFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual ExplanationLFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual Explanation
 
184816386 x mining
184816386 x mining184816386 x mining
184816386 x mining
 
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural NetworksTemporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
 
Intelligent Multimedia Recommendation
Intelligent Multimedia RecommendationIntelligent Multimedia Recommendation
Intelligent Multimedia Recommendation
 
Big-Data Analytics for Media Management
Big-Data Analytics for Media ManagementBig-Data Analytics for Media Management
Big-Data Analytics for Media Management
 
Usage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareUsage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in Healthcare
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
 
Object Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningObject Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online Learning
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1
 

Similar to Cross-domain complementary learning with synthetic data for multi-person part segmentation

AI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media PipeAI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media PipeIRJET Journal
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 
AI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media PipeAI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media PipeIRJET Journal
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningVan Huy
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareTigerGraph
 
Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.Fernando Constantino
 
brief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANsbrief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANsParham Zilouchian
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION cscpconf
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachLorenzo Cesaretti
 
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...Matti Luhtala
 
Age Estimation And Gender Prediction Using Convolutional Neural Network.pptx
Age Estimation And Gender Prediction Using Convolutional Neural Network.pptxAge Estimation And Gender Prediction Using Convolutional Neural Network.pptx
Age Estimation And Gender Prediction Using Convolutional Neural Network.pptxBulbul Agrawal
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNING
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNINGHUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNING
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNINGIRJET Journal
 

Similar to Cross-domain complementary learning with synthetic data for multi-person part segmentation (20)

AI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media PipeAI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media Pipe
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
AI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media PipeAI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media Pipe
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep Learning
 
BTP Report.pdf
BTP Report.pdfBTP Report.pdf
BTP Report.pdf
 
Fa19_P1.pptx
Fa19_P1.pptxFa19_P1.pptx
Fa19_P1.pptx
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
 
Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
brief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANsbrief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANs
 
final ppt
final pptfinal ppt
final ppt
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
 
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
Proactive Rescue Work by Enhancing Situational Awareness: Modeling Resources,...
 
Age Estimation And Gender Prediction Using Convolutional Neural Network.pptx
Age Estimation And Gender Prediction Using Convolutional Neural Network.pptxAge Estimation And Gender Prediction Using Convolutional Neural Network.pptx
Age Estimation And Gender Prediction Using Convolutional Neural Network.pptx
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNING
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNINGHUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNING
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNING
 
Deep Meta Learning
Deep Meta Learning Deep Meta Learning
Deep Meta Learning
 

More from 哲东 郑

Visual saliency
Visual saliencyVisual saliency
Visual saliency哲东 郑
 
Image Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and StyleImage Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and Style哲东 郑
 
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal RetrievalPolysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval哲东 郑
 
Weijian image retrieval
Weijian image retrievalWeijian image retrieval
Weijian image retrieval哲东 郑
 
Scops self supervised co-part segmentation
Scops self supervised co-part segmentationScops self supervised co-part segmentation
Scops self supervised co-part segmentation哲东 郑
 
Video object detection
Video object detectionVideo object detection
Video object detection哲东 郑
 
C2 ae open set recognition
C2 ae open set recognitionC2 ae open set recognition
C2 ae open set recognition哲东 郑
 
Sota semantic segmentation
Sota semantic segmentationSota semantic segmentation
Sota semantic segmentation哲东 郑
 
Deep randomized embedding
Deep randomized embeddingDeep randomized embedding
Deep randomized embedding哲东 郑
 
Semantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive NormalizationSemantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive Normalization哲东 郑
 
Instance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flowInstance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flow哲东 郑
 
Learning to adapt structured output space for semantic
Learning to adapt structured output space for semanticLearning to adapt structured output space for semantic
Learning to adapt structured output space for semantic哲东 郑
 
Unsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image GenerationUnsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image Generation哲东 郑
 
Graph based global reasoning networks
Graph based global reasoning networks Graph based global reasoning networks
Graph based global reasoning networks 哲东 郑
 
Variational Discriminator Bottleneck
Variational Discriminator BottleneckVariational Discriminator Bottleneck
Variational Discriminator Bottleneck哲东 郑
 
GNorm and Rethinking pre training-ruijie
GNorm and Rethinking pre training-ruijieGNorm and Rethinking pre training-ruijie
GNorm and Rethinking pre training-ruijie哲东 郑
 
Smoothed manifold
Smoothed manifoldSmoothed manifold
Smoothed manifold哲东 郑
 

More from 哲东 郑 (20)

Visual saliency
Visual saliencyVisual saliency
Visual saliency
 
Image Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and StyleImage Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and Style
 
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal RetrievalPolysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
 
Weijian image retrieval
Weijian image retrievalWeijian image retrieval
Weijian image retrieval
 
Scops self supervised co-part segmentation
Scops self supervised co-part segmentationScops self supervised co-part segmentation
Scops self supervised co-part segmentation
 
Video object detection
Video object detectionVideo object detection
Video object detection
 
Center nets
Center netsCenter nets
Center nets
 
C2 ae open set recognition
C2 ae open set recognitionC2 ae open set recognition
C2 ae open set recognition
 
Sota semantic segmentation
Sota semantic segmentationSota semantic segmentation
Sota semantic segmentation
 
Deep randomized embedding
Deep randomized embeddingDeep randomized embedding
Deep randomized embedding
 
Semantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive NormalizationSemantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive Normalization
 
Instance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flowInstance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flow
 
Learning to adapt structured output space for semantic
Learning to adapt structured output space for semanticLearning to adapt structured output space for semantic
Learning to adapt structured output space for semantic
 
Unsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image GenerationUnsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image Generation
 
Graph based global reasoning networks
Graph based global reasoning networks Graph based global reasoning networks
Graph based global reasoning networks
 
Style gan
Style ganStyle gan
Style gan
 
Vi2vi
Vi2viVi2vi
Vi2vi
 
Variational Discriminator Bottleneck
Variational Discriminator BottleneckVariational Discriminator Bottleneck
Variational Discriminator Bottleneck
 
GNorm and Rethinking pre training-ruijie
GNorm and Rethinking pre training-ruijieGNorm and Rethinking pre training-ruijie
GNorm and Rethinking pre training-ruijie
 
Smoothed manifold
Smoothed manifoldSmoothed manifold
Smoothed manifold
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Cross-domain complementary learning with synthetic data for multi-person part segmentation

  • 1. Cross-domain Complementary Learning with Synthetic Data for Multi-Person Part Segmentation Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun University of Washington, Seattle Microsoft, Redmond International Conference on Computer Vision (ICCV), Demonstration, 2019 1
  • 2. Outline • Introduction • Related works • Proposed method • Experiments • On-going work and Conclusion 2
  • 3. Human part segmentation • Human part segmentation aims at partitioning persons in the image to multiple semantically consistent regions. Typically 14 parts: Head, torso, left upper-arm, right upper-arm, left lower-arm, right lower- arm, left hand, right hand, left thigh, right thigh, left shank, right shank, left foot, right foot Input Image Part Segmentation 3
  • 4. Challenges • Training data labeling in pixel-level is very expensive and labor intensive. 4
  • 5. Previous works • People have been exploring synthetic data as an alternative. • They trained deep CNN using the synthetic data. Samples of the synthetic training data and the synthetic labels [CVPR17] 5
  • 6. Previous works Their method works well only on the well-controlled, single-person scenario. Learning from Synthetic Humans, CVPR 2017 Input images Output results 6
  • 7. The domain gap • The discrepancy of pixel-value distributions between the synthetic and real data makes transferring the knowledge from the synthetic to real domain challenging. Synthetic image Real images 7
  • 8. Related works on street-view segmentation • People are also trying to use graphics simulation for training a segmentation model for street-view images. • They also observe the domain-gap issue. Zhang et al, Fully Convolutional Adaptation Networks for Semantic Segmentation, CVPR 2018. 8
  • 9. Related works on street-view segmentation • Previous studies tried to address the domain-gap issue by using adversarial training. • They use a discriminator to distinguish whether the input is from the source or target domain. [Tsai et al, ICCV2019], [Tsai et al, CVPR 2018], [Ren et al, CVPR2018], [Tzeng et al, CVPR2017], [Ganin et al, ICML2015] Graphics simulation Real-world images 9
  • 10. Challenges • Can we learn human part segmentation without data labeling? • How to learn human part segmentation from graphics simulations, and make the resulting model work well on real world scenario? We propose a new approach, named cross-domain complementary learning (CDCL) to address the challenges. 10
  • 11. Our multi-person synthetic data • We create a new multi-person synthetic dataset which contains multiple persons performing various actions in a 3D room. 11
  • 12. The idea •We observe that real and synthetic humans both have a skeleton (pose) representation. 12
  • 13. Proposed method • We propose to bridge the domains with skeletons and learn part segmentation from synthetic data. 13
  • 14. Proposed network: Module 1 Backbone (ResNet101) Part Affinity Fields Keypoint Maps Skeletons Real Inputs Head networks The network architecture is similar to “Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,” in CVPR 2017. 14
  • 15. Proposed network: Module 2 Backbone (ResNet101) Head networks Keypoint Maps Body Part Maps Skeletons Body Part Segmentation Synthetic Inputs Part Affinity Fields 15
  • 16. Two modules are trained interchangeably Backbone (ResNet101) Head networks Keypoint Maps Body Part Maps Skeletons Body Part Segmentation Backbone (ResNet101) Part Affinity Fields Keypoint Maps Skeletons Parameter Sharing Synthetic Inputs Real Inputs Head networks Module 2 Module 1 Part Affinity Fields 16
  • 17. • Intersection over Union (IoU) is one of the most commonly used metrics in semantic segmentation. • IoU is calculated for each body part category separately. • We average over all categories to provide a mean IoU. Evaluation metric IoU = 𝐴𝑟𝑒𝑎 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 𝐴𝑒𝑟𝑎 𝑜𝑓 𝑈𝑛𝑖𝑜𝑛 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 ∩ 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 ∪ 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ 17
  • 18. Evaluation benchmarks • Pascal-Person-Parts dataset • 1716 training images • 1817 test images • COCO-DensePose dataset • 26151 training images • 1508 test images 18
  • 19. Comparison on Pascal and COCO (mIOU, %) Synthetic Only Adversarial Training Fang et al CVPR18 OursChen et al TPAMI18 Gong et al CVPR17 Ours + Real part labels Use real part labels Use additional real part labels Ideal 19
  • 20. Comparison on Pascal and COCO (mIOU, %) Synthetic Only Adversarial Training Fang et al CVPR18 OursChen et al TPAMI18 Gong et al CVPR17 Ours + Real part labels Use real part labels Use additional real part labels Ideal Performance Gap 20
  • 21. Comparison on Pascal and COCO (mIOU, %) Synthetic Only Adversarial Training Fang et al CVPR18 OursChen et al TPAMI18 Gong et al CVPR17 Ours + Real part labels Use real part labels Use additional real part labels Ideal Performance Gap 21
  • 22. Comparison on Pascal and COCO (mIOU, %) Synthetic Only Adversarial Training Fang et al CVPR18 OursChen et al TPAMI18 Gong et al CVPR17 Ours + Real part labels Use real part labels Use additional real part labels Ideal Relax labeling requirements! 22
  • 23. Comparison on Pascal and COCO (mIOU, %) Synthetic Only Adversarial Training Fang et al CVPR18 OursChen et al TPAMI18 Gong et al CVPR17 Ours + Real part labels Use real part labels Use additional real part labels Ideal Our performance upper bound 23
  • 24. Qualitative comparison Training with Synthetic Data Only [CVPR17] Ours 24
  • 25. Qualitative comparison Domain Adaptation with Adversarial Training [CVPR18] Ours 25
  • 27. Synthetic training data analysis 27
  • 28. Qualitative comparison [1] Learning from Synthetic Humans, CVPR17. 28
  • 29. Qualitative comparison [1] Learning from Synthetic Humans, CVPR17. 29
  • 30. General approach • Our proposed cross-domain training approach is general and can be extended to other applications, such as novel keypoint detection. We can simply generate new labels on the synthetic data 30
  • 31. Novel keypoint detection • In some applications, we need to detect other keypoints (e.g., joints) such as hand tips, toes, pelvis, spine. • We create novel keypoints using the graphics simulator and train our model to detect new human skeleton including those on the hands and feet. The definition of our newly created novel keypoints 31
  • 33. Conclusion • We discover human pose is very effective to bridge the real and synthetic domains for multi-person part segmentation. • We introduce an effective framework to leverage information in both real and synthetic images for multi-person part segmentation. • Our method can be extended to generate labels for keypoints such as those on hands and feet in real images without human labeling. 33
  • 34. On-going work and future directions • Reconstruct 3D human mesh from a single image without ground truth training labels 34
  • 35. On-going work and future directions • Training data labeling for 3D body shape is very expensive. First stage: Ask workers to label parts Second stage: Ask workers to label the corresponding points on 3D human model Sampled points: uniformly sampled points within the part Guler et al, “DensePose: Learning image-to-surface correspondence,” CVPR 2018. 35
  • 36. On-going work and future directions • We plan to explore different approaches to learn human 3D body shape from graphics simulations. 36