SlideShare a Scribd company logo
1 of 15
Download to read offline
PR-12 presentation
M3D-GAN: Multi-Modal
Multi-Domain Translation
with Universal Attention
Authors: Shuang Ma, Daniel
McDuff, Yale Song
Presented by Taesu Kim
arXiv:1907.04378v1 [cs.CV] 9 Jul 2019
Introduction
Introduction
Architecture overview
Training Objectives
Universal Attention Module
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Refer to speech synthesis with style tokens
Architecture for each module
❖ PreNet
❖ Speech and Image: 32x32 2D convolution
❖ Text: 128-D character embedding and fully connected layer [256, 128]
Experiments
Input-output pairs for train/test phase
Image-to-image
Image-to-image
Text-to-image
Quantitative results
Ablation study
❖ It shows the contribution of our attention module in producing a highly
structured latent space, which helps the model generate more realistic and
diverse results.
❖ When we add the latent regression, the performance is further improved.
❖ It shows that, the distance regularization plays an important role in controlling
the learned latent code.
❖ It illustrates that the attention mechanism does help in producing better results.
Conclusion
❖ We present M3D-GAN for cross-modal cross-domain translation, which
consists of modality-specific subnets with a universal attention module
that learns to encode modality/domain information in a unified way.
❖ We show how the same architecture can be applied to a wide variety of
tasks.
❖ The universal attention module we propose can learn a highly
structured latent space by means of information bottleneck.
❖ Leveraging the same architecture for different problems is
advantageous as it allows us to develop shared representations in the
abstracted unified space.

More Related Content

Similar to PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022VasileiosMezaris
 
RoBERTa: language modelling in building Indonesian question-answering systems
RoBERTa: language modelling in building Indonesian question-answering systemsRoBERTa: language modelling in building Indonesian question-answering systems
RoBERTa: language modelling in building Indonesian question-answering systemsTELKOMNIKA JOURNAL
 
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...Antonio Tejero de Pablos
 
Multimedia And Contiguity Principles Casey Susan
Multimedia And Contiguity Principles Casey SusanMultimedia And Contiguity Principles Casey Susan
Multimedia And Contiguity Principles Casey SusanCasey Susan
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedIRJET Journal
 
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...Willy Marroquin (WillyDevNET)
 
RuCORD: Rule-based Composite Operation Recovering and Detection to Support Co...
RuCORD: Rule-based Composite Operation Recovering and Detection to Support Co...RuCORD: Rule-based Composite Operation Recovering and Detection to Support Co...
RuCORD: Rule-based Composite Operation Recovering and Detection to Support Co...Amanuel Alemayehu
 
Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplabFrozen Paradise
 
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...IJCNCJournal
 
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...IJCNCJournal
 
A Study on MDE Approaches for Engineering Wireless Sensor Networks
A Study on MDE Approaches  for Engineering Wireless Sensor Networks A Study on MDE Approaches  for Engineering Wireless Sensor Networks
A Study on MDE Approaches for Engineering Wireless Sensor Networks Ivano Malavolta
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ijnlc
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...kevig
 
An Ensemble Approach To Improve Homomorphic Encrypted Data Classification Per...
An Ensemble Approach To Improve Homomorphic Encrypted Data Classification Per...An Ensemble Approach To Improve Homomorphic Encrypted Data Classification Per...
An Ensemble Approach To Improve Homomorphic Encrypted Data Classification Per...IJCI JOURNAL
 
MODIGEN: MODEL-DRIVEN GENERATION OF GRAPHICAL EDITORS IN ECLIPSE
MODIGEN: MODEL-DRIVEN GENERATION OF GRAPHICAL EDITORS IN ECLIPSEMODIGEN: MODEL-DRIVEN GENERATION OF GRAPHICAL EDITORS IN ECLIPSE
MODIGEN: MODEL-DRIVEN GENERATION OF GRAPHICAL EDITORS IN ECLIPSEijcsit
 
Beginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptxBeginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptxIshaq Khan
 

Similar to PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention (20)

Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
 
Use CNN for Sequence Modeling
Use CNN for Sequence ModelingUse CNN for Sequence Modeling
Use CNN for Sequence Modeling
 
RoBERTa: language modelling in building Indonesian question-answering systems
RoBERTa: language modelling in building Indonesian question-answering systemsRoBERTa: language modelling in building Indonesian question-answering systems
RoBERTa: language modelling in building Indonesian question-answering systems
 
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
 
Multimedia And Contiguity Principles Casey Susan
Multimedia And Contiguity Principles Casey SusanMultimedia And Contiguity Principles Casey Susan
Multimedia And Contiguity Principles Casey Susan
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
 
Cmpe 255 Short Story Assignment
Cmpe 255 Short Story AssignmentCmpe 255 Short Story Assignment
Cmpe 255 Short Story Assignment
 
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...
 
RuCORD: Rule-based Composite Operation Recovering and Detection to Support Co...
RuCORD: Rule-based Composite Operation Recovering and Detection to Support Co...RuCORD: Rule-based Composite Operation Recovering and Detection to Support Co...
RuCORD: Rule-based Composite Operation Recovering and Detection to Support Co...
 
Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplab
 
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...
 
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...
 
A Study on MDE Approaches for Engineering Wireless Sensor Networks
A Study on MDE Approaches  for Engineering Wireless Sensor Networks A Study on MDE Approaches  for Engineering Wireless Sensor Networks
A Study on MDE Approaches for Engineering Wireless Sensor Networks
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
An Ensemble Approach To Improve Homomorphic Encrypted Data Classification Per...
An Ensemble Approach To Improve Homomorphic Encrypted Data Classification Per...An Ensemble Approach To Improve Homomorphic Encrypted Data Classification Per...
An Ensemble Approach To Improve Homomorphic Encrypted Data Classification Per...
 
MODIGEN: MODEL-DRIVEN GENERATION OF GRAPHICAL EDITORS IN ECLIPSE
MODIGEN: MODEL-DRIVEN GENERATION OF GRAPHICAL EDITORS IN ECLIPSEMODIGEN: MODEL-DRIVEN GENERATION OF GRAPHICAL EDITORS IN ECLIPSE
MODIGEN: MODEL-DRIVEN GENERATION OF GRAPHICAL EDITORS IN ECLIPSE
 
Beginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptxBeginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptx
 

More from Taesu Kim

PR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
PR12-193 NISP: Pruning Networks using Neural Importance Score PropagationPR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
PR12-193 NISP: Pruning Networks using Neural Importance Score PropagationTaesu Kim
 
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head ModelsPR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head ModelsTaesu Kim
 
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual MetricPR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual MetricTaesu Kim
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsTaesu Kim
 
PR-043: HyperNetworks
PR-043: HyperNetworksPR-043: HyperNetworks
PR-043: HyperNetworksTaesu Kim
 

More from Taesu Kim (6)

PR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
PR12-193 NISP: Pruning Networks using Neural Importance Score PropagationPR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
PR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
 
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head ModelsPR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
PR12-165 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
 
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual MetricPR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
 
PR-043: HyperNetworks
PR-043: HyperNetworksPR-043: HyperNetworks
PR-043: HyperNetworks
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

  • 1. PR-12 presentation M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention Authors: Shuang Ma, Daniel McDuff, Yale Song Presented by Taesu Kim arXiv:1907.04378v1 [cs.CV] 9 Jul 2019
  • 6. Universal Attention Module Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Refer to speech synthesis with style tokens
  • 7. Architecture for each module ❖ PreNet ❖ Speech and Image: 32x32 2D convolution ❖ Text: 128-D character embedding and fully connected layer [256, 128]
  • 9. Input-output pairs for train/test phase
  • 14. Ablation study ❖ It shows the contribution of our attention module in producing a highly structured latent space, which helps the model generate more realistic and diverse results. ❖ When we add the latent regression, the performance is further improved. ❖ It shows that, the distance regularization plays an important role in controlling the learned latent code. ❖ It illustrates that the attention mechanism does help in producing better results.
  • 15. Conclusion ❖ We present M3D-GAN for cross-modal cross-domain translation, which consists of modality-specific subnets with a universal attention module that learns to encode modality/domain information in a unified way. ❖ We show how the same architecture can be applied to a wide variety of tasks. ❖ The universal attention module we propose can learn a highly structured latent space by means of information bottleneck. ❖ Leveraging the same architecture for different problems is advantageous as it allows us to develop shared representations in the abstracted unified space.