A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

Multimodal Language Understanding
for Carry and Place tasks
Aly Magassouba, Komei Sugiura and Hisashi Kawai
National Institute of Information and Communications Tech., Japan
Our target: service robots that understand ambiguous speech
Social Background
• Shortage of manpower that can physically
support people with disability
Challenge
• Understanding ambiguous instructions
from the linguistic and visual context in a
end-to-end approach
Ambiguity
• “Put away the sugar and milk bottle”
• Meaning: “Put the sugar on the kitchen
shelf and the milk in the fridge”
The difference between our approach and literature is Generative
Adversarial Nets (GAN) data augmentation in latent space
Related work:
• Dialog-based approach [Kollar10]
– Time consuming
• End-to-end approach [Hatori18]
– Grasping task/Large dataset
• LAC-GAN [Sugiura17]
– Single modality
Novelty:
– Multimodal spoken language
understanding with GAN data
augmentation
• Key technology
– GAN data augmentation in latent space
– Different from Classic GAN[Goodfellow14]
used for generation
[Bousmalis17]
fake
real
Discriminator
Generator
OR
[Zhang17]
Theoretical background of MultiModal Classifier GAN (MMC-GAN)
Cost function of Extractor
Cost function of Generator based on
Wasserstein method
Cost function of discriminator• Data augmentation in latent space
makes more data-efficient [Sugiura17]
• Extractor was fully-connected, not
adapted to visual and multimodal inputs
Structure of Extractor
Input (b)
• Instruction: “Bring this towel to the
kitchen shelf”
• Context “the robot is holding the
towel”
• Depth image
Output label
• A4= Unlikely target area
Building Carry-and-Place Multimodal Dataset for validating our method
Input (a)
• Instruction: “Put the coke bottle on
the table”
• Context “the bottle has been
grasped”
• Depth image
Output label
• A1= Very likely target area
A1 212
A2 432
A3 398
A4 240
Total 1282
Data set distribution
MMC-GAN is more accurate thanks to the data augmentation property
Method GAN
type
Instruction Instruction
+Context
Image Instruction
+Context
+Image
CNN
(baseline)
- 59.4 60.2 61.1 82.2
MMC-GAN GAN 57.5* 59.5* 58.1 85.3
MMC-GAN CGAN 56.4* 56.7* 58.2 86.2
MMC-GAN WGAN 61.8 62.7 59.7 84.4
*Not all trials converge
Metric = test-set accuracy
MMC-GAN is more accurate thanks to the data augmentation property
Method GAN
type
Instruction Instruction
+Context
Image Instruction
+Context
+Image
CNN
(baseline)
- 59.4 60.2 61.1 82.2
MMC-GAN GAN 57.5* 59.5* 58.1 85.3
MMC-GAN CGAN 56.4* 56.7* 58.2 86.2
MMC-GAN WGAN 61.8 62.7 59.7 84.4
*Not all trials converge
MMC-GAN outperforms
classic DNN
Metric = test-set accuracy
MMC-GAN is more accurate thanks to the data augmentation property
Method GAN
type
Instruction Instruction
+Context
Image Instruction
+Context
+Image
CNN
(baseline)
- 59.4 60.2 61.1 82.2
MMC-GAN GAN 57.5* 59.5* 58.1 85.3
MMC-GAN CGAN 56.4* 56.7* 58.2 86.2
MMC-GAN WGAN 61.8 62.7 59.7 84.4
*Not all trials converge
Metric = test-set accuracy
Multimodal approach is
required to solve the carry-
and-place task
MMC-GAN is more accurate thanks to the data augmentation property
Method GAN
type
Instruction Instruction
+Context
Image Instruction
+Context
+Image
CNN
(baseline)
- 59.4 60.2 61.1 82.2
MMC-GAN GAN 57.5* 59.5* 58.1 85.3
MMC-GAN CGAN 56.4* 56.7* 58.2 86.2
MMC-GAN WGAN 61.8 62.7 59.7 84.4
*Not all trials converge
WGAN is more
stable
Metric = test-set accuracy
Sample results: MMC-GAN emphasizes the relationship between
linguistic and visual features
CorrectpredictionIncorrectprediction
Confusion matrix
Summary
• Contribution
– Multimodal spoken language understanding with GAN data augmentation
• Method
– A GAN network based on latent space feature that classifies target areas
from ambiguous instructions
• Results
– Our method outperforms DNN
– Multimodal inputs are required to solve carry-and-place tasks
1 of 12

Recommended

kaggle_meet_up by
kaggle_meet_upkaggle_meet_up
kaggle_meet_upMarios Michailidis
5.6K views34 slides
B4UConference_machine learning_deeplearning by
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
1.3K views69 slides
modeling.ppt by
modeling.pptmodeling.ppt
modeling.pptssuser1d6968
2 views56 slides
AI and Deep Learning by
AI and Deep Learning AI and Deep Learning
AI and Deep Learning Subrat Panda, PhD
1.8K views59 slides
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan by
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganSpark Summit
2.5K views23 slides
2022_03_22 EDUCON 2022 “Using e-Learning Standards to Improve Serious Game De... by
2022_03_22 EDUCON 2022 “Using e-Learning Standards to Improve Serious Game De...2022_03_22 EDUCON 2022 “Using e-Learning Standards to Improve Serious Game De...
2022_03_22 EDUCON 2022 “Using e-Learning Standards to Improve Serious Game De...eMadrid network
181 views15 slides

More Related Content

Similar to A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five by
Multi Task DPP for Basket Completion by Romain WARLOP, Fifty FiveMulti Task DPP for Basket Completion by Romain WARLOP, Fifty Five
Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Fiverecsysfr
177 views18 slides
2014 khmer protocols by
2014 khmer protocols2014 khmer protocols
2014 khmer protocolsc.titus.brown
6.5K views38 slides
The Power of Auto ML and How Does it Work by
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
2.8K views29 slides
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ... by
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Jisu Han
6 views22 slides
Parallel Genetic Algorithms in the Cloud by
Parallel Genetic Algorithms in the CloudParallel Genetic Algorithms in the Cloud
Parallel Genetic Algorithms in the CloudPasquale Salza
517 views31 slides
Informs 2019 - Flexible Network Design Utilizing Non Strict Modeling Approaches by
Informs 2019  - Flexible Network Design Utilizing Non Strict Modeling ApproachesInforms 2019  - Flexible Network Design Utilizing Non Strict Modeling Approaches
Informs 2019 - Flexible Network Design Utilizing Non Strict Modeling ApproachesFabion Kauker
57 views17 slides

Similar to A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions(20)

Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five by recsysfr
Multi Task DPP for Basket Completion by Romain WARLOP, Fifty FiveMulti Task DPP for Basket Completion by Romain WARLOP, Fifty Five
Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five
recsysfr177 views
The Power of Auto ML and How Does it Work by Ivo Andreev
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev2.8K views
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ... by Jisu Han
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Jisu Han6 views
Parallel Genetic Algorithms in the Cloud by Pasquale Salza
Parallel Genetic Algorithms in the CloudParallel Genetic Algorithms in the Cloud
Parallel Genetic Algorithms in the Cloud
Pasquale Salza517 views
Informs 2019 - Flexible Network Design Utilizing Non Strict Modeling Approaches by Fabion Kauker
Informs 2019  - Flexible Network Design Utilizing Non Strict Modeling ApproachesInforms 2019  - Flexible Network Design Utilizing Non Strict Modeling Approaches
Informs 2019 - Flexible Network Design Utilizing Non Strict Modeling Approaches
Fabion Kauker57 views
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al... by Edge AI and Vision Alliance
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al..."Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
Strata London - Deep Learning 05-2015 by Turi, Inc.
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
Turi, Inc.2.4K views
NS-CUK Joint Jouarl Club: JHLee, Review on "GraphMAE: Self-Supervised Masked... by ssuser4b1f48
 NS-CUK Joint Jouarl Club: JHLee, Review on "GraphMAE: Self-Supervised Masked... NS-CUK Joint Jouarl Club: JHLee, Review on "GraphMAE: Self-Supervised Masked...
NS-CUK Joint Jouarl Club: JHLee, Review on "GraphMAE: Self-Supervised Masked...
ssuser4b1f4874 views
Manta ray optimized deep contextualized bi-directional long short-term memor... by IJECEIAES
Manta ray optimized deep contextualized bi-directional long  short-term memor...Manta ray optimized deep contextualized bi-directional long  short-term memor...
Manta ray optimized deep contextualized bi-directional long short-term memor...
IJECEIAES5 views
BRV CTO Summit Deep Learning Talk by Doug Chang
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning Talk
Doug Chang205 views
The Frontier of Deep Learning in 2020 and Beyond by NUS-ISS
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
NUS-ISS168 views
Machine Learning in e commerce - Reboot by Marion DE SOUSA
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - Reboot
Marion DE SOUSA124 views
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models by DataScienceConferenc1
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
Musings of kaggler by Kai Xin Thia
Musings of kagglerMusings of kaggler
Musings of kaggler
Kai Xin Thia1.4K views
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST... by csandit
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
csandit482 views
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken... by Yun-Nung (Vivian) Chen
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...

Recently uploaded

POSTER IV LAWCN_ROVER_IUE.pdf by
POSTER IV LAWCN_ROVER_IUE.pdfPOSTER IV LAWCN_ROVER_IUE.pdf
POSTER IV LAWCN_ROVER_IUE.pdfSOCIEDAD JULIO GARAVITO
11 views1 slide
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor... by
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...Trustlife
18 views17 slides
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...ILRI
5 views6 slides
NUTRITION IN BACTERIA.pdf by
NUTRITION IN BACTERIA.pdfNUTRITION IN BACTERIA.pdf
NUTRITION IN BACTERIA.pdfNandadulalSannigrahi
28 views14 slides
TF-FAIR.pdf by
TF-FAIR.pdfTF-FAIR.pdf
TF-FAIR.pdfDirk Roorda
6 views120 slides
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy... by
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...Anmol Vishnu Gupta
6 views10 slides

Recently uploaded(20)

Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor... by Trustlife
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Trustlife18 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 views
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy... by Anmol Vishnu Gupta
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
Pollination By Nagapradheesh.M.pptx by MNAGAPRADHEESH
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptx
MNAGAPRADHEESH17 views
별헤는 사람들 2023년 12월호 전명원 교수 자료 by sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople50 views
Applications of Large Language Models in Materials Discovery and Design by Anubhav Jain
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
Anubhav Jain11 views
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... by Anmol Vishnu Gupta
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... by SwagatBehera9
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
SwagatBehera95 views
Conventional and non-conventional methods for improvement of cucurbits.pptx by gandhi976
Conventional and non-conventional methods for improvement of cucurbits.pptxConventional and non-conventional methods for improvement of cucurbits.pptx
Conventional and non-conventional methods for improvement of cucurbits.pptx
gandhi97620 views
Light Pollution for LVIS students by CWBarthlmew
Light Pollution for LVIS studentsLight Pollution for LVIS students
Light Pollution for LVIS students
CWBarthlmew7 views

A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

  • 1. Multimodal Language Understanding for Carry and Place tasks Aly Magassouba, Komei Sugiura and Hisashi Kawai National Institute of Information and Communications Tech., Japan
  • 2. Our target: service robots that understand ambiguous speech Social Background • Shortage of manpower that can physically support people with disability Challenge • Understanding ambiguous instructions from the linguistic and visual context in a end-to-end approach Ambiguity • “Put away the sugar and milk bottle” • Meaning: “Put the sugar on the kitchen shelf and the milk in the fridge”
  • 3. The difference between our approach and literature is Generative Adversarial Nets (GAN) data augmentation in latent space Related work: • Dialog-based approach [Kollar10] – Time consuming • End-to-end approach [Hatori18] – Grasping task/Large dataset • LAC-GAN [Sugiura17] – Single modality Novelty: – Multimodal spoken language understanding with GAN data augmentation • Key technology – GAN data augmentation in latent space – Different from Classic GAN[Goodfellow14] used for generation [Bousmalis17] fake real Discriminator Generator OR [Zhang17]
  • 4. Theoretical background of MultiModal Classifier GAN (MMC-GAN) Cost function of Extractor Cost function of Generator based on Wasserstein method Cost function of discriminator• Data augmentation in latent space makes more data-efficient [Sugiura17] • Extractor was fully-connected, not adapted to visual and multimodal inputs
  • 6. Input (b) • Instruction: “Bring this towel to the kitchen shelf” • Context “the robot is holding the towel” • Depth image Output label • A4= Unlikely target area Building Carry-and-Place Multimodal Dataset for validating our method Input (a) • Instruction: “Put the coke bottle on the table” • Context “the bottle has been grasped” • Depth image Output label • A1= Very likely target area A1 212 A2 432 A3 398 A4 240 Total 1282 Data set distribution
  • 7. MMC-GAN is more accurate thanks to the data augmentation property Method GAN type Instruction Instruction +Context Image Instruction +Context +Image CNN (baseline) - 59.4 60.2 61.1 82.2 MMC-GAN GAN 57.5* 59.5* 58.1 85.3 MMC-GAN CGAN 56.4* 56.7* 58.2 86.2 MMC-GAN WGAN 61.8 62.7 59.7 84.4 *Not all trials converge Metric = test-set accuracy
  • 8. MMC-GAN is more accurate thanks to the data augmentation property Method GAN type Instruction Instruction +Context Image Instruction +Context +Image CNN (baseline) - 59.4 60.2 61.1 82.2 MMC-GAN GAN 57.5* 59.5* 58.1 85.3 MMC-GAN CGAN 56.4* 56.7* 58.2 86.2 MMC-GAN WGAN 61.8 62.7 59.7 84.4 *Not all trials converge MMC-GAN outperforms classic DNN Metric = test-set accuracy
  • 9. MMC-GAN is more accurate thanks to the data augmentation property Method GAN type Instruction Instruction +Context Image Instruction +Context +Image CNN (baseline) - 59.4 60.2 61.1 82.2 MMC-GAN GAN 57.5* 59.5* 58.1 85.3 MMC-GAN CGAN 56.4* 56.7* 58.2 86.2 MMC-GAN WGAN 61.8 62.7 59.7 84.4 *Not all trials converge Metric = test-set accuracy Multimodal approach is required to solve the carry- and-place task
  • 10. MMC-GAN is more accurate thanks to the data augmentation property Method GAN type Instruction Instruction +Context Image Instruction +Context +Image CNN (baseline) - 59.4 60.2 61.1 82.2 MMC-GAN GAN 57.5* 59.5* 58.1 85.3 MMC-GAN CGAN 56.4* 56.7* 58.2 86.2 MMC-GAN WGAN 61.8 62.7 59.7 84.4 *Not all trials converge WGAN is more stable Metric = test-set accuracy
  • 11. Sample results: MMC-GAN emphasizes the relationship between linguistic and visual features CorrectpredictionIncorrectprediction Confusion matrix
  • 12. Summary • Contribution – Multimodal spoken language understanding with GAN data augmentation • Method – A GAN network based on latent space feature that classifies target areas from ambiguous instructions • Results – Our method outperforms DNN – Multimodal inputs are required to solve carry-and-place tasks