SlideShare a Scribd company logo
1
2
Multimodal intelligence integrating
diverse data with deep learning
Computer
Vision
Natural
Language
Processing
Multimodal
Machine Learning
Deep Learning
3
Robust/Few-shot
learning theory
ACCV’18, ICIP’19
ConvNets architectures
BMVC’13, ICIP’19
Fine-grained recognition
ICME’13, CLEF’13
Large-scale image tagging
CVPR’10, ECCV’10, ICPR’16
Medical image analysis
ISBI’18, CIKM’19,
Neurocomputing’19
Visual aesthetics analysis
ACMMM’18
Scene text erasing
WACV’20
Visual relationship detection
ISM’17, ICIP’20
4
Word representation learning
IJCNLP’17, ICLR’18
Machine translation
MT’17, ACL’18, ACL’19, AAAI’20 Cross-lingual retrieval
EMNLP’15
a cat is trying to
eat the food
Image/video caption generation
COLING’16 , LREC’18
Visual phrase grounding
LREC’20
Multimodal machine
translation
CICLING’19
Unsupervised discourse parsing
SIGDIAL’18, TACL’20
Neural input method
NAACL’19
 Discriminative initialization of
Convolutional Neural Network (CNN)
[BMVC’13]
◦ Closed-form initialization using Fisher
Discriminant Analysis
 Frequency-domain CNNs
[ICIP’19, MMAsia’19]
5
6
 Control the number of output words using a
recurrent neural network
Jin et al., Annotation Order Matters: Recurrent Image Annotator
for Arbitrary Length Image Tagging, In Proc. ICPR, 2016.
 Car types identification
 Plant species identification
◦ ImageCLEF 2013 Plant
Identification Challenge (1st place)
 Character recognition
◦ ICDAR Script identification
challenge (3rd place)
7
Acura RLMitsubishi
Lancer
Toyota Camry
Audi S4 Honda Accord Mercedes-Benz C-Class
 Stochastically switch the cross-entropy loss(CCE)and the
mean absolute error loss(MAE)
8Hataya et al., LOL: LEARNING TO OPTIMIZE LOSS SWITCHING UNDER LABEL NOISE, 2018.
 There exist some “easy” examples which can be correctly
classified at the beginning stage of learning
 “Hard” data matters more
9
Kishida et al., EMPIRICAL STUDY OF EASY AND HARD EXAMPLES IN CNN TRAINING, ICONIP 2019.
 Co-segmentation: extract common objects in multiple images
10
Chen et al., Semantic Aware Attention Based Deep
Object Co-segmentation, In Proc. ACCV, 2018.
Han et al., "Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using
Highly-Rough Annotation on MR Images", In Proc. of CIKM, 2019.
Han et al., "Combining Noise-to-Image and Image-to-Image GANs: Brain MR Image Augmentation for Tumor Detection",
IEEE Access, Vol.7, pp.156966-156977, 2019.
 Erasing texts in general images
[WACV’20]
 Erasing general objects
[Lazarski, 2018]
12
https://www.youtube.com/watch?v=JvTvyOeAGbU
 Diversification of decoding [ACL’19]
 Resource-efficient MT
◦ Compression of word vectors (99% off!) [ICLR’18]
◦ Rapid decoding [ACL’18, AAAI’20]
13
Input
Beam
Search
Proposed
Syntactic Diversity
14
 Obtain syntactic word features
Permutation
Matrix
→
update
steps
 Incorporate Quantum Walk for graph
representation learning
15
16
a woman is slicing
some vegetables
a cat is trying to
eat the food
a dog is swimming
in the pool
Input
(frame
sequence)
Output (word sequence)
“Translation”
from
video to text!
<BOS> a woman is cooking in the kitchen <EOS>
context vector
 Multimodal Machine Translation
[CICLING’19]
◦ Improve translation with the help of vision
17
 Phrase localization [LREC’20]
◦ Identify the image region for a given phrase
 Presentations at prestigious conferences/journals
◦ ACL, AAAI, WACV, TACL, ICIP (2020)
◦ ACL, CIKM, Neurocomputing, 3DV, ICIPx3 (2019)
◦ ACL, ICLR, ACCV, SIGDIAL, LREC (2018)
◦ IJCNLP, ICDAR (2017)
 Awards
◦ 言語処理学会年次大会 優秀賞、若手奨励賞 (2020)
◦ CVIM研究会奨励賞 (2020)
◦ 情報理工学系研究科長賞 (2020)
◦ 画像の認識・理解シンポジウム 学生奨励賞 (2019)
◦ 電子情報通信学会医用画像研究会奨励賞 (2019)
◦ 言語処理学会年次大会 最優秀賞 (2018)
◦ NMT@ACL outstanding paper award (2017)
◦ 人工知能学会全国大会 優秀賞, 学生奨励賞x2 (2017)
18
 Faculty:1(Nakayama)
 PhD students:10
 Master’s students:12(4~5 per each year)
 Secretary:1
19
 Monday: Group meeting (2~3h)
◦ Short progress report by all, discussion, study session
◦ Mainly organized by PhD students
 Wednesday: Main meeting (2~3h)
◦ Progress report (3~4 students)
◦ Presentation practice, etc.
 Others
◦ One-on-one meeting
◦ Project meeting
 No other hours on duty
20
 Workstation (2GPUs) for each student
 Share machines
◦ 4GPUs x 4
◦ 8GPUs x 2
 Cloud computers
◦ University cloud system
◦ ABCI
21
22

More Related Content

More from nlab_utokyo

RecSysTV2014
RecSysTV2014RecSysTV2014
RecSysTV2014
nlab_utokyo
 
20150930
2015093020150930
20150930
nlab_utokyo
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
nlab_utokyo
 
20150414seminar
20150414seminar20150414seminar
20150414seminar
nlab_utokyo
 
Deep Learningと画像認識   ~歴史・理論・実践~
Deep Learningと画像認識 ~歴史・理論・実践~Deep Learningと画像認識 ~歴史・理論・実践~
Deep Learningと画像認識   ~歴史・理論・実践~
nlab_utokyo
 
Lab introduction 2014
Lab introduction 2014Lab introduction 2014
Lab introduction 2014nlab_utokyo
 
SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2nlab_utokyo
 

More from nlab_utokyo (10)

RecSysTV2014
RecSysTV2014RecSysTV2014
RecSysTV2014
 
20150930
2015093020150930
20150930
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
 
20150414seminar
20150414seminar20150414seminar
20150414seminar
 
Deep Learningと画像認識   ~歴史・理論・実践~
Deep Learningと画像認識 ~歴史・理論・実践~Deep Learningと画像認識 ~歴史・理論・実践~
Deep Learningと画像認識   ~歴史・理論・実践~
 
MIRU2014 SLAC
MIRU2014 SLACMIRU2014 SLAC
MIRU2014 SLAC
 
Lab introduction 2014
Lab introduction 2014Lab introduction 2014
Lab introduction 2014
 
SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2
 
ICME 2013
ICME 2013ICME 2013
ICME 2013
 
Seminar
SeminarSeminar
Seminar
 

Recently uploaded

A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 

Recently uploaded (20)

A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 

2020年度 東京大学中山研 研究室紹介

  • 1. 1
  • 2. 2 Multimodal intelligence integrating diverse data with deep learning Computer Vision Natural Language Processing Multimodal Machine Learning Deep Learning
  • 3. 3 Robust/Few-shot learning theory ACCV’18, ICIP’19 ConvNets architectures BMVC’13, ICIP’19 Fine-grained recognition ICME’13, CLEF’13 Large-scale image tagging CVPR’10, ECCV’10, ICPR’16 Medical image analysis ISBI’18, CIKM’19, Neurocomputing’19 Visual aesthetics analysis ACMMM’18 Scene text erasing WACV’20 Visual relationship detection ISM’17, ICIP’20
  • 4. 4 Word representation learning IJCNLP’17, ICLR’18 Machine translation MT’17, ACL’18, ACL’19, AAAI’20 Cross-lingual retrieval EMNLP’15 a cat is trying to eat the food Image/video caption generation COLING’16 , LREC’18 Visual phrase grounding LREC’20 Multimodal machine translation CICLING’19 Unsupervised discourse parsing SIGDIAL’18, TACL’20 Neural input method NAACL’19
  • 5.  Discriminative initialization of Convolutional Neural Network (CNN) [BMVC’13] ◦ Closed-form initialization using Fisher Discriminant Analysis  Frequency-domain CNNs [ICIP’19, MMAsia’19] 5
  • 6. 6  Control the number of output words using a recurrent neural network Jin et al., Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging, In Proc. ICPR, 2016.
  • 7.  Car types identification  Plant species identification ◦ ImageCLEF 2013 Plant Identification Challenge (1st place)  Character recognition ◦ ICDAR Script identification challenge (3rd place) 7 Acura RLMitsubishi Lancer Toyota Camry Audi S4 Honda Accord Mercedes-Benz C-Class
  • 8.  Stochastically switch the cross-entropy loss(CCE)and the mean absolute error loss(MAE) 8Hataya et al., LOL: LEARNING TO OPTIMIZE LOSS SWITCHING UNDER LABEL NOISE, 2018.
  • 9.  There exist some “easy” examples which can be correctly classified at the beginning stage of learning  “Hard” data matters more 9 Kishida et al., EMPIRICAL STUDY OF EASY AND HARD EXAMPLES IN CNN TRAINING, ICONIP 2019.
  • 10.  Co-segmentation: extract common objects in multiple images 10 Chen et al., Semantic Aware Attention Based Deep Object Co-segmentation, In Proc. ACCV, 2018.
  • 11. Han et al., "Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using Highly-Rough Annotation on MR Images", In Proc. of CIKM, 2019. Han et al., "Combining Noise-to-Image and Image-to-Image GANs: Brain MR Image Augmentation for Tumor Detection", IEEE Access, Vol.7, pp.156966-156977, 2019.
  • 12.  Erasing texts in general images [WACV’20]  Erasing general objects [Lazarski, 2018] 12 https://www.youtube.com/watch?v=JvTvyOeAGbU
  • 13.  Diversification of decoding [ACL’19]  Resource-efficient MT ◦ Compression of word vectors (99% off!) [ICLR’18] ◦ Rapid decoding [ACL’18, AAAI’20] 13 Input Beam Search Proposed Syntactic Diversity
  • 14. 14  Obtain syntactic word features Permutation Matrix → update steps
  • 15.  Incorporate Quantum Walk for graph representation learning 15
  • 16. 16 a woman is slicing some vegetables a cat is trying to eat the food a dog is swimming in the pool Input (frame sequence) Output (word sequence) “Translation” from video to text! <BOS> a woman is cooking in the kitchen <EOS> context vector
  • 17.  Multimodal Machine Translation [CICLING’19] ◦ Improve translation with the help of vision 17  Phrase localization [LREC’20] ◦ Identify the image region for a given phrase
  • 18.  Presentations at prestigious conferences/journals ◦ ACL, AAAI, WACV, TACL, ICIP (2020) ◦ ACL, CIKM, Neurocomputing, 3DV, ICIPx3 (2019) ◦ ACL, ICLR, ACCV, SIGDIAL, LREC (2018) ◦ IJCNLP, ICDAR (2017)  Awards ◦ 言語処理学会年次大会 優秀賞、若手奨励賞 (2020) ◦ CVIM研究会奨励賞 (2020) ◦ 情報理工学系研究科長賞 (2020) ◦ 画像の認識・理解シンポジウム 学生奨励賞 (2019) ◦ 電子情報通信学会医用画像研究会奨励賞 (2019) ◦ 言語処理学会年次大会 最優秀賞 (2018) ◦ NMT@ACL outstanding paper award (2017) ◦ 人工知能学会全国大会 優秀賞, 学生奨励賞x2 (2017) 18
  • 19.  Faculty:1(Nakayama)  PhD students:10  Master’s students:12(4~5 per each year)  Secretary:1 19
  • 20.  Monday: Group meeting (2~3h) ◦ Short progress report by all, discussion, study session ◦ Mainly organized by PhD students  Wednesday: Main meeting (2~3h) ◦ Progress report (3~4 students) ◦ Presentation practice, etc.  Others ◦ One-on-one meeting ◦ Project meeting  No other hours on duty 20
  • 21.  Workstation (2GPUs) for each student  Share machines ◦ 4GPUs x 4 ◦ 8GPUs x 2  Cloud computers ◦ University cloud system ◦ ABCI 21
  • 22. 22