SlideShare a Scribd company logo
1 of 35
TEKST NAAR BEELD MET GAN’S
Arthur Decloedt & Joppe Geluykens & Professor Moens
Situering
Wat? - Waarom? - Hoe?
Tekst naar beeld
Wat? - Waarom? - Hoe?
Tekst Beeld (pixels)
Tekst naar beeld (ex.)
Wat? - Waarom? - Hoe?
Relevantie
Interpretatie
Wat? - Waarom? - Hoe?
Commando’s in
natuurlijke taal
Aanpak
Generatief model
Wat? - Waarom? - Hoe?
Initiële onderzoeksvragen
● Invloed op uitvoer wanneer bepaald
attribuut/relatie verandert in invoer
● Visualisatie van geleerde features, i.f.v.
aantal lagen
Hoe leert een neuraal netwerk
over de wereld waarin het zich
bevindt?
● Visual Genome: bevat
uitgebreide spatiale relaties
in de tekst
Kunnen we de beeld
representatie verbeteren?
Obstakels
1
Rekenkracht Data
2
Hyperparameters
3
Onderzoeksvragen (bijgesteld)
● Invloed op uitvoer wanneer bepaald
attribuut/relatie verandert in invoer
● Visualisatie van geleerde features, i.f.v.
beperkingen (aantal lagen, dimensies)
Hoe goed leert een
bestaand tekst-naar-beeld model
met beperkte middelen?
● Visual Genome: bevat
uitgebreide spatiale relaties
in de tekst
Kunnen we de beeld
representatie verbeteren?
Methode
Deep Learning: neural networks
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and
organization in the brain. Psychological review, 65(6), 386.
Deep Learning: neural networks
Trainen van netwerken
Convolutional networks
https://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/
Convolutional networks: discriminatief
Transposed Convolution: generatief
Transposed Convolution (cont.)
https://github.com/vdumoulin/conv_arithmetic
Tekstembedding
Learning Deep Representations of Fine-Grained Visual Descriptions,
https://arxiv.org/pdf/1605.05395.pdf
GAN
https://techcrunch.com/2017/06/20/gangogh/
GAN (cont.)
https://www.analyticsvidhya.com/blog/2017/06/introductory-generative-adversarial-
networks-gans/
Objectief
GENERATOR OBJECTIEF DISCRIMINATOR LOSS
Evaluatie
● Inception Score
● Afstandsmetrieken tot bronverdeling:
○ Frechet Inception Distance
○ Sliced Wasserstein Distance
https://www.win.tue.nl/~wmeulema/research-overview.php
Baseline
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative adversarial text to image
synthesis. arXiv preprint arXiv:1605.05396.
StackGAN++
https://github.com/hanzhanggit/StackGAN-v2
Experimenten
Dataset
CUB-200-2011
● 200 soorten vogels, vooral Amerikaanse, en beschrijvingen ervan
● Totaal 11788 afbeeldingen
○ Train: 8855
○ Test: 2933
Caltech-UCSD Birds-200-2011
Parameters
Batch size 24
Initial learning rate 0.0002
Ruis dimensie 100
Dimensie tekst invoer vector 128
Diepte boom 1
Maximum aantal stappen 300.000
HUIDIGE
EXPERIMENTEN
Batch size ↑,↓
Initial learning rate =
Ruis dimensie ↓
Dimensie tekst invoer vector ↓
Diepte boom =,↑
Maximum aantal stappen =, ↓
TOEKOMSTIGE
EXPERIMENTEN
Resultaten (0 iteraties)
Resultaten (13.3x103 iteraties)
Resultaten (53x103 iteraties)
Resultaten (83.6x103 iteraties)
Resultaten (297.1x103 iteraties)
Evaluatie resultaten
DATASET: CUB Ons model StackGAN++
Inception Score 4,16 4,04±0,05*
FID 263,20 Niet gepubliceerd
SWD 0,05 Niet gepubliceerd
* gebruikten niet-standaard model
METRISCH ERROR EVALUATIE
(In toekomstige
experimenten)
Verder werk
● Loss functie
● Meer training
Betere tekst-beeld overeenkomst
● Andere datasets
● Spatiale relaties
Originele onderzoeksvragen
https://xkcd.com/1838/
Q&A

More Related Content

Featured

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming LanguageSimplilearn
 

Featured (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
 

Text to Image synthesis with GAN's

Editor's Notes

  1. FOCUS: voldoende diepgang Kies enkele slides waar we echt diep gaan JOPPE
  2. JOPPE
  3. WAT? automatisch genereren van beeld dat de inhoud van natuurlijke taal voorstelt, automatisch vertalen van tekst naar pixels van een beeld → niet google image search Zeer recent onderzoeksveld Nu pas mogelijk door sterke GPU’s JOPPE
  4. WAT? automatisch genereren van beeld dat de inhoud van natuurlijke taal voorstelt, automatisch vertalen van tekst naar pixels van een beeld → niet google image search Zeer recent onderzoeksveld Nu pas mogelijk door sterke GPU’s JOPPE
  5. WAAROM? Interpreteren van kennis opgedaan door neurale netwerken Commando’s geven in natuurlijke taal Zelfrijdende auto kan zich situeren in de visuele ruimte en tekst commando's (‘stop aan het blauwe huis’) begrijpen Begrip van hoe “gezond verstand” tot stand komt Automatisch boek → film JOPPE
  6. HOE? Generatief model, geïmplementeerd met neuraal netwerk: GAN JOPPE
  7. JOPPE
  8. Obstakels Rekenkracht Paralleliseren Datasets Variabelen en hyperparameters JOPPE
  9. Onderzoeksdoel deels herzien (visualisatie behouden) Reproduceer bestaand tekst-naar-beeld model Beperkte middelen, wat is het resultaat? Wat leert het netwerk onder bepaalde constraints? JOPPE
  10. ARTHUR
  11. Bestaan uit Perceptrons georganiseerd in lagen Bootsen allerlei functies na Relatief oud idee Gpu innovatie -> sterke toename aan mogelijkheden Arthur
  12. Bestaan uit Perceptrons georganiseerd in lagen Bootsen allerlei functies na Relatief oud idee Gpu innovatie -> sterke toename aan mogelijkheden Arthur
  13. Gradient descent (in meerdere dimensies!) Backpropagation Arthur
  14. Filter over meerdere pixels Perceptrons houden meer rekening met (ruimtelijke) context Arthur
  15. Filter over meerdere pixels Perceptrons houden meer rekening met (ruimtelijke) context Arthur
  16. Usampling Genereren van features Arthur
  17. Makkelijker creëren van structuur ARTHUR
  18. Model moet input kunnen begrijpen Is pretrained op taal, specifiek voor onze dataset Char-CNN-RNN structuur RNN capteert sequentiële structuur van Engelse taal goed ARTHUR
  19. 2 neurale netwerken Leren tegen elkaar op Discriminator fungeert als loss function Terwijl de generator leert genereren leert de discriminator hem beoordelen Arthur
  20. Arthur
  21. Optimalisatieprobleem → definieer te optimaliseren waarde (L_G, L_D) Loss Discriminator -> Loss functie Generator Generator : Wordt dit geaccepteerd door discriminator? L_BCE: Classificatieperformantie van disc op gegenereerde data: G wilt dat dit altijd ‘1’ is → L_BCE minimaliseren Discriminator : Werd juiste inschatting gemaakt? Indicatie van classificatie-performantie op echte, gegenereerde en mismatch tekst-beeld Minimaliseren ongelijkheid tussen a) echte data, uniforme verd 1, b) gegenereerde data, unifome 0, c) mismatch, uniforme 0 JOPPE
  22. IS: meest gebruikt, leg uit FID & SWD: inschatting van afstand tussen 2 verdelingen arthur
  23. Enkelvoudige GAN Arthur
  24. Multi-Gan Model Geïtereerde generatie Iteratief verhogen van de resolutie Verschenen 19 okt 2017 ARTHUR
  25. JOPPE
  26. JOPPE
  27. Toekomstige experimenten (low resource setting)
  28. 0 JOPPE
  29. 13300 JOPPE
  30. 53000 JOPPE
  31. 83600 JOPPE
  32. 95000 JOPPE
  33. 297100 JOPPE
  34. Als we vergelijken met de SOTA (op andere datasets) zijn deze scores beduidend slechter. Maar: wij zitten met beperkte resources JOPPE
  35. Benader SOTA (AttnGAN) JOPPE