SlideShare a Scribd company logo
1 of 21
Kyonggi Univ. AI Lab.
CPGAN : CONTENT-PARSING GENERATIVE
ADVERSARIAL NETWORKS FOR TEXT-TO-IMAGE SYNTHESIS
2021.1.18
정규열
Artificial Intelligence Lab
Kyonggi Univiersity
Kyonggi Univ. AI Lab.
Index
 도입 배경
 CP-GAN
 Coarse-to-fine Generative Framework
 Memory-Attended Text Encoder
 Fine-grained Conditional Discriminator
 실험
 결론
Kyonggi Univ. AI Lab.
도입 배경
Kyonggi Univ. AI Lab.
도입 배경
 기존까지 제안된 text-to-image 모델들의 특징
 Text을 이미지로 변환하기 위한 구조적 제안이 대부분 이었다.
 이 방법은 서로 교차 해석을 해야 하기 때문에 상당히 어렵다.
 CP GAN
 Text와 합성된 Image 모두 Parsing한 content 에 집중한다.
 Memory structure 사용
 conditional discriminator를 단어와 이미지의 sub-regions 사이의 관계를 세분화
하도록 맞춤 설정 함
소스코드 : https://github.com/dongdongdong666/CPGAN
학습기능은 미포함(사실상 공개 안 할 것으로 보임)
Kyonggi Univ. AI Lab.
도입 배경
 전체 구조
• 1 : 단어와 다양한 visual 맥락 사이의 일치 시킴
• 2 : 이미지를 의미의 관점에 맞춰 생성함
• 3 : 문장과 생성된 이미지 사이의 일관성을 체크한다.
Kyonggi Univ. AI Lab.
도입 배경
 현재 시점에서 Inception score가 높은 알고리즘 이다.
Kyonggi Univ. AI Lab.
CP-GAN
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Coarse-to-fine Generative Framework
CP-GAN Attn-GAN
1, 잔차(residual)를 적용함 -> Generator사이의 정보 전달을 용이하게 함.
2, discriminator를 세분화 시킴 -> unconditional, conditional
Attn-GAN에서 추가된 요소
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Coarse-to-fine Generative Framework
 Generator
 Discriminator
notations
𝐼 : Generator로 부터 생성된 이미지
X : textual description Encoding 기법이 기존의 Attn_GAN이랑 다르다.
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Memory-Attended Text Encoder
 기존의 Encoding 방식
 현재 학습중인 이미지와 문장에만 집중이 가능하다.
 제안하는 방법
 과거의 이미지와 문장도 고려한다.
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Memory-Attended Text Encoder
 Memory Construction
 단어를 visual 맥락과 서로 맞춘다. (parsing)
Visual feature :
m : Attention score가 가장 높은 Visual feature를 뽑은 후 가공함
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Memory-Attended Text Encoder
 Text Encoding with Memory
 이전에 생성한 m으로부터 Text를 encoding 함.
 단어의 embedding 값도 같이 적용한다.(e)
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Fine-grained Conditional Discriminator
 입력된 자연어와 합성된 이미지를 의미적으로 일치 시킴.
Kyonggi Univ. AI Lab.
실험
Kyonggi Univ. AI Lab.
실험
 정량적 평가
여러가지 평가지표 모두 CP GAN이 우수하다.
Kyonggi Univ. AI Lab.
실험
 정량적 평가
비교적 가벼운 신경망으로도 성능이 좋았다.
Kyonggi Univ. AI Lab.
실험
 정성적 평가
Kyonggi Univ. AI Lab.
실험
 정성적 평가
Kyonggi Univ. AI Lab.
실험
 직접 실행한 결과
Sever airplanes are parked
on an airport runway.
The room is situated on the dark side of the house.
Kyonggi Univ. AI Lab.
결론
Kyonggi Univ. AI Lab.
결론
 Text와 Image를 Parsing 하여 의미적으로 매칭 시키려 하였다.
 Attn Gan에서 Text와 Image encoder 부분을 수정 하였다.
 단어와 sub region간의 연관성을 높이려 하였다.
 fine-grained conditional discriminator
 개인적의견
 이전 모델에 비해 성능은 많이 향상되었다.
 또한 이전 모델에 비해 상대적으로 가벼운 편이다.
 그러나 생성된 품질은 아직은 아쉽다.

More Related Content

Similar to Cpgan content-parsing generative

16 OpenCV Functions to Start your Computer Vision journey.docx
16 OpenCV Functions to Start your Computer Vision journey.docx16 OpenCV Functions to Start your Computer Vision journey.docx
16 OpenCV Functions to Start your Computer Vision journey.docx
ssuser90e017
 

Similar to Cpgan content-parsing generative (20)

ADS Team 8 Final Presentation
ADS Team 8 Final PresentationADS Team 8 Final Presentation
ADS Team 8 Final Presentation
 
Client-Side Deep Learning
Client-Side Deep LearningClient-Side Deep Learning
Client-Side Deep Learning
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
 
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
 
20110504 AWS 台北開發者聚會
20110504 AWS 台北開發者聚會20110504 AWS 台北開發者聚會
20110504 AWS 台北開發者聚會
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
 
Stochastic latent actor critic - deep reinforcement learning with a latent va...
Stochastic latent actor critic - deep reinforcement learning with a latent va...Stochastic latent actor critic - deep reinforcement learning with a latent va...
Stochastic latent actor critic - deep reinforcement learning with a latent va...
 
EMF-IncQuery presentation at TOOLS 2012
EMF-IncQuery presentation at TOOLS 2012EMF-IncQuery presentation at TOOLS 2012
EMF-IncQuery presentation at TOOLS 2012
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...
 
16 OpenCV Functions to Start your Computer Vision journey.docx
16 OpenCV Functions to Start your Computer Vision journey.docx16 OpenCV Functions to Start your Computer Vision journey.docx
16 OpenCV Functions to Start your Computer Vision journey.docx
 
Research plan
Research planResearch plan
Research plan
 
보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?
 
Content based image retrieval Projects.pdf
Content based image retrieval Projects.pdfContent based image retrieval Projects.pdf
Content based image retrieval Projects.pdf
 
dic-160603172047.pdf
dic-160603172047.pdfdic-160603172047.pdf
dic-160603172047.pdf
 
OCR speech using Labview
OCR speech using LabviewOCR speech using Labview
OCR speech using Labview
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...
IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...
IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
 
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
 
Image Processing In Open CV. Image Processing In Open CV. Image Processing In...
Image Processing In Open CV. Image Processing In Open CV. Image Processing In...Image Processing In Open CV. Image Processing In Open CV. Image Processing In...
Image Processing In Open CV. Image Processing In Open CV. Image Processing In...
 

More from KyuYeolJung (8)

Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
 
MARL based on role
MARL based on roleMARL based on role
MARL based on role
 
Style gan
Style ganStyle gan
Style gan
 
Rethinking attention with performers
Rethinking attention with performersRethinking attention with performers
Rethinking attention with performers
 
Language gans falling short
Language gans falling shortLanguage gans falling short
Language gans falling short
 
plug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generationplug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generation
 
COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)
 
TRPO(trust region policy optimization)
TRPO(trust region policy optimization)TRPO(trust region policy optimization)
TRPO(trust region policy optimization)
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 

Recently uploaded (20)

Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 

Cpgan content-parsing generative

  • 1. Kyonggi Univ. AI Lab. CPGAN : CONTENT-PARSING GENERATIVE ADVERSARIAL NETWORKS FOR TEXT-TO-IMAGE SYNTHESIS 2021.1.18 정규열 Artificial Intelligence Lab Kyonggi Univiersity
  • 2. Kyonggi Univ. AI Lab. Index  도입 배경  CP-GAN  Coarse-to-fine Generative Framework  Memory-Attended Text Encoder  Fine-grained Conditional Discriminator  실험  결론
  • 3. Kyonggi Univ. AI Lab. 도입 배경
  • 4. Kyonggi Univ. AI Lab. 도입 배경  기존까지 제안된 text-to-image 모델들의 특징  Text을 이미지로 변환하기 위한 구조적 제안이 대부분 이었다.  이 방법은 서로 교차 해석을 해야 하기 때문에 상당히 어렵다.  CP GAN  Text와 합성된 Image 모두 Parsing한 content 에 집중한다.  Memory structure 사용  conditional discriminator를 단어와 이미지의 sub-regions 사이의 관계를 세분화 하도록 맞춤 설정 함 소스코드 : https://github.com/dongdongdong666/CPGAN 학습기능은 미포함(사실상 공개 안 할 것으로 보임)
  • 5. Kyonggi Univ. AI Lab. 도입 배경  전체 구조 • 1 : 단어와 다양한 visual 맥락 사이의 일치 시킴 • 2 : 이미지를 의미의 관점에 맞춰 생성함 • 3 : 문장과 생성된 이미지 사이의 일관성을 체크한다.
  • 6. Kyonggi Univ. AI Lab. 도입 배경  현재 시점에서 Inception score가 높은 알고리즘 이다.
  • 7. Kyonggi Univ. AI Lab. CP-GAN
  • 8. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Coarse-to-fine Generative Framework CP-GAN Attn-GAN 1, 잔차(residual)를 적용함 -> Generator사이의 정보 전달을 용이하게 함. 2, discriminator를 세분화 시킴 -> unconditional, conditional Attn-GAN에서 추가된 요소
  • 9. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Coarse-to-fine Generative Framework  Generator  Discriminator notations 𝐼 : Generator로 부터 생성된 이미지 X : textual description Encoding 기법이 기존의 Attn_GAN이랑 다르다.
  • 10. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Memory-Attended Text Encoder  기존의 Encoding 방식  현재 학습중인 이미지와 문장에만 집중이 가능하다.  제안하는 방법  과거의 이미지와 문장도 고려한다.
  • 11. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Memory-Attended Text Encoder  Memory Construction  단어를 visual 맥락과 서로 맞춘다. (parsing) Visual feature : m : Attention score가 가장 높은 Visual feature를 뽑은 후 가공함
  • 12. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Memory-Attended Text Encoder  Text Encoding with Memory  이전에 생성한 m으로부터 Text를 encoding 함.  단어의 embedding 값도 같이 적용한다.(e)
  • 13. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Fine-grained Conditional Discriminator  입력된 자연어와 합성된 이미지를 의미적으로 일치 시킴.
  • 14. Kyonggi Univ. AI Lab. 실험
  • 15. Kyonggi Univ. AI Lab. 실험  정량적 평가 여러가지 평가지표 모두 CP GAN이 우수하다.
  • 16. Kyonggi Univ. AI Lab. 실험  정량적 평가 비교적 가벼운 신경망으로도 성능이 좋았다.
  • 17. Kyonggi Univ. AI Lab. 실험  정성적 평가
  • 18. Kyonggi Univ. AI Lab. 실험  정성적 평가
  • 19. Kyonggi Univ. AI Lab. 실험  직접 실행한 결과 Sever airplanes are parked on an airport runway. The room is situated on the dark side of the house.
  • 20. Kyonggi Univ. AI Lab. 결론
  • 21. Kyonggi Univ. AI Lab. 결론  Text와 Image를 Parsing 하여 의미적으로 매칭 시키려 하였다.  Attn Gan에서 Text와 Image encoder 부분을 수정 하였다.  단어와 sub region간의 연관성을 높이려 하였다.  fine-grained conditional discriminator  개인적의견  이전 모델에 비해 성능은 많이 향상되었다.  또한 이전 모델에 비해 상대적으로 가벼운 편이다.  그러나 생성된 품질은 아직은 아쉽다.