SlideShare a Scribd company logo
Kyonggi Univ. AI Lab.
CPGAN : CONTENT-PARSING GENERATIVE
ADVERSARIAL NETWORKS FOR TEXT-TO-IMAGE SYNTHESIS
2021.1.18
정규열
Artificial Intelligence Lab
Kyonggi Univiersity
Kyonggi Univ. AI Lab.
Index
 도입 배경
 CP-GAN
 Coarse-to-fine Generative Framework
 Memory-Attended Text Encoder
 Fine-grained Conditional Discriminator
 실험
 결론
Kyonggi Univ. AI Lab.
도입 배경
Kyonggi Univ. AI Lab.
도입 배경
 기존까지 제안된 text-to-image 모델들의 특징
 Text을 이미지로 변환하기 위한 구조적 제안이 대부분 이었다.
 이 방법은 서로 교차 해석을 해야 하기 때문에 상당히 어렵다.
 CP GAN
 Text와 합성된 Image 모두 Parsing한 content 에 집중한다.
 Memory structure 사용
 conditional discriminator를 단어와 이미지의 sub-regions 사이의 관계를 세분화
하도록 맞춤 설정 함
소스코드 : https://github.com/dongdongdong666/CPGAN
학습기능은 미포함(사실상 공개 안 할 것으로 보임)
Kyonggi Univ. AI Lab.
도입 배경
 전체 구조
• 1 : 단어와 다양한 visual 맥락 사이의 일치 시킴
• 2 : 이미지를 의미의 관점에 맞춰 생성함
• 3 : 문장과 생성된 이미지 사이의 일관성을 체크한다.
Kyonggi Univ. AI Lab.
도입 배경
 현재 시점에서 Inception score가 높은 알고리즘 이다.
Kyonggi Univ. AI Lab.
CP-GAN
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Coarse-to-fine Generative Framework
CP-GAN Attn-GAN
1, 잔차(residual)를 적용함 -> Generator사이의 정보 전달을 용이하게 함.
2, discriminator를 세분화 시킴 -> unconditional, conditional
Attn-GAN에서 추가된 요소
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Coarse-to-fine Generative Framework
 Generator
 Discriminator
notations
𝐼 : Generator로 부터 생성된 이미지
X : textual description Encoding 기법이 기존의 Attn_GAN이랑 다르다.
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Memory-Attended Text Encoder
 기존의 Encoding 방식
 현재 학습중인 이미지와 문장에만 집중이 가능하다.
 제안하는 방법
 과거의 이미지와 문장도 고려한다.
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Memory-Attended Text Encoder
 Memory Construction
 단어를 visual 맥락과 서로 맞춘다. (parsing)
Visual feature :
m : Attention score가 가장 높은 Visual feature를 뽑은 후 가공함
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Memory-Attended Text Encoder
 Text Encoding with Memory
 이전에 생성한 m으로부터 Text를 encoding 함.
 단어의 embedding 값도 같이 적용한다.(e)
Kyonggi Univ. AI Lab.
CP-GAN
 CP-GAN : Fine-grained Conditional Discriminator
 입력된 자연어와 합성된 이미지를 의미적으로 일치 시킴.
Kyonggi Univ. AI Lab.
실험
Kyonggi Univ. AI Lab.
실험
 정량적 평가
여러가지 평가지표 모두 CP GAN이 우수하다.
Kyonggi Univ. AI Lab.
실험
 정량적 평가
비교적 가벼운 신경망으로도 성능이 좋았다.
Kyonggi Univ. AI Lab.
실험
 정성적 평가
Kyonggi Univ. AI Lab.
실험
 정성적 평가
Kyonggi Univ. AI Lab.
실험
 직접 실행한 결과
Sever airplanes are parked
on an airport runway.
The room is situated on the dark side of the house.
Kyonggi Univ. AI Lab.
결론
Kyonggi Univ. AI Lab.
결론
 Text와 Image를 Parsing 하여 의미적으로 매칭 시키려 하였다.
 Attn Gan에서 Text와 Image encoder 부분을 수정 하였다.
 단어와 sub region간의 연관성을 높이려 하였다.
 fine-grained conditional discriminator
 개인적의견
 이전 모델에 비해 성능은 많이 향상되었다.
 또한 이전 모델에 비해 상대적으로 가벼운 편이다.
 그러나 생성된 품질은 아직은 아쉽다.

More Related Content

Similar to Cpgan content-parsing generative

ADS Team 8 Final Presentation
ADS Team 8 Final PresentationADS Team 8 Final Presentation
ADS Team 8 Final Presentation
Pranay Mankad
 
Client-Side Deep Learning
Client-Side Deep LearningClient-Side Deep Learning
Client-Side Deep Learning
Shuichi Tsutsumi
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
Abhinav Dadhich
 
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
Nathan Mathis
 
20110504 AWS 台北開發者聚會
20110504 AWS 台北開發者聚會20110504 AWS 台北開發者聚會
20110504 AWS 台北開發者聚會
Jui-Nan Lin
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Sangmin Woo
 
Stochastic latent actor critic - deep reinforcement learning with a latent va...
Stochastic latent actor critic - deep reinforcement learning with a latent va...Stochastic latent actor critic - deep reinforcement learning with a latent va...
Stochastic latent actor critic - deep reinforcement learning with a latent va...
KyuYeolJung
 
EMF-IncQuery presentation at TOOLS 2012
EMF-IncQuery presentation at TOOLS 2012EMF-IncQuery presentation at TOOLS 2012
EMF-IncQuery presentation at TOOLS 2012
Istvan Rath
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...
IEEEBEBTECHSTUDENTPROJECTS
 
16 OpenCV Functions to Start your Computer Vision journey.docx
16 OpenCV Functions to Start your Computer Vision journey.docx16 OpenCV Functions to Start your Computer Vision journey.docx
16 OpenCV Functions to Start your Computer Vision journey.docx
ssuser90e017
 
Research plan
Research planResearch plan
Research plan
denaldo2012
 
보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?
광희 이
 
Content based image retrieval Projects.pdf
Content based image retrieval Projects.pdfContent based image retrieval Projects.pdf
Content based image retrieval Projects.pdf
rupaymts
 
dic-160603172047.pdf
dic-160603172047.pdfdic-160603172047.pdf
dic-160603172047.pdf
AkhilJoseph63
 
OCR speech using Labview
OCR speech using LabviewOCR speech using Labview
OCR speech using Labview
Bharat Thakur
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
Sunghoon Joo
 
IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...
IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...
IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...
IRJET Journal
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
IRJET Journal
 
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
AgileNetwork
 
Image Processing In Open CV. Image Processing In Open CV. Image Processing In...
Image Processing In Open CV. Image Processing In Open CV. Image Processing In...Image Processing In Open CV. Image Processing In Open CV. Image Processing In...
Image Processing In Open CV. Image Processing In Open CV. Image Processing In...
Antoinette Williams
 

Similar to Cpgan content-parsing generative (20)

ADS Team 8 Final Presentation
ADS Team 8 Final PresentationADS Team 8 Final Presentation
ADS Team 8 Final Presentation
 
Client-Side Deep Learning
Client-Side Deep LearningClient-Side Deep Learning
Client-Side Deep Learning
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
 
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
 
20110504 AWS 台北開發者聚會
20110504 AWS 台北開發者聚會20110504 AWS 台北開發者聚會
20110504 AWS 台北開發者聚會
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
 
Stochastic latent actor critic - deep reinforcement learning with a latent va...
Stochastic latent actor critic - deep reinforcement learning with a latent va...Stochastic latent actor critic - deep reinforcement learning with a latent va...
Stochastic latent actor critic - deep reinforcement learning with a latent va...
 
EMF-IncQuery presentation at TOOLS 2012
EMF-IncQuery presentation at TOOLS 2012EMF-IncQuery presentation at TOOLS 2012
EMF-IncQuery presentation at TOOLS 2012
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Robust face recognition from multi...
 
16 OpenCV Functions to Start your Computer Vision journey.docx
16 OpenCV Functions to Start your Computer Vision journey.docx16 OpenCV Functions to Start your Computer Vision journey.docx
16 OpenCV Functions to Start your Computer Vision journey.docx
 
Research plan
Research planResearch plan
Research plan
 
보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?
 
Content based image retrieval Projects.pdf
Content based image retrieval Projects.pdfContent based image retrieval Projects.pdf
Content based image retrieval Projects.pdf
 
dic-160603172047.pdf
dic-160603172047.pdfdic-160603172047.pdf
dic-160603172047.pdf
 
OCR speech using Labview
OCR speech using LabviewOCR speech using Labview
OCR speech using Labview
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...
IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...
IRJET- Transformation of Realistic Images and Videos into Cartoon Images and ...
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
 
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
 
Image Processing In Open CV. Image Processing In Open CV. Image Processing In...
Image Processing In Open CV. Image Processing In Open CV. Image Processing In...Image Processing In Open CV. Image Processing In Open CV. Image Processing In...
Image Processing In Open CV. Image Processing In Open CV. Image Processing In...
 

More from KyuYeolJung

Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
KyuYeolJung
 
MARL based on role
MARL based on roleMARL based on role
MARL based on role
KyuYeolJung
 
Style gan
Style ganStyle gan
Style gan
KyuYeolJung
 
Rethinking attention with performers
Rethinking attention with performersRethinking attention with performers
Rethinking attention with performers
KyuYeolJung
 
Language gans falling short
Language gans falling shortLanguage gans falling short
Language gans falling short
KyuYeolJung
 
plug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generationplug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generation
KyuYeolJung
 
COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)
KyuYeolJung
 
TRPO(trust region policy optimization)
TRPO(trust region policy optimization)TRPO(trust region policy optimization)
TRPO(trust region policy optimization)
KyuYeolJung
 

More from KyuYeolJung (8)

Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
 
MARL based on role
MARL based on roleMARL based on role
MARL based on role
 
Style gan
Style ganStyle gan
Style gan
 
Rethinking attention with performers
Rethinking attention with performersRethinking attention with performers
Rethinking attention with performers
 
Language gans falling short
Language gans falling shortLanguage gans falling short
Language gans falling short
 
plug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generationplug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generation
 
COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)
 
TRPO(trust region policy optimization)
TRPO(trust region policy optimization)TRPO(trust region policy optimization)
TRPO(trust region policy optimization)
 

Recently uploaded

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 

Recently uploaded (20)

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 

Cpgan content-parsing generative

  • 1. Kyonggi Univ. AI Lab. CPGAN : CONTENT-PARSING GENERATIVE ADVERSARIAL NETWORKS FOR TEXT-TO-IMAGE SYNTHESIS 2021.1.18 정규열 Artificial Intelligence Lab Kyonggi Univiersity
  • 2. Kyonggi Univ. AI Lab. Index  도입 배경  CP-GAN  Coarse-to-fine Generative Framework  Memory-Attended Text Encoder  Fine-grained Conditional Discriminator  실험  결론
  • 3. Kyonggi Univ. AI Lab. 도입 배경
  • 4. Kyonggi Univ. AI Lab. 도입 배경  기존까지 제안된 text-to-image 모델들의 특징  Text을 이미지로 변환하기 위한 구조적 제안이 대부분 이었다.  이 방법은 서로 교차 해석을 해야 하기 때문에 상당히 어렵다.  CP GAN  Text와 합성된 Image 모두 Parsing한 content 에 집중한다.  Memory structure 사용  conditional discriminator를 단어와 이미지의 sub-regions 사이의 관계를 세분화 하도록 맞춤 설정 함 소스코드 : https://github.com/dongdongdong666/CPGAN 학습기능은 미포함(사실상 공개 안 할 것으로 보임)
  • 5. Kyonggi Univ. AI Lab. 도입 배경  전체 구조 • 1 : 단어와 다양한 visual 맥락 사이의 일치 시킴 • 2 : 이미지를 의미의 관점에 맞춰 생성함 • 3 : 문장과 생성된 이미지 사이의 일관성을 체크한다.
  • 6. Kyonggi Univ. AI Lab. 도입 배경  현재 시점에서 Inception score가 높은 알고리즘 이다.
  • 7. Kyonggi Univ. AI Lab. CP-GAN
  • 8. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Coarse-to-fine Generative Framework CP-GAN Attn-GAN 1, 잔차(residual)를 적용함 -> Generator사이의 정보 전달을 용이하게 함. 2, discriminator를 세분화 시킴 -> unconditional, conditional Attn-GAN에서 추가된 요소
  • 9. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Coarse-to-fine Generative Framework  Generator  Discriminator notations 𝐼 : Generator로 부터 생성된 이미지 X : textual description Encoding 기법이 기존의 Attn_GAN이랑 다르다.
  • 10. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Memory-Attended Text Encoder  기존의 Encoding 방식  현재 학습중인 이미지와 문장에만 집중이 가능하다.  제안하는 방법  과거의 이미지와 문장도 고려한다.
  • 11. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Memory-Attended Text Encoder  Memory Construction  단어를 visual 맥락과 서로 맞춘다. (parsing) Visual feature : m : Attention score가 가장 높은 Visual feature를 뽑은 후 가공함
  • 12. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Memory-Attended Text Encoder  Text Encoding with Memory  이전에 생성한 m으로부터 Text를 encoding 함.  단어의 embedding 값도 같이 적용한다.(e)
  • 13. Kyonggi Univ. AI Lab. CP-GAN  CP-GAN : Fine-grained Conditional Discriminator  입력된 자연어와 합성된 이미지를 의미적으로 일치 시킴.
  • 14. Kyonggi Univ. AI Lab. 실험
  • 15. Kyonggi Univ. AI Lab. 실험  정량적 평가 여러가지 평가지표 모두 CP GAN이 우수하다.
  • 16. Kyonggi Univ. AI Lab. 실험  정량적 평가 비교적 가벼운 신경망으로도 성능이 좋았다.
  • 17. Kyonggi Univ. AI Lab. 실험  정성적 평가
  • 18. Kyonggi Univ. AI Lab. 실험  정성적 평가
  • 19. Kyonggi Univ. AI Lab. 실험  직접 실행한 결과 Sever airplanes are parked on an airport runway. The room is situated on the dark side of the house.
  • 20. Kyonggi Univ. AI Lab. 결론
  • 21. Kyonggi Univ. AI Lab. 결론  Text와 Image를 Parsing 하여 의미적으로 매칭 시키려 하였다.  Attn Gan에서 Text와 Image encoder 부분을 수정 하였다.  단어와 sub region간의 연관성을 높이려 하였다.  fine-grained conditional discriminator  개인적의견  이전 모델에 비해 성능은 많이 향상되었다.  또한 이전 모델에 비해 상대적으로 가벼운 편이다.  그러나 생성된 품질은 아직은 아쉽다.