SlideShare a Scribd company logo
Internship Challenge Presentation:
Font Generator
Author: Tomohiro Inoue
Date: 2021/06/18
Overview
I created a system that can extract style information from a single sample
image and generate an entire font set with uniformity.
2
Output: Image of the entire font set
Input: 1 font image
Table of contents
• Background
• Method
• Results and discussion
• Future work
3
Background: Creating a font set
Creating a font set is very labor intensive.
The only way is for the creator to prepare the images one by one.
4
The more characters there are,
the longer it takes to create.
If it takes 3 min. /character…
Alphabet: {A, B, C, …}
→ 52 classes = 2.6 h
Kanji: {一, 二, 三, …}
→ 2136 classes = 106.8 h
Background: Style consistency of font sets
It is not necessary to check all the fonts to get a feel for the font set.
5
Font samples. You can get an idea of the overall atmosphere from just some of the letters.
Avenir Next
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Baskerville
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor incididunt
ut labore et dolore magna
aliqua.
Didot
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Problem-setting: Generating an entire font set from a few samples
Considering the problem of generating an entire unified font set from a subset
of samples in the font set.
6
Font sample
ABCDEFG
HIJKLMN
OPQRSTU
VWXYZ
Entire font set
Hypothesis: How to generate an entire font set from a few samples?
It would be effective to extract font styles from a few samples and
generate them based on the font style and class information.
7
Font sample
ABCDEFG
HIJKLMN
OPQRSTU
VWXYZ
A
Entire font set
𝑧𝑠
Style information.
Mincho or Gothic, etc.
𝑧𝑐
Class information. A, B, … etc.
Extraction
Table of contents
• Background
• Method
• Results and discussion
• Future work
8
Overall system structure
9
Style Extractor
𝐸 𝑧𝑠
Style Vector
𝑥
Image
𝑧
𝑧𝑐
𝐺(𝑧)
Generated Images
Generator
Input
Class Vector
𝐺
Generator: GlyphGAN (1/4)
GlyphGAN (Hayashi et al., 2019) is a type of GAN that generates
a consistent and diverse font sets.
10
𝑧
𝑧𝑠
𝑧𝑐
𝐺(𝑧)
𝑥
𝐷
Dataset
Generated Images
Generator
Input
Style Vector
Class Vector
Discriminator
𝑦
𝐺
Generator: GlyphGAN (2/4)
The generator and discriminator are CNN-based models.
11
Generator (top) and Discriminator (bottom)
Photo by (Hayashi et al., 2019)
Generator: GlyphGAN (3/4)
Input consists of style information and class information.
12
𝑧
𝑧𝑠
𝑧𝑐
Input
Style Vector
Class Vector
𝑧𝑐
: Class Vector
ex.
A → [1,0, ⋯ , 0]𝑇
B → [0,1, ⋯ , 0]𝑇
⋮
Z → [0,0, ⋯ , 1]𝑇
𝑧𝑠: Style Vector
𝑧𝑠
∈ ℝ𝑛
, 𝑧𝑖
𝑠
∼ 𝑈(−1,1)
ex.
[0.1, −0.7, ⋯ , 0.5] ∈ ℝ100
Generator: GlyphGAN (4/4)
Stable learning is achieved by introducing the loss function of WGAN-GP.
The training of WGAN-GP
13
Extractor: CNN-based model (1/2)
Outputs a style vector with a single sample image as input.
14
Style Extractor
𝐸 𝑧
Output
𝑧𝑠
Style Vector
𝑥
Image
The structure of the style extractor
The structure is the same as
GlyphGAN’s Discriminator
except for the last layer.
Photo by (Hayashi et al., 2019)
Extractor: CNN-based model (2/2)
Create a dataset using a trained generator.
15
𝑧 𝐺(𝑧)
Dataset for training
Style Extractor
𝐸 𝑧
Output
𝑧𝑠
Style Vector
𝑥
Image
Table of contents
• Background
• Method
• Results and discussion
• Future work
16
Generator: Training dataset
Dataset: Alphabet Characters Fonts Dataset
Number of data: 26 classes × 6561 font types
17
Sample images in the dataset
Generator: Training
Training settings
• batch size: 1024
• epochs: 2500
• optimizer: Adam (lr=0.0002)
• criterion: WGAN-GP
18
Learning curve (Wasserstein distance)
Generator: Examples of generation
Generated font sets for each Style Vector.
19
Extractor: Training dataset
Dataset: Generated by GlyphGAN’s Generator
Number of data: 26 classes × 10000 styles
20
× 10000 Styles
Extractor: Training
Training settings
• batch size: 1024
• epochs: 1000
• optimizer: Adam (lr=0.0002)
• criterion: MSE
21
Learning curve(loss)
Examples of image generation
22
Output: Image of the entire font set
Input: 1 font image
Style extraction
& Generation
Style extraction
& Generation
Evaluation 1: Legibility of generated images (1/3)
Create a CNN-based multi-class classification model.
Compare the accuracy on the dataset with that on the generated images.
23
The structure of multi-font classifier
Photo by (Hayashi et al., 2019)
Evaluation 1: Legibility of generated images (2/3)
Number of data:
train: 26 classes × 6561 font types,
validation: 26 classes × 8429 font types
Training settings:
• batch size: 1024
• epochs: 100
• optimizer: Adam (lr=0.0002)
• criterion: cross entropy
24
Learning curves: accuracy (top) and loss (bottom)
Evaluation 1: Legibility of generated images (3/3)
Evaluation results
It can be confirmed that a certain level of readability has been achieved.
25
Accuracy
Training dataset
 (6561 font types) 97.0%
Test dataset 
(8429 font types) 89.9%
Generated fonts
(10000 font types)
82.6%
Evaluation 2: Style extraction (1/3)
Calculate the average similarity (SSIM) between the fonts in the dataset and the
fonts generated by style extraction and generation.
26
Fonts in dataset
Style extraction
& Generation
Generated fonts
Calculation of similarity (SSIM)
Evaluation 2: Style extraction (2/3)
SSIM is a perception-based model that considers image degradation as
perceived change in structural information.
27
MSE vs. SSIM
Photo by (Wang and Bovik, 2009)
𝑆𝑆𝐼𝑀(𝑥, 𝑦) =
(2𝜇𝑥𝜇𝑦 + 𝑐1)(2𝜎𝑥𝑦 + 𝑐2)
(𝜇𝑥
2
+ 𝜇𝑦
2
+ 𝑐1)(𝜎𝑥
2
+ 𝜎𝑦
2
+ 𝑐2)
Evaluation 2: Style extraction (3/3)
One character from each font set was randomly selected.
Evaluation results
Style extraction works to some extent, but not well enough,
28
Average similarity
Training dataset
 (6561 font types) 67.2%
Test dataset 
(8429 font types) 66.0%
Evaluation 3: Style consistency (1/2)
Calculate the average similarity (SSIM) between the font sets in the dataset and
the font sets generated by style extraction and generation.
29
Font set in dataset
ABCDEFG
HIJKLMN
OPQRSTU
VWXYZ
A
Sampling
Style extraction
& Generation
ABCDEFG
HIJKLMN
OPQRSTU
VWXYZ
Generated font set
Calculation of similarity (SSIM)
Evaluation 3: Style consistency (2/2)
Evaluation results
The similarity between font sets is not high.
The low accuracy of the extractor may be a bottleneck.
30
Average similarity
Training dataset
 (6561 font types) 52.4%
Test dataset 
(8429 font types) 51.4%
Application demo
31
Table of contents
• Background
• Method
• Results and discussion
• Future work
32
Current problem 1: Accuracy of the extractor
Improvements could be made by using the SSIM losses of the original and
generated images during training.
33
Style Extractor
𝐸
𝑥
Image
𝑧 𝐺(𝑧)
Generated Images
Generator
Input
𝐺
Current problem 2: Inefficiency of extractor training
After the generator is trained, the corresponding extractor needs to be trained.
This could be improved by using models such as VAE or flow-based models.
34
VAE (top) and Flow-based model (bottom)
Encoder
𝐸
𝑥
Image
𝑧 𝐷(𝑧)
Generated Images
Decoder
Latent
𝐷
Flow
𝑓
𝑥
Image
𝑧 𝑓−1(𝑧)
Generated Images
Inverse
𝑓−1
Latent
Current problem 3: Small dataset
The relationship between the number of datasets and the accuracy of
generation needs to be investigated.
35
Results for Hiragana dataset
Number of data: 84 classes × 50 font types
Application examples of Style Extractor + GlyphGAN System
If enough datasets can be prepared, applications that reduce the burden on
creators can be considered.
36
ex. A system to create assets of your own art style from a single sample.
Conclusion
What I made:
A system that combines Style Extractor and GlyphGAN to create an entire font
set from a single font image.
Level of achievement:
• A certain level of readability.
• The reproduction of style remains an issue.
37
References
• [1] Hayashi et al., “GlyphGAN: Style-Consistent Font Generation Based on
Generative Adversarial Networks”, 2019.
• [2] Wang and Bovik, “Mean squared error: Love it or leave it? A new look at
Signal Fidelity Measures”, 2009.
38

More Related Content

Similar to May internship challenge: Font Generator

Interactive Machine Learning Appendix
Interactive  Machine Learning AppendixInteractive  Machine Learning Appendix
Interactive Machine Learning Appendix
Zitao Liu
 
M sc thesis proposal v4
M sc thesis proposal v4M sc thesis proposal v4
M sc thesis proposal v4
Ashenafi Workie
 
Design Patterns - General Introduction
Design Patterns - General IntroductionDesign Patterns - General Introduction
Design Patterns - General Introduction
Asma CHERIF
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
Computational Methods in Physics for Students
Computational Methods in Physics for StudentsComputational Methods in Physics for Students
Computational Methods in Physics for Students
MaheshPatil527151
 
Working with color and font
Working with color and fontWorking with color and font
Working with color and fontmyrajendra
 
2014 01-ticosa
2014 01-ticosa2014 01-ticosa
2014 01-ticosa
Pharo
 
DiscoGAN
DiscoGANDiscoGAN
DiscoGAN
Il Gu Yi
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1
San Kim
 
Csmr06a.ppt
Csmr06a.pptCsmr06a.ppt
Artist Assistant AI(AAA)
Artist Assistant AI(AAA)Artist Assistant AI(AAA)
Artist Assistant AI(AAA)
Gunhee Lee
 
How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...
Wee Hyong Tok
 
May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]owenchambers11
 
May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]owenchambers11
 
May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]owenchambers11
 
May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]owenchambers11
 
May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]owenchambers11
 
Presentation on BornoNet Research Paper and Python Basics
Presentation on BornoNet Research Paper and Python BasicsPresentation on BornoNet Research Paper and Python Basics
Presentation on BornoNet Research Paper and Python Basics
Shibbir Ahmed
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 

Similar to May internship challenge: Font Generator (20)

Interactive Machine Learning Appendix
Interactive  Machine Learning AppendixInteractive  Machine Learning Appendix
Interactive Machine Learning Appendix
 
M sc thesis proposal v4
M sc thesis proposal v4M sc thesis proposal v4
M sc thesis proposal v4
 
Design Patterns - General Introduction
Design Patterns - General IntroductionDesign Patterns - General Introduction
Design Patterns - General Introduction
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Computational Methods in Physics for Students
Computational Methods in Physics for StudentsComputational Methods in Physics for Students
Computational Methods in Physics for Students
 
Working with color and font
Working with color and fontWorking with color and font
Working with color and font
 
2014 01-ticosa
2014 01-ticosa2014 01-ticosa
2014 01-ticosa
 
DiscoGAN
DiscoGANDiscoGAN
DiscoGAN
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1
 
Csmr06a.ppt
Csmr06a.pptCsmr06a.ppt
Csmr06a.ppt
 
Artist Assistant AI(AAA)
Artist Assistant AI(AAA)Artist Assistant AI(AAA)
Artist Assistant AI(AAA)
 
How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...
 
May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]
 
May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]
 
May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]
 
May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]
 
May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]May june 2010 scenario 4 [documentation]
May june 2010 scenario 4 [documentation]
 
Presentation on BornoNet Research Paper and Python Basics
Presentation on BornoNet Research Paper and Python BasicsPresentation on BornoNet Research Paper and Python Basics
Presentation on BornoNet Research Paper and Python Basics
 
Design Patterns
Design PatternsDesign Patterns
Design Patterns
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 

More from Ridge-i, Inc.

Unsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overviewUnsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overview
Ridge-i, Inc.
 
Continual Learning Introduction
Continual Learning IntroductionContinual Learning Introduction
Continual Learning Introduction
Ridge-i, Inc.
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
Ridge-i, Inc.
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Ridge-i, Inc.
 
How to learn with non-reliable labels?
How to learn with non-reliable labels?How to learn with non-reliable labels?
How to learn with non-reliable labels?
Ridge-i, Inc.
 
How to learn with non-reliable labels? (Japanese version)
How to learn with non-reliable labels? (Japanese version)How to learn with non-reliable labels? (Japanese version)
How to learn with non-reliable labels? (Japanese version)
Ridge-i, Inc.
 
May internship challenge: User Authentication System only using image data: C...
May internship challenge: User Authentication System only using image data: C...May internship challenge: User Authentication System only using image data: C...
May internship challenge: User Authentication System only using image data: C...
Ridge-i, Inc.
 
May internship challenge: Estimating Distance between Two Balls App
May internship challenge: Estimating Distance between Two Balls AppMay internship challenge: Estimating Distance between Two Balls App
May internship challenge: Estimating Distance between Two Balls App
Ridge-i, Inc.
 

More from Ridge-i, Inc. (8)

Unsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overviewUnsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overview
 
Continual Learning Introduction
Continual Learning IntroductionContinual Learning Introduction
Continual Learning Introduction
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
How to learn with non-reliable labels?
How to learn with non-reliable labels?How to learn with non-reliable labels?
How to learn with non-reliable labels?
 
How to learn with non-reliable labels? (Japanese version)
How to learn with non-reliable labels? (Japanese version)How to learn with non-reliable labels? (Japanese version)
How to learn with non-reliable labels? (Japanese version)
 
May internship challenge: User Authentication System only using image data: C...
May internship challenge: User Authentication System only using image data: C...May internship challenge: User Authentication System only using image data: C...
May internship challenge: User Authentication System only using image data: C...
 
May internship challenge: Estimating Distance between Two Balls App
May internship challenge: Estimating Distance between Two Balls AppMay internship challenge: Estimating Distance between Two Balls App
May internship challenge: Estimating Distance between Two Balls App
 

Recently uploaded

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

May internship challenge: Font Generator

  • 1. Internship Challenge Presentation: Font Generator Author: Tomohiro Inoue Date: 2021/06/18
  • 2. Overview I created a system that can extract style information from a single sample image and generate an entire font set with uniformity. 2 Output: Image of the entire font set Input: 1 font image
  • 3. Table of contents • Background • Method • Results and discussion • Future work 3
  • 4. Background: Creating a font set Creating a font set is very labor intensive. The only way is for the creator to prepare the images one by one. 4 The more characters there are, the longer it takes to create. If it takes 3 min. /character… Alphabet: {A, B, C, …} → 52 classes = 2.6 h Kanji: {一, 二, 三, …} → 2136 classes = 106.8 h
  • 5. Background: Style consistency of font sets It is not necessary to check all the fonts to get a feel for the font set. 5 Font samples. You can get an idea of the overall atmosphere from just some of the letters. Avenir Next Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Baskerville Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Didot Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  • 6. Problem-setting: Generating an entire font set from a few samples Considering the problem of generating an entire unified font set from a subset of samples in the font set. 6 Font sample ABCDEFG HIJKLMN OPQRSTU VWXYZ Entire font set
  • 7. Hypothesis: How to generate an entire font set from a few samples? It would be effective to extract font styles from a few samples and generate them based on the font style and class information. 7 Font sample ABCDEFG HIJKLMN OPQRSTU VWXYZ A Entire font set 𝑧𝑠 Style information. Mincho or Gothic, etc. 𝑧𝑐 Class information. A, B, … etc. Extraction
  • 8. Table of contents • Background • Method • Results and discussion • Future work 8
  • 9. Overall system structure 9 Style Extractor 𝐸 𝑧𝑠 Style Vector 𝑥 Image 𝑧 𝑧𝑐 𝐺(𝑧) Generated Images Generator Input Class Vector 𝐺
  • 10. Generator: GlyphGAN (1/4) GlyphGAN (Hayashi et al., 2019) is a type of GAN that generates a consistent and diverse font sets. 10 𝑧 𝑧𝑠 𝑧𝑐 𝐺(𝑧) 𝑥 𝐷 Dataset Generated Images Generator Input Style Vector Class Vector Discriminator 𝑦 𝐺
  • 11. Generator: GlyphGAN (2/4) The generator and discriminator are CNN-based models. 11 Generator (top) and Discriminator (bottom) Photo by (Hayashi et al., 2019)
  • 12. Generator: GlyphGAN (3/4) Input consists of style information and class information. 12 𝑧 𝑧𝑠 𝑧𝑐 Input Style Vector Class Vector 𝑧𝑐 : Class Vector ex. A → [1,0, ⋯ , 0]𝑇 B → [0,1, ⋯ , 0]𝑇 ⋮ Z → [0,0, ⋯ , 1]𝑇 𝑧𝑠: Style Vector 𝑧𝑠 ∈ ℝ𝑛 , 𝑧𝑖 𝑠 ∼ 𝑈(−1,1) ex. [0.1, −0.7, ⋯ , 0.5] ∈ ℝ100
  • 13. Generator: GlyphGAN (4/4) Stable learning is achieved by introducing the loss function of WGAN-GP. The training of WGAN-GP 13
  • 14. Extractor: CNN-based model (1/2) Outputs a style vector with a single sample image as input. 14 Style Extractor 𝐸 𝑧 Output 𝑧𝑠 Style Vector 𝑥 Image The structure of the style extractor The structure is the same as GlyphGAN’s Discriminator except for the last layer. Photo by (Hayashi et al., 2019)
  • 15. Extractor: CNN-based model (2/2) Create a dataset using a trained generator. 15 𝑧 𝐺(𝑧) Dataset for training Style Extractor 𝐸 𝑧 Output 𝑧𝑠 Style Vector 𝑥 Image
  • 16. Table of contents • Background • Method • Results and discussion • Future work 16
  • 17. Generator: Training dataset Dataset: Alphabet Characters Fonts Dataset Number of data: 26 classes × 6561 font types 17 Sample images in the dataset
  • 18. Generator: Training Training settings • batch size: 1024 • epochs: 2500 • optimizer: Adam (lr=0.0002) • criterion: WGAN-GP 18 Learning curve (Wasserstein distance)
  • 19. Generator: Examples of generation Generated font sets for each Style Vector. 19
  • 20. Extractor: Training dataset Dataset: Generated by GlyphGAN’s Generator Number of data: 26 classes × 10000 styles 20 × 10000 Styles
  • 21. Extractor: Training Training settings • batch size: 1024 • epochs: 1000 • optimizer: Adam (lr=0.0002) • criterion: MSE 21 Learning curve(loss)
  • 22. Examples of image generation 22 Output: Image of the entire font set Input: 1 font image Style extraction & Generation Style extraction & Generation
  • 23. Evaluation 1: Legibility of generated images (1/3) Create a CNN-based multi-class classification model. Compare the accuracy on the dataset with that on the generated images. 23 The structure of multi-font classifier Photo by (Hayashi et al., 2019)
  • 24. Evaluation 1: Legibility of generated images (2/3) Number of data: train: 26 classes × 6561 font types, validation: 26 classes × 8429 font types Training settings: • batch size: 1024 • epochs: 100 • optimizer: Adam (lr=0.0002) • criterion: cross entropy 24 Learning curves: accuracy (top) and loss (bottom)
  • 25. Evaluation 1: Legibility of generated images (3/3) Evaluation results It can be confirmed that a certain level of readability has been achieved. 25 Accuracy Training dataset
 (6561 font types) 97.0% Test dataset 
(8429 font types) 89.9% Generated fonts (10000 font types) 82.6%
  • 26. Evaluation 2: Style extraction (1/3) Calculate the average similarity (SSIM) between the fonts in the dataset and the fonts generated by style extraction and generation. 26 Fonts in dataset Style extraction & Generation Generated fonts Calculation of similarity (SSIM)
  • 27. Evaluation 2: Style extraction (2/3) SSIM is a perception-based model that considers image degradation as perceived change in structural information. 27 MSE vs. SSIM Photo by (Wang and Bovik, 2009) 𝑆𝑆𝐼𝑀(𝑥, 𝑦) = (2𝜇𝑥𝜇𝑦 + 𝑐1)(2𝜎𝑥𝑦 + 𝑐2) (𝜇𝑥 2 + 𝜇𝑦 2 + 𝑐1)(𝜎𝑥 2 + 𝜎𝑦 2 + 𝑐2)
  • 28. Evaluation 2: Style extraction (3/3) One character from each font set was randomly selected. Evaluation results Style extraction works to some extent, but not well enough, 28 Average similarity Training dataset
 (6561 font types) 67.2% Test dataset 
(8429 font types) 66.0%
  • 29. Evaluation 3: Style consistency (1/2) Calculate the average similarity (SSIM) between the font sets in the dataset and the font sets generated by style extraction and generation. 29 Font set in dataset ABCDEFG HIJKLMN OPQRSTU VWXYZ A Sampling Style extraction & Generation ABCDEFG HIJKLMN OPQRSTU VWXYZ Generated font set Calculation of similarity (SSIM)
  • 30. Evaluation 3: Style consistency (2/2) Evaluation results The similarity between font sets is not high. The low accuracy of the extractor may be a bottleneck. 30 Average similarity Training dataset
 (6561 font types) 52.4% Test dataset 
(8429 font types) 51.4%
  • 32. Table of contents • Background • Method • Results and discussion • Future work 32
  • 33. Current problem 1: Accuracy of the extractor Improvements could be made by using the SSIM losses of the original and generated images during training. 33 Style Extractor 𝐸 𝑥 Image 𝑧 𝐺(𝑧) Generated Images Generator Input 𝐺
  • 34. Current problem 2: Inefficiency of extractor training After the generator is trained, the corresponding extractor needs to be trained. This could be improved by using models such as VAE or flow-based models. 34 VAE (top) and Flow-based model (bottom) Encoder 𝐸 𝑥 Image 𝑧 𝐷(𝑧) Generated Images Decoder Latent 𝐷 Flow 𝑓 𝑥 Image 𝑧 𝑓−1(𝑧) Generated Images Inverse 𝑓−1 Latent
  • 35. Current problem 3: Small dataset The relationship between the number of datasets and the accuracy of generation needs to be investigated. 35 Results for Hiragana dataset Number of data: 84 classes × 50 font types
  • 36. Application examples of Style Extractor + GlyphGAN System If enough datasets can be prepared, applications that reduce the burden on creators can be considered. 36 ex. A system to create assets of your own art style from a single sample.
  • 37. Conclusion What I made: A system that combines Style Extractor and GlyphGAN to create an entire font set from a single font image. Level of achievement: • A certain level of readability. • The reproduction of style remains an issue. 37
  • 38. References • [1] Hayashi et al., “GlyphGAN: Style-Consistent Font Generation Based on Generative Adversarial Networks”, 2019. • [2] Wang and Bovik, “Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures”, 2009. 38