May internship challenge: Font Generator

Internship Challenge Presentation:
Font Generator
Author: Tomohiro Inoue
Date: 2021/06/18

Overview
I created a system that can extract style information from a single sample
image and generate an entire font set with uniformity.
2
Output: Image of the entire font set
Input: 1 font image

Table of contents
• Background
• Method
• Results and discussion
• Future work
3

Background: Creating a font set
Creating a font set is very labor intensive.
The only way is for the creator to prepare the images one by one.
4
The more characters there are,
the longer it takes to create.
If it takes 3 min. /character…
Alphabet: {A, B, C, …}
→ 52 classes = 2.6 h
Kanji: {一, 二, 三, …}
→ 2136 classes = 106.8 h

Background: Style consistency of font sets
It is not necessary to check all the fonts to get a feel for the font set.
5
Font samples. You can get an idea of the overall atmosphere from just some of the letters.
Avenir Next
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor
incididunt ut labore et
dolore magna aliqua.
Baskerville
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor incididunt
ut labore et dolore magna
aliqua.
Didot
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor
incididunt ut labore et
dolore magna aliqua.

Problem-setting: Generating an entire font set from a few samples
Considering the problem of generating an entire unified font set from a subset
of samples in the font set.
6
Font sample
ABCDEFG
HIJKLMN
OPQRSTU
VWXYZ
Entire font set

Hypothesis: How to generate an entire font set from a few samples?
It would be effective to extract font styles from a few samples and
generate them based on the font style and class information.
7
Font sample
ABCDEFG
HIJKLMN
OPQRSTU
VWXYZ
A
Entire font set
𝑧𝑠
Style information.
Mincho or Gothic, etc.
𝑧𝑐
Class information. A, B, … etc.
Extraction

Table of contents
• Background
• Method
• Future work
8

Overall system structure
9
Style Extractor
𝐸 𝑧𝑠
Style Vector
𝑥
Image
𝑧
𝑧𝑐
𝐺(𝑧)
Generated Images
Generator
Input
Class Vector
𝐺

Generator: GlyphGAN (1/4)
GlyphGAN (Hayashi et al., 2019) is a type of GAN that generates
a consistent and diverse font sets.
10
𝑧
𝑧𝑠
𝑧𝑐
𝐺(𝑧)
𝑥
𝐷
Dataset
Generated Images
Generator
Input
Style Vector
Class Vector
Discriminator
𝑦
𝐺

The generator and discriminator are CNN-based models.
11
Generator (top) and Discriminator (bottom)
Photo by (Hayashi et al., 2019)

Input consists of style information and class information.
12
𝑧
𝑧𝑠
𝑧𝑐
Input
Style Vector
Class Vector
𝑧𝑐
: Class Vector
ex.
A → [1,0, ⋯ , 0]𝑇
B → [0,1, ⋯ , 0]𝑇
⋮
Z → [0,0, ⋯ , 1]𝑇
𝑧𝑠: Style Vector
𝑧𝑠
∈ ℝ𝑛
, 𝑧𝑖
𝑠
∼ 𝑈(−1,1)
ex.
[0.1, −0.7, ⋯ , 0.5] ∈ ℝ100

Stable learning is achieved by introducing the loss function of WGAN-GP.
The training of WGAN-GP
13

Extractor: CNN-based model (1/2)
Outputs a style vector with a single sample image as input.
14
Style Extractor
𝐸 𝑧
Output
𝑧𝑠
Style Vector
𝑥
Image
The structure of the style extractor
The structure is the same as
GlyphGAN’s Discriminator
except for the last layer.

Extractor: CNN-based model (2/2)
Create a dataset using a trained generator.
15
𝑧 𝐺(𝑧)
Dataset for training
Style Extractor
𝐸 𝑧
Output
𝑧𝑠
Style Vector
𝑥
Image

Table of contents
• Background
• Method
• Future work
16

Generator: Training dataset
Dataset: Alphabet Characters Fonts Dataset
Number of data: 26 classes × 6561 font types
17
Sample images in the dataset

Generator: Training
Training settings
• batch size: 1024
• epochs: 2500
• optimizer: Adam (lr=0.0002)
• criterion: WGAN-GP
18
Learning curve (Wasserstein distance)

Generator: Examples of generation
Generated font sets for each Style Vector.
19

Extractor: Training dataset
Dataset: Generated by GlyphGAN’s Generator
Number of data: 26 classes × 10000 styles
20
× 10000 Styles

Extractor: Training
Training settings
• epochs: 1000
• criterion: MSE
21
Learning curve（loss）

Examples of image generation
22
Output: Image of the entire font set
Input: 1 font image
Style extraction
& Generation
Style extraction
& Generation

Evaluation 1: Legibility of generated images (1/3)
Create a CNN-based multi-class classification model.
Compare the accuracy on the dataset with that on the generated images.
23
The structure of multi-font classifier

Number of data:
train: 26 classes × 6561 font types,
validation: 26 classes × 8429 font types
Training settings:
• epochs: 100
• criterion: cross entropy
24
Learning curves: accuracy (top) and loss (bottom)

Evaluation results
It can be confirmed that a certain level of readability has been achieved.
25
Accuracy
Training dataset  (6561 font types) 97.0%
Test dataset  (8429 font types) 89.9%
Generated fonts
(10000 font types)
82.6%

Evaluation 2: Style extraction (1/3)
Calculate the average similarity (SSIM) between the fonts in the dataset and the
fonts generated by style extraction and generation.
26
Fonts in dataset
Style extraction
& Generation
Generated fonts
Calculation of similarity (SSIM)

SSIM is a perception-based model that considers image degradation as
perceived change in structural information.
27
MSE vs. SSIM
Photo by (Wang and Bovik, 2009)
𝑆𝑆𝐼𝑀(𝑥, 𝑦) =
(2𝜇𝑥𝜇𝑦 + 𝑐1)(2𝜎𝑥𝑦 + 𝑐2)
(𝜇𝑥
2
+ 𝜇𝑦
2
+ 𝑐1)(𝜎𝑥
2
+ 𝜎𝑦
2
+ 𝑐2)

One character from each font set was randomly selected.
Evaluation results
Style extraction works to some extent, but not well enough,
28
Average similarity

Evaluation 3: Style consistency (1/2)
Calculate the average similarity (SSIM) between the font sets in the dataset and
the font sets generated by style extraction and generation.
29
Font set in dataset
ABCDEFG
HIJKLMN
OPQRSTU
VWXYZ
A
Sampling
Style extraction
& Generation
ABCDEFG
HIJKLMN
OPQRSTU
VWXYZ
Generated font set
Calculation of similarity (SSIM)

Evaluation 3: Style consistency (2/2)
Evaluation results
The similarity between font sets is not high.
The low accuracy of the extractor may be a bottleneck.
30
Average similarity

Table of contents
• Background
• Method
• Future work
32

Current problem 1: Accuracy of the extractor
Improvements could be made by using the SSIM losses of the original and
generated images during training.
33
Style Extractor
𝐸
𝑥
Image
𝑧 𝐺(𝑧)
Generated Images
Generator
Input
𝐺

Current problem 2: Inefficiency of extractor training
After the generator is trained, the corresponding extractor needs to be trained.
This could be improved by using models such as VAE or flow-based models.
34
VAE (top) and Flow-based model (bottom)
Encoder
𝐸
𝑥
Image
𝑧 𝐷(𝑧)
Generated Images
Decoder
Latent
𝐷
Flow
𝑓
𝑥
Image
𝑧 𝑓−1(𝑧)
Generated Images
Inverse
𝑓−1
Latent

Current problem 3: Small dataset
The relationship between the number of datasets and the accuracy of
generation needs to be investigated.
35
Results for Hiragana dataset
Number of data: 84 classes × 50 font types

Application examples of Style Extractor + GlyphGAN System
If enough datasets can be prepared, applications that reduce the burden on
creators can be considered.
36
ex. A system to create assets of your own art style from a single sample.

Conclusion
What I made:
A system that combines Style Extractor and GlyphGAN to create an entire font
set from a single font image.
Level of achievement:
• A certain level of readability.
• The reproduction of style remains an issue.
37

References
• [1] Hayashi et al., “GlyphGAN: Style-Consistent Font Generation Based on
Generative Adversarial Networks”, 2019.
• [2] Wang and Bovik, “Mean squared error: Love it or leave it? A new look at
Signal Fidelity Measures”, 2009.
38

May internship challenge: Font Generator

Recommended

Recommended

More Related Content

Similar to May internship challenge: Font Generator

Similar to May internship challenge: Font Generator (20)

More from Ridge-i, Inc.

More from Ridge-i, Inc. (8)

Recently uploaded

Recently uploaded (20)

May internship challenge: Font Generator