Evaluation of conditional images synthesis: generating a photorealistic image from a face sketch

INTRODUCTION APPLICATIONS
AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
EVALUATION OF CONDITIONAL
IMAGES SYNTHESIS:
GENERATING A PHOTOREALISTIC
IMAGE FROM A FACE SKETCH
Laureanda:
Samantha Gallone
Relatore:
Prof. Andrea De Lorenzo

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
AGENDA
AGENDA
What are GANs?
Ø Structure and functioning
Ø Conditional GANs and applications
What’s next?
Ø Main limitations
Ø Three suggestions for future works
Project implementation
Ø How the dataset has been obtained?
Ø How the networks used are structured?
Evaluation’s results
Ø How the generated images have been tested?
Ø What can be said about their quality?
Agenda

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
INTRODUCTION
INTRODUCTION
Learn a generative model
Generative
Trained in an adversarial setting
Adversarial
Use deep Neural Networks
Networks
What are Generative Adversarial Networks?

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
INTRODUCTION
INTRODUCTION
Conditional GANs

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
APPLICATIONS
APPLICATIONS
Ø Generation of digital characters for:
• video games,
• movies, and
• animations
Ø Generation of photorealistic renderings of
suspect based on sketches obtained thanks to
eyewitness information
Ø Creation of fake identities
Possible applications

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
IMPLEMENTATION
Ø ArtLine + sketch simplification
Ø XDoG edge detector + sketch simplification
Dataset preparation
Ø StyleGAN2
Ø ReStyle + pixel2Style2pixel (pSp)
Network architecture

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
IMPLEMENTATION
IMPLEMENTATION
Ø A very large dataset composed of pair of images (sketch + corresponding image) is needed
Ø Online there are available:
• CUHK Face Sketch FERET Database (CUFSF)
o 1’194 pair of images with both photo of a face and sketch of it
• FFHQ (Flickr-Face-HQ) dataset
o 70’000 face images
o no sketch
Dataset preparation - problem

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
IMPLEMENTATION
IMPLEMENTATION
Ø ArtLine
Ø Learning to Simplify (LtS)/Mastering sketching
Original image ArtLine LtS MSE+GAN pencil1 pencil2
Dataset preparation – 1st approach

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
IMPLEMENTATION
IMPLEMENTATION
Dataset preparation – 2nd approach
Ø Extended Difference of Gaussian (XDoG)
Ø Mastering sketching
Original image MSE+GAN
XDoG

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
IMPLEMENTATION
IMPLEMENTATION
(a) StyleGAN generator
Ø State-of-the-art deep learning generative model
Ø Developed by NVIDIA in 2018 to produce realistic-looking images
StyleGAN2

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
IMPLEMENTATION
IMPLEMENTATION
ReStyle
Ø Novel inversion scheme tasked with encoding real images into the extended 𝒲 + StyleGAN
latent space

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
IMPLEMENTATION
IMPLEMENTATION
pixel2Style2pixel
Ø Encoder network that directly generates a series of style vectors which are fed into a pretrained
StyleGAN generator

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
IMPLEMENTATION
IMPLEMENTATION
ReStyle – simplified encoder architecture
Ø Encoder architecture based on a variation of the pSp encoder
Ø All style features are derived from the final 16x16 feature map.

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION RESULTS
RESULTS
Generated images

AGENDA CONCLUSION
RESULTS
RESULTS
Survey’s example
In your opinion, which images has inspired this drawing?

AGENDA CONCLUSION
RESULTS
RESULTS
Results – Correct vs incorrect responses

AGENDA CONCLUSION
RESULTS
RESULTS
Results – Lowest % of correct responses

AGENDA CONCLUSION
RESULTS
RESULTS
Results – Highest % of correct responses

AGENDA CONCLUSION
RESULTS
RESULTS
Ø Spearman correlation coefficient used to determine if there is correlation between the time spent to
answer and the percentage of correct responses
Results – Further analysis

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION CONCLUSION
CONCLUSION
Limitations
Ø Not able to generate images of all races equally
Ø It is challenging to generate images of children and young people
Ø It is not able to capture some features like piercings, tattoos and freckles

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION CONCLUSION
CONCLUSION
Future works
Solve the current limitations
Apply Stable Diffusion to generate photorealistic images based on a face sketch
Adapt the proposed model to other domains

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
THANK YOU
FOR THE
ATTENTION

Bibliography (I)
1) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron
Courville, and Yoshua Bengio. GeneraJve adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N.
Lawrence, and K.Q. Weinberger, editors, Advances in Neural InformaJon Processing Systems, volume 27.
Curran Associates, Inc., 2014.
2) Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. Restyle: A residual-based stylegan encoder via iteraJve
refinement. In Proceedings of the IEEE/CVF InternaJonal Conference on Computer Vision (ICCV), October
2021.
3) Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or.
Encoding in style: a stylegan encoder for image-to-image transla- Jon. In IEEE/CVF Conference on
Computer Vision and Paèrn RecogniJon (CVPR), June 2021.
4) Edgar Simo-Serra, Satoshi Iizuka, and Hiroshi Ishikawa. Mastering Sketching: Adver- sarial AugmentaJon
for Structured PredicJon. ACM TransacJons on Graphics (TOG), 37(1), 2018.
5) Edgar Simo-Serra, Satoshi Iizuka, Kazuma Sasaki, and Hiroshi Ishikawa. Learning to Simplify: Fully
ConvoluJonal Networks for Rough Sketch Cleanup. ACM TransacJons on Graphics (SIGGRAPH), 35(4),
2016.
6) Sven C. Olsen Holger Winnemöller, Jan Eric Kyprianidis. Xdog: An extended difference-of-gaussians
compendium including advanced image stylizaJon. Computers & Graphics, 36, 2012.

Bibliography (II)
7) NVIDIA. Ffhq dataset. https://github.com/NVlabs/ffhq-dataset.
8) Yu-Sheng Lin, Zhe-Yu Liu, Yu-An Chen, Yu-Siang Wang, Ya-Liang Chang, and Win- ston H. Hsu. Xcos: An
explainable cosine metric for face verification task. ACM Trans. Multimedia Comput. Commun. Appl.,
17(3s), nov 2021.
9) Timo Aila Tero Karras, Samuli Laine. A style-based generator architecture for generative adversarial
networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
10) Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, and Daniel Cohen-Or. Designing an encoder for
stylegan image manipulation. arXiv preprint arXiv:2102.02766, 2021.
11) H. J. Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu.
Cosface: Large margin cosine loss for deep face recognition. In IEEE/CVF Conference on Computer Vision
and Pattern Recognition, 2018.

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
APPENDIX

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
Explainable Cosine Metric - xCos
Ø It is based on the insight that humans tend to compare
different facial features to determine whether two face
images belong to the same person.
Ø It is built using a grid-based feature extraction
approach, in which each image is divided into
multiple local regions.
Ø It uses the cosine similarity to compute the similarity
score
Ø It includes an attention mechanism that identifies the
specific facial features that contribute the most to the
similarity score

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
Extended Difference of Gaussian
Ø Gaussian filter:
Ø Difference of two Gaussians with different 𝜎:
Ø XDoG :
Gσ(x) =
1
2πσ2
e− x2
2σ2
Dσ,k(x) = Gσ(x) − Gkσ(x) ≈ − (k − 1)σ2
∇2
G
Dσ,k,τ(x) = Gσ(x) − τ·Gkσ(x)
Tϵ,φ(u) =
{
1 u ≥ ϵ
1 + tanh(φ·(u − ϵ)) otherwise
Tϵ,φ(Dσ,k,τ * I)

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
Learning to Simplify (LtS)
Ø Technique to simplify rough sketches
Ø It consists of a Fully Convolutional Network to simplify the image
Ø It has been trained by the authors using pairs of rough and simplified sketches using a weighted
mean square error criterion as loss

AGENDA CONCLUSION
RESULTS
IMPLEMENTATION
Mastering sketching
Ø Combines a fully convolutional network for sketch simplification with a discriminator network that
is able to distinguish real line drawings from those generated by the network
Ø It is trained a variation of a conditional GAN where instead of a random input z, it is used a
deterministic prediction
Ø For adversarial training, the prediction model S is trained together with the discriminator model
which is no conditioned on the input x.
S : x ↦ y = S(x)
D : y ↦ D(y) ∈ ℝ

Evaluation of conditional images synthesis: generating a photorealistic image from a face sketch

Recommended

Recommended

More Related Content

Similar to Evaluation of conditional images synthesis: generating a photorealistic image from a face sketch

Similar to Evaluation of conditional images synthesis: generating a photorealistic image from a face sketch (20)

Recently uploaded

Recently uploaded (20)

Evaluation of conditional images synthesis: generating a photorealistic image from a face sketch