160205 NeuralArt - Understanding Neural Representation

Perception and Intelligence Laboratory
Seoul
National
University
Seminar
- Understanding Neural Representation
Junho Cho
16/02/05

• Texture Synthesis Using Convolutional Neural Networks
• NIPS2015
• Understanding Deep Image Representations by Inverting Them
• CVPR2015
• A Neural Algorithm of Artistic Style
Perception and Intelligence Lab., Copyright © 2015 2
Introduction

Mainly focusing on
• NIPS2015
• Novel method of texture generation.
+
A bit of
• CVPR2015
• Reconstruct the input image from feature maps.
And
• A Neural Algorithm of Artistic Style
• Synthesize content of a photo and style of an artwork.
Two Preliminaries and the collaboration

• Picture of yours and Style Van Gogh
AI painter

Texture Synthesis Using
Convolutional Neural Networks
-NIPS2015
Part 01.
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge

• Infer a generating process from an example texture.
• Produce arbitrarily many new samples of that texture.
• Evaluation by human inspection. Success if human can’t tell difference.
Visual texture synthesis

Results of this work

Two main approaches to find texture generating process
1. Non-parametric texture model
• Generate a new texture by resampling (pixels | whole patches of original texture).
• High quality natural textures very efficiently.
• But, mechanistic procedure, randomizing a source texture without perceptive model.
2. Parametric texture model
• Statistical measurements on image, texture defined by the measurements.
• Generate new samples of a texture of same measurements of original.
• Visual texture can be uniquely described by the Nth-order joint histograms of its pixels.
• B. Julesz. Visual Pattern Discrimination. IRE Transactions on Information Theory, 8(2), February 1962.
• Texture models, inspired by the linear response
properties of the mammalian early visual system.
• Statistical measurements on filter responses(Gabor) rather than direct pixels
This work is parametric texture model.
Introduction

Demo
Previous method, fails in those textures.Current method

Demo

• code
Demo

• Propose parametric texture model
• Not a model of the early visual system. Link
• Using CNN – functional model for the entire ventral stream
• Texture model, parameterized by spatially invariant representations,
built on the hierarchical processing architecture of CNN,
• Better qualitative result.
Contribution
"Convolutional Networks: Unleashing the Potential of Machine Learning for Robust
Perception Systems," a Presentation from Yann LeCun of Facebook and NYU

VGG-19 network
• 16 conv, 5 pool.
Do not use Fully Connected layers
• Use Object recognition of ImageNet
trained
• CNN for Object recognition is also
capable of capturing texture.
• MAX  AVG pooling
• Better gradient flow
• Weight rescale.
• Mean activation of each filter over
images and positions is equal to one.
CNN

• Compute spatial
summary statistic on the
feature responses to obtain
• Find new image with the
same stationary
description by performing
gradient descent.
• Discard spatial information
in the feature maps, by
correlations between
feature maps.
• Gram matrix
Texture model

Gram matrices as Texture feature
• Gram Matrix
• 𝐹 𝑙 ∈ ℛ 𝑁 𝑙×𝑀 𝑙
𝐹 𝑙: Feature maps of layer 𝑙
𝑁𝑙: # of feature maps
𝑀𝑙: each size of feature map
𝐹𝑗𝑘
𝑙
: activation of the 𝑗 𝑡ℎfilter
at position 𝑘 in layer 𝑙
• In the example,
𝑙: conv3_1
𝑀𝑙: 56
𝑁𝑙: 256
A set of Gram matrices {𝐺1, 𝐺2, … , 𝐺 𝐿} for some layers 1, … , 𝐿 in the network
in response to a given texture provides a description of the texture.
Only convolutions

• Correlation of filter response has been used as texture feature.
• Visual texture can be uniquely described by the Nth-order joint histograms of its pixels.
• B. Julesz. Visual Pattern Discrimination. IRE Transactions on Information Theory, 8(2), February 1962.
• D. J. Heeger and J. R. Bergen. Pyramid-based Texture Analysis/Synthesis. In Proceedings of the 22Nd Annual
Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’95, pages 229–238, New York, NY,
USA, 1995. ACM.
• Compared to previous method
• Texture feature from
Linear filter bank  Feature space by DNN
• Correlations between feature responses in each layer of the network.
• Texture model: agnostic to spatial information
• To do this, compute correlations between the responses of feature maps.
Why correlation defines texture?

𝑥: input image, 𝑥: generated image
𝐺 𝑙, 𝐺 𝑙: respective Gram matrix
To generate a new texture on 𝑥,
use gradient descent from a
white noise image to find
another image matches the
Gram matrix representation
of the original image
Texture generation
To be sure, we are not training CNN model,
CNN weights are fixed.
We are optimizing(training) 𝑥 that has
similar texture(Gram matrices) of 𝑥

• The derivative of 𝐸𝑙 with
respect to the activations in
layer 𝑙 can be computed
analytically:
Texture generation – Gradient descent

• The gradients of 𝐸𝑙, with respect
to the pixels 𝑥 can be computed
using standard back propagation.
• The gradient
𝜕ℒ
𝜕 𝑥
, input for
numerical optimization strategy
• L-BFGS
• High-dimensional
optimization problem
• Forward-backward pass that
is used to train CNN
• Very complex model
• But computation
with GPUs.
Texture generation – Gradient descent

Result
• Each label layer includes all layers below it.
• Ex) pool4 includes pool1,2,3,4
• Constraining to only low layers
 Similar to noise.
• Increasing number of layers
Increase degrees of naturalness
• Above ‘pool4’, didn’t improve
• Last column shows only local spatial
information preserved.
• Receptive size of Deep CNN
More results link

• Yellow box on boundary
• This model discards global spatial information
• Possible reasons:
• Some features encode information at the image boundary
• Zero-padding?
More analysis

A
Can reduce some parameters
B
Experiment on CaffeNet
Not better than VGG
Some grid found:
• stride?
• Larger size?
C
Random weighted VGG doesn’t work
- Importance ImageNet
pretrained model
More analysis

• Test texture features how they understand context of object.
• Gram-matrix representation still predict object identity.
• Texture still has high-level information.
• Texture doesn’t necessarily preserve the global structure of objects.
• Might provide an insight into how CNNs encode object identity.
Texture representation on Object recognition.

• A new parametric texture model
• Computationally more expensive, but based on CNN.
• Any progress in deep CNN is transferable to texture synthesis method.
• Computing Gram matrices, transforming the CNN representation into
stationary feature space increases performance.
• Another usage of stationary feature: SppNet
• Improved object recognition and detection.
• Texture model inspired by biological vision.
(Point of view in Neuroscience)
• Hierarchical architecture and basic computational properties
similar to real neural system.
• Very undistinguishable between original texture and synthesized texture.
• Compelling candidate models for studying visual information processing in the
brain.
Discussion

Understanding Deep Image
Representations by Inverting Them
-CVPR2015
Part 02.
Aravindh Mahendran, Andrea Vedaldi (VGGgroup)

Reconstruction from feature map
Gradient descent

Reconstruction from each layers in CNN

Receptive field &
Deep layer still contains rich information

A Neural Algorithm of Artistic Style
Part 03.
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge

• Fine art had been exclusive only to humans
• Only human is able to interplay between content and style of an image.
• No artificial system had been capable of it so far.
• NeuralArt offers a path forward to an algorithmic understanding of how
humans create and perceive artistic imagery.
Introduction

Obtain content from
Obtain a representation of the style from
Mix two representations!
How?

• High level content in higher layer,
and do not constrain the exact pixel values in reconstruction.
• Obtain style, using originally
designed to capture texture information
• Creates images that match the style of
a given image on an increasing
scale while discarding
information of the global
arrangement of the scene.
How?
Almost same as input in pixel level Still retain rich
information

• Representations of content and style in CNN are separable
• Manipulate both representations independently.
• Mix the content and style representation from two different source images.
• Images are synthesized by finding an image that simultaneously matches :
1. The content representation of the phtoto
2. The style representation of the artwork
Thu, while global arrangement of the original photo preserved, the color & local
structures are from the artwork.
Intuition

Method
𝑝: original photo, 𝑎: original artwork
𝑥: image that is generated

• The gradients of 𝐸𝑙, with respect
to the pixels 𝑥 can be computed
using standard back propagation.
• The gradient
𝜕ℒ
𝜕 𝑥
, input for
numerical optimization strategy
• L-BFGS
• High-dimensional
optimization problem
• Forward-backward pass that
is used to train CNN
• Very complex model
• But computation
with GPUs.
Previously on Texture synthesis….

Results
Deeper the layer
Local image structure captured by style
representation increases
due to receptive field increases
and more feature complexity in hierarchy
𝛼/𝛽 increase: more content focus

• The second idea, the new one, is to avoid data-fragmentation by using pose-
specific classifiers trained with “stationary features”, a generalization of the
underlying implicit parametrization of the features by a scale and a location in
all the discriminative learning techniques mentioned earlier. Each stationary
feature is “pose-indexed” in the sense of assigning a numerical value to each
combination of an image and a pose (or subset of poses).

Method dataset Measure 1 Measure 2 Measure 3 Measure 4
Baseline ABC 92 12 34 45
XXX ABC 32 32 54 76
YYY ABC 14 14 12 98
ZZZ ABC 32 23 32 67
Proposed ABC 14 42 41 87
Proposed (w.XX) ABC 32 15 35 67
Table example
Table Title (if you want it to place here)

Figure example
< Updated cells > < CNN architecture >
For highlight

160205 NeuralArt - Understanding Neural Representation

More Related Content

What's hot

Similar to 160205 NeuralArt - Understanding Neural Representation

Recently uploaded

160205 NeuralArt - Understanding Neural Representation