Tutorial on Polynomial Networks at CVPR'22

High-degree polynomial expansions
20th June 2022
High-degree polynomial expansions 20th
June 2022 1 / 100

Outline
1 Introduction
2 Higher-degree polynomial expansions
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
5 Future directions
June 2022 2 / 100

June 2022 3 / 100

Imagenet
https: // paperswithcode. com/ sota/ image-classification-on-imagenet
June 2022 4 / 100

Deep-learning architectures
K. He, X. Zhang, X., S. Ren, J. Sun, Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition
(CVPR), 2016
June 2022 5 / 100

Deep-learning architectures
(a) (b)
J Hu, L Shen, G Sun. ’Squeeze-and-excitation networks.’ In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
X. Wang, R. Girshick, A. Gupta, K. He. ’Non-local Neural Networks.’ In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
June 2022 6 / 100

MLP layer
June 2022 7 / 100

MLP layer
June 2022 8 / 100

Squeeze-and-Excitation Nets are 2nd degree polynomials
June 2022 9 / 100

Squeeze-and-Excitation Nets are 2nd degree polynomials
June 2022 10 / 100

Non-local neural network is a 3rd degree polynomial
June 2022 11 / 100

Non-local neural network is a 3rd degree polynomial
June 2022 12 / 100

Self-Attention is a 3rd degree polynomial
June 2022 13 / 100

Learning with polynomials, an old idea
Mapping Units [Hinton, 1985], ”dynamic mapping” [v.d. Malsburg;
1981]
Binocular+Motion Energy models [Adelson, Bergen; 1985], [Ozhawa,
DeAngelis, Freeman; 1990], [Fleet et al., 1994].
Sigma-Pi neural unit [Mel, Koch; 1990].
Higher Order Botlzmann Machines / Higher Order Neural Networks
[Sejnowski; 1986].
Subspace SOM [Kohonen; 1996], topographic ICA [Hyvarinen, Hoyer;
2000] [Karklin, Lewicki;2003].
Bilinear Models [Tenebaum and Freeman; 2008], [Ohlshaussen; 1994],
[Grimes, Rao; 2005].
Higher Order Restricted Boltzmann Machines (RBMs) [Memisevic
and Hinton; 2007], [Ranzato et al; 2010].
Gating mechanisms; LSTM [Hochreiter, Schmidhuber 1997],
Multiplicative RNN [Sutskever, Martens, Hinton; 2011].
June 2022 14 / 100

Group Method of Data Handling (GMDH)
One of the first approaches of systematic design of nonlinear
relationships.
Generation of Partial Descriptions of data (PDs) with two input
variables.
Shortcoming: tends to produce an overly complex network.
A Ivakhnenko. ‘Polynomial theory of complex systems.’ IEEE Transactions on Systems, Man, and Cybernetics, 1971.
June 2022 15 / 100

Mapping Units / Higher Order Boltzmann Machines
Hinton et al. (1985) and Sutskever et al. (2011) argue that
multiplications (mapping units) allow for better modeling of
conjunctions.
Higher order Boltzmann Machines and Higher order RBMs utilize
multiplication in factorized representations, e.g., bilinear models
factorize style and content.
June 2022 16 / 100

Pi-Sigma network (PSN)
Single hidden layer learns multiple affine transformations of the data,
multiplies them to obtain the output.
hji =
X
k
wkji xk + θji
yi = σ(
Y
j
hji ) .
Y Shin, J Ghosh. ‘The pi-sigma network: an efficient higher-order neural network for pattern classification and function approximation.’ International Joint Conference on
Neural Networks, 1991.
June 2022 17 / 100

Sigma-Pi-Sigma Neural Network (SPSNN)
Composed of different orders of pi-sigma networks.
fSPSNN =
k
X
i=1
fPSNk
=
k
X
i=1
k
Y
j=1
hjk .
C Li. ‘A sigma-pi-sigma neural network (SPSNN).’ Neural Processing Letters, 2003.
June 2022 18 / 100

Factorization Machines
Second degree polynomial net to combine the features under sparse
data.
The weight matrix is mapped into a low-rank space using matrix
factorization.
ŷ(x) := w0 +
n
X
i=1
wi xi +
n
X
i=1
n
X
j=i+1
⟨vi , vj⟩ xi xj ,
where the learnable parameters are: w0 ∈ R, w ∈ Rn and
V ∈ Rn×k (k ≫ n).
S Rendle. ‘Factorization Machines.’ International Conference on Data Mining, 2010.
June 2022 19 / 100

Variations of Factorization Machines
Field-aware FM (FFM): Different vectors are used when the features
of different fields combination.
ŷ(x) := w0 +
n
X
i=1
wi xi +
n
X
i=1
n
X
j=i+1
D
vi,fj , vj,fi
E
xi xj .
Field-weighted FM: Add a weight parameter for every two features.
ŷ(x) := w0 +
n
X
i=1
wi xi +
n
X
i=1
n
X
j=i+1
⟨vi , vj⟩ xi xjrfi ,fj .
Higher-order FM: Third-order or higher-order feature combination
problems.
Y Juan, Y Zhuang, W Chin, C Lin. ‘Field-aware factorization machines for CTR prediction.’ In ACM conference on recommender systems, 2016.
J Pan, et al. ‘Field-weighted factorization machines for click-through rate prediction in display advertising.’ In World Wide Web Conference, 2018.
M Blondel, A Fujino, N Ueda, M Ishihata. ‘Higher-order factorization machines.’ In Advances in neural information processing systems (NeurIPS), 2016.
June 2022 20 / 100

Multiplicative Recurrent Neural Networks (MRNN)
Character-level language modeling tasks.
Multiplicative (or “gated”) connections.
factor state sequence ft = diag(Wfx xt) · Wfhht−1
hidden state sequence ht = tanh(Whf ft + Whx xt)
output sequence ot = Wohht + bo .
I Sutskever, J Martens, G Hinton. ‘Generating text with recurrent neural networks.’ In International Conference on Machine Learning (ICML), 2011.
June 2022 21 / 100

Sum-Product Networks (SPN)
H Poon, P Domingos. ‘Sum-product networks: A new deep architecture.’ In International Conference on Computer Vision Workshops, 2011.
June 2022 22 / 100

Multiplicative interactions
June 2022 23 / 100

Outline
1 Introduction
5 Future directions
June 2022 24 / 100

Outline
1 Introduction
Notation
5 Future directions
June 2022 25 / 100

Formalism
In Machine Learning tasks, we have (at least) one input and one
output.
The goal is to learn G(z) : Rd → Ro with z ∈ Rd the input.
Neural networks use a composition of linear and unitary non-linear
units.
We augment this structure and we capture the higher-order
correlations using tensors.
June 2022 26 / 100

Hadamard product
Let matrices Γ ∈ R2×3 and P ∈ R2×3. The Hadamard product
Γ ∗ P is denoted as ‘∗’ and defined as:
"
γ(1,1) γ(1,2) γ(1,3)
γ(2,1) γ(2,2) γ(2,3)
#
| {z }
Γ
∗
"
ρ(1,1) ρ(1,2) ρ(1,3)
ρ(2,1) ρ(2,2) ρ(2,3)
#
| {z }
P
=
"
γ(1,1)ρ(1,1) γ(1,2)ρ(1,2) γ(1,3)ρ(1,3)
γ(1,1)ρ(2,1) γ(1,2)ρ(2,2) γ(1,3)ρ(2,3)
#
| {z }
Γ∗P
(1)
The Hadamard product of Γ ∈ RI×N and P ∈ RI×N results in a
matrix of dimensions I × N.
Hadamard, J. ’Leçons sur la Propagation des Ondes et les Équations de l’Hydrodynamique’, 1903.
Halmos, Paul R. ’Finite-dimensional vector spaces’, Annals of Mathematics Studies, Princeton University Press, 1948.
June 2022 27 / 100

Khatri-Rao product
Let matrices Γ ∈ R2×3 and P ∈ R3×3. The Khatri-Rao product
Γ ⊙ P is denoted as ‘⊙’ and defined as:
"
γ(1,1) γ(1,2) γ(1,3)
γ(2,1) γ(2,2) γ(2,3)
#
| {z }
Γ
⊙



ρ(1,1) ρ(1,2) ρ(1,3)
ρ(2,1) ρ(2,2) ρ(2,3)
ρ(3,1) ρ(3,2) ρ(3,3)



| {z }
P
=









γ(1,1)ρ(1,1) γ(1,2)ρ(1,2) γ(1,3)ρ(1,3)
γ(1,1)ρ(2,1) γ(1,2)ρ(2,2) γ(1,3)ρ(2,3)
γ(1,1)ρ(3,1) γ(1,2)ρ(3,2) γ(1,3)ρ(3,3)
γ(2,1)ρ(1,1) γ(2,2)ρ(1,2) γ(2,3)ρ(1,3)
γ(2,1)ρ(2,1) γ(2,2)ρ(2,2) γ(2,3)ρ(2,3)
γ(2,1)ρ(3,1) γ(2,2)ρ(3,2) γ(2,3)ρ(3,3)









| {z }
Γ⊙P
(2)
The Khatri-Rao product of Γ ∈ RI×N and P ∈ RJ×N results in a
matrix of dimensions (IJ) × N.
Khatri, C. G., and C. Radhakrishna Rao. ’Solutions to some functional equations and their applications to characterization of probability distributions.’ Sankhyā: the Indian
journal of statistics, series A (1968): 167-180.
June 2022 28 / 100

Tensors
Tensors → multi-dimensional arrays.
June 2022 29 / 100

Tensors
The order is the number of dimensions, e.g. X ∈ R4×4×4 has order 3.
June 2022 29 / 100

Tensors
Third-order tensor illustration:
𝑥𝑖
𝑥𝑗
𝑥𝑘
June 2022 29 / 100

Tensors
Third-order tensor illustration:
𝑥𝑖
𝑥𝑗
𝑥𝑘
Let W ∈ RI1×···×IM and u ∈ RIm with m ∈ [1, . . . , M]. The mode-m
vector product W ×m u is:
(W ×m u)i1,...,im−1,im+1,...,iM
=
Im
X
im=1
wi1,...,iM
uim (3)
June 2022 29 / 100

CP decomposition
Goal: Decompose a tensor W to a sequence of low-rank components.
June 2022 30 / 100

CP decomposition
In matrix form: W(1)
.
= U[1]
J2
m=M U[m]
T
where {U[m]}M
m=1 are
the factor matrices.
June 2022 30 / 100

CP decomposition
In matrix form: W(1)
.
= U[1]
J2
m=M U[m]
T
where {U[m]}M
m=1 are
the factor matrices.
A schematic of the CP decomposition of a third-order tensor W is:
Figure: CP decomposition of a third-order tensor.
June 2022 30 / 100

Outline
1 Introduction
Polynomial expansion with respect to an input vector
5 Future directions
June 2022 31 / 100

Polynomial approximation
Approximate the τth element G(z)τ with a Nth-degree polynomial:
(G(z))τ ≈ βτ +
d
X
i=1
w
[1]
τ,i zi +
d
X
i=1
d
X
j=1
w
[2]
τ,i,jzi zj + · · · +
d
X
i=1
d
X
j=1
. . .
d
X
k=1
| {z }
N summations
w
[N]
τ,i,j,...,kzi zj . . . zk
(4)
Both βτ ∈ R and the set of tensors

W[n]
τ ∈ R
Qn
m=1
×md N
n=1
are
learnable parameters.
June 2022 32 / 100

Polynomial approximation
The last equation (4) can be written in the tensor format as:
(G(z))τ ≈ βτ + w[1]
τ
T
z + zT
W[2]
τ z + · · · + W[N]
τ
N
Y
n=1
×nz (5)
By stacking the polynomials for all elements τ ∈ [1, . . . , o], we obtain:
G(z) ≈
N
X
n=1

W[n]
n+1
Y
j=2
×jz

+ β (6)
From Stone-Weierstrass theorem, a polynomial can approximate any
smooth function.
June 2022 33 / 100

Polynomial approximation - learnable parameters
The learnable parameters of (6) are Θ(dN).
June 2022 34 / 100

Polynomial approximation - learnable parameters
The learnable parameters of (6) are Θ(dN).
A solution to reduce them: demand each factor W[n]
to be low-rank.
June 2022 35 / 100

Outline
1 Introduction
Tensor decomposition per degree
5 Future directions
June 2022 36 / 100

Tensor decomposition per degree
First solution: Demand each factor W[n]
to be low-rank.
Apply CP decomposition to each factor W[n]
.
Then, the expansion for N = 3 is:
y = β + CT
1,[1]z +

CT
1,[2]z

∗

CT
2,[2]z

+

CT
1,[3]z

∗

CT
2,[3]z

∗

CT
3,[3]z
(7)
G Chrysos*, M Georgopoulos*, J Deng, J Kossaifi, Y Panagakis, A Anandkumar, ‘Augmenting Deep Classifiers with Polynomial Neural Networks.’ European Conference on
Computer Vision (ECCV), 2022.
June 2022 37 / 100

Khatri-Rao to Hadamard product
Lemma (Chrysos’19)
For a set of N matrices {A[ν] ∈ RIν ×K }N
ν=1 and {B[ν] ∈ RIν ×L}N
ν=1, the
following equality holds:
(
N
K
ν=1
A[ν])T
· (
N
K
ν=1
B[ν]) = (AT
[1] · B[1]) ∗ . . . ∗ (AT
[N] · B[N]), (8)
where the symbol ‘∗’ denotes the Hadamard product.
G Chrysos, S Moschoglou, Y Panagakis, and S Zafeiriou. ‘Polygan: High-order polynomial generators.’ arXiv preprint arXiv:1908.06571.
June 2022 38 / 100

Factorization of Univariate Polynomials Over Finite Fields
Berlekamp’s algorithm (1970): only practical over small finite fields.
Cantor–Zassenhaus Algorithm (1981): Probabilistic algorithms.
Victor Shoup Algorithm (1990): Deterministic algorithm.
E Berlekamp. ‘Factoring Polynomials Over Large Finite Fields.’ In Mathematics of Computation, 1970.
D Cantor, H Zassenhaus. ‘A New Algorithm for Factoring Polynomials Over Finite Fields.’ In Mathematics of Computation, 1981.
V Shoup. ‘On the deterministic complexity of factoring polynomials over finite fields.’ In Information Processing Letters, 1990.
June 2022 39 / 100

Decoupling Multivariate Polynomials
Factorizing multivariate polynomials as a linear combination of
univariate polynomials has been studied using tensor decompositions.
Using first-order information and CP decomposition.
Obtain a decomposition of the form:
fi (u1, . . . , um) =
r
X
j=1
wij · gj
m
X
k=1
vkjuk

, ∀i = 1, . . . , n ,
Matrix form decoupled representation:
f (u) = Wg(V⊤
u) ,
P. Dreesen, M. Ishteva, J. Schoukens. ‘Decoupling Multivariate Polynomials Using First-Order Information and Tensor Decompositions.’ Journal on Matrix Analysis and
Applications, 2015.
June 2022 40 / 100

Outline
1 Introduction
Π−nets: Joint decompositions across degrees
5 Future directions
June 2022 41 / 100

Π-nets: Third-degree expansion schematic - Model CCP
Figure: Third-degree expansion.
G Chrysos, S Moschoglou, Y Panagakis, and S Zafeiriou. ‘Polygan: High-order polynomial generators.’ arXiv preprint arXiv:1908.06571.
G Chrysos, S Moschoglou, G Bouritsas, Y Panagakis, J Deng, and S Zafeiriou. ‘Π-nets: Deep Polynomial Neural Networks.’ In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2020.
June 2022 42 / 100

Π−nets - Model CCP
We use a coupled CP decomposition, i.e., factor sharing in different
levels.
To demonstrate the method, we assume a third degree expansion, i.e.,
N = 3 in (6).
Then, the expansion is:
G(z) = β + W[1]
z + W[2]
×2 z ×3 z + W[3]
×2 z ×3 z ×4 z (9)
June 2022 43 / 100

Π−nets - Third-degree expansion - Model CCP
We use the following factorizations:
Let W[1] = CUT
[1], be the parameters for first level of approximation.
Assume W[2]
= W
[2]
1:2 + W
[2]
1:3. We use a coupled CP decomposition
which results in the following matrix form:
W
[2]
(1) = C(U[3] ⊙ U[1])T + C(U[2] ⊙ U[1])T .
Let the third-degree parameters: W
[3]
(1) = C(U[3] ⊙ U[2] ⊙ U[1])T .
June 2022 44 / 100

Π−nets - Nth
degree expansion
The derivation can be extended to an arbitrary degree with the
following recursive formulation:
xn =

UT
[n]z

∗ xn−1 + xn−1 , (CCP)
for n = 2, . . . , N with x1 = UT
[1]z and x = CxN + β. The parameters
C ∈ Ro×k, U[n] ∈ Rd×k for n = 1, . . . , N are learnable.
June 2022 45 / 100

Π−nets - Alternative models
Model CCP above assumes a certain factorization, e.g.,
W[2]
= W
[2]
1:2 + W
[2]
1:3.
New models can be derived by changing the assumptions.
For instance, what if we assume that the tensors admit nested
decompositions?
June 2022 46 / 100

Π-nets: Model NCP
The model with nested decompositions, called NCP, for N = 3:
b[1] B[1] ∗ S[2] + ∗ S[3] + ∗ C +
A[1] A[2] A[3]
z
B[2] B[3]
b[2] b[3]
β
G(z)
Figure: Third-degree expansion.
June 2022 47 / 100

Π-nets: Model NCP
The derivation can be extended to an arbitrary degree with the following
recursive formulation:
xn =

AT
[n]z

∗

ST
[n]xn−1 + BT
[n]b[n]

, (NCP)
for n = 2, . . . , N with x1 =

AT
[1]z

∗

BT
[1]b[1]

and x = CxN + β.
June 2022 48 / 100

Π-nets: Product of polynomials
The previous formulations, e.g. (CCP), require Θ(N) layers for Nth
degree expansion.
Can we achieve a higher degree expansion with less parameters?
Yes. For instance, by stacking lower-degree polynomials sequentially.
z · · · G(z)
Order 2 Order 2
Order 2N
∗ ∗
Figure: Stacking N polynomials of degree 2, results in a 2N
polynomial expansion.
June 2022 49 / 100

Outline
1 Introduction
5 Future directions
June 2022 50 / 100

Performance of polynomial expansions (with batch normalization) on
CIFAR10, CIFAR100 benchmarks.
Table: Polynomial expansion versus baselines.
CIFAR10 CIFAR100
2−degree products 0.907 ± 0.003 0.667 ± 0.003
ResNet18∗ 0.391 ± 0.001 0.168 ± 0.001
ResNet18 0.945 ± 0.000 0.756 ± 0.001
June 2022 51 / 100

SORT model
The model obtains the following formulation:
x = UT
[1]z + UT
[2]z +

UT
[1]z

∗

UT
[2]z

. (10)
Y Wang, L Xie, C Liu, Y Zhang, W Zhang, A Yuille. ‘SORT: Second-Order Response Transform for Visual Recognition.’ International Conference on Computer Vision
(ICCV), 2017.
June 2022 52 / 100

Squeeze-and-Excitation network
Squeeze-and-Excitation network (SENet): The output of the
SENet block YSE with respect to input X ∈ Rhw×C (h is the height,
w is the width) can be formulated as:
YSE
= (XW1) ∗ r(p(XW1)W2) = (XW1) ∗
−
→
1
1
hw
−
→
1 T
XW1

W2
T
(11)
where W1, W2 are learnable parameters.
J Hu, L Shen, G Sun. ’Squeeze-and-excitation networks.’ In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
June 2022 53 / 100

Non-local (NL) neural network
Non-local (NL) neural network: The output of the non-local block
YNL ∈ RN×C with respect to input X ∈ RN×C can be formulated as:
YNL
= (XW1W⊤
2 X⊤
)(XW3), (12)
where W1, W2, W3 ∈ RC×C are learnable parameters.
Scales quadratically with the dimension N (i.e. O(N2) complexity).
X Wang, R Girshick, A Gupta, K He. ’Non-local Neural Networks.’ In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
June 2022 54 / 100

Poly-NL
Poly-NL: The output YPoly-NL
∈ RN×C is expressed by Using 3 degree
polynomial nets as non-local self-attention block:
YPoly-NL
= (Φ(XW1 ∗ XW2) ∗ X)W3, (13)
where learnable parameters W1, W2, W3 ∈ RC×C .
Scales linearly with the dimension N (i.e. O(N) complexity).
F Babiloni, et al. ‘Poly-NL: Linear Complexity Non-local Layers with Polynomials.’ In International Conference on Computer Vision (ICCV), 2021.
June 2022 55 / 100

Linear Complexity Self-Attention with Polynomials
Poly-NL reformulates SA using only global descriptors and element-wise
multiplications, achieving Linear Complexity O(N).
June 2022 56 / 100

Poly-NL: Space and Time Complexity
(a) (b)
Figure: Poly-NL achieves up to 10× speed up in run-time and a 5× less
complexity overhead wrt NL.
June 2022 57 / 100

Non-local with lower-degree interactions
PDC-NL: Y = (XW1W⊤
2 X⊤)(XW3) + XW4XW5 + XW6
Includes first to third degrees term based on NL (only third degree).
G Chrysos*, M Georgopoulos*, J Deng, J Kossaifi, Y Panagakis, A Anandkumar, ‘Augmenting Deep Classifiers with Polynomial Neural Networks.’ European Conference on
Computer Vision (ECCV), 2022.
June 2022 58 / 100

Outline
1 Introduction
5 Future directions
June 2022 59 / 100

Outline
1 Introduction
Unconditional generation with polynomial networks
5 Future directions
June 2022 60 / 100

Expressivity - Generation without activation functions
Results from a generator with convolutional layers without activations:
June 2022 61 / 100

Expressivity of Π−nets
We consider image generation without activation functions between the
layers. Synthesized images:
June 2022 62 / 100

Expressivity of Π−nets
Linear interpolation in the latent space:
June 2022 63 / 100

Image generation from a polynomial generator
June 2022 64 / 100

Π−nets on non-euclidean representation learning
Beyond image generation, polynomial nets perform well in non-euclidean
representation learning.
Code: https://github.com/grigorisg9gr/polynomial_nets
G Chrysos, S Moschoglou, G Bouritsas, J Deng, Y Panagakis, and S Zafeiriou. ‘Deep Polynomial Neural Networks.’ IEEE Transactions on Pattern Analysis and Machine
Intelligence (T-PAMI), 2021.
June 2022 65 / 100

Outline
1 Introduction
Synthesizing unseen combinations
5 Future directions
June 2022 66 / 100

Conditional data generation: Visual examples
Figure: Image-to-image translation examples.
Phillip Isola, et al. ’A Image-to-image translation with conditional adversarial networks’, Conference on Computer Vision and Pattern Recognition (CVPR) 2017.
Mehdi Mirza and Simon Osindero. ’Conditional generative adversarial nets’, CoRR 2014.
June 2022 67 / 100

Attribute-conditional generative models
June 2022 68 / 100

Attribute-conditional generative models and generalization
June 2022 69 / 100

Conditional Variational Autoencoder (cVAE)
June 2022 70 / 100

MLC-VAE - Our framework
We instead model each attribute combination with a different mean.
How to obtain the mean:
M(y1, y2) = W[1]
y1 + W[2]
y2 + W[12]
×2 y1 ×3 y2, (14)
for attributes y1, y2.
M Georgopoulos, G Chrysos, M Pantic, and Y Panagakis. ‘Multilinear Latent Conditioning for Generating Unseen Attribute Combinations.’ In International Conference on
Machine Learning (ICML), 2020.
June 2022 71 / 100

MLC-VAE - Results
June 2022 72 / 100

MLC-VAE - Multiplicative interactions
Can we use additive interactions instead?
Not really. For instance, synthesize images with attributes (’smile’
and ’closed mouth’).
June 2022 73 / 100

Outline
1 Introduction
Conditional image generation with polynomial networks
5 Future directions
June 2022 74 / 100

Diverse samples in conditional generation
Figure: In addition to the adversarial loss of GANs, regularization losses are
typically used for enabling diverse synthesis.
Q Mao, H Lee, H Tseng, S Ma, M Yang. ‘Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis.’ In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2019.
June 2022 75 / 100

Conditional image generation - Introduction
1 Conditioning the generator still relies on the neural network for the
expressivity.
2 Can we use high-degree polynomial expansions instead?
3 Assume zI, zII ∈ Rd are the input vectors. The goal is to learn a
function G : Rd×d → Ro that captures the higher-order correlations
between the elements of the two inputs.
June 2022 76 / 100

CoPE: Nth
-degree expansion - Model CCP
The recursive formulation of CoPE is given by:
xn = xn−1 +

UT
[n,I]zI + UT
[n,II]zII

∗ xn−1, (15)
for n = 2, . . . , N with x1 = UT
[1,I]zI + UT
[1,II]zII and x = CxN + β.
The schematic illustration is the following:
Figure: Nth
-degree expansion for conditional generation.
G Chrysos, M Georgopoulos, and Y Panagakis. ‘Conditional Generation Using Polynomial Expansions.’ In Advances in neural information processing systems (NeurIPS),
2021.
June 2022 77 / 100

Synthesized images with CoPE
(a) edges-to-handbags (b) edges-to-shoes
Figure: The first row depicts the conditional input (i.e., the edges). The rows 2-6
depict outputs when we vary zI (i.e., noise).
June 2022 78 / 100

Beyond two-variable expansion with CoPE
The recursive formulation can be extended beyond two-variable
expansions. For three-variables the formulation is the following:
xn = xn−1 +

UT
[n,I]zI + UT
[n,II]zII + UT
[n,III]zIII

∗ xn−1, (16)
for n = 2, . . . , N with x1 = UT
[1,I]zI +UT
[1,II]zII +UT
[1,III]zIII and x = CxN +β.
Code:
https://github.com/grigorisg9gr/polynomial_nets_for_conditional_generation
June 2022 79 / 100

Beyond two-variable expansion with CoPE
Synthesized images on conditional generation with 2 attributes:
(a) (b)
Figure: (a) Each row/column depicts a different hair/eye color respectively, (b)
synthesized images per unique combination by varying the noise zI.
June 2022 80 / 100

Outline
1 Introduction
Audio synthesis
5 Future directions
June 2022 81 / 100

Audio representation
Time domain VS Frequency domain
Figure: Source: https://www.nti-audio.com/en/support/know-how/fast-fourier-transform-fft
June 2022 82 / 100

How to model the complex-valued frequency representations?
Real-valued neural networks (RVNNs) with 1 output channel for the
magnitude of complex-valued representations:
Discard the phase information.
Require phase reconstruction in a generative task.
RVNNs with 2 output channels for complex-valued representations:
Higher degree of freedom at the synaptic weighting.
Lower generalization ability.
How about directly modelling the complex-valued representations?
A Hirose, S. Yoshida. ’Generalization Characteristics of Complex-Valued Feedforward Neural Networks in Relation to Signal Coherence.’ IEEE Transactions on Neural
Networks and Learning Systems, 2012.
June 2022 83 / 100

Mergelyan’s Theorem
Suppose K is a compact set in the plane whose complement is connected,
f is a continuous complex-valued function defined on K which is
holomorphic in the interior of K, and if ϵ 0, then there exists a
polynomial P such that |f (x) − P(x)| ϵ for all x ∈ K.
W Rudin. ’Real and Complex Analysis.’ McGraw-Hill International Series, 1987.
June 2022 84 / 100

Schematic of the generator
Audiorepresentation

in frequencydomain
Complex-valued

randomnoise
Audiorepresentation

in frequencydomain
Complex-valued

randomnoise
...
...
...
from degreeto degree
APOLLOgenerator
(Model BN)
Yongtao Wu, G Chrysos, Volkan Cevher. ’Adversarial Audio Synthesis with Complex-valued Polynomial Networks.’ 2022.
June 2022 85 / 100

Model in the complex field
CFBN (Nested CP decomposition with bias):
The recursive form for Nth degree expansion is:
e
yn =

ET
[n]e
x + ρ[n]

∗

FT
[n]e
yn−1 + b[n]

+ e
yn−1, (17)
for n = 2, . . . , N with e
y1 = (e
ET
[1]
e
x) ∗

e
b[1]

, e
y = e
He
yN + e
h, where we
denote by e
b[n] = e
BT
[n]
e
β[n] for n = 1, . . . , N.
June 2022 86 / 100

Unsupervised audio generation on SC09 dataset
Model IS (↑) FID (↓) NDB (↓) JSD (↓) # par (M)
Real data 8.01 ± 0.24 0.50 0.00 ± 0.00 0.011
WaveGAN 4.67 ± 0.01 41.60 16.00 ± 1.09 0.094 36.5
. SpecGAN 6.03 ± 0.04 36.5
TiFGAN 5.97 26.70 6.00 ± 0.89 0.051 42.4
StyleGAN-U2 27.10 48.7
Unsupervised BigGAN 6.17 ± 0.20 24.72
Π-Nets 6.59 ± 0.03 13.01 4.40 ± 0.48 0.048 45.9
APOLLO, Small 6.48 ± 0.05 18.90 4.20 ± 1.47 0.038 4.6
APOLLO 7.25 ± 0.05 8.15 3.20 ± 1.16 0.029 64.1
June 2022 87 / 100

Human evaluation
Human evaluation on unsupervised audio generation on SC09 dataset.
From left to right in the histogram, the Mean Opinion Score (MOS)
for all models and the real data are 1.61, 2.68, 2.73, 3.33, and 4.73,
respectively.
APOLLO
-Nets Real
TiFGAN
WaveGAN
Rating
June 2022 88 / 100

Multimodal generation: Image-to-speech
June 2022 89 / 100

Highway networks
2nd degree
Increasing degree
MLP

(Identity activation)
1st degree 3rd degree Higher degree
LSTM
Gating
MLC-VAE
Bilinear form
Squeeze and excitation nets
StyleGAN
-Nets
APOLLO
COPE
PDC
Non-local networks
Self-attention
Metric learning
Polynomial nets
Mahalanobis distance
ResNet
RNN Multiplicative RNN Higher order tensor RNN
June 2022 90 / 100

Outline
1 Introduction
5 Future directions
June 2022 91 / 100

Complementary work on polynomial networks I
1 Polynomial networks can enlarge the hypothesis space [Jayakumar’20,
Fan’21].
S Jayakumar, et al. ‘Multiplicative Interactions and Where to Find Them.’ In International Conference on Learning Representations (ICLR), 2020.
FL Fan, et al. ‘Expressivity and Trainability of Quadratic Networks.’ ArXiv preprint arXiv:2110.06081.
S Zhang, Y Gong, D Yu, ‘Encrypted Speech Recognition using Deep Polynomial Networks.’ In International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2019.
Z Zhu, et al. ‘Controlling the Complexity and Lipschitz Constant improves Polynomial Nets’ In International Conference on Learning Representations (ICLR), 2022.
June 2022 92 / 100

Fan’21].
2 Privacy-preserving applications require polynomial expansions
[Zhang’19].
(ICASSP), 2019.
June 2022 92 / 100

Fan’21].
[Zhang’19].
3 Sample complexity (and similar theoretical bounds) might be simpler
to compute [Zhu’22].
(ICASSP), 2019.
June 2022 92 / 100

Fan’21].
[Zhang’19].
3 Sample complexity (and similar theoretical bounds) might be simpler
to compute [Zhu’22].
4 Known (theoretical) results from neural networks might not be
directly applicable (e.g., implicit bias).
(ICASSP), 2019.
June 2022 92 / 100

Theoretical characterization of polynomial networks
0 200 400 600 800 1000
Polynomial degree
10-3
10-2
10-1
100
101
Test
loss
Test loss
Figure: Double descent curve on polynomial regression.
Source: https: // windowsontheory. org/ 2019/ 12/ 05/ deep-double-descent/
June 2022 93 / 100

Optimization and training
1 Multiplications can make the loss surface less well behaved [Schwarz
et al.]. How should we adapt the optimizers for polynomial
architectures?
J Schwarz, S Jayakumar, R Pascanu, P Latham, T W Teh. ’Powerpropagation: A sparsity inducing weight reparameterisation.’ In Advances in neural information
processing systems (NeurIPS), 2021.
June 2022 94 / 100

architectures?
2 What is the interaction between model degree and implicit
regularization in polynomial networks?
June 2022 94 / 100

architectures?
2 What is the interaction between model degree and implicit
regularization in polynomial networks?
3 How should we initialize polynomial networks?
June 2022 94 / 100

Architecture
1 Can we use other popular tensor factorizations, e.g. Tucker
decomposition, to obtain useful architectures?
June 2022 95 / 100

Architecture
2 How can we evaluate the differences of those architectures?
June 2022 95 / 100

Architecture
3 How can we determine the degree required by the task at hand?
June 2022 95 / 100

Architecture
1 Is higher degree always better?
June 2022 95 / 100

Architecture
2 Where should we have this higher degree?
June 2022 95 / 100

Architecture
2 Where should we have this higher degree?
3 Is there a total degree that is sufficient for all standard tasks?
June 2022 95 / 100

Architecture II
4 How can we express a joint tensor decomposition over all sequential
polynomial networks?
June 2022 96 / 100

Architecture II
5 Can we represent all signals of interest with a sequence of polynomial
expansions?
June 2022 96 / 100

Architecture II
expansions?
6 How should we reason about activations often used in conjunction
with a polynomial form?
June 2022 96 / 100

Architecture II
expansions?
1 Are activations required?
June 2022 96 / 100

Architecture II
expansions?
2 Are they mostly there to make learning possible?
June 2022 96 / 100

Architecture II
expansions?
2 Are they mostly there to make learning possible?
3 How do they modify the polynomial expansion?
June 2022 96 / 100

Robustness of polynomial networks
1 A polynomial expansion with unconstrained input can obtain
extremely large values.
June 2022 97 / 100

2 How can we constrain their output range values efficiently?
June 2022 97 / 100

2 How can we constrain their output range values efficiently?
3 How can we make polynomial nets robust to (adversarial) noise?
June 2022 97 / 100

Demo code
https://github.com/polynomial-nets/tutorial-2022-intro-polynomial-nets
June 2022 98 / 100

Thank you for your attention
1 We would like to thank Francesca Babiloni, Leello Dadi, Zhenyu Zhu
and Yongtao Wu for their help in preparing the tutorial.
2 Further information and materials can be found on
https://polynomial-nets.github.io/.
3 Contact us: grigorios.chrysos [at] epfl.ch.
June 2022 99 / 100

Highway networks
2nd degree
Increasing degree
MLP

(Identity activation)
1st degree 3rd degree Higher degree
LSTM
Gating
MLC-VAE
Bilinear form
Squeeze and excitation nets
StyleGAN
-Nets
APOLLO
COPE
PDC
Non-local networks
Self-attention
Metric learning
Polynomial nets
Mahalanobis distance
ResNet
RNN Multiplicative RNN Higher order tensor RNN
June 2022 100 / 100

Tutorial on Polynomial Networks at CVPR'22

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tutorial on Polynomial Networks at CVPR'22

Similar to Tutorial on Polynomial Networks at CVPR'22 (20)

Recently uploaded

Recently uploaded (20)

Tutorial on Polynomial Networks at CVPR'22