Polynomial networks enable a new network design that treats a network as a high-degree polynomial expansion of the input. Recently, polynomial networks have demonstrated state-of-the-art performance in a range of tasks. Despite the fact that polynomial networks have appeared for several decades in machine learning and complex systems, they are not widely acknowledged for their role in modern deep learning.
In this tutorial we intend to bridge the gap and draw parallelisms between modern deep learning approaches and polynomial networks. We share recent developments on the topic, as well as explain the required tools.
Diversity is all you need(DIAYN) : Learning Skills without a Reward FunctionYeChan(Paul) Kim
DIAYN is an unsupervised reinforcement learning method that learns diverse skills without a reward function. It works by maximizing the mutual information between skills and states visited to ensure skills dictate different states, while minimizing the mutual information between skills and actions given a state to distinguish skills based on states. It also maximizes a mixture of policies to encourage diverse skills. Experiments show DIAYN discovers locomotion skills in complex environments and sometimes learns skills that solve benchmark tasks. The learned skills can then be adapted to maximize rewards, used for hierarchical RL, and to imitate experts.
Diversity is all you need(DIAYN) : Learning Skills without a Reward FunctionYeChan(Paul) Kim
DIAYN is an unsupervised reinforcement learning method that learns diverse skills without a reward function. It works by maximizing the mutual information between skills and states visited to ensure skills dictate different states, while minimizing the mutual information between skills and actions given a state to distinguish skills based on states. It also maximizes a mixture of policies to encourage diverse skills. Experiments show DIAYN discovers locomotion skills in complex environments and sometimes learns skills that solve benchmark tasks. The learned skills can then be adapted to maximize rewards, used for hierarchical RL, and to imitate experts.
This document introduces the deep reinforcement learning model 'A3C' by Japanese.
Original literature is "Asynchronous Methods for Deep Reinforcement Learning" written by V. Mnih, et. al.
This slide introduces the model which is one of the deep Q network. Dueling Network is the successor model of DQN or DDQN. You can easily understand the architecture of Dueling Network.
ERATO感謝祭 Season IV
【参考】Satoshi Hara and Takanori Maehara. Enumerate Lasso Solutions for Feature Selection. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI'17), pages 1985--1991, 2017.
This document discusses methods for automated machine learning (AutoML) and optimization of hyperparameters. It focuses on accelerating the Nelder-Mead method for hyperparameter optimization using predictive parallel evaluation. Specifically, it proposes using a Gaussian process to model the objective function and perform predictive evaluations in parallel to reduce the number of actual function evaluations needed by the Nelder-Mead method. The results show this approach reduces evaluations by 49-63% compared to baseline methods.
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...宏毅 李
The document provides an overview of generative adversarial networks (GANs) and their applications to signal processing and natural language processing. It begins with a general introduction to GANs, including how they work, common issues, and potential solutions. Conditional GANs and unsupervised conditional GANs are also discussed. The document then outlines applications of GANs to signal processing and natural language processing.
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...Toru Tamaki
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers, Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
https://proceedings.neurips.cc/paper/2021/hash/64f1f27bf1b4ec22924fd0acb550c235-Abstract.html
https://arxiv.org/abs/2105.15203
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Adversarial Variational Autoencoders to extend and improve generative model -...Loc Nguyen
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is a good data generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
This document introduces the deep reinforcement learning model 'A3C' by Japanese.
Original literature is "Asynchronous Methods for Deep Reinforcement Learning" written by V. Mnih, et. al.
This slide introduces the model which is one of the deep Q network. Dueling Network is the successor model of DQN or DDQN. You can easily understand the architecture of Dueling Network.
ERATO感謝祭 Season IV
【参考】Satoshi Hara and Takanori Maehara. Enumerate Lasso Solutions for Feature Selection. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI'17), pages 1985--1991, 2017.
This document discusses methods for automated machine learning (AutoML) and optimization of hyperparameters. It focuses on accelerating the Nelder-Mead method for hyperparameter optimization using predictive parallel evaluation. Specifically, it proposes using a Gaussian process to model the objective function and perform predictive evaluations in parallel to reduce the number of actual function evaluations needed by the Nelder-Mead method. The results show this approach reduces evaluations by 49-63% compared to baseline methods.
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...宏毅 李
The document provides an overview of generative adversarial networks (GANs) and their applications to signal processing and natural language processing. It begins with a general introduction to GANs, including how they work, common issues, and potential solutions. Conditional GANs and unsupervised conditional GANs are also discussed. The document then outlines applications of GANs to signal processing and natural language processing.
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...Toru Tamaki
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers, Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
https://proceedings.neurips.cc/paper/2021/hash/64f1f27bf1b4ec22924fd0acb550c235-Abstract.html
https://arxiv.org/abs/2105.15203
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Adversarial Variational Autoencoders to extend and improve generative model -...Loc Nguyen
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is a good data generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Performance analysis of transformation and bogdonov chaotic substitution base...IJECEIAES
In this article, a combined Pseudo Hadamard transformation and modified Bogdonav chaotic generator based image encryption technique is proposed. Pixel position transformation is performed using Pseudo Hadamard transformation and pixel value variation is made using Bogdonav chaotic substitution. Bogdonav chaotic generator produces random sequences and it is observed that very less correlation between the adjacent elements in the sequence. The cipher image obtained from the transformation stage is subjected for substitution using Bogdonav chaotic sequence to break correlation between adjacent pixels. The cipher image is subjected for various security tests under noisy conditions and very high degree of similarity is observed after deciphering process between original and decrypted images.
This document discusses convolutional neural networks for image and speech processing. It begins with examples of using convolutional nets for handwritten digit recognition and object recognition with translation invariance. It then explains the basic architecture of convolutional nets, including convolution, hidden layers, pooling, and how they achieve translation invariance. Later sections discuss applications to large datasets like ImageNet, influential models like AlexNet, and extensions to tasks like segmentation using fully convolutional networks.
Slides from our PacificVis 2015 presentation.
The paper tackles the problems of the “giant hairballs”, the dense and tangled structures often resulting from visualiza- tion of large social graphs. Proposed is a high-dimensional rotation technique called AGI3D, combined with an ability to filter elements based on social centrality values. AGI3D is targeted for a high-dimensional embedding of a social graph and its projection onto 3D space. It allows the user to ro- tate the social graph layout in the high-dimensional space by mouse dragging of a vertex. Its high-dimensional rotation effects give the user an illusion that he/she is destructively reshaping the social graph layout but in reality, it assists the user to find a preferred positioning and direction in the high- dimensional space to look at the internal structure of the social graph layout, keeping it unmodified. A prototype im- plementation of the proposal called Social Viewpoint Finder is tested with about 70 social graphs and this paper reports four of the analysis results.
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics Kolja Kleineberg
The document summarizes research on the hidden geometry of multiplex networks. It finds that real-world multiplex networks often have correlated geometric properties between network layers, with nodes maintaining similar radial and angular coordinates. This has implications like communities of nodes being similar across layers and hyperbolic distance in one layer predicting connections in another. A geometric multiplex model is introduced to generate realistic multiplex networks with tunable geometric correlations between layers.
The document analyzes the use of the Tent map as a source of pseudorandom bits for generating binary codes. It evaluates the Tent map's period length, discrimination value, and merit factor. The Tent map is proposed as an alternative to traditional low-complexity pseudorandom bit generators. Different window functions are applied to the binary codes generated from the Tent map to reduce side lobes and improve performance. Results show discrimination increases with sequence length and some window functions perform better than others at different lengths.
This document discusses dimensionality reduction techniques for hyperspectral images. It proposes using k-means clustering based on statistical measures like variance, standard deviation, and mean absolute deviation to select bands from hyperspectral images. The number of bands is first estimated using virtual dimensionality. Bands are then clustered based on their statistical properties and one band is selected from each cluster with the maximum value of the statistical measure. Finally, endmembers are extracted from the selected bands using N-FINDR.
We trained a large, deep convolutional neural network to classify the 1.2 million
high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-
ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%
and 17.0% which is considerably better than the previous state-of-the-art. The
neural network, which has 60 million parameters and 650,000 neurons, consists
of five convolutional layers, some of which are followed by max-pooling layers,
and three fully-connected layers with a final 1000-way softmax. To make train-
ing faster, we used non-saturating neurons and a very efficient GPU implemen-
tation of the convolution operation. To reduce overfitting in the fully-connected
layers we employed a recently-developed regularization method called “dropout”
that proved to be very effective. We also entered a variant of this model in the
ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%,
compared to 26.2% achieved by the second-best entry.
We trained a large, deep convolutional neural network to classify the 1.2 million
high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-
ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%
and 17.0% which is considerably better than the previous state-of-the-art. The
neural network, which has 60 million parameters and 650,000 neurons, consists
of five convolutional layers, some of which are followed by max-pooling layers,
and three fully-connected layers with a final 1000-way softmax. To make train-
ing faster, we used non-saturating neurons and a very efficient GPU implemen-
tation of the convolution operation. To reduce overfitting in the fully-connected
layers we employed a recently-developed regularization method called “dropout”
that proved to be very effective. We also entered a variant of this model in the
ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%,
compared to 26.2% achieved by the second-best entry.
Fractal Image Compression By Range Block ClassificationIRJET Journal
This document proposes a method for fractal image compression using range block classification and particle swarm optimization. It begins with an abstract that describes fractal image compression as a lossy technique that partitions images into range and domain blocks, with each range block searching domain blocks for the best match using PSO. The document then provides more details on PSO, describes implementing fractal image compression with PSO by having particles represent domain block locations and fitness measure matching between blocks, and shows experimental results compressing test images with the method. The goal is to improve compression ratio and decompressed image quality over traditional techniques by using PSO for block matching.
1. The document presents a new approach for steganography detection using a combination of Fisher's linear discriminant function (FLD) and radial basis function neural network (RBF).
2. In the training phase, FLD is used to project high-dimensional image data onto a lower dimensional space, then an RBF network is trained to classify images as containing hidden data or not.
3. Experiments show the combined FLD-RBF method provides promising results for steganography detection compared to existing supervised methods, though extracting the hidden information remains challenging.
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
This document summarizes key developments in deep learning for object detection from 2012 onwards. It begins with a timeline showing that 2012 was a turning point, as deep learning achieved record-breaking results in image classification. The document then provides overviews of 250+ contributions relating to object detection frameworks, fundamental problems addressed, evaluation benchmarks and metrics, and state-of-the-art performance. Promising future research directions are also identified.
안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개할 논문은 3D관련 업무를 진행 하시는/ 희망하시는 분들의 필수 논문인 VoxelNET 입니다.
발표자료:https://www.slideshare.net/taeseonryu/mcsemultimodal-contrastive-learning-of-sentence-embeddings
안녕하세요! 딥러닝 논문읽기 모임입니다.
오늘은 자율 주행, 가정용 로봇, 증강/가상 현실과 같은 다양한 응용 분야에서 중요한 문제인 3D 포인트 클라우드에서의 객체 탐지에 대한 획기적인 진전을 소개하고자 합니다. 이를 위해 'VoxelNet'이라는 새로운 3D 탐지 네트워크에 대해 알아보겠습니다.
1. 기존 방법의 한계
기존의 많은 노력은 수동으로 만들어진 특징 표현, 예를 들어 새의 눈 시점 투영 등에 집중해 왔습니다. 하지만 이러한 방법들은 LiDAR 포인트 클라우드와 영역 제안 네트워크(RPN) 사이의 연결을 효과적으로 수행하기 어렵습니다.
2. VoxelNet의 혁신적 접근법
VoxelNet은 3D 포인트 클라우드를 위한 수동 특징 공학의 필요성을 없애고, 특징 추출과 바운딩 박스 예측을 단일 단계, end-to-end 학습 가능한 깊은 네트워크로 통합합니다. VoxelNet은 포인트 클라우드를 균일하게 배치된 3D 복셀로 나누고, 새롭게 도입된 복셀 특징 인코딩(VFE) 레이어를 통해 각 복셀 내의 포인트 그룹을 통합된 특징 표현으로 변환합니다.
3. 효과적인 기하학적 표현 학습
이 방식을 통해 포인트 클라우드는 서술적인 체적 표현으로 인코딩되며, 이는 RPN에 연결되어 탐지를 생성합니다. VoxelNet은 다양한 기하학적 구조를 가진 객체의 효과적인 구별 가능한 표현을 학습합니다.
4. 성능 평가
KITTI 자동차 탐지 벤치마크에서의 실험 결과, VoxelNet은 기존의 LiDAR 기반 3D 탐지 방법들을 큰 차이로 능가했습니다. 또한, LiDAR만을 기반으로 한 보행자와 자전거 탐지에서도 희망적인 결과를 보였습니다.
VoxelNet의 도입은 3D 포인트 클라우드에서의 객체 탐지를 혁신적으로 개선하고 있으며, 이 분야에서의 미래 발전에 중요한 영향을 미칠 것으로 기대됩니다.
오늘 논문 리뷰를 위해 이미지처리 허정원님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
https://youtu.be/yCgsCyoJoMg
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...IOSR Journals
In this paper we present the feature extraction based estimation of rain fall by cross correlating
cloud RADAR Data. The idea is to select a square box of around 200x200 pixels around the point of interest and
take the cross correlation between the last picture and one that is 5 or 10 minutes older. We then determine the
wind direction and speed by finding the highest point in the correlation. Last step is to interpolate the data
acquired in a tagged format to the latest data in the up-wind direction to get a prediction for the near future.
The basic principle works, but it is hard to get a good estimate of the wind direction.
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...IOSR Journals
Abstract: In this paper we present the feature extraction based estimation of rain fall by cross correlating
cloud RADAR Data. The idea is to select a square box of around 200x200 pixels around the point of interest and
take the cross correlation between the last picture and one that is 5 or 10 minutes older. We then determine the
wind direction and speed by finding the highest point in the correlation. Last step is to interpolate the data
acquired in a tagged format to the latest data in the up-wind direction to get a prediction for the near future.
The basic principle works, but it is hard to get a good estimate of the wind direction.
Keywords – Feature Extraction, Cross correlation, Rain Fall, RADAR, Image Processing.
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimationtaeseon ryu
안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개 드릴 논문은 Deep High-Resolution Representation Learning for Human Pose Estimation 라는 제목의 논문입니다.
오늘 소개드릴 논문은 Pose Estimation에 관련된 논문 입니다. 기존 Pose Estimation 모델의 경우 직렬적인 네트워크 구조를 지녔지만, 직렬적인 구조는 압축하는 과정에서
지엽적인 정보들의 손실을 가져오게 되고 모든 프로세스가 upsampling에 과도하게 의존하고 있다는 한계점을 가지고 있습니다.그래서 이러한 한계점을 극복하고자HRNet은 이러한 직렬 구조에서 벗어나 병렬 구조로 subnetwork를 구성했습니다.
A new four-dimensional hyper-chaotic system for image encryption IJECEIAES
Currently, images are very important with the rapid growth of communication networks. Therefore, image encryption is a process to provide security for private information and prevent unwanted access to sensitive data by unauthorized individuals. Chaos systems provide an important role for key generation, with high randomization properties and accurate performance. In this study, a new four-dimensional hyper-chaotic system has been suggested that is used in the keys generation, which are utilized in the image encryption process to achieve permutation and substitution operations. Firstly, color bands are permuted using the index of the chaotic sequences to remove the high correlation among neighboring pixels. Secondly, dynamic S-boxes achieve the principle of substitution, which are utilized to diffuse the pixel values of the color image. The efficiency of the proposed method is tested by the key space, histogram, and so on. Security analysis shows that the proposed method for encrypting images is secure and resistant to different attacks. It contains a big key space of (2627) and a high sensitivity to a slight change in the secret key, a fairly uniform histogram, and entropy values nearby to the best value of 8. Moreover, it consumes a very short time for encryption and decryption.
Graph Neural Network for Phenotype Predictiontuxette
This document describes a study on using graph neural networks (GNNs) for phenotype prediction from gene expression data. The objectives are to determine if including network information can improve predictions, which network types work best, and if GNNs can learn network inferences. It provides background on GNNs and how they generalize convolutional layers to graph data. The authors implemented a GNN model from previous work as a starting point and tested it on different network types to see which network information is most useful for predictions. Their methodology involves comparing GNN performance to other methods like random forests using 10-fold cross validation.
Similar to Tutorial on Polynomial Networks at CVPR'22 (20)
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
5. Deep-learning architectures
K. He, X. Zhang, X., S. Ren, J. Sun, Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition
(CVPR), 2016
High-degree polynomial expansions 20th
June 2022 5 / 100
6. Deep-learning architectures
(a) (b)
J Hu, L Shen, G Sun. ’Squeeze-and-excitation networks.’ In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
X. Wang, R. Girshick, A. Gupta, K. He. ’Non-local Neural Networks.’ In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
High-degree polynomial expansions 20th
June 2022 6 / 100
11. Non-local neural network is a 3rd degree polynomial
High-degree polynomial expansions 20th
June 2022 11 / 100
12. Non-local neural network is a 3rd degree polynomial
High-degree polynomial expansions 20th
June 2022 12 / 100
13. Self-Attention is a 3rd degree polynomial
High-degree polynomial expansions 20th
June 2022 13 / 100
14. Learning with polynomials, an old idea
Mapping Units [Hinton, 1985], ”dynamic mapping” [v.d. Malsburg;
1981]
Binocular+Motion Energy models [Adelson, Bergen; 1985], [Ozhawa,
DeAngelis, Freeman; 1990], [Fleet et al., 1994].
Sigma-Pi neural unit [Mel, Koch; 1990].
Higher Order Botlzmann Machines / Higher Order Neural Networks
[Sejnowski; 1986].
Subspace SOM [Kohonen; 1996], topographic ICA [Hyvarinen, Hoyer;
2000] [Karklin, Lewicki;2003].
Bilinear Models [Tenebaum and Freeman; 2008], [Ohlshaussen; 1994],
[Grimes, Rao; 2005].
Higher Order Restricted Boltzmann Machines (RBMs) [Memisevic
and Hinton; 2007], [Ranzato et al; 2010].
Gating mechanisms; LSTM [Hochreiter, Schmidhuber 1997],
Multiplicative RNN [Sutskever, Martens, Hinton; 2011].
High-degree polynomial expansions 20th
June 2022 14 / 100
15. Group Method of Data Handling (GMDH)
One of the first approaches of systematic design of nonlinear
relationships.
Generation of Partial Descriptions of data (PDs) with two input
variables.
Shortcoming: tends to produce an overly complex network.
A Ivakhnenko. ‘Polynomial theory of complex systems.’ IEEE Transactions on Systems, Man, and Cybernetics, 1971.
High-degree polynomial expansions 20th
June 2022 15 / 100
16. Mapping Units / Higher Order Boltzmann Machines
Hinton et al. (1985) and Sutskever et al. (2011) argue that
multiplications (mapping units) allow for better modeling of
conjunctions.
Higher order Boltzmann Machines and Higher order RBMs utilize
multiplication in factorized representations, e.g., bilinear models
factorize style and content.
High-degree polynomial expansions 20th
June 2022 16 / 100
17. Pi-Sigma network (PSN)
Single hidden layer learns multiple affine transformations of the data,
multiplies them to obtain the output.
hji =
X
k
wkji xk + θji
yi = σ(
Y
j
hji ) .
Y Shin, J Ghosh. ‘The pi-sigma network: an efficient higher-order neural network for pattern classification and function approximation.’ International Joint Conference on
Neural Networks, 1991.
High-degree polynomial expansions 20th
June 2022 17 / 100
18. Sigma-Pi-Sigma Neural Network (SPSNN)
Composed of different orders of pi-sigma networks.
fSPSNN =
k
X
i=1
fPSNk
=
k
X
i=1
k
Y
j=1
hjk .
C Li. ‘A sigma-pi-sigma neural network (SPSNN).’ Neural Processing Letters, 2003.
High-degree polynomial expansions 20th
June 2022 18 / 100
19. Factorization Machines
Second degree polynomial net to combine the features under sparse
data.
The weight matrix is mapped into a low-rank space using matrix
factorization.
ŷ(x) := w0 +
n
X
i=1
wi xi +
n
X
i=1
n
X
j=i+1
⟨vi , vj⟩ xi xj ,
where the learnable parameters are: w0 ∈ R, w ∈ Rn and
V ∈ Rn×k (k ≫ n).
S Rendle. ‘Factorization Machines.’ International Conference on Data Mining, 2010.
High-degree polynomial expansions 20th
June 2022 19 / 100
20. Variations of Factorization Machines
Field-aware FM (FFM): Different vectors are used when the features
of different fields combination.
ŷ(x) := w0 +
n
X
i=1
wi xi +
n
X
i=1
n
X
j=i+1
D
vi,fj , vj,fi
E
xi xj .
Field-weighted FM: Add a weight parameter for every two features.
ŷ(x) := w0 +
n
X
i=1
wi xi +
n
X
i=1
n
X
j=i+1
⟨vi , vj⟩ xi xjrfi ,fj .
Higher-order FM: Third-order or higher-order feature combination
problems.
Y Juan, Y Zhuang, W Chin, C Lin. ‘Field-aware factorization machines for CTR prediction.’ In ACM conference on recommender systems, 2016.
J Pan, et al. ‘Field-weighted factorization machines for click-through rate prediction in display advertising.’ In World Wide Web Conference, 2018.
M Blondel, A Fujino, N Ueda, M Ishihata. ‘Higher-order factorization machines.’ In Advances in neural information processing systems (NeurIPS), 2016.
High-degree polynomial expansions 20th
June 2022 20 / 100
21. Multiplicative Recurrent Neural Networks (MRNN)
Character-level language modeling tasks.
Multiplicative (or “gated”) connections.
factor state sequence ft = diag(Wfx xt) · Wfhht−1
hidden state sequence ht = tanh(Whf ft + Whx xt)
output sequence ot = Wohht + bo .
I Sutskever, J Martens, G Hinton. ‘Generating text with recurrent neural networks.’ In International Conference on Machine Learning (ICML), 2011.
High-degree polynomial expansions 20th
June 2022 21 / 100
22. Sum-Product Networks (SPN)
H Poon, P Domingos. ‘Sum-product networks: A new deep architecture.’ In International Conference on Computer Vision Workshops, 2011.
High-degree polynomial expansions 20th
June 2022 22 / 100
24. Outline
1 Introduction
2 Higher-degree polynomial expansions
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
5 Future directions
High-degree polynomial expansions 20th
June 2022 24 / 100
25. Outline
1 Introduction
2 Higher-degree polynomial expansions
Notation
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
5 Future directions
High-degree polynomial expansions 20th
June 2022 25 / 100
26. Formalism
In Machine Learning tasks, we have (at least) one input and one
output.
The goal is to learn G(z) : Rd → Ro with z ∈ Rd the input.
Neural networks use a composition of linear and unitary non-linear
units.
We augment this structure and we capture the higher-order
correlations using tensors.
High-degree polynomial expansions 20th
June 2022 26 / 100
27. Hadamard product
Let matrices Γ ∈ R2×3 and P ∈ R2×3. The Hadamard product
Γ ∗ P is denoted as ‘∗’ and defined as:
"
γ(1,1) γ(1,2) γ(1,3)
γ(2,1) γ(2,2) γ(2,3)
#
| {z }
Γ
∗
"
ρ(1,1) ρ(1,2) ρ(1,3)
ρ(2,1) ρ(2,2) ρ(2,3)
#
| {z }
P
=
"
γ(1,1)ρ(1,1) γ(1,2)ρ(1,2) γ(1,3)ρ(1,3)
γ(1,1)ρ(2,1) γ(1,2)ρ(2,2) γ(1,3)ρ(2,3)
#
| {z }
Γ∗P
(1)
The Hadamard product of Γ ∈ RI×N and P ∈ RI×N results in a
matrix of dimensions I × N.
Hadamard, J. ’Leçons sur la Propagation des Ondes et les Équations de l’Hydrodynamique’, 1903.
Halmos, Paul R. ’Finite-dimensional vector spaces’, Annals of Mathematics Studies, Princeton University Press, 1948.
High-degree polynomial expansions 20th
June 2022 27 / 100
28. Khatri-Rao product
Let matrices Γ ∈ R2×3 and P ∈ R3×3. The Khatri-Rao product
Γ ⊙ P is denoted as ‘⊙’ and defined as:
"
γ(1,1) γ(1,2) γ(1,3)
γ(2,1) γ(2,2) γ(2,3)
#
| {z }
Γ
⊙
ρ(1,1) ρ(1,2) ρ(1,3)
ρ(2,1) ρ(2,2) ρ(2,3)
ρ(3,1) ρ(3,2) ρ(3,3)
| {z }
P
=
γ(1,1)ρ(1,1) γ(1,2)ρ(1,2) γ(1,3)ρ(1,3)
γ(1,1)ρ(2,1) γ(1,2)ρ(2,2) γ(1,3)ρ(2,3)
γ(1,1)ρ(3,1) γ(1,2)ρ(3,2) γ(1,3)ρ(3,3)
γ(2,1)ρ(1,1) γ(2,2)ρ(1,2) γ(2,3)ρ(1,3)
γ(2,1)ρ(2,1) γ(2,2)ρ(2,2) γ(2,3)ρ(2,3)
γ(2,1)ρ(3,1) γ(2,2)ρ(3,2) γ(2,3)ρ(3,3)
| {z }
Γ⊙P
(2)
The Khatri-Rao product of Γ ∈ RI×N and P ∈ RJ×N results in a
matrix of dimensions (IJ) × N.
Khatri, C. G., and C. Radhakrishna Rao. ’Solutions to some functional equations and their applications to characterization of probability distributions.’ Sankhyā: the Indian
journal of statistics, series A (1968): 167-180.
High-degree polynomial expansions 20th
June 2022 28 / 100
30. Tensors
Tensors → multi-dimensional arrays.
The order is the number of dimensions, e.g. X ∈ R4×4×4 has order 3.
High-degree polynomial expansions 20th
June 2022 29 / 100
31. Tensors
Tensors → multi-dimensional arrays.
The order is the number of dimensions, e.g. X ∈ R4×4×4 has order 3.
Third-order tensor illustration:
𝑥𝑖
𝑥𝑗
𝑥𝑘
High-degree polynomial expansions 20th
June 2022 29 / 100
32. Tensors
Tensors → multi-dimensional arrays.
The order is the number of dimensions, e.g. X ∈ R4×4×4 has order 3.
Third-order tensor illustration:
𝑥𝑖
𝑥𝑗
𝑥𝑘
Let W ∈ RI1×···×IM and u ∈ RIm with m ∈ [1, . . . , M]. The mode-m
vector product W ×m u is:
(W ×m u)i1,...,im−1,im+1,...,iM
=
Im
X
im=1
wi1,...,iM
uim (3)
High-degree polynomial expansions 20th
June 2022 29 / 100
33. CP decomposition
Goal: Decompose a tensor W to a sequence of low-rank components.
High-degree polynomial expansions 20th
June 2022 30 / 100
34. CP decomposition
Goal: Decompose a tensor W to a sequence of low-rank components.
In matrix form: W(1)
.
= U[1]
J2
m=M U[m]
T
where {U[m]}M
m=1 are
the factor matrices.
High-degree polynomial expansions 20th
June 2022 30 / 100
35. CP decomposition
Goal: Decompose a tensor W to a sequence of low-rank components.
In matrix form: W(1)
.
= U[1]
J2
m=M U[m]
T
where {U[m]}M
m=1 are
the factor matrices.
A schematic of the CP decomposition of a third-order tensor W is:
Figure: CP decomposition of a third-order tensor.
High-degree polynomial expansions 20th
June 2022 30 / 100
36. Outline
1 Introduction
2 Higher-degree polynomial expansions
Polynomial expansion with respect to an input vector
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
5 Future directions
High-degree polynomial expansions 20th
June 2022 31 / 100
37. Polynomial approximation
Approximate the τth element G(z)τ with a Nth-degree polynomial:
(G(z))τ ≈ βτ +
d
X
i=1
w
[1]
τ,i zi +
d
X
i=1
d
X
j=1
w
[2]
τ,i,jzi zj + · · · +
d
X
i=1
d
X
j=1
. . .
d
X
k=1
| {z }
N summations
w
[N]
τ,i,j,...,kzi zj . . . zk
(4)
Both βτ ∈ R and the set of tensors
W[n]
τ ∈ R
Qn
m=1
×md N
n=1
are
learnable parameters.
High-degree polynomial expansions 20th
June 2022 32 / 100
38. Polynomial approximation
The last equation (4) can be written in the tensor format as:
(G(z))τ ≈ βτ + w[1]
τ
T
z + zT
W[2]
τ z + · · · + W[N]
τ
N
Y
n=1
×nz (5)
By stacking the polynomials for all elements τ ∈ [1, . . . , o], we obtain:
G(z) ≈
N
X
n=1
W[n]
n+1
Y
j=2
×jz
+ β (6)
From Stone-Weierstrass theorem, a polynomial can approximate any
smooth function.
High-degree polynomial expansions 20th
June 2022 33 / 100
39. Polynomial approximation - learnable parameters
The learnable parameters of (6) are Θ(dN).
High-degree polynomial expansions 20th
June 2022 34 / 100
40. Polynomial approximation - learnable parameters
The learnable parameters of (6) are Θ(dN).
A solution to reduce them: demand each factor W[n]
to be low-rank.
High-degree polynomial expansions 20th
June 2022 35 / 100
41. Outline
1 Introduction
2 Higher-degree polynomial expansions
Tensor decomposition per degree
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
5 Future directions
High-degree polynomial expansions 20th
June 2022 36 / 100
42. Tensor decomposition per degree
First solution: Demand each factor W[n]
to be low-rank.
Apply CP decomposition to each factor W[n]
.
Then, the expansion for N = 3 is:
y = β + CT
1,[1]z +
CT
1,[2]z
∗
CT
2,[2]z
+
CT
1,[3]z
∗
CT
2,[3]z
∗
CT
3,[3]z
(7)
G Chrysos*, M Georgopoulos*, J Deng, J Kossaifi, Y Panagakis, A Anandkumar, ‘Augmenting Deep Classifiers with Polynomial Neural Networks.’ European Conference on
Computer Vision (ECCV), 2022.
High-degree polynomial expansions 20th
June 2022 37 / 100
43. Khatri-Rao to Hadamard product
Lemma (Chrysos’19)
For a set of N matrices {A[ν] ∈ RIν ×K }N
ν=1 and {B[ν] ∈ RIν ×L}N
ν=1, the
following equality holds:
(
N
K
ν=1
A[ν])T
· (
N
K
ν=1
B[ν]) = (AT
[1] · B[1]) ∗ . . . ∗ (AT
[N] · B[N]), (8)
where the symbol ‘∗’ denotes the Hadamard product.
G Chrysos, S Moschoglou, Y Panagakis, and S Zafeiriou. ‘Polygan: High-order polynomial generators.’ arXiv preprint arXiv:1908.06571.
High-degree polynomial expansions 20th
June 2022 38 / 100
44. Factorization of Univariate Polynomials Over Finite Fields
Berlekamp’s algorithm (1970): only practical over small finite fields.
Cantor–Zassenhaus Algorithm (1981): Probabilistic algorithms.
Victor Shoup Algorithm (1990): Deterministic algorithm.
E Berlekamp. ‘Factoring Polynomials Over Large Finite Fields.’ In Mathematics of Computation, 1970.
D Cantor, H Zassenhaus. ‘A New Algorithm for Factoring Polynomials Over Finite Fields.’ In Mathematics of Computation, 1981.
V Shoup. ‘On the deterministic complexity of factoring polynomials over finite fields.’ In Information Processing Letters, 1990.
High-degree polynomial expansions 20th
June 2022 39 / 100
45. Decoupling Multivariate Polynomials
Factorizing multivariate polynomials as a linear combination of
univariate polynomials has been studied using tensor decompositions.
Using first-order information and CP decomposition.
Obtain a decomposition of the form:
fi (u1, . . . , um) =
r
X
j=1
wij · gj
m
X
k=1
vkjuk
, ∀i = 1, . . . , n ,
Matrix form decoupled representation:
f (u) = Wg(V⊤
u) ,
P. Dreesen, M. Ishteva, J. Schoukens. ‘Decoupling Multivariate Polynomials Using First-Order Information and Tensor Decompositions.’ Journal on Matrix Analysis and
Applications, 2015.
High-degree polynomial expansions 20th
June 2022 40 / 100
46. Outline
1 Introduction
2 Higher-degree polynomial expansions
Π−nets: Joint decompositions across degrees
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
5 Future directions
High-degree polynomial expansions 20th
June 2022 41 / 100
47. Π-nets: Third-degree expansion schematic - Model CCP
Figure: Third-degree expansion.
G Chrysos, S Moschoglou, Y Panagakis, and S Zafeiriou. ‘Polygan: High-order polynomial generators.’ arXiv preprint arXiv:1908.06571.
G Chrysos, S Moschoglou, G Bouritsas, Y Panagakis, J Deng, and S Zafeiriou. ‘Π-nets: Deep Polynomial Neural Networks.’ In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2020.
High-degree polynomial expansions 20th
June 2022 42 / 100
48. Π-nets: Third-degree expansion schematic - Model CCP
Figure: Third-degree expansion.
G Chrysos, S Moschoglou, Y Panagakis, and S Zafeiriou. ‘Polygan: High-order polynomial generators.’ arXiv preprint arXiv:1908.06571.
G Chrysos, S Moschoglou, G Bouritsas, Y Panagakis, J Deng, and S Zafeiriou. ‘Π-nets: Deep Polynomial Neural Networks.’ In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2020.
High-degree polynomial expansions 20th
June 2022 42 / 100
49. Π-nets: Third-degree expansion schematic - Model CCP
Figure: Third-degree expansion.
G Chrysos, S Moschoglou, Y Panagakis, and S Zafeiriou. ‘Polygan: High-order polynomial generators.’ arXiv preprint arXiv:1908.06571.
G Chrysos, S Moschoglou, G Bouritsas, Y Panagakis, J Deng, and S Zafeiriou. ‘Π-nets: Deep Polynomial Neural Networks.’ In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2020.
High-degree polynomial expansions 20th
June 2022 42 / 100
50. Π-nets: Third-degree expansion schematic - Model CCP
Figure: Third-degree expansion.
G Chrysos, S Moschoglou, Y Panagakis, and S Zafeiriou. ‘Polygan: High-order polynomial generators.’ arXiv preprint arXiv:1908.06571.
G Chrysos, S Moschoglou, G Bouritsas, Y Panagakis, J Deng, and S Zafeiriou. ‘Π-nets: Deep Polynomial Neural Networks.’ In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2020.
High-degree polynomial expansions 20th
June 2022 42 / 100
51. Π−nets - Model CCP
We use a coupled CP decomposition, i.e., factor sharing in different
levels.
To demonstrate the method, we assume a third degree expansion, i.e.,
N = 3 in (6).
Then, the expansion is:
G(z) = β + W[1]
z + W[2]
×2 z ×3 z + W[3]
×2 z ×3 z ×4 z (9)
High-degree polynomial expansions 20th
June 2022 43 / 100
52. Π−nets - Third-degree expansion - Model CCP
We use the following factorizations:
Let W[1] = CUT
[1], be the parameters for first level of approximation.
Assume W[2]
= W
[2]
1:2 + W
[2]
1:3. We use a coupled CP decomposition
which results in the following matrix form:
W
[2]
(1) = C(U[3] ⊙ U[1])T + C(U[2] ⊙ U[1])T .
Let the third-degree parameters: W
[3]
(1) = C(U[3] ⊙ U[2] ⊙ U[1])T .
High-degree polynomial expansions 20th
June 2022 44 / 100
53. Π−nets - Nth
degree expansion
The derivation can be extended to an arbitrary degree with the
following recursive formulation:
xn =
UT
[n]z
∗ xn−1 + xn−1 , (CCP)
for n = 2, . . . , N with x1 = UT
[1]z and x = CxN + β. The parameters
C ∈ Ro×k, U[n] ∈ Rd×k for n = 1, . . . , N are learnable.
High-degree polynomial expansions 20th
June 2022 45 / 100
54. Π−nets - Alternative models
Model CCP above assumes a certain factorization, e.g.,
W[2]
= W
[2]
1:2 + W
[2]
1:3.
New models can be derived by changing the assumptions.
For instance, what if we assume that the tensors admit nested
decompositions?
High-degree polynomial expansions 20th
June 2022 46 / 100
55. Π-nets: Model NCP
The model with nested decompositions, called NCP, for N = 3:
b[1] B[1] ∗ S[2] + ∗ S[3] + ∗ C +
A[1] A[2] A[3]
z
B[2] B[3]
b[2] b[3]
β
G(z)
Figure: Third-degree expansion.
High-degree polynomial expansions 20th
June 2022 47 / 100
56. Π-nets: Model NCP
The derivation can be extended to an arbitrary degree with the following
recursive formulation:
xn =
AT
[n]z
∗
ST
[n]xn−1 + BT
[n]b[n]
, (NCP)
for n = 2, . . . , N with x1 =
AT
[1]z
∗
BT
[1]b[1]
and x = CxN + β.
High-degree polynomial expansions 20th
June 2022 48 / 100
57. Π-nets: Product of polynomials
The previous formulations, e.g. (CCP), require Θ(N) layers for Nth
degree expansion.
Can we achieve a higher degree expansion with less parameters?
Yes. For instance, by stacking lower-degree polynomials sequentially.
z · · · G(z)
Order 2 Order 2
Order 2N
∗ ∗
Figure: Stacking N polynomials of degree 2, results in a 2N
polynomial expansion.
High-degree polynomial expansions 20th
June 2022 49 / 100
58. Outline
1 Introduction
2 Higher-degree polynomial expansions
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
5 Future directions
High-degree polynomial expansions 20th
June 2022 50 / 100
60. SORT model
The model obtains the following formulation:
x = UT
[1]z + UT
[2]z +
UT
[1]z
∗
UT
[2]z
. (10)
Y Wang, L Xie, C Liu, Y Zhang, W Zhang, A Yuille. ‘SORT: Second-Order Response Transform for Visual Recognition.’ International Conference on Computer Vision
(ICCV), 2017.
High-degree polynomial expansions 20th
June 2022 52 / 100
61. Squeeze-and-Excitation network
Squeeze-and-Excitation network (SENet): The output of the
SENet block YSE with respect to input X ∈ Rhw×C (h is the height,
w is the width) can be formulated as:
YSE
= (XW1) ∗ r(p(XW1)W2) = (XW1) ∗
−
→
1
1
hw
−
→
1 T
XW1
W2
T
(11)
where W1, W2 are learnable parameters.
J Hu, L Shen, G Sun. ’Squeeze-and-excitation networks.’ In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
High-degree polynomial expansions 20th
June 2022 53 / 100
62. Non-local (NL) neural network
Non-local (NL) neural network: The output of the non-local block
YNL ∈ RN×C with respect to input X ∈ RN×C can be formulated as:
YNL
= (XW1W⊤
2 X⊤
)(XW3), (12)
where W1, W2, W3 ∈ RC×C are learnable parameters.
Scales quadratically with the dimension N (i.e. O(N2) complexity).
X Wang, R Girshick, A Gupta, K He. ’Non-local Neural Networks.’ In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
High-degree polynomial expansions 20th
June 2022 54 / 100
63. Poly-NL
Poly-NL: The output YPoly-NL
∈ RN×C is expressed by Using 3 degree
polynomial nets as non-local self-attention block:
YPoly-NL
= (Φ(XW1 ∗ XW2) ∗ X)W3, (13)
where learnable parameters W1, W2, W3 ∈ RC×C .
Scales linearly with the dimension N (i.e. O(N) complexity).
F Babiloni, et al. ‘Poly-NL: Linear Complexity Non-local Layers with Polynomials.’ In International Conference on Computer Vision (ICCV), 2021.
High-degree polynomial expansions 20th
June 2022 55 / 100
64. Linear Complexity Self-Attention with Polynomials
Poly-NL reformulates SA using only global descriptors and element-wise
multiplications, achieving Linear Complexity O(N).
High-degree polynomial expansions 20th
June 2022 56 / 100
65. Poly-NL: Space and Time Complexity
(a) (b)
Figure: Poly-NL achieves up to 10× speed up in run-time and a 5× less
complexity overhead wrt NL.
High-degree polynomial expansions 20th
June 2022 57 / 100
66. Non-local with lower-degree interactions
PDC-NL: Y = (XW1W⊤
2 X⊤)(XW3) + XW4XW5 + XW6
Includes first to third degrees term based on NL (only third degree).
G Chrysos*, M Georgopoulos*, J Deng, J Kossaifi, Y Panagakis, A Anandkumar, ‘Augmenting Deep Classifiers with Polynomial Neural Networks.’ European Conference on
Computer Vision (ECCV), 2022.
High-degree polynomial expansions 20th
June 2022 58 / 100
67. Outline
1 Introduction
2 Higher-degree polynomial expansions
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
5 Future directions
High-degree polynomial expansions 20th
June 2022 59 / 100
68. Outline
1 Introduction
2 Higher-degree polynomial expansions
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
Unconditional generation with polynomial networks
5 Future directions
High-degree polynomial expansions 20th
June 2022 60 / 100
69. Expressivity - Generation without activation functions
Results from a generator with convolutional layers without activations:
High-degree polynomial expansions 20th
June 2022 61 / 100
70. Expressivity of Π−nets
We consider image generation without activation functions between the
layers. Synthesized images:
High-degree polynomial expansions 20th
June 2022 62 / 100
71. Expressivity of Π−nets
Linear interpolation in the latent space:
High-degree polynomial expansions 20th
June 2022 63 / 100
72. Image generation from a polynomial generator
High-degree polynomial expansions 20th
June 2022 64 / 100
73. Π−nets on non-euclidean representation learning
Beyond image generation, polynomial nets perform well in non-euclidean
representation learning.
Code: https://github.com/grigorisg9gr/polynomial_nets
G Chrysos, S Moschoglou, G Bouritsas, J Deng, Y Panagakis, and S Zafeiriou. ‘Deep Polynomial Neural Networks.’ IEEE Transactions on Pattern Analysis and Machine
Intelligence (T-PAMI), 2021.
High-degree polynomial expansions 20th
June 2022 65 / 100
74. Outline
1 Introduction
2 Higher-degree polynomial expansions
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
Synthesizing unseen combinations
5 Future directions
High-degree polynomial expansions 20th
June 2022 66 / 100
75. Conditional data generation: Visual examples
Figure: Image-to-image translation examples.
Phillip Isola, et al. ’A Image-to-image translation with conditional adversarial networks’, Conference on Computer Vision and Pattern Recognition (CVPR) 2017.
Mehdi Mirza and Simon Osindero. ’Conditional generative adversarial nets’, CoRR 2014.
High-degree polynomial expansions 20th
June 2022 67 / 100
79. MLC-VAE - Our framework
We instead model each attribute combination with a different mean.
How to obtain the mean:
M(y1, y2) = W[1]
y1 + W[2]
y2 + W[12]
×2 y1 ×3 y2, (14)
for attributes y1, y2.
M Georgopoulos, G Chrysos, M Pantic, and Y Panagakis. ‘Multilinear Latent Conditioning for Generating Unseen Attribute Combinations.’ In International Conference on
Machine Learning (ICML), 2020.
High-degree polynomial expansions 20th
June 2022 71 / 100
81. MLC-VAE - Multiplicative interactions
Can we use additive interactions instead?
Not really. For instance, synthesize images with attributes (’smile’
and ’closed mouth’).
High-degree polynomial expansions 20th
June 2022 73 / 100
82. Outline
1 Introduction
2 Higher-degree polynomial expansions
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
Conditional image generation with polynomial networks
5 Future directions
High-degree polynomial expansions 20th
June 2022 74 / 100
83. Diverse samples in conditional generation
Figure: In addition to the adversarial loss of GANs, regularization losses are
typically used for enabling diverse synthesis.
Q Mao, H Lee, H Tseng, S Ma, M Yang. ‘Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis.’ In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2019.
High-degree polynomial expansions 20th
June 2022 75 / 100
84. Conditional image generation - Introduction
1 Conditioning the generator still relies on the neural network for the
expressivity.
2 Can we use high-degree polynomial expansions instead?
3 Assume zI, zII ∈ Rd are the input vectors. The goal is to learn a
function G : Rd×d → Ro that captures the higher-order correlations
between the elements of the two inputs.
High-degree polynomial expansions 20th
June 2022 76 / 100
85. CoPE: Nth
-degree expansion - Model CCP
The recursive formulation of CoPE is given by:
xn = xn−1 +
UT
[n,I]zI + UT
[n,II]zII
∗ xn−1, (15)
for n = 2, . . . , N with x1 = UT
[1,I]zI + UT
[1,II]zII and x = CxN + β.
The schematic illustration is the following:
Figure: Nth
-degree expansion for conditional generation.
G Chrysos, M Georgopoulos, and Y Panagakis. ‘Conditional Generation Using Polynomial Expansions.’ In Advances in neural information processing systems (NeurIPS),
2021.
High-degree polynomial expansions 20th
June 2022 77 / 100
86. CoPE: Nth
-degree expansion - Model CCP
The recursive formulation of CoPE is given by:
xn = xn−1 +
UT
[n,I]zI + UT
[n,II]zII
∗ xn−1, (15)
for n = 2, . . . , N with x1 = UT
[1,I]zI + UT
[1,II]zII and x = CxN + β.
The schematic illustration is the following:
Figure: Nth
-degree expansion for conditional generation.
G Chrysos, M Georgopoulos, and Y Panagakis. ‘Conditional Generation Using Polynomial Expansions.’ In Advances in neural information processing systems (NeurIPS),
2021.
High-degree polynomial expansions 20th
June 2022 77 / 100
87. Synthesized images with CoPE
(a) edges-to-handbags (b) edges-to-shoes
Figure: The first row depicts the conditional input (i.e., the edges). The rows 2-6
depict outputs when we vary zI (i.e., noise).
High-degree polynomial expansions 20th
June 2022 78 / 100
88. Beyond two-variable expansion with CoPE
The recursive formulation can be extended beyond two-variable
expansions. For three-variables the formulation is the following:
xn = xn−1 +
UT
[n,I]zI + UT
[n,II]zII + UT
[n,III]zIII
∗ xn−1, (16)
for n = 2, . . . , N with x1 = UT
[1,I]zI +UT
[1,II]zII +UT
[1,III]zIII and x = CxN +β.
Code:
https://github.com/grigorisg9gr/polynomial_nets_for_conditional_generation
High-degree polynomial expansions 20th
June 2022 79 / 100
89. Beyond two-variable expansion with CoPE
Synthesized images on conditional generation with 2 attributes:
(a) (b)
Figure: (a) Each row/column depicts a different hair/eye color respectively, (b)
synthesized images per unique combination by varying the noise zI.
High-degree polynomial expansions 20th
June 2022 80 / 100
90. Outline
1 Introduction
2 Higher-degree polynomial expansions
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
Audio synthesis
5 Future directions
High-degree polynomial expansions 20th
June 2022 81 / 100
91. Audio representation
Time domain VS Frequency domain
Figure: Source: https://www.nti-audio.com/en/support/know-how/fast-fourier-transform-fft
High-degree polynomial expansions 20th
June 2022 82 / 100
92. How to model the complex-valued frequency representations?
Real-valued neural networks (RVNNs) with 1 output channel for the
magnitude of complex-valued representations:
Discard the phase information.
Require phase reconstruction in a generative task.
RVNNs with 2 output channels for complex-valued representations:
Higher degree of freedom at the synaptic weighting.
Lower generalization ability.
How about directly modelling the complex-valued representations?
A Hirose, S. Yoshida. ’Generalization Characteristics of Complex-Valued Feedforward Neural Networks in Relation to Signal Coherence.’ IEEE Transactions on Neural
Networks and Learning Systems, 2012.
High-degree polynomial expansions 20th
June 2022 83 / 100
93. Mergelyan’s Theorem
Suppose K is a compact set in the plane whose complement is connected,
f is a continuous complex-valued function defined on K which is
holomorphic in the interior of K, and if ϵ 0, then there exists a
polynomial P such that |f (x) − P(x)| ϵ for all x ∈ K.
W Rudin. ’Real and Complex Analysis.’ McGraw-Hill International Series, 1987.
High-degree polynomial expansions 20th
June 2022 84 / 100
94. Schematic of the generator
Audiorepresentation
in frequencydomain
Complex-valued
randomnoise
Audiorepresentation
in frequencydomain
Complex-valued
randomnoise
...
...
...
from degreeto degree
APOLLOgenerator
(Model BN)
Yongtao Wu, G Chrysos, Volkan Cevher. ’Adversarial Audio Synthesis with Complex-valued Polynomial Networks.’ 2022.
High-degree polynomial expansions 20th
June 2022 85 / 100
95. Model in the complex field
CFBN (Nested CP decomposition with bias):
The recursive form for Nth degree expansion is:
e
yn =
ET
[n]e
x + ρ[n]
∗
FT
[n]e
yn−1 + b[n]
+ e
yn−1, (17)
for n = 2, . . . , N with e
y1 = (e
ET
[1]
e
x) ∗
e
b[1]
, e
y = e
He
yN + e
h, where we
denote by e
b[n] = e
BT
[n]
e
β[n] for n = 1, . . . , N.
High-degree polynomial expansions 20th
June 2022 86 / 100
97. Human evaluation
Human evaluation on unsupervised audio generation on SC09 dataset.
From left to right in the histogram, the Mean Opinion Score (MOS)
for all models and the real data are 1.61, 2.68, 2.73, 3.33, and 4.73,
respectively.
APOLLO
-Nets Real
TiFGAN
WaveGAN
Rating
High-degree polynomial expansions 20th
June 2022 88 / 100
100. Outline
1 Introduction
2 Higher-degree polynomial expansions
3 Object recognition with polynomial networks
4 Data generation with polynomial networks
5 Future directions
High-degree polynomial expansions 20th
June 2022 91 / 100
101. Complementary work on polynomial networks I
1 Polynomial networks can enlarge the hypothesis space [Jayakumar’20,
Fan’21].
S Jayakumar, et al. ‘Multiplicative Interactions and Where to Find Them.’ In International Conference on Learning Representations (ICLR), 2020.
FL Fan, et al. ‘Expressivity and Trainability of Quadratic Networks.’ ArXiv preprint arXiv:2110.06081.
S Zhang, Y Gong, D Yu, ‘Encrypted Speech Recognition using Deep Polynomial Networks.’ In International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2019.
Z Zhu, et al. ‘Controlling the Complexity and Lipschitz Constant improves Polynomial Nets’ In International Conference on Learning Representations (ICLR), 2022.
High-degree polynomial expansions 20th
June 2022 92 / 100
102. Complementary work on polynomial networks I
1 Polynomial networks can enlarge the hypothesis space [Jayakumar’20,
Fan’21].
2 Privacy-preserving applications require polynomial expansions
[Zhang’19].
S Jayakumar, et al. ‘Multiplicative Interactions and Where to Find Them.’ In International Conference on Learning Representations (ICLR), 2020.
FL Fan, et al. ‘Expressivity and Trainability of Quadratic Networks.’ ArXiv preprint arXiv:2110.06081.
S Zhang, Y Gong, D Yu, ‘Encrypted Speech Recognition using Deep Polynomial Networks.’ In International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2019.
Z Zhu, et al. ‘Controlling the Complexity and Lipschitz Constant improves Polynomial Nets’ In International Conference on Learning Representations (ICLR), 2022.
High-degree polynomial expansions 20th
June 2022 92 / 100
103. Complementary work on polynomial networks I
1 Polynomial networks can enlarge the hypothesis space [Jayakumar’20,
Fan’21].
2 Privacy-preserving applications require polynomial expansions
[Zhang’19].
3 Sample complexity (and similar theoretical bounds) might be simpler
to compute [Zhu’22].
S Jayakumar, et al. ‘Multiplicative Interactions and Where to Find Them.’ In International Conference on Learning Representations (ICLR), 2020.
FL Fan, et al. ‘Expressivity and Trainability of Quadratic Networks.’ ArXiv preprint arXiv:2110.06081.
S Zhang, Y Gong, D Yu, ‘Encrypted Speech Recognition using Deep Polynomial Networks.’ In International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2019.
Z Zhu, et al. ‘Controlling the Complexity and Lipschitz Constant improves Polynomial Nets’ In International Conference on Learning Representations (ICLR), 2022.
High-degree polynomial expansions 20th
June 2022 92 / 100
104. Complementary work on polynomial networks I
1 Polynomial networks can enlarge the hypothesis space [Jayakumar’20,
Fan’21].
2 Privacy-preserving applications require polynomial expansions
[Zhang’19].
3 Sample complexity (and similar theoretical bounds) might be simpler
to compute [Zhu’22].
4 Known (theoretical) results from neural networks might not be
directly applicable (e.g., implicit bias).
S Jayakumar, et al. ‘Multiplicative Interactions and Where to Find Them.’ In International Conference on Learning Representations (ICLR), 2020.
FL Fan, et al. ‘Expressivity and Trainability of Quadratic Networks.’ ArXiv preprint arXiv:2110.06081.
S Zhang, Y Gong, D Yu, ‘Encrypted Speech Recognition using Deep Polynomial Networks.’ In International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2019.
Z Zhu, et al. ‘Controlling the Complexity and Lipschitz Constant improves Polynomial Nets’ In International Conference on Learning Representations (ICLR), 2022.
High-degree polynomial expansions 20th
June 2022 92 / 100
105. Theoretical characterization of polynomial networks
0 200 400 600 800 1000
Polynomial degree
10-3
10-2
10-1
100
101
Test
loss
Test loss
Figure: Double descent curve on polynomial regression.
Source: https: // windowsontheory. org/ 2019/ 12/ 05/ deep-double-descent/
High-degree polynomial expansions 20th
June 2022 93 / 100
106. Optimization and training
1 Multiplications can make the loss surface less well behaved [Schwarz
et al.]. How should we adapt the optimizers for polynomial
architectures?
J Schwarz, S Jayakumar, R Pascanu, P Latham, T W Teh. ’Powerpropagation: A sparsity inducing weight reparameterisation.’ In Advances in neural information
processing systems (NeurIPS), 2021.
High-degree polynomial expansions 20th
June 2022 94 / 100
107. Optimization and training
1 Multiplications can make the loss surface less well behaved [Schwarz
et al.]. How should we adapt the optimizers for polynomial
architectures?
2 What is the interaction between model degree and implicit
regularization in polynomial networks?
J Schwarz, S Jayakumar, R Pascanu, P Latham, T W Teh. ’Powerpropagation: A sparsity inducing weight reparameterisation.’ In Advances in neural information
processing systems (NeurIPS), 2021.
High-degree polynomial expansions 20th
June 2022 94 / 100
108. Optimization and training
1 Multiplications can make the loss surface less well behaved [Schwarz
et al.]. How should we adapt the optimizers for polynomial
architectures?
2 What is the interaction between model degree and implicit
regularization in polynomial networks?
3 How should we initialize polynomial networks?
J Schwarz, S Jayakumar, R Pascanu, P Latham, T W Teh. ’Powerpropagation: A sparsity inducing weight reparameterisation.’ In Advances in neural information
processing systems (NeurIPS), 2021.
High-degree polynomial expansions 20th
June 2022 94 / 100
109. Architecture
1 Can we use other popular tensor factorizations, e.g. Tucker
decomposition, to obtain useful architectures?
High-degree polynomial expansions 20th
June 2022 95 / 100
110. Architecture
1 Can we use other popular tensor factorizations, e.g. Tucker
decomposition, to obtain useful architectures?
2 How can we evaluate the differences of those architectures?
High-degree polynomial expansions 20th
June 2022 95 / 100
111. Architecture
1 Can we use other popular tensor factorizations, e.g. Tucker
decomposition, to obtain useful architectures?
2 How can we evaluate the differences of those architectures?
3 How can we determine the degree required by the task at hand?
High-degree polynomial expansions 20th
June 2022 95 / 100
112. Architecture
1 Can we use other popular tensor factorizations, e.g. Tucker
decomposition, to obtain useful architectures?
2 How can we evaluate the differences of those architectures?
3 How can we determine the degree required by the task at hand?
1 Is higher degree always better?
High-degree polynomial expansions 20th
June 2022 95 / 100
113. Architecture
1 Can we use other popular tensor factorizations, e.g. Tucker
decomposition, to obtain useful architectures?
2 How can we evaluate the differences of those architectures?
3 How can we determine the degree required by the task at hand?
1 Is higher degree always better?
2 Where should we have this higher degree?
High-degree polynomial expansions 20th
June 2022 95 / 100
114. Architecture
1 Can we use other popular tensor factorizations, e.g. Tucker
decomposition, to obtain useful architectures?
2 How can we evaluate the differences of those architectures?
3 How can we determine the degree required by the task at hand?
1 Is higher degree always better?
2 Where should we have this higher degree?
3 Is there a total degree that is sufficient for all standard tasks?
High-degree polynomial expansions 20th
June 2022 95 / 100
115. Architecture II
4 How can we express a joint tensor decomposition over all sequential
polynomial networks?
High-degree polynomial expansions 20th
June 2022 96 / 100
116. Architecture II
4 How can we express a joint tensor decomposition over all sequential
polynomial networks?
5 Can we represent all signals of interest with a sequence of polynomial
expansions?
High-degree polynomial expansions 20th
June 2022 96 / 100
117. Architecture II
4 How can we express a joint tensor decomposition over all sequential
polynomial networks?
5 Can we represent all signals of interest with a sequence of polynomial
expansions?
6 How should we reason about activations often used in conjunction
with a polynomial form?
High-degree polynomial expansions 20th
June 2022 96 / 100
118. Architecture II
4 How can we express a joint tensor decomposition over all sequential
polynomial networks?
5 Can we represent all signals of interest with a sequence of polynomial
expansions?
6 How should we reason about activations often used in conjunction
with a polynomial form?
1 Are activations required?
High-degree polynomial expansions 20th
June 2022 96 / 100
119. Architecture II
4 How can we express a joint tensor decomposition over all sequential
polynomial networks?
5 Can we represent all signals of interest with a sequence of polynomial
expansions?
6 How should we reason about activations often used in conjunction
with a polynomial form?
1 Are activations required?
2 Are they mostly there to make learning possible?
High-degree polynomial expansions 20th
June 2022 96 / 100
120. Architecture II
4 How can we express a joint tensor decomposition over all sequential
polynomial networks?
5 Can we represent all signals of interest with a sequence of polynomial
expansions?
6 How should we reason about activations often used in conjunction
with a polynomial form?
1 Are activations required?
2 Are they mostly there to make learning possible?
3 How do they modify the polynomial expansion?
High-degree polynomial expansions 20th
June 2022 96 / 100
121. Robustness of polynomial networks
1 A polynomial expansion with unconstrained input can obtain
extremely large values.
High-degree polynomial expansions 20th
June 2022 97 / 100
122. Robustness of polynomial networks
1 A polynomial expansion with unconstrained input can obtain
extremely large values.
2 How can we constrain their output range values efficiently?
High-degree polynomial expansions 20th
June 2022 97 / 100
123. Robustness of polynomial networks
1 A polynomial expansion with unconstrained input can obtain
extremely large values.
2 How can we constrain their output range values efficiently?
3 How can we make polynomial nets robust to (adversarial) noise?
High-degree polynomial expansions 20th
June 2022 97 / 100
125. Thank you for your attention
1 We would like to thank Francesca Babiloni, Leello Dadi, Zhenyu Zhu
and Yongtao Wu for their help in preparing the tutorial.
2 Further information and materials can be found on
https://polynomial-nets.github.io/.
3 Contact us: grigorios.chrysos [at] epfl.ch.
High-degree polynomial expansions 20th
June 2022 99 / 100