Disentangled representation is the holy grail for representation learning which factorizes human-understandable factors in unsupervised way what help us move forward to interpretable machine learning.
StarGAN is a method for multi-domain image-to-image translation using a single model. It uses an adversarial loss with gradient penalty to train the discriminator. The generator is trained to translate images to different domains based on a target label, reconstruct the original image, and minimize classification and adversarial losses. StarGAN can be trained on multiple datasets by using mask vectors to ignore unknown domain labels. It achieves high quality image translation across different facial attributes and expressions.
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
발표자: 이활석(NAVER)
발표일: 2017.11.
최근 딥러닝 연구는 지도학습에서 비지도학습으로 급격히 무게 중심이 옮겨 지고 있습니다. 본 과정에서는 비지도학습의 가장 대표적인 방법인 오토인코더의 모든 것에 대해서 살펴보고자 합니다. 차원 축소관점에서 가장 많이 사용되는Autoencoder와 (AE) 그 변형 들인 Denoising AE, Contractive AE에 대해서 공부할 것이며, 데이터 생성 관점에서 최근 각광 받는 Variational AE와 (VAE) 그 변형 들인 Conditional VAE, Adversarial AE에 대해서 공부할 것입니다. 또한, 오토인코더의 다양한 활용 예시를 살펴봄으로써 현업과의 접점을 찾아보도록 노력할 것입니다.
1. Revisit Deep Neural Networks
2. Manifold Learning
3. Autoencoders
4. Variational Autoencoders
5. Applications
The document discusses Mask R-CNN, an extension of Faster R-CNN object detection that also performs semantic segmentation. Mask R-CNN adds a branch for predicting segmentation masks on each Region of Interest independently of class. During training, the mask branch learns to segment objects regardless of class, and at test time predicts masks for all classes using a "winner takes all" approach. The document also compares Mask R-CNN to Faster R-CNN and FCN approaches.
This document summarizes a seminar on rate-distortion theory and variational autoencoders. It introduces rate-distortion theory, discusses how it relates to variational autoencoder models, and summarizes several papers on this topic. These include papers that explore fixing issues with the evidence lower bound objective in VAEs, examine how rate-distortion theory relates to human memory distortions, and propose a method called "echo noise" to achieve exact rate-distortion in autoencoders.
StarGAN is a method for multi-domain image-to-image translation using a single model. It uses an adversarial loss with gradient penalty to train the discriminator. The generator is trained to translate images to different domains based on a target label, reconstruct the original image, and minimize classification and adversarial losses. StarGAN can be trained on multiple datasets by using mask vectors to ignore unknown domain labels. It achieves high quality image translation across different facial attributes and expressions.
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
발표자: 이활석(NAVER)
발표일: 2017.11.
최근 딥러닝 연구는 지도학습에서 비지도학습으로 급격히 무게 중심이 옮겨 지고 있습니다. 본 과정에서는 비지도학습의 가장 대표적인 방법인 오토인코더의 모든 것에 대해서 살펴보고자 합니다. 차원 축소관점에서 가장 많이 사용되는Autoencoder와 (AE) 그 변형 들인 Denoising AE, Contractive AE에 대해서 공부할 것이며, 데이터 생성 관점에서 최근 각광 받는 Variational AE와 (VAE) 그 변형 들인 Conditional VAE, Adversarial AE에 대해서 공부할 것입니다. 또한, 오토인코더의 다양한 활용 예시를 살펴봄으로써 현업과의 접점을 찾아보도록 노력할 것입니다.
1. Revisit Deep Neural Networks
2. Manifold Learning
3. Autoencoders
4. Variational Autoencoders
5. Applications
The document discusses Mask R-CNN, an extension of Faster R-CNN object detection that also performs semantic segmentation. Mask R-CNN adds a branch for predicting segmentation masks on each Region of Interest independently of class. During training, the mask branch learns to segment objects regardless of class, and at test time predicts masks for all classes using a "winner takes all" approach. The document also compares Mask R-CNN to Faster R-CNN and FCN approaches.
This document summarizes a seminar on rate-distortion theory and variational autoencoders. It introduces rate-distortion theory, discusses how it relates to variational autoencoder models, and summarizes several papers on this topic. These include papers that explore fixing issues with the evidence lower bound objective in VAEs, examine how rate-distortion theory relates to human memory distortions, and propose a method called "echo noise" to achieve exact rate-distortion in autoencoders.
About Unsupervised Image-to-Image TranslationMehdi Shibahara
Short introduction I did at work of Nvidia's paper on Unsupervised Image-to-Image Translation. Use VAE-GAN to transform daytime images to night time, or cats to tigers!
The document summarizes the policy gradient theorem, which provides a way to perform policy improvement in reinforcement learning using gradient ascent on the expected returns with respect to the policy parameters. It begins by motivating policy gradients as a way to do policy improvement when the action space is large or continuous. It then defines the necessary notation, expected returns objective function, and discounted state visitation measure. The main part of the document proves the policy gradient theorem, which expresses the policy gradient as an expectation over the discounted state visitation measure and action-value function. It notes that in practice the action-value function must be estimated, and proves the compatible function approximation theorem, which ensures the policy gradient is computed correctly when using an estimated action-value
Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki
This document discusses disentangled representation learning in deep generative models. It explains that generative models can generate realistic images but it is difficult to control specific attributes of the generated images. Recent research aims to learn disentangled representations where each latent variable corresponds to an independent perceptual factor, such as object pose or color. Methods described include InfoGAN, β-VAE, spatial conditional batch normalization, hierarchical latent variables, and StyleGAN's hierarchical modulation approach. Measuring entanglement through perceptual path length and linear separability is also discussed. The document suggests disentangled representation learning could help applications in biology and medicine by providing better explanatory variables for complex phenomena.
Mask R-CNN extends Faster R-CNN by adding a branch for predicting segmentation masks in parallel with bounding box recognition and classification. It introduces a new layer called RoIAlign to address misalignment issues in the RoIPool layer of Faster R-CNN. RoIAlign improves mask accuracy by 10-50% by removing quantization and properly aligning extracted features. Mask R-CNN runs at 5fps with only a small overhead compared to Faster R-CNN.
Faster R-CNN improves object detection by introducing a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. The RPN slides over feature maps and predicts object bounds and objectness at each position. During training, anchors are assigned positive or negative labels based on Intersection over Union with ground truth boxes. Faster R-CNN runs the RPN in parallel with Fast R-CNN for detection, end-to-end in a single network and stage. This achieves state-of-the-art object detection speed and accuracy while eliminating computationally expensive selective search for proposals.
A summary of Categorical Reparameterization with Gumbel-Softmax by Jang et al...Jin-Hwa Kim
This document proposes the Gumbel-Softmax distribution as a way to differentially sample from a categorical distribution. It describes how categorical variables are non-differentiable to train with backpropagation. The REINFORCE algorithm uses the likelihood ratio to estimate gradients but has high variance. Gumbel-Softmax approximates the categorical with a continuous relaxation using the Gumbel-Max trick, allowing gradients to flow. It shows this continuous relaxation performs similarly to the discrete categorical distribution but is differentiable, enabling lower-variance training of models with categorical latent variables.
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
A short presentation on the emerging research on normalizing flows. The presentations follows two recent survey papers on the topic:
[1] Kobyzev, Ivan, Simon Prince, and Marcus Brubaker. Normalizing flows: An introduction and review of current methods, T-PAMI 2020.
[2] Papamakarios, George, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference, arXiv preprint arXiv:1912.02762 (2019).
[Mmlab seminar 2016] deep learning for human pose estimationWei Yang
This document summarizes recent advances in deep learning approaches for human pose estimation. It describes early methods like DeepPose that used cascades of regressors. Later works introduced heatmap regression to capture spatial information. Convolutional Pose Machine and Stacked Hourglass networks further improved accuracy by incorporating stronger context modeling through deeper networks with larger receptive fields and intermediate supervision. These approaches demonstrate that both local appearance cues and modeling of global context and structure are important for accurate human pose estimation.
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
This document proposes methods to incorporate higher-order graph properties into graph neural networks (GNNs). It shows that GNNs are as powerful as the 1-dimensional Weisfeiler-Lehman graph isomorphism test for distinguishing graphs, but cannot capture higher-order properties like triangle counts. The document introduces k-dimensional GNNs and hierarchical k-GNNs to learn representations of subgraphs. Experimental results show these methods improve over 1-GNN baselines on graph classification and regression tasks.
Image Classification And Support Vector MachineShao-Chuan Wang
This document discusses support vector machines and their application to image classification. It provides an overview of SVM concepts like functional and geometric margins, optimization to maximize margins, Lagrangian duality, kernels, soft margins, and bias-variance tradeoff. It also covers multiclass SVM approaches, dimensionality reduction techniques, model selection via cross-validation, and results from applying SVM to an image classification problem.
Generative adversarial networks (GANs) use two neural networks, a generator and discriminator, that compete against each other. The generator aims to produce realistic samples to fool the discriminator, while the discriminator tries to distinguish real samples from generated ones. This adversarial training can produce high-quality, sharp samples but is challenging to train as the generator and discriminator must be carefully balanced.
This document discusses deep generative models including variational autoencoders (VAEs) and generational adversarial networks (GANs). It explains that generative models learn the distribution of input data and can generate new samples from that distribution. VAEs use variational inference to learn a latent space and generate new data by varying the latent variables. The document outlines the key concepts of VAEs including the evidence lower bound objective used for training and how it maximizes the likelihood of the data.
About Unsupervised Image-to-Image TranslationMehdi Shibahara
Short introduction I did at work of Nvidia's paper on Unsupervised Image-to-Image Translation. Use VAE-GAN to transform daytime images to night time, or cats to tigers!
The document summarizes the policy gradient theorem, which provides a way to perform policy improvement in reinforcement learning using gradient ascent on the expected returns with respect to the policy parameters. It begins by motivating policy gradients as a way to do policy improvement when the action space is large or continuous. It then defines the necessary notation, expected returns objective function, and discounted state visitation measure. The main part of the document proves the policy gradient theorem, which expresses the policy gradient as an expectation over the discounted state visitation measure and action-value function. It notes that in practice the action-value function must be estimated, and proves the compatible function approximation theorem, which ensures the policy gradient is computed correctly when using an estimated action-value
Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki
This document discusses disentangled representation learning in deep generative models. It explains that generative models can generate realistic images but it is difficult to control specific attributes of the generated images. Recent research aims to learn disentangled representations where each latent variable corresponds to an independent perceptual factor, such as object pose or color. Methods described include InfoGAN, β-VAE, spatial conditional batch normalization, hierarchical latent variables, and StyleGAN's hierarchical modulation approach. Measuring entanglement through perceptual path length and linear separability is also discussed. The document suggests disentangled representation learning could help applications in biology and medicine by providing better explanatory variables for complex phenomena.
Mask R-CNN extends Faster R-CNN by adding a branch for predicting segmentation masks in parallel with bounding box recognition and classification. It introduces a new layer called RoIAlign to address misalignment issues in the RoIPool layer of Faster R-CNN. RoIAlign improves mask accuracy by 10-50% by removing quantization and properly aligning extracted features. Mask R-CNN runs at 5fps with only a small overhead compared to Faster R-CNN.
Faster R-CNN improves object detection by introducing a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. The RPN slides over feature maps and predicts object bounds and objectness at each position. During training, anchors are assigned positive or negative labels based on Intersection over Union with ground truth boxes. Faster R-CNN runs the RPN in parallel with Fast R-CNN for detection, end-to-end in a single network and stage. This achieves state-of-the-art object detection speed and accuracy while eliminating computationally expensive selective search for proposals.
A summary of Categorical Reparameterization with Gumbel-Softmax by Jang et al...Jin-Hwa Kim
This document proposes the Gumbel-Softmax distribution as a way to differentially sample from a categorical distribution. It describes how categorical variables are non-differentiable to train with backpropagation. The REINFORCE algorithm uses the likelihood ratio to estimate gradients but has high variance. Gumbel-Softmax approximates the categorical with a continuous relaxation using the Gumbel-Max trick, allowing gradients to flow. It shows this continuous relaxation performs similarly to the discrete categorical distribution but is differentiable, enabling lower-variance training of models with categorical latent variables.
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
A short presentation on the emerging research on normalizing flows. The presentations follows two recent survey papers on the topic:
[1] Kobyzev, Ivan, Simon Prince, and Marcus Brubaker. Normalizing flows: An introduction and review of current methods, T-PAMI 2020.
[2] Papamakarios, George, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference, arXiv preprint arXiv:1912.02762 (2019).
[Mmlab seminar 2016] deep learning for human pose estimationWei Yang
This document summarizes recent advances in deep learning approaches for human pose estimation. It describes early methods like DeepPose that used cascades of regressors. Later works introduced heatmap regression to capture spatial information. Convolutional Pose Machine and Stacked Hourglass networks further improved accuracy by incorporating stronger context modeling through deeper networks with larger receptive fields and intermediate supervision. These approaches demonstrate that both local appearance cues and modeling of global context and structure are important for accurate human pose estimation.
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
This document proposes methods to incorporate higher-order graph properties into graph neural networks (GNNs). It shows that GNNs are as powerful as the 1-dimensional Weisfeiler-Lehman graph isomorphism test for distinguishing graphs, but cannot capture higher-order properties like triangle counts. The document introduces k-dimensional GNNs and hierarchical k-GNNs to learn representations of subgraphs. Experimental results show these methods improve over 1-GNN baselines on graph classification and regression tasks.
Image Classification And Support Vector MachineShao-Chuan Wang
This document discusses support vector machines and their application to image classification. It provides an overview of SVM concepts like functional and geometric margins, optimization to maximize margins, Lagrangian duality, kernels, soft margins, and bias-variance tradeoff. It also covers multiclass SVM approaches, dimensionality reduction techniques, model selection via cross-validation, and results from applying SVM to an image classification problem.
Generative adversarial networks (GANs) use two neural networks, a generator and discriminator, that compete against each other. The generator aims to produce realistic samples to fool the discriminator, while the discriminator tries to distinguish real samples from generated ones. This adversarial training can produce high-quality, sharp samples but is challenging to train as the generator and discriminator must be carefully balanced.
This document discusses deep generative models including variational autoencoders (VAEs) and generational adversarial networks (GANs). It explains that generative models learn the distribution of input data and can generate new samples from that distribution. VAEs use variational inference to learn a latent space and generate new data by varying the latent variables. The document outlines the key concepts of VAEs including the evidence lower bound objective used for training and how it maximizes the likelihood of the data.
Topic of presentation: Variational autoencoders for speech processing
The main points of the presentation: Variational autoencoders (or VAE) have become one of the most popular unsupervised learning techniques for modelling complex data distributions, such as images and audio. In this talk I'll begin with a general introduction to VAEs and then review a recent technique called VQ-VAE which is capable of learning rundimentary phoneme-level language model from raw audio without any supervision.
http://dataconf.com.ua/speaker-page/dmytro-bielievtsov.php
https://www.youtube.com/watch?v=euYSAL-aKMI&list=PL5_LBM8-5sLjbRFUtXaUpg84gtJtyc4Pu&t=0s&index=9
This document summarizes Frank Nielsen's talk on divergence-based center clustering and their applications. Some key points:
- Center-based clustering aims to minimize an objective function that assigns data points to their closest cluster centers. This is an NP-hard problem when the number of dimensions and data points are greater than 1.
- Mixed divergences use dual centroids per cluster to define cluster assignments. Total Jensen divergences are proposed as a way to make divergences more robust by incorporating a conformal factor.
- For clustering when centroids do not have closed-form solutions, initialization methods like k-means++ can be used which randomly select initial seeds without computing centroids. Total Jensen k-means++
This document summarizes research on using elliptic curve cryptography based on imaginary quadratic orders. It shows that for elliptic curves over a finite field Fq, if q satisfies certain conditions, the elliptic curve discrete logarithm problem can be reduced to the discrete logarithm problem over the finite field Fp2. This allows the elliptic curve discrete logarithm problem to potentially be solved faster. It then provides examples of how to construct "weak curves" that satisfy the necessary conditions.
This document summarizes key concepts from a PhD dissertation on uncertainty in deep learning:
1) There are two types of uncertainties - epistemic uncertainty from lack of knowledge that decreases with more data, and aleatoric uncertainty from inherent noise that cannot be reduced. Deep learning models need to estimate both to provide predictive uncertainty.
2) Variational inference allows approximating intractable Bayesian posteriors by minimizing the KL divergence between an approximating distribution and the true posterior. Dropout can be seen as a Bayesian approximation where weights follow a Bernoulli distribution.
3) With dropout as a variational distribution, predictive uncertainty in regression is estimated from multiple stochastic forward passes, with aleatoric uncertainty from noise and epistem
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtureszukun
1. Gaussian mixtures are commonly used in computer vision and pattern recognition tasks like classification, segmentation, and probability density function estimation.
2. The document reviews Gaussian mixtures, which model a probability distribution as a weighted sum of Gaussian distributions. It discusses estimating Gaussian mixture models with the EM algorithm and techniques for model order selection like minimum description length and Gaussian deficiency.
3. Gaussian mixtures can model images and perform color-based segmentation. The EM algorithm is used to estimate the parameters of Gaussian mixtures by alternating between expectation and maximization steps.
This document discusses Bayesian inference on mixtures models. It covers several key topics:
1. Density approximation and consistency results for mixtures as a way to approximate unknown distributions.
2. The "scarcity phenomenon" where the posterior probabilities of most component allocations in mixture models are zero, concentrating on just a few high probability allocations.
3. Challenges with Bayesian inference for mixtures, including identifiability issues, label switching, and complex combinatorial calculations required to integrate over all possible component allocations.
On Convolution of Graph Signals and Deep Learning on Graph DomainsJean-Charles Vialatte
This document provides an outline and definitions for a thesis on convolution of graph signals and deep learning on graph domains. It discusses motivations, related work, definitions of graph signals and convolution, and different approaches to extending convolution operations to non-Euclidean graph domains. Specifically, it covers spectral approaches that define convolution in the graph spectral domain, vertex-domain approaches that define it as a sum over neighborhoods, and characterizes convolutional operators by their equivariance properties. It also discusses applications to deep learning on graphs and different notions of graph convolution.
This document discusses mixture models and approximations to computing model evidence. It contains:
1) An overview of mixtures of distributions and common priors used for mixtures.
2) Approximations to computing marginal likelihoods or model evidence using Chib's representation and Rao-Blackwellization. Permutations are used to address label switching issues.
3) Methods for more efficient sampling for computing model evidence, including iterative bridge sampling and dual importance sampling with approximations to reduce the number of permutations considered.
Sequential Monte Carlo is also briefly mentioned as an alternative approach.
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Frank Nielsen
This document discusses Voronoi diagrams in various geometries beyond Euclidean space. It begins by introducing ordinary Voronoi diagrams based on Euclidean distance. It then discusses Voronoi diagrams in non-Euclidean geometries like spherical and hyperbolic spaces. It also discusses Voronoi diagrams in Riemannian geometries defined by a metric tensor, as well as information geometries based on probability distributions. The document introduces Bregman divergences and shows how Voronoi diagrams can be defined using these divergences in dually flat spaces. It specifically discusses α-divergences and β-divergences as examples of representational Bregman divergences. It concludes by discussing centroids
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
Many biological systems exhibit heterogeneiety on a population level. This heterogeneity can be captured by describing the temporal evolution of the probability of an individual in the population to be in a certain state as partial differential equation. To tune parameters of such a partial differential equation to experimental data, a partial differential equation constrained optimisation problem has to be solved. Hence, for biological systems with a large number of states, a high-dimensional partial differential equation has to be solved. This can easily render the optimisation problem intractable, As there are no well-established, efficient integration schemes for high dimensional partial differential equations available. In this talk we will present techniques to translate the partial differential equation constrained optimization problem into a hierarchical, ordinary differential equation constrained optimization problem given a certain set of assumptions. We will present these assumptionas as well as the derivation of the hierarchical, ordinary differential equation constrained optimisation problem. Moreover we will present numerical schemes for the computation of the respective objective function and its gradient. Eventually we will also present numerical schemes to solve the constrained optimisation problem and apply these techniques to small and large scale biological applications for which experimental data is available.
Topic modeling with Poisson factorization is introduced. The generative model assumes words in documents are generated from topics modeled with Poisson distributions. Variational Bayesian inference is used to approximate the posterior. Update equations are derived for the variational parameters ω, representing topic assignments, α, the Dirichlet prior, and γ, the gamma prior over topic distributions. ω is updated proportionally to functions of α and γ. α is updated based on sums of ω. γ is updated based on sums of ω and the prior shape parameter.
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Frank Nielsen
The document defines total Jensen divergences, which are a generalization of total Bregman divergences. Total Jensen divergences incorporate a double-sided conformal factor that makes them invariant to rotations. They reduce to total Bregman divergences when distributions are close. The square root of the total Jensen-Shannon divergence is not a metric. Jensen centroids are not always robust. However, total Jensen k-means++ clustering does not require calculating centroids and provides approximation guarantees.
Similar to Toward Disentanglement through Understand ELBO (20)
Learning visual representation without human labelKai-Wen Zhao
Self supervised learning (SSL) is one of the most fast-growing research topic in recent years. SSL provides algorithm that directly learn visual representation from data itself rather than human manual labels. From theoretical point of view, SSL explores information theory & the nature of large scale dataset.
A new paper published by OpenAI discussing generalization in deep learning and provide an observation that how model & data complexity influence each other.
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
The global update Monte Carlo sampler can be discovered naturally by trained machine using policy gradient method on topologically constrained environment.
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
This slide reviews deep reinforcement learning, specially Q-Learning and its variants. We introduce Bellman operator and approximate it with deep neural network. Last but not least, we review the classical paper: DeepMind Atari Game beats human performance. Also, some tips of stabilizing DQN are included.
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
Review of the t-SNE algorithm which helps visualizing the high dimensional data on manifold by projecting them onto 2D or 3D space with metric preserving.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
2. Overview
1 Background Knowledge
Information Quantities
Rate Distortion Theory and Information Bottleneck
Variational Inference
2 Build up Frameworks for Disentangle Representations
3 Isolating Sources of Disentanglement
ELBO Surgery
Evaluate Disentanglement
Experiments
4 Conclusion
kv (Viscovery) ELBO February 17, 2019 2 / 53
3. Before we start ...
What is disentanglement?
Disentangled Representation = Fatorized + Interpretable
Reuse and generalize knowledge
Extrapolate beyond training data distribution
Questions will be answered in series of discussion
(Part I) Why VAE is the main framework to realize disentanglement?
[Chen, 2018]
(Part II) Why there is a trade-off between reconstruction and
disentanglement? [Alemi, 2018]
(Part II) Is disentanglement a task or principal? [Achille, 2018]
kv (Viscovery) ELBO February 17, 2019 3 / 53
5. Quick Review
Consider beta decay process, we observe N electrons
↑↑↓↓↑ ... ↑
Number of the possible states of N spins
N!
(pN)!((1 − p)N)!
∼
NN
(pN)pN((1 − p)N)(1−p)N
=
1
ppN(1 − p)(1−p)N
= 2NS
where S is called Shannon Entropy per spin
S = −p log p − (1 − p) log(1 − p)
Number of bits of information one gains in actually observe such state is
NS. [?]
kv (Viscovery) ELBO February 17, 2019 5 / 53
6. Quick Review
In general case:
N!
(p1N)p1N(p2N)p2N...(pkN)pk N
∼
NN
k
i=1(pi N)pi N
= 2NS
Such that
S = −
i
pi log pi
kv (Viscovery) ELBO February 17, 2019 6 / 53
7. Quick Review
We have a theory that predicts a probability distribution Q for the final
state, however, the correct probability distribution is P, then after
observing N decays, we will see outcome i approximately pi N times.
P =
N
i
qpi N
i
N!
j (pj N)!
We already calculated N!
j (pj N)! ∼ 2−N i pi log pi ,
so
P ∼ 2−N i pi (log pi −log qi )
The quantity, we called, relative entropy or Kullback-Liebler divergence
DKL(p||q) =
i
pi (log pi − log qi )
kv (Viscovery) ELBO February 17, 2019 7 / 53
8. Quick Review
Quantities:
Entropy: Sx = − x p(x) log p(x), information we don’t know.
Relative Entropy, KL-Divergence:
DKL(p(x)||q(x)) = x p(x) log p(x) − log q(x)
Mutual Information:
I(x; y) = Sx − Sx|y = Sy − Sy|x = Sx,y − Sx|y − Sy|x
I(x; y) = DKL p(x, y)||p(x)p(y)
Symmetric between x and y
Extreme Cases: Independence, Deterministic Relation
Relation:
Chain rule: p(x, y) = p(x|y)p(y)
Bayesian: p(y|x) = p(x|y)p(y)
p(x)
kv (Viscovery) ELBO February 17, 2019 8 / 53
11. Rate-Distortion Theory
What makes good encoding? Low rate, low distortion.
min
p(˜x)|p(x)
I(X; ˜X) s.t d(X, ˜X) < D
Theorem (Rate Distortion, Shannon and Kolmogorov)
Define the function as the minimum achievable rate under distortion
constraint D.
R(D) = min
p(˜x|x) s.t d(x,˜x)<D
I(X; ˜X)
Then an encoding that achieves this rate is
p(˜x|x) =
p(˜x)
Z(x, β)
e−βd(x,˜x)
kv (Viscovery) ELBO February 17, 2019 11 / 53
13. Information System
Sending a signal from Alice to Bob: X → ˜X, where Y the relevant
information about X.
kv (Viscovery) ELBO February 17, 2019 13 / 53
14. Information Bottleneck Theory
What makes good encoding? Low rate, high relevance.
min
p(˜x)|p(x)
I( ˜X; X) s.t I( ˜X, Y ) > L
Theorem (Information Bottleneck, Tishby, Pereira, and Bialek)
Define the function as the minimum achievable rate while preserving L bits
of mutual information.
R(L) = min
p(˜x|x) s.t I(˜x;y)≥L
I(X; ˜X)
Then an encoding that achieves this rate is
p(˜x|x) =
p(˜x)
Z(x, β)
e−βDKL[p(y|x)||p(y|˜x)]
kv (Viscovery) ELBO February 17, 2019 14 / 53
15. Comparison
What makes good encoding?: Low
Rate, Low Distortion
p(˜x|x) =
p(˜x)
Z(x, β)
e−βd(x,˜x)
What makes good code?: Low
Rate, High Relevance
p(˜x|x) =
p(˜x)
Z(x, β)
e−βDKL[p(y|x)||p(y|˜x)]
kv (Viscovery) ELBO February 17, 2019 15 / 53
16. Structure of the Solution
On the structure of solution
L[p(˜x|x)] = I(X; ˜X) − βI( ˜X; Y )
The Lagrangian multiplier operates as the trade-off parameter between
complexity of representation and preserved relevant information.
I( ˜X; Y ) is the measure of performance
I(X; ˜X) as the regularization term
kv (Viscovery) ELBO February 17, 2019 16 / 53
18. Inference under Posterior
X: observation, input data
Z: latent variable, representation, embedding
p(z|x)
posterior
=
likelihood of z
p(x|z)
prior
p(z)
p(x)
evidence
Due to p(x) is intractable, there are two parallel ways to solve
MCMC
Variational Inference
kv (Viscovery) ELBO February 17, 2019 18 / 53
19. Variational Inference
Propose a simpler, tractable distribution q(z) to approximate posterior
p(z|x)
DKL(q(z|x)||p(z|x)) = Eq(z|x) log(
q(z|x)p(x)
p(x, z)
)
= Eq(z|x) log
q(z|x)
p(x, z)
+ log p(x)
Swap the left and right hand-sides, we get
log p(x) = DKL(q(z|x)||p(z|x)) + Eq(z|x) log
p(x, z)
q(z|x)
kv (Viscovery) ELBO February 17, 2019 19 / 53
20. Reduce KL Divergence to ELBO
Due to the positivity of divergence
log p(x) = DKL(q(z|x)||p(z|x)) + Eq(z|x)[log p(x|z)] − DKL[q(z|x)||p(z)]
Evidence Lower Bound, ELBO (per sample)
log p(x) ≤ Eq(z|x)
p(xn, z)
q(z|xn)
ELBO
LELBO = Eq(z|x) log p(xn|z)
reconstruction
− DKL q(z|xn)||p(z)
regularization
kv (Viscovery) ELBO February 17, 2019 20 / 53
21. Implement ELBO using VAE
Figure: ELBO Structure in VAE
kv (Viscovery) ELBO February 17, 2019 21 / 53
22. Implement ELBO using VAE
Figure: Parameterization Trick for Back-propagation
kv (Viscovery) ELBO February 17, 2019 22 / 53
23. Build up Framework for Disentanglement
kv (Viscovery) ELBO February 17, 2019 23 / 53
24. Build up β-VAE framework
Suppose the data generation process are affected by two type of factors
p(x|z) ≈ p(x|v, w)
where v is conditionally independent; w is conditionally dependent factor.
Maximization data likelihood of observed data over the whole latent
distribution. Also, the aim of disentanglement is to ensure the inferred
latent capture the generative factors v in an independent manner.
max
θ
Epθ(z)[pθ(x|z)] s.t DKL(q(z|x)||p(z)) <
kv (Viscovery) ELBO February 17, 2019 24 / 53
25. β-VAE
The objective function of β-VAE[Higgins, 2017] goes to
L = Eq(z|x)[log p(x|z)] − β DKL[q(z|x)||p(z)]
Understanding effect of β
Reconstruction quality is the poor indicator of learnt disentanglement
Good disentanglement often lead to blurry reconstruction
Disentangled representation lacks capability of latent channel
kv (Viscovery) ELBO February 17, 2019 25 / 53
28. ELBO Surgery
Conjecture two criteria may be important
MI between data variable and latent variable
Independence of latent variable
kv (Viscovery) ELBO February 17, 2019 28 / 53
29. ELBO Surgery
To further understand ELBO, we use the average encoding distribution as
the expression. [Hoffman, 2017] Identify each training example with
unique index {1, 2, 3, ...N}
Define q(z|n) = q(z|xn), q(z, n) = p(n)q(z|n) = 1
N q(z|n) where
p(n) = 1
N . The marginal distribution q(z) = Ep(n)[q(z|n)] and
q(z) = n q(z|n)p(n).
kv (Viscovery) ELBO February 17, 2019 29 / 53
33. ELBO TC-Decomposition
Modified ELBO =
Reconstruction
Eq(z,n)[log p(n|z)] +
− α Iq(z; n)
Index-Code MI
−β DKL[q(z)||Πj q(zj )]
Total Correlation
−γ
j
DKL[q(zj )||p(zj )]
Dim-wise KL
Index-Code MI: DKL[q(z, n)||q(z)p(n))] = Iq(z; n)
Drop the penalty to improve disentanglement
Keep this term to improve disentanglement according to IB
Dataset dependent
Total Correlation: DKL[q(z)||Πj q(zj )]
Heavier penalty on this term induces disentanglement
TC forces model to find statistically independent factors
Dim-wise KL: j DKL[q(zj )||p(zj )]
Prevent latent space deviating from corresponding prior
kv (Viscovery) ELBO February 17, 2019 33 / 53
34. Minibatch Sampling: Stochastic Estimation of log q(z)
The evaluation of density q(z) requires sampling the whole dataset.
Random chosen n will lead to q(z|n) close to zero. Inspired by importance
sampling, for given batch of samples {n1, n2, ...nM}, we can use the
estimator re-utilize the batch
Eq(z|x)[log q(z)] = Eq(z|x) log En ∼p(n )[q(z|n )]
≈
1
M
M
i=1
log
1
MN
M
j=1
q(z(ni )|nj )
where z(ni ) is a sample from q(z|ni ). Treat q(z) as mixture of
distribution, where the data index n indicates the mixture components.
kv (Viscovery) ELBO February 17, 2019 34 / 53
35. Special case: β-TCVAE
With MBS, it is available to assign different weights (α, β, γ) to terms
Lβ−TC = Eq(z|n)p(n)[log p(n|z)]
− αIq(z; n) − β DKL(q(z)||
j
q(zj )) − γ
j
DKL(q(zj )||p(zj ))
Proposed β-TCVAE uses α = γ = 1, and set β as hyper-parameter.
kv (Viscovery) ELBO February 17, 2019 35 / 53
38. Evaluate Disentanglement: Mutual Information Gap
Suppose we have some groundtruth factor {vk}K
k=1 Define joint
distribution q(zj , vk) = N
n=1 p(vk)p(n|vk)q(zj |n)
In(zj ; vk) = Eq(zj ,vk ) log q(zj |n)p(n|vk) + S(zj )
kv (Viscovery) ELBO February 17, 2019 38 / 53
39. Feature Factor and Latent Variable
Figure: Correlation between factors and latent space
kv (Viscovery) ELBO February 17, 2019 39 / 53
40. Evaluate Disentanglement: Mutual Information Gap
Mutual Information Gap
MIG =
1
K
K
k=1
1
S(vk)
I(zj(k) ; vk) − maxj=j(k) I(zj ; vk)
where j(k)
= arg max
j
I(zj ; vk)
where 0 ≤ I(zj ; vk) = S(vk) − S(vk|zj ) ≤ S(vk) naturally serve as the
normalization condition. Benefits of this metric
Axis-alignment
Compactness of representation
kv (Viscovery) ELBO February 17, 2019 40 / 53
52. Conclusion
In this paper
Regularization term in ELBO contains various factors which naturally
encourage disentangling
Total correlation (independence of latent variable) is the major factor
force machine to learn statistically independent reprensentation
New information-theoretic metric quantity
kv (Viscovery) ELBO February 17, 2019 52 / 53
53. References
Chen, T.Q., and etc,
Isolating Sources of Disentanglement in Variational Autoencoders. NIPS. 2018
Tishby, Naftali, and etc.
The information bottleneck method. physics/0004057 (2000).
Hoffman, Matthew D., and Matthew J. Johnson.
Elbo surgery: yet another way to carve up the variational evidence lower bound.
NIPS. 2016.
Higgins and etc.
beta-vae: Learning basic visual concepts with a constrained variational framework.
ICLR 2017
Burgess and ect.
Understanding disentangling in beta-VAE. arXiv preprint arXiv:1804.03599.
Achille, Alessandro, and etc.
Emergence of invariance and disentanglement in deep representations.
Alemi, Alexander, et al.
Fixing a broken ELBO. International Conference on Machine Learning. 2018.
kv (Viscovery) ELBO February 17, 2019 53 / 53