발표자: 박태성 (UC Berkeley 박사과정)
발표일: 2017.6.
Taesung Park is a Ph.D. student at UC Berkeley in AI and computer vision, advised by Prof. Alexei Efros.
His research interest lies between computer vision and computational photography, such as generating realistic images or enhancing photo qualities. He received B.S. in mathematics and M.S. in computer science from Stanford University.
개요:
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
However, for many tasks, paired training data will not be available.
We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.
Our goal is to learn a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa).
Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc.
Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
This presentation discusses multimodal deep learning and unsupervised feature learning from audio and video speech data. It introduces the McGurk effect where audio-visual speech is integrated. An autoencoder model is used to learn shared representations from audio and video input that outperform single modality learning on lip-reading tasks. On the AVLetters dataset, the cross-modality features achieved a classification accuracy of 64.4%, and on the CUAVE dataset, an accuracy of 68.7%.
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...宏毅 李
The document provides an overview of generative adversarial networks (GANs) and their applications to signal processing and natural language processing. It begins with a general introduction to GANs, including how they work, common issues, and potential solutions. Conditional GANs and unsupervised conditional GANs are also discussed. The document then outlines applications of GANs to signal processing and natural language processing.
This document summarizes and compares two popular Python libraries for graph neural networks - Spektral and PyTorch Geometric. It begins by providing an overview of the basic functionality and architecture of each library. It then discusses how each library handles data loading and mini-batching of graph data. The document reviews several common message passing layer types implemented in both libraries. It provides an example comparison of using each library for a node classification task on the Cora dataset. Finally, it discusses a graph classification comparison in PyTorch Geometric using different message passing and pooling layers on the IMDB-binary dataset.
발표자: 박태성 (UC Berkeley 박사과정)
발표일: 2017.6.
Taesung Park is a Ph.D. student at UC Berkeley in AI and computer vision, advised by Prof. Alexei Efros.
His research interest lies between computer vision and computational photography, such as generating realistic images or enhancing photo qualities. He received B.S. in mathematics and M.S. in computer science from Stanford University.
개요:
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
However, for many tasks, paired training data will not be available.
We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.
Our goal is to learn a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa).
Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc.
Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
This presentation discusses multimodal deep learning and unsupervised feature learning from audio and video speech data. It introduces the McGurk effect where audio-visual speech is integrated. An autoencoder model is used to learn shared representations from audio and video input that outperform single modality learning on lip-reading tasks. On the AVLetters dataset, the cross-modality features achieved a classification accuracy of 64.4%, and on the CUAVE dataset, an accuracy of 68.7%.
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...宏毅 李
The document provides an overview of generative adversarial networks (GANs) and their applications to signal processing and natural language processing. It begins with a general introduction to GANs, including how they work, common issues, and potential solutions. Conditional GANs and unsupervised conditional GANs are also discussed. The document then outlines applications of GANs to signal processing and natural language processing.
This document summarizes and compares two popular Python libraries for graph neural networks - Spektral and PyTorch Geometric. It begins by providing an overview of the basic functionality and architecture of each library. It then discusses how each library handles data loading and mini-batching of graph data. The document reviews several common message passing layer types implemented in both libraries. It provides an example comparison of using each library for a node classification task on the Cora dataset. Finally, it discusses a graph classification comparison in PyTorch Geometric using different message passing and pooling layers on the IMDB-binary dataset.
Recent Progress on Single-Image Super-ResolutionHiroto Honda
This document summarizes recent progress in single image super resolution (SISR) techniques using deep convolutional neural networks. It discusses early networks like SRCNN and VDSR, as well as more advanced models such as SRResNet, SRGAN, and EDSR that utilize residual blocks and perceptual loss functions. The document notes that while SISR accuracy has improved significantly in recent years, achieving both high PSNR and natural perceptual quality remains challenging due to a distortion-perception tradeoff. It concludes that the application determines whether more accurate or plausible output is preferred.
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1Masashi Shibata
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
Graph neural networks are a type of neural network that operates on graph structured data. They work by passing messages between nodes in a graph and aggregating information from neighboring nodes. Common graph neural network models include graph convolutional networks (GCNs) which use convolutional filters on graphs, and graph attention networks (GATs) which use attention mechanisms. GraphSAGE is another model that learns node representations by sampling and aggregating features from a node's local neighborhood. Graph neural networks have applications in tasks like node classification, link prediction, and graph classification and can be used to model many real-world problems that can be represented as graphs.
This document discusses Wasserstein GAN (WGAN) and how it improves upon traditional GANs. WGAN uses the Wasserstein distance as its loss function instead of the Jensen-Shannon divergence used in traditional GANs. This allows for more stable training with less mode collapse. The Wasserstein distance is continuous, unlike other distance metrics, which helps gradients flow better during training. However, the Wasserstein distance is computationally intractable, so WGAN uses weight clipping to make the critic Lipchitz continuous and allow for its estimation. Overall, WGAN provides more meaningful learning curves and hyperparameters are easier to tune compared to traditional GANs.
This document provides an overview of VAE-type deep generative models, especially RNNs combined with VAEs. It begins with notations and abbreviations used. The agenda then covers the mathematical formulation of generative models, the Variational Autoencoder (VAE), variants of VAE that combine it with RNNs (VRAE, VRNN, DRAW), a Chainer implementation of Convolutional DRAW, other related models (Inverse DRAW, VAE+GAN), and concludes with challenges of VAE-like generative models.
Super tickets in pre trained language modelsHyunKyu Jeon
This document discusses finding "super tickets" in pre-trained language models through pruning attention heads and feedforward layers. It shows that lightly pruning BERT models can improve generalization without degrading accuracy (phase transition phenomenon). The authors propose a new pruning approach for multi-task fine-tuning of language models called "ticket sharing" where pruned weights are shared across tasks. Experiments on GLUE benchmarks show their proposed super ticket and ticket sharing methods consistently outperform unpruned baselines, with more significant gains on smaller tasks. Analysis indicates pruning reduces model variance and some tasks share more task-specific knowledge than others.
Recent Progress on Single-Image Super-ResolutionHiroto Honda
This document summarizes recent progress in single image super resolution (SISR) techniques using deep convolutional neural networks. It discusses early networks like SRCNN and VDSR, as well as more advanced models such as SRResNet, SRGAN, and EDSR that utilize residual blocks and perceptual loss functions. The document notes that while SISR accuracy has improved significantly in recent years, achieving both high PSNR and natural perceptual quality remains challenging due to a distortion-perception tradeoff. It concludes that the application determines whether more accurate or plausible output is preferred.
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1Masashi Shibata
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
Graph neural networks are a type of neural network that operates on graph structured data. They work by passing messages between nodes in a graph and aggregating information from neighboring nodes. Common graph neural network models include graph convolutional networks (GCNs) which use convolutional filters on graphs, and graph attention networks (GATs) which use attention mechanisms. GraphSAGE is another model that learns node representations by sampling and aggregating features from a node's local neighborhood. Graph neural networks have applications in tasks like node classification, link prediction, and graph classification and can be used to model many real-world problems that can be represented as graphs.
This document discusses Wasserstein GAN (WGAN) and how it improves upon traditional GANs. WGAN uses the Wasserstein distance as its loss function instead of the Jensen-Shannon divergence used in traditional GANs. This allows for more stable training with less mode collapse. The Wasserstein distance is continuous, unlike other distance metrics, which helps gradients flow better during training. However, the Wasserstein distance is computationally intractable, so WGAN uses weight clipping to make the critic Lipchitz continuous and allow for its estimation. Overall, WGAN provides more meaningful learning curves and hyperparameters are easier to tune compared to traditional GANs.
This document provides an overview of VAE-type deep generative models, especially RNNs combined with VAEs. It begins with notations and abbreviations used. The agenda then covers the mathematical formulation of generative models, the Variational Autoencoder (VAE), variants of VAE that combine it with RNNs (VRAE, VRNN, DRAW), a Chainer implementation of Convolutional DRAW, other related models (Inverse DRAW, VAE+GAN), and concludes with challenges of VAE-like generative models.
Super tickets in pre trained language modelsHyunKyu Jeon
This document discusses finding "super tickets" in pre-trained language models through pruning attention heads and feedforward layers. It shows that lightly pruning BERT models can improve generalization without degrading accuracy (phase transition phenomenon). The authors propose a new pruning approach for multi-task fine-tuning of language models called "ticket sharing" where pruned weights are shared across tasks. Experiments on GLUE benchmarks show their proposed super ticket and ticket sharing methods consistently outperform unpruned baselines, with more significant gains on smaller tasks. Analysis indicates pruning reduces model variance and some tasks share more task-specific knowledge than others.
Synthesizer rethinking self-attention for transformer models HyunKyu Jeon
The document expresses gratitude to the reader for taking the time to listen. It does not provide any other details, context, or information beyond thanking the reader for listening. The summary captures the essence of the document in a single concise sentence.
This document summarizes Meta Back-Translation, a method for improving back-translation by training the backward model to directly optimize the performance of the forward model during training. The key points are:
1. Back-translation typically relies on a fixed backward model, which can lead the forward model to overfit to its outputs. Meta back-translation instead continually trains the backward model to generate pseudo-parallel data that improves the forward model.
2. Experiments show Meta back-translation generates translations with fewer pathological outputs like greatly differing in length from references. It also avoids both overfitting and underfitting of the forward model by flexibly controlling the diversity of pseudo-parallel data.
3. Related work leverages mon
Maxmin qlearning controlling the estimation bias of qlearningHyunKyu Jeon
This document summarizes the Maxmin Q-learning paper published at ICLR 2020. Maxmin Q-learning aims to address the overestimation bias of Q-learning and underestimation bias of Double Q-learning by maintaining multiple Q-functions and using the minimum value across them for the target in the Q-learning update. It defines the action selection and target construction for the update based on taking the maximum over the minimum Q-value for each action. The algorithm initializes multiple Q-functions, selects a random subset to update using the maxmin target constructed from the minimum Q-values. This approach reduces the biases seen in prior methods.