SlideShare a Scribd company logo
1 of 29
Download to read offline
SEGAN: Speech Enhancement Generative
Adversarial Network
Santiago Pascual, Antonio Bonafonte, Joan Serra
TALP-UPC, BarcelonaTech & Telefonica Research Barcelona
santi.pascual@upc.edu
[arxiv] [github] [samples]
The GAN Epidemic
Figure credit: https://github.com/hindupuravinash/the-gan-zoo
3
Outline
1. Introduction
2. Generative Adversarial Networks
3. Speech Enhancement GAN
4. Experimental Setup
5. Results
6. Conclusions
4
Introduction
● Speech Enhancement: improve the intelligibility and quality of speech contaminated
by noise.
● Classic methods: Spectral subtraction, Wiener filtering, Statistical-model based
methods, Sub-space algorithms...
● Neural networks have been also applied to speech enhancement since the 80s!
● Most of the current systems are based on the short-time Fourier
analysis/synthesis framework: spectral domain features, signal assumptions.
● Significant improvements of speech quality are possible, especially when a clean
phase spectrum is known.
● Deep learning makes us wonder → What about using waveforms?
5
Generative Adversarial Networks (GAN)
We have a pair of networks, Generator (G) and Discriminator (D):
● They “fight” against each other during training→ Adversarial Training
● G mission: make its pdf, Pmodel, as much similar as possible to our training set
distribution Pdata → Try to make predictions so realistic that D can’t distinguish
● D mission: distinguish between G samples and real samples
Adversarial Training (conceptual)
Generator
Real world
samples
Discriminator
Real
Loss
Latentrandomvariable
Sample
Sample
Fake
6
z
Adversarial Training (Goodfellow et al. 2014)
We have networks G and D, and training set with pdf Pdata. Notation:
● θ(G), θ(D) (Parameters of model G and D respectively)
● x ~ Pdata (M-dim sample from training data pdf)
● z ~ N(0, I) (sample from prior pdf, e.g. N-dim normal)
● G(z) = ẍ ~ Pg (M-dim sample from G network)
D network receives x or ẍ inputs → decides whether input is real or fake. It is optimized to learn: x is
real (1), ẍ is fake (0) (binary classifier).
G network maps sample z to G(z) = ẍ → it is optimized to maximize D mistakes.
NIPS 2016 Tutorial: Generative Adversarial Networks. Ian Goodfellow
Adversarial Training (batch update)
● Pick a sample x from training set
● Show x to D and update weights to
output 1 (real)
Adversarial Training (batch update)
● G maps sample z to ẍ
● show ẍ and update weights to output 0 (fake)
Adversarial Training (batch update)
● Freeze D weights
● Update G weights to make D output 1 (just G weights!)
● Unfreeze D Weights and repeat
Adversarial Training analogy
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D)
has to detect whether money is real or fake.
Key Idea: as D is trained to detect fraud, its parameters learn discriminative
features of “what is real/fake”. As backprop goes through D to G there happens to
be information leaking about the requirements for bank notes to look real. This
makes G perform small corrections by little steps to get closer and closer to what a
real sample would be.
● Caveat: this means GANs are not suitable for discrete tokens predictions, like
words, because in that discrete space there is no “small change” criteria to get
to a neighbour word (but can work in a word embedding space for example)
Adversarial Training analogy
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D)
has to detect whether money is real or fake.
100
100
It’s not even
greenFAKE
Adversarial Training analogy
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D)
has to detect whether money is real or fake.
100
100
There is no
watermarkFAKE
Adversarial Training analogy
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D)
has to detect whether money is real or fake.
100
100
Watermark
should be
roundedFAKE
Adversarial Training analogy
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D)
has to detect whether money is real or fake.
?
After enough iterations, and if the counterfeiter is good enough (in terms of G
network it means “has enough parameters”), the police should be confused.
Real?
Conditioned GANs
GANs can be conditioned on other info extra to z: text, labels, speech, etc..
z might capture random characteristics of the data, variabilities of plausible
futures, whilst c would condition the deterministic parts
For details on ways to condition GANs:
Ways of Conditioning Generative
Adversarial Networks (Wack et al.)
Least Squares GAN
Main idea: shift to a loss function that provides smooth and non-saturating
gradients in D
● Because of sigmoid saturation in binary classification loss, G gets no info
when D gets to label true examples → vanishing gradients make G no learn
● Least squares loss improves learning with notion of distance of Pmodel to
Pdata:
Least Squares Generative Adversarial
Networks, Mao et al. 2016
Speech Enhancement GAN
Requirements:
● End-to-end: no assumptions or discarding certain info from data
○ wav2wav.
● One-shot generation: not slow recursive operations as in WaveNet
○ fully convolutional structure.
● Many noise types and speakers learned with same shared parameters
○ generalize in those dimensions.
Underlying conv structure in the system
● Conv1D filters
● Virtual Batch Normalization: normalize layer responses with statistics
from fixed batch (reference) + current batch → less intra dependent
statistics for GAN instability
● LeakyReLU/ParametricReLU:
○ α fixed (0.3) or learnable
Speech Enhancement GAN
Two stages in Generator:
1. Encoder (Downconv): Project noisy signal
into a deterministic representation c and
concatenate to latent variable z ~ N(0, I)
2. Decoder (Deconv): Interpolate the
intermediate hidden features w/ learnable
params. until re-generation of clean speech.
Speech Enhancement GAN
G architecture:
● 22 1D conv filters with kernel width = 31
● Increase feature maps in encoder from 32 to 1024, and all way back in the decoder to
one final waveform bounded [-1, 1].
● Use of skip-connections to shuttle the low level features to highest layer (avoiding
bottleneck for these).
● Use of PReLU activations
D architecture:
● Same as G encoder except for having 2 input channels (Noisy, Real/Fake)
● LeakyReLUs with alpha = 0.3
● Virtual Batch Normalizations for speed up convergence and stability
● Classification output unit
Experimental setup: database
● Publicly available Edinburgh dataset: http://datashare.is.ed.ac.uk/handle/10283/1942
● Amount of data and types of speakers and noises fit our purposes
● Data re-sampled at 16kHz
● Training set:
○ 40 noise conditions: 10 types of noise w/ 4 SNR conditions {15, 10, 5, 0} dB
○ 10 sentences in each condition for every speaker
○ 28 speakers
● Test set (None seen during training. No condition nor speaker):
○ 20 noise conditions: 5 types of noise w/ 4 SNR conditions each {17.5, 12.5, 7.5, 2.5} dB
○ 20 sentences for each condition per speaker
○ 2 speakers (male and female)
Experimental setup: training
● Show pairs of signals to “learn” a reconstruction loss.
● Use of L1 regularization to guide the GAN training.
Experimental setup: training
● Training set was chunked with 50% overlaps → generated canvas of 1 second each
(16384 samples).
● Batch size of 400 samples
● Training for 86 epochs
● Distributed training among up to 4 GPUs → ~ 18h total time
● Very low learning rates: 0.0002
● RMSprop optimizer
● L1 regularization weight: 100
● Pre-emphasis and de-emphasis with factor 0.95 helped getting rid of high freq. artifacts!
Final G loss: LSGAN
Adversarial + weighted L1
regularizatoin
Experimental setup: test
Objective evaluation:
● PESQ: Perceptual Evaluation of Speech Quality [-0.5, 4.5]: designed for
telephonic compression assessment
● COVL: MOS prediction of the overall effect [1, 5]
○ CSIG: Mean opinion score (MOS) prediction of the signaldistortion attending only to the
speech signal [1, 5]
○ CBAK: MOS prediction of the intrusiveness of background noise [1, 5]
● SSNR: Segmental SNR [0, inf)
Experimental setup: test
Subjective evaluation (perceptual test):
● 20 sentences per utterance. Each utterance randomly shown for the different
systems. Subjects can listen/compare as many times as desired.
● Cherry picking meaningful types of noise for both speakers (we had no tags,
so by hand).
● 16 subjects to rate from 1 (bad) to 5 (excellent) a trade-off:
○ How much noise is removed
○ How intact remains the signal to the enhancement process
Results: objective
Worst PESQ than the baseline, but outperforming the other metrics. However
CSIG/CBAK/COVL are more correlated to enhancement than PESQ. Quite higher
SSNR → SEGAN gets to remove much more noise.
Results: subjective
Conclusions
● And end-to-end speech enhancement method has been implemented within
the GAN framework.
● It works as a fully conv encoder-decoder structure, making it faster than
recursive solutions.
● The results show its effectiveness, but there is an improvement margin.
● Further development in architecture design is now the work in progress.
● Next stage: voice conversion by adaptation of this architecture.

More Related Content

What's hot

Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationJason Anderson
 
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPNaoaki Okazaki
 
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門Keiichiro Ono
 
Chapter7 回帰分析の悩みどころ
Chapter7 回帰分析の悩みどころChapter7 回帰分析の悩みどころ
Chapter7 回帰分析の悩みどころitoyan110
 
敵対的学習による統合型ソースフィルタネットワーク
敵対的学習による統合型ソースフィルタネットワーク敵対的学習による統合型ソースフィルタネットワーク
敵対的学習による統合型ソースフィルタネットワークNU_I_TODALAB
 
【論文紹介】How Powerful are Graph Neural Networks?
【論文紹介】How Powerful are Graph Neural Networks?【論文紹介】How Powerful are Graph Neural Networks?
【論文紹介】How Powerful are Graph Neural Networks?Masanao Ochi
 
[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text GenerationDeep Learning JP
 
20180118 一般化線形モデル(glm)
20180118 一般化線形モデル(glm)20180118 一般化線形モデル(glm)
20180118 一般化線形モデル(glm)Masakazu Shinoda
 
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズム
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズムDNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズム
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズムShinnosuke Takamichi
 
The review of 'Explaining nonlinear classification decisions with deep Taylor...
The review of 'Explaining nonlinear classification decisions with deep Taylor...The review of 'Explaining nonlinear classification decisions with deep Taylor...
The review of 'Explaining nonlinear classification decisions with deep Taylor...tetsuo ishigaki
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
1 4.回帰分析と分散分析
1 4.回帰分析と分散分析1 4.回帰分析と分散分析
1 4.回帰分析と分散分析logics-of-blue
 
[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain RandomizationDeep Learning JP
 
因果探索: 基本から最近の発展までを概説
因果探索: 基本から最近の発展までを概説因果探索: 基本から最近の発展までを概説
因果探索: 基本から最近の発展までを概説Shiga University, RIKEN
 
2 4.devianceと尤度比検定
2 4.devianceと尤度比検定2 4.devianceと尤度比検定
2 4.devianceと尤度比検定logics-of-blue
 
AIによるアニメ生成の挑戦
AIによるアニメ生成の挑戦AIによるアニメ生成の挑戦
AIによるアニメ生成の挑戦Koichi Hamada
 
[DL Hacks]Self-Attention Generative Adversarial Networks
[DL Hacks]Self-Attention Generative Adversarial Networks[DL Hacks]Self-Attention Generative Adversarial Networks
[DL Hacks]Self-Attention Generative Adversarial NetworksDeep Learning JP
 
質的変数の相関・因子分析
質的変数の相関・因子分析質的変数の相関・因子分析
質的変数の相関・因子分析Mitsuo Shimohata
 

What's hot (20)

Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLP
 
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
 
Chapter7 回帰分析の悩みどころ
Chapter7 回帰分析の悩みどころChapter7 回帰分析の悩みどころ
Chapter7 回帰分析の悩みどころ
 
敵対的学習による統合型ソースフィルタネットワーク
敵対的学習による統合型ソースフィルタネットワーク敵対的学習による統合型ソースフィルタネットワーク
敵対的学習による統合型ソースフィルタネットワーク
 
【論文紹介】How Powerful are Graph Neural Networks?
【論文紹介】How Powerful are Graph Neural Networks?【論文紹介】How Powerful are Graph Neural Networks?
【論文紹介】How Powerful are Graph Neural Networks?
 
[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation
 
20180118 一般化線形モデル(glm)
20180118 一般化線形モデル(glm)20180118 一般化線形モデル(glm)
20180118 一般化線形モデル(glm)
 
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズム
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズムDNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズム
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズム
 
The review of 'Explaining nonlinear classification decisions with deep Taylor...
The review of 'Explaining nonlinear classification decisions with deep Taylor...The review of 'Explaining nonlinear classification decisions with deep Taylor...
The review of 'Explaining nonlinear classification decisions with deep Taylor...
 
一般化線形モデル (GLM) & 一般化加法モデル(GAM)
一般化線形モデル (GLM) & 一般化加法モデル(GAM) 一般化線形モデル (GLM) & 一般化加法モデル(GAM)
一般化線形モデル (GLM) & 一般化加法モデル(GAM)
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
1 4.回帰分析と分散分析
1 4.回帰分析と分散分析1 4.回帰分析と分散分析
1 4.回帰分析と分散分析
 
[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization
 
因果探索: 基本から最近の発展までを概説
因果探索: 基本から最近の発展までを概説因果探索: 基本から最近の発展までを概説
因果探索: 基本から最近の発展までを概説
 
2 4.devianceと尤度比検定
2 4.devianceと尤度比検定2 4.devianceと尤度比検定
2 4.devianceと尤度比検定
 
AIによるアニメ生成の挑戦
AIによるアニメ生成の挑戦AIによるアニメ生成の挑戦
AIによるアニメ生成の挑戦
 
[DL Hacks]Self-Attention Generative Adversarial Networks
[DL Hacks]Self-Attention Generative Adversarial Networks[DL Hacks]Self-Attention Generative Adversarial Networks
[DL Hacks]Self-Attention Generative Adversarial Networks
 
質的変数の相関・因子分析
質的変数の相関・因子分析質的変数の相関・因子分析
質的変数の相関・因子分析
 

Similar to SEGAN Speech Enhancement GAN Title

Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...Universitat Politècnica de Catalunya
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksZak Jost
 
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...Universitat Politècnica de Catalunya
 
Audio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdfAudio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdfssuser849b73
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...AI Frontiers
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022Kwanghee Choi
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processinganeetaanu
 
Adversarial_Examples_in_Audio_and_Text.pptx
Adversarial_Examples_in_Audio_and_Text.pptxAdversarial_Examples_in_Audio_and_Text.pptx
Adversarial_Examples_in_Audio_and_Text.pptxujjawalchaurasia1
 
Generative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksGenerative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksDenis Dus
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money LaunderingJim Dowling
 
Google and SRI talk September 2016
Google and SRI talk September 2016Google and SRI talk September 2016
Google and SRI talk September 2016Hagai Aronowitz
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsbutest
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Lattice Cryptography
Lattice CryptographyLattice Cryptography
Lattice CryptographyPriyanka Aash
 

Similar to SEGAN Speech Enhancement GAN Title (20)

Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
 
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
 
Audio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdfAudio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdf
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
 
12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processing
 
Real Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform DomainReal Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform Domain
 
Adversarial_Examples_in_Audio_and_Text.pptx
Adversarial_Examples_in_Audio_and_Text.pptxAdversarial_Examples_in_Audio_and_Text.pptx
Adversarial_Examples_in_Audio_and_Text.pptx
 
Generative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksGenerative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural Networks
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Google and SRI talk September 2016
Google and SRI talk September 2016Google and SRI talk September 2016
Google and SRI talk September 2016
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Generating Creative Works with AI
Generating Creative Works with AIGenerating Creative Works with AI
Generating Creative Works with AI
 
Lattice Cryptography
Lattice CryptographyLattice Cryptography
Lattice Cryptography
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

SEGAN Speech Enhancement GAN Title

  • 1. SEGAN: Speech Enhancement Generative Adversarial Network Santiago Pascual, Antonio Bonafonte, Joan Serra TALP-UPC, BarcelonaTech & Telefonica Research Barcelona santi.pascual@upc.edu [arxiv] [github] [samples]
  • 2. The GAN Epidemic Figure credit: https://github.com/hindupuravinash/the-gan-zoo
  • 3. 3 Outline 1. Introduction 2. Generative Adversarial Networks 3. Speech Enhancement GAN 4. Experimental Setup 5. Results 6. Conclusions
  • 4. 4 Introduction ● Speech Enhancement: improve the intelligibility and quality of speech contaminated by noise. ● Classic methods: Spectral subtraction, Wiener filtering, Statistical-model based methods, Sub-space algorithms... ● Neural networks have been also applied to speech enhancement since the 80s! ● Most of the current systems are based on the short-time Fourier analysis/synthesis framework: spectral domain features, signal assumptions. ● Significant improvements of speech quality are possible, especially when a clean phase spectrum is known. ● Deep learning makes us wonder → What about using waveforms?
  • 5. 5 Generative Adversarial Networks (GAN) We have a pair of networks, Generator (G) and Discriminator (D): ● They “fight” against each other during training→ Adversarial Training ● G mission: make its pdf, Pmodel, as much similar as possible to our training set distribution Pdata → Try to make predictions so realistic that D can’t distinguish ● D mission: distinguish between G samples and real samples
  • 6. Adversarial Training (conceptual) Generator Real world samples Discriminator Real Loss Latentrandomvariable Sample Sample Fake 6 z
  • 7. Adversarial Training (Goodfellow et al. 2014) We have networks G and D, and training set with pdf Pdata. Notation: ● θ(G), θ(D) (Parameters of model G and D respectively) ● x ~ Pdata (M-dim sample from training data pdf) ● z ~ N(0, I) (sample from prior pdf, e.g. N-dim normal) ● G(z) = ẍ ~ Pg (M-dim sample from G network) D network receives x or ẍ inputs → decides whether input is real or fake. It is optimized to learn: x is real (1), ẍ is fake (0) (binary classifier). G network maps sample z to G(z) = ẍ → it is optimized to maximize D mistakes. NIPS 2016 Tutorial: Generative Adversarial Networks. Ian Goodfellow
  • 8. Adversarial Training (batch update) ● Pick a sample x from training set ● Show x to D and update weights to output 1 (real)
  • 9. Adversarial Training (batch update) ● G maps sample z to ẍ ● show ẍ and update weights to output 0 (fake)
  • 10. Adversarial Training (batch update) ● Freeze D weights ● Update G weights to make D output 1 (just G weights!) ● Unfreeze D Weights and repeat
  • 11. Adversarial Training analogy Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. Key Idea: as D is trained to detect fraud, its parameters learn discriminative features of “what is real/fake”. As backprop goes through D to G there happens to be information leaking about the requirements for bank notes to look real. This makes G perform small corrections by little steps to get closer and closer to what a real sample would be. ● Caveat: this means GANs are not suitable for discrete tokens predictions, like words, because in that discrete space there is no “small change” criteria to get to a neighbour word (but can work in a word embedding space for example)
  • 12. Adversarial Training analogy Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 It’s not even greenFAKE
  • 13. Adversarial Training analogy Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 There is no watermarkFAKE
  • 14. Adversarial Training analogy Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 Watermark should be roundedFAKE
  • 15. Adversarial Training analogy Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. ? After enough iterations, and if the counterfeiter is good enough (in terms of G network it means “has enough parameters”), the police should be confused. Real?
  • 16. Conditioned GANs GANs can be conditioned on other info extra to z: text, labels, speech, etc.. z might capture random characteristics of the data, variabilities of plausible futures, whilst c would condition the deterministic parts For details on ways to condition GANs: Ways of Conditioning Generative Adversarial Networks (Wack et al.)
  • 17. Least Squares GAN Main idea: shift to a loss function that provides smooth and non-saturating gradients in D ● Because of sigmoid saturation in binary classification loss, G gets no info when D gets to label true examples → vanishing gradients make G no learn ● Least squares loss improves learning with notion of distance of Pmodel to Pdata: Least Squares Generative Adversarial Networks, Mao et al. 2016
  • 18. Speech Enhancement GAN Requirements: ● End-to-end: no assumptions or discarding certain info from data ○ wav2wav. ● One-shot generation: not slow recursive operations as in WaveNet ○ fully convolutional structure. ● Many noise types and speakers learned with same shared parameters ○ generalize in those dimensions.
  • 19. Underlying conv structure in the system ● Conv1D filters ● Virtual Batch Normalization: normalize layer responses with statistics from fixed batch (reference) + current batch → less intra dependent statistics for GAN instability ● LeakyReLU/ParametricReLU: ○ α fixed (0.3) or learnable
  • 20. Speech Enhancement GAN Two stages in Generator: 1. Encoder (Downconv): Project noisy signal into a deterministic representation c and concatenate to latent variable z ~ N(0, I) 2. Decoder (Deconv): Interpolate the intermediate hidden features w/ learnable params. until re-generation of clean speech.
  • 21. Speech Enhancement GAN G architecture: ● 22 1D conv filters with kernel width = 31 ● Increase feature maps in encoder from 32 to 1024, and all way back in the decoder to one final waveform bounded [-1, 1]. ● Use of skip-connections to shuttle the low level features to highest layer (avoiding bottleneck for these). ● Use of PReLU activations D architecture: ● Same as G encoder except for having 2 input channels (Noisy, Real/Fake) ● LeakyReLUs with alpha = 0.3 ● Virtual Batch Normalizations for speed up convergence and stability ● Classification output unit
  • 22. Experimental setup: database ● Publicly available Edinburgh dataset: http://datashare.is.ed.ac.uk/handle/10283/1942 ● Amount of data and types of speakers and noises fit our purposes ● Data re-sampled at 16kHz ● Training set: ○ 40 noise conditions: 10 types of noise w/ 4 SNR conditions {15, 10, 5, 0} dB ○ 10 sentences in each condition for every speaker ○ 28 speakers ● Test set (None seen during training. No condition nor speaker): ○ 20 noise conditions: 5 types of noise w/ 4 SNR conditions each {17.5, 12.5, 7.5, 2.5} dB ○ 20 sentences for each condition per speaker ○ 2 speakers (male and female)
  • 23. Experimental setup: training ● Show pairs of signals to “learn” a reconstruction loss. ● Use of L1 regularization to guide the GAN training.
  • 24. Experimental setup: training ● Training set was chunked with 50% overlaps → generated canvas of 1 second each (16384 samples). ● Batch size of 400 samples ● Training for 86 epochs ● Distributed training among up to 4 GPUs → ~ 18h total time ● Very low learning rates: 0.0002 ● RMSprop optimizer ● L1 regularization weight: 100 ● Pre-emphasis and de-emphasis with factor 0.95 helped getting rid of high freq. artifacts! Final G loss: LSGAN Adversarial + weighted L1 regularizatoin
  • 25. Experimental setup: test Objective evaluation: ● PESQ: Perceptual Evaluation of Speech Quality [-0.5, 4.5]: designed for telephonic compression assessment ● COVL: MOS prediction of the overall effect [1, 5] ○ CSIG: Mean opinion score (MOS) prediction of the signaldistortion attending only to the speech signal [1, 5] ○ CBAK: MOS prediction of the intrusiveness of background noise [1, 5] ● SSNR: Segmental SNR [0, inf)
  • 26. Experimental setup: test Subjective evaluation (perceptual test): ● 20 sentences per utterance. Each utterance randomly shown for the different systems. Subjects can listen/compare as many times as desired. ● Cherry picking meaningful types of noise for both speakers (we had no tags, so by hand). ● 16 subjects to rate from 1 (bad) to 5 (excellent) a trade-off: ○ How much noise is removed ○ How intact remains the signal to the enhancement process
  • 27. Results: objective Worst PESQ than the baseline, but outperforming the other metrics. However CSIG/CBAK/COVL are more correlated to enhancement than PESQ. Quite higher SSNR → SEGAN gets to remove much more noise.
  • 29. Conclusions ● And end-to-end speech enhancement method has been implemented within the GAN framework. ● It works as a fully conv encoder-decoder structure, making it faster than recursive solutions. ● The results show its effectiveness, but there is an improvement margin. ● Further development in architecture design is now the work in progress. ● Next stage: voice conversion by adaptation of this architecture.