SlideShare a Scribd company logo
Audio Inpainting
with
Generative Adversarial Network
Authors: Pirmin Philipp Ebner, Amr Eltelt
Published in Ar Xiv [ eess. AS ], March 13, 2020
Presenter : Kuan Hsun Ho
Date : 2021/10/08
Outline
1. Introduction
-Background, Objective
2. Architecture
-GAN, WGAN, Long/short Boarders, Dual Discriminators WGAN, Loss Function
3. Methodology
-Dataset, Signal Preprocessing, Training, Evaluation Metrics
4. Results & Discussion
-Different Network, Different Dataset, Different Training Steps
5. Conclusion & Outlook
6. References
Introduction
01
● Corrupted audio files, lost information in audio transmission (e.g. VoIP), and
audio signals locally contaminated by noise are highly important problems in
various audio processing tasks like in music enhancement and restoration.
● Loss of the connection in audio transmission -> data loss beyond hundred
milliseconds. This has highly unpleasant consequences for a listener, and it is
hardly feasible to reconstruct the lost content from local information only.
● The audio features are high dimensional, complex and non-correlated ->
applying state-of-the art models in image or video inpaintings tends not to
work well[2].
● Restoring of lost information in audio : waveform substitution [5], audio
inter/extrapolation [6,7], or audio inpainting [8].
Introduction
Background
● Reconstruction is usually aimed at providing a coherent and meaningful information
while preventing audible artifacts so that the listener remains unaware of any occurred
problem.
● Gaps < 50ms, small gaps, no synthesis needed, consider only over statistics
=> Uses sparsity-based techniques.
Otherwise, long gaps, regressive modeling[7] / sinusoidal modeling[10,11] / GAN[13] is
used.
● Audio signal are composed of fundamental freq and overtones.
● Having multiple different instrument in the training set, a higher frequency
spectrum has to be covered and the generated audio signal is stronger influenced by
noise.
Introduction
Background
● Audio inpainting of long gap (500~550ms) audio content.
● Focus on waveform strategies rather than spectrogram or multimodal strategies.
● Using WGAN in generating audio with
1. global coherence.
2. good audio quality.
3. different type of instruments from different dataset.
=> ability to adapt to different audio signals
● Finding a optimal model setting that is generalized rather than overfitted to specific dataset.
● There is no specific metric that can be used to evaluate the quality of generated audio signal
and therefore we depend on human judgement using the objective difference grading (ODG)
technique for evaluation.
● It is clear that in this unsupervised setting, our goal is not to generate a perfectly matching
audio similar to ground truth.
Introduction
Objective
Architecture
02
● GANs rely on two competing neural networks trained simultaneously in a two-player min-max
game: The generator produces new data from samples of a random variable; The
discriminator attempts to distinguish between these generated and real data.
● Generator : To fool the discriminator
Discriminator : Learn to better classify real and generated (fake) data.
● Advantages of GAN-based approaches is the rapid and straightforward sampling of large
amounts of audio.
● Several GAN approaches have been investigate to fill gaps in audio like WaveGAN [4] or
TiFGAN [14].
Architecture
GAN
● Improve the stability of learning.
○ Both generator and discriminator have to learn in correct directions.
● Get rid of problems like mode collapse.
○ Generator and discriminator have to be equally robust.
● Provide meaningful learning curves useful for debugging and hyperparameter
searches.
● Guarantee for data diversity.
Architecture
WGAN
Architecture
WGAN
G : Z -> X D : X -> [0,1] -> backpropagation
● The approach applied here is the extraction of the short-range borders and the long-range borders of
the missing audio content.
● Close neighbor audio have proven success to inpaint short period missing audio, however it fails for long
periods.
● A larger range of neighboring audio can tell us something about the long missing audio content.
Architecture
Long/Short Boarders
Boarders 1 length = 1+0.5+1 = 2.5s.
Boarders 2 length = 3+0.5+3 = 6.5s.
(Architecture of proposed model implemented by Tensorflow)
Architecture
Dual Discriminators WGAN
● To make GAN perform well, we minimize the total loss of both discriminators and generator,
and expect convergence.
Architecture
Loss Function
This figure shows the performance of training the total D-loss and total G-loss for different dataset.
Methodology
03
● We considered instrument sounds including piano, acoustic guitar and piano together with
a string orchestra using three different dataset.
1. The PIANO dataset contains only piano samples.
2. The SOLO dataset contains different instruments recording including accordion, acoustic
guitar, cello, flute, saxophone, trumpet, violin, and xylophone. In our work only the
acoustic guitar dataset.
3. The MAESTRO dataset contains recordings from the International Piano-e-Competition.
It is a piano performance competition where virtuoso pianists perform on Yamaha
claviers together with a string orchestra. The data must be splitted in training / validation
/ testing so that the same composition, even if performed by multiple contestants, does
not appear in multiple subsets. (Important!)
Methodology
Dataset
(Training / validation / testing segmentation)
Methodology
Dataset
Methodology
Signal Preprocessing
The filter cutoff frequency is set lower than Nyquist frequency. The audio signals are then
downsampled by 3.
● Both Generator and Discriminator shares same hyperparameter :
Learning rate : 1e-4
Optimizer : Adam
Batch size : 64 MAESTRO
Iteration ( steps )
PIANO, SOLO
● Parameters : 53248
● 3 datasets x 2 model = 6 training
● Figure besides shows comparison of generated audio
and real audio.
Methodology
Training
WGAN : 73 k
D2WGAN : 56.3 k
WGAN : 40 k
D2WGAN : 53.6 k
● Using the signal-to-noise ratios (SNRs) applied to the time-domain waveforms and
magnitude spectrograms is not an appropriate indicator to evaluate the model.
● Therefore, we conducted a user study evaluating the inpainting quality by means of
objective difference grades (ODG [22]).
● 50 samples of 6.5 s length for each dataset and model, in total 150 samples per model,
evaluate them by two independent person without knowing the source of the samples and
evaluate using ODG.
Methodology
Evaluation Metrics
Results & Discussion
04
Results & Discussion
Different Network
For PIANO,
● D-loss and G-loss in both models
converge.
● No overfitting occurs.
=> WGAN slightly outperforms.
For SOLO,
● D-loss in WGAN hasn’t converged.
● G-loss in both models converge.
=> D2WGAN highly outperforms.
● ODG table is as follows.
Results & Discussion
Different Network
+0.02 +0.08 +0.2 +0.1
-0.33 -0.07
● Inpainting quality : SOLO > PIANO > MAESTRO
● The SOLO dataset offers much more inpainting techniques and sound variations than the
PIANO dataset.
Guitar, 144 tones, covers four octaves. Piano, 88 tones, covers seven octaves.
● Lower frequency signal represents information at a lower rate (needs lesser samples to
reconstruct it) whereas a high frequency signal represents information at a faster rate (and hence
needs more samples while reconstruction).
Results & Discussion
Different Dataset
● Human ears are more sensitive in high frequencies, we can know it
from k-weighted filter.
● The MAESTRO dataset, a combination of piano together with a
string orchestra, is much harder to train. Also, it is a huge dataset of
201 hours of music and there is not only one single instrument
playing at the same time.
Results & Discussion
Different Dataset
● Improves ODG performance by training our network longer.
● The D2WGAN model was trained on the PIANO dataset for 140000 steps and tested both on the
PIANO and MAESTRO dataset.
● The D2WGAN model showed on both dataset a huge improvement. The results indicate that we
didn’t overfit the D2WGAN model and got excellent performance (both mean & std).
● Explanation may be that the model interpreted the orchestra mainly as background noise. Thus,
small noise in the inpainting part will be stronger smoothed and therefore the impairment is
more perceptible but not annoying.
Results & Discussion
Different Training Steps
Conclusion & Outlook
05
● Audio inpainting of long gap (500~550ms) audio content.
● Focus on waveform strategies rather than spectrogram or multimodal strategies.
● Using WGAN in generating audio with
1. global coherence.
2. good audio quality.
3. different type of instruments from different dataset.
=> ability to adapt to different audio signals
● Finding a optimal model setting that is generalized rather than overfitted to specific dataset.
● There is no specific metric that can be used to evaluate the quality of generated audio signal
and therefore we depend on human judgement using the objective difference grading (ODG)
technique for evaluation.
● It is clear that in this unsupervised setting, our goal is not to generate a perfectly matching
audio similar to ground truth.
Introduction
Objective
● We analysed long gap (500 - 550 ms) audio content inpainting using WGAN and D2WGAN.
● The D2WGAN model is a new proposed architecture improvement to inpaint the missing
audio content using a short-range and a long-range borders.
● The new proposed D2WGAN model slightly outperforms the classic WGAN model for all
three different dataset (piano, guitar, piano and orchestra).
● The long range borders combined with short range borders can potential provide
information in correlation with the gap content.
● A better improvement was observed for the SOLO and MAESTRO.
● The poor results for the PIANO can be illustrated by the fact that it lacks audible variation in
sound.
● As we aim for generalized solution, we need to have a dataset with a generalized audio
content.
Conclusion & Outlook
● By increasing the training step, the D2WGAN has significantly improved performance
without overfitting.
● Better results can be achieved for audio dataset where a particular instrument is
accompanist by other instruments if we train the network only on this particular instrument
and neglecting the other instruments.
● As for future work, few topics can be done
1. playing around with the boarder length
2. adding more layers to the convolution operation or changing filters sizes
3. exploring multimodal strategies where we combine waveform and spectrogram
images
4. finding datasets that helps strengthen reconstruction of high frequency
5. training on a wide range of music instruments
Conclusion & Outlook
References
06
[1] N. Perraudin, N. Holighaus, P. Majdak, and P. Balazs, “Inpainting of long audio segments with similarity graphs,” IEEE/ACM Transactions on Audio,
Speech, and Language Processing, 2018.
[2] Y.-L. Chang, K.-Y. Lee, P.-Y. Wu, H.-y. Lee, and W. Hsu, “Deep long audio inpainting,” arXiv:1911.06476v1, 2019.
[3] A.VanDenOord,S.Dieleman,H.Zen,K.Simonyan,O.Vinyals,A.Graves,N.Kalchbrenner, A. W. Senior, and K. Kavukcuoglu, “Wavenet: A generative
model for raw audio.,” in SSW, p. 125, 2016.
[4] C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” arXiv:1802.04208, 2018.
[5] D. Goodman, G. Lockhart, O. Wasem, and W.-C. Wong, “Waveform substitution techniques for recovering missing speech segments in packed voice
communications,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 34, no. 6, pp. 1440–1448, 1986.
[6] I. Kauppinen, J. Kauppinen, and P. Saarinen, “A method for long extrapolation of audio signals,” Journal of the Audio Engineering Society, vol. 49, no.
12, pp. 1167–1180, 2001.
[7] W.Etter,“Restoration of a discrete-time signal segment by interpolation based on the left-sided and right-sided autoregressive parameters,” IEEE
Transactions on Signal Processing, vol. 44, no. 5, pp. 1124–1135, 1996.
[8] A. Adler, V. Emiya, M. Jafari, M. Elad, R. Gribonval, and M. Plumbley, “Audio inpainting,” IEEE Transactions on Audio, Speech and Language
Processing, vol. 20, no. 3, pp. 922–932, 2012.
[9] K. Siedenburg, M. Drfler, and M. Kowalski, “Audio inpainting with social sparsity,” SPARS Signal Processing with Adaptive Sparse Structured
Representations, 2013.
References
[10] M. Lagrange, S. Marchand, and J.-B. Rault, “Long interpolation of audio signals using linear prediction in sinusoidal modeling,” Audio Eng. Soc., vol.
53, no. 10, pp. 891–905, 2005.
[11] A. Lukin and J. Todd, “Parametric interpolation of gaps in audio signals,” Audio Engineering Society Convention, vol. 125, 2008.
[12] Y.Bahat,Y.Schechner,andM.Elad,“Self-content-basedaudioinpainting,”SignalProcessing, vol. 111, pp. 61–72, 2015.
[13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in
neural information processing sys- tems, pp. 2672–2680, 2014.
[14] A. Marafioti, N. Holighaus, N. Perraudin, and P. Majdak, “Adversarial generation of time- frequency features with application in audio synthesis,”
arXiv preprint arXiv:1902.04072, 2019.
[15] T. Che, Y. Li, A. P. Jacob, Y. Bengio, and W. Li, “Mode regularized generative adversarial networks,” International Conference on Learning
Representations, 2017.
[16] A. Srivastava, L. Valkov, C. Russell, M. U. Gutmann, and C. Sutton, “VEE-GAN: Reducing mode collapse in GANs using implicit variational
learning,” Advances In Neutral Information Processing Systems, 2017.
[17] K. J. Liang, C. Li, G. Wang, and L. Carin, “Generative adversarial network training is a con- tinual learning problem,” arXiv:1811.11083v1, 2018.
References
CREDITS: This presentation template was created
by Slidesgo, including icons by Flaticon,and
infographics & images by Freepik
Thanks

More Related Content

Similar to Audio Inpainting with D2WGAN.pdf

Audio Noise Removal – The State of the Art
Audio Noise Removal – The State of the ArtAudio Noise Removal – The State of the Art
Audio Noise Removal – The State of the Art
ijceronline
 
Review Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different TechniquesReview Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different Techniques
IRJET Journal
 
A review of analog audio scrambling methods for residual intelligibility
A review of analog audio scrambling methods for residual intelligibilityA review of analog audio scrambling methods for residual intelligibility
A review of analog audio scrambling methods for residual intelligibility
Alexander Decker
 
Audio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet TransformsAudio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet Transforms
CSCJournals
 
IRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based SteganographyIRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based Steganography
IRJET Journal
 
A novel speech enhancement technique
A novel speech enhancement techniqueA novel speech enhancement technique
A novel speech enhancement technique
eSAT Publishing House
 
IRJET- Survey on Efficient Signal Processing Techniques for Speech Enhancement
IRJET- Survey on Efficient Signal Processing Techniques for Speech EnhancementIRJET- Survey on Efficient Signal Processing Techniques for Speech Enhancement
IRJET- Survey on Efficient Signal Processing Techniques for Speech Enhancement
IRJET Journal
 
Snorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise NormalizationSnorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise Normalization
IJERA Editor
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
IRJET Journal
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...
karthik annam
 
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET Journal
 
SEGAN: Speech Enhancement Generative Adversarial Network
SEGAN: Speech Enhancement Generative Adversarial NetworkSEGAN: Speech Enhancement Generative Adversarial Network
SEGAN: Speech Enhancement Generative Adversarial Network
Universitat Politècnica de Catalunya
 
Quality and Distortion Evaluation of Audio Signal by Spectrum
Quality and Distortion Evaluation of Audio Signal by SpectrumQuality and Distortion Evaluation of Audio Signal by Spectrum
Quality and Distortion Evaluation of Audio Signal by Spectrum
CSCJournals
 
Sudormrf.pdf
Sudormrf.pdfSudormrf.pdf
Sudormrf.pdf
ssuser849b73
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 
IRJET- A Novel Hybrid Image Denoising Technique based on Trilateral Filtering...
IRJET- A Novel Hybrid Image Denoising Technique based on Trilateral Filtering...IRJET- A Novel Hybrid Image Denoising Technique based on Trilateral Filtering...
IRJET- A Novel Hybrid Image Denoising Technique based on Trilateral Filtering...
IRJET Journal
 
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET Journal
 
Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...
eSAT Journals
 
H010234144
H010234144H010234144
H010234144
IOSR Journals
 
Nd2421622165
Nd2421622165Nd2421622165
Nd2421622165
IJERA Editor
 

Similar to Audio Inpainting with D2WGAN.pdf (20)

Audio Noise Removal – The State of the Art
Audio Noise Removal – The State of the ArtAudio Noise Removal – The State of the Art
Audio Noise Removal – The State of the Art
 
Review Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different TechniquesReview Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different Techniques
 
A review of analog audio scrambling methods for residual intelligibility
A review of analog audio scrambling methods for residual intelligibilityA review of analog audio scrambling methods for residual intelligibility
A review of analog audio scrambling methods for residual intelligibility
 
Audio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet TransformsAudio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet Transforms
 
IRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based SteganographyIRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based Steganography
 
A novel speech enhancement technique
A novel speech enhancement techniqueA novel speech enhancement technique
A novel speech enhancement technique
 
IRJET- Survey on Efficient Signal Processing Techniques for Speech Enhancement
IRJET- Survey on Efficient Signal Processing Techniques for Speech EnhancementIRJET- Survey on Efficient Signal Processing Techniques for Speech Enhancement
IRJET- Survey on Efficient Signal Processing Techniques for Speech Enhancement
 
Snorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise NormalizationSnorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise Normalization
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...
 
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVM
 
SEGAN: Speech Enhancement Generative Adversarial Network
SEGAN: Speech Enhancement Generative Adversarial NetworkSEGAN: Speech Enhancement Generative Adversarial Network
SEGAN: Speech Enhancement Generative Adversarial Network
 
Quality and Distortion Evaluation of Audio Signal by Spectrum
Quality and Distortion Evaluation of Audio Signal by SpectrumQuality and Distortion Evaluation of Audio Signal by Spectrum
Quality and Distortion Evaluation of Audio Signal by Spectrum
 
Sudormrf.pdf
Sudormrf.pdfSudormrf.pdf
Sudormrf.pdf
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
IRJET- A Novel Hybrid Image Denoising Technique based on Trilateral Filtering...
IRJET- A Novel Hybrid Image Denoising Technique based on Trilateral Filtering...IRJET- A Novel Hybrid Image Denoising Technique based on Trilateral Filtering...
IRJET- A Novel Hybrid Image Denoising Technique based on Trilateral Filtering...
 
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
 
Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...
 
H010234144
H010234144H010234144
H010234144
 
Nd2421622165
Nd2421622165Nd2421622165
Nd2421622165
 

More from ssuser849b73

Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdf
ssuser849b73
 
Frame-Online DNN-WPE Dereverberation.pdf
Frame-Online DNN-WPE Dereverberation.pdfFrame-Online DNN-WPE Dereverberation.pdf
Frame-Online DNN-WPE Dereverberation.pdf
ssuser849b73
 
WaveNet.pdf
WaveNet.pdfWaveNet.pdf
WaveNet.pdf
ssuser849b73
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
ssuser849b73
 
Wavesplit.pdf
Wavesplit.pdfWavesplit.pdf
Wavesplit.pdf
ssuser849b73
 
EEND-SS.pdf
EEND-SS.pdfEEND-SS.pdf
EEND-SS.pdf
ssuser849b73
 
Sepformer&DPTNet.pdf
Sepformer&DPTNet.pdfSepformer&DPTNet.pdf
Sepformer&DPTNet.pdf
ssuser849b73
 

More from ssuser849b73 (7)

Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdf
 
Frame-Online DNN-WPE Dereverberation.pdf
Frame-Online DNN-WPE Dereverberation.pdfFrame-Online DNN-WPE Dereverberation.pdf
Frame-Online DNN-WPE Dereverberation.pdf
 
WaveNet.pdf
WaveNet.pdfWaveNet.pdf
WaveNet.pdf
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
 
Wavesplit.pdf
Wavesplit.pdfWavesplit.pdf
Wavesplit.pdf
 
EEND-SS.pdf
EEND-SS.pdfEEND-SS.pdf
EEND-SS.pdf
 
Sepformer&DPTNet.pdf
Sepformer&DPTNet.pdfSepformer&DPTNet.pdf
Sepformer&DPTNet.pdf
 

Recently uploaded

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
bjmsejournal
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
Yasser Mahgoub
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
PIMR BHOPAL
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
mahaffeycheryld
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 

Recently uploaded (20)

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 

Audio Inpainting with D2WGAN.pdf

  • 1. Audio Inpainting with Generative Adversarial Network Authors: Pirmin Philipp Ebner, Amr Eltelt Published in Ar Xiv [ eess. AS ], March 13, 2020 Presenter : Kuan Hsun Ho Date : 2021/10/08
  • 2. Outline 1. Introduction -Background, Objective 2. Architecture -GAN, WGAN, Long/short Boarders, Dual Discriminators WGAN, Loss Function 3. Methodology -Dataset, Signal Preprocessing, Training, Evaluation Metrics 4. Results & Discussion -Different Network, Different Dataset, Different Training Steps 5. Conclusion & Outlook 6. References
  • 4. ● Corrupted audio files, lost information in audio transmission (e.g. VoIP), and audio signals locally contaminated by noise are highly important problems in various audio processing tasks like in music enhancement and restoration. ● Loss of the connection in audio transmission -> data loss beyond hundred milliseconds. This has highly unpleasant consequences for a listener, and it is hardly feasible to reconstruct the lost content from local information only. ● The audio features are high dimensional, complex and non-correlated -> applying state-of-the art models in image or video inpaintings tends not to work well[2]. ● Restoring of lost information in audio : waveform substitution [5], audio inter/extrapolation [6,7], or audio inpainting [8]. Introduction Background
  • 5. ● Reconstruction is usually aimed at providing a coherent and meaningful information while preventing audible artifacts so that the listener remains unaware of any occurred problem. ● Gaps < 50ms, small gaps, no synthesis needed, consider only over statistics => Uses sparsity-based techniques. Otherwise, long gaps, regressive modeling[7] / sinusoidal modeling[10,11] / GAN[13] is used. ● Audio signal are composed of fundamental freq and overtones. ● Having multiple different instrument in the training set, a higher frequency spectrum has to be covered and the generated audio signal is stronger influenced by noise. Introduction Background
  • 6. ● Audio inpainting of long gap (500~550ms) audio content. ● Focus on waveform strategies rather than spectrogram or multimodal strategies. ● Using WGAN in generating audio with 1. global coherence. 2. good audio quality. 3. different type of instruments from different dataset. => ability to adapt to different audio signals ● Finding a optimal model setting that is generalized rather than overfitted to specific dataset. ● There is no specific metric that can be used to evaluate the quality of generated audio signal and therefore we depend on human judgement using the objective difference grading (ODG) technique for evaluation. ● It is clear that in this unsupervised setting, our goal is not to generate a perfectly matching audio similar to ground truth. Introduction Objective
  • 8. ● GANs rely on two competing neural networks trained simultaneously in a two-player min-max game: The generator produces new data from samples of a random variable; The discriminator attempts to distinguish between these generated and real data. ● Generator : To fool the discriminator Discriminator : Learn to better classify real and generated (fake) data. ● Advantages of GAN-based approaches is the rapid and straightforward sampling of large amounts of audio. ● Several GAN approaches have been investigate to fill gaps in audio like WaveGAN [4] or TiFGAN [14]. Architecture GAN
  • 9. ● Improve the stability of learning. ○ Both generator and discriminator have to learn in correct directions. ● Get rid of problems like mode collapse. ○ Generator and discriminator have to be equally robust. ● Provide meaningful learning curves useful for debugging and hyperparameter searches. ● Guarantee for data diversity. Architecture WGAN
  • 10. Architecture WGAN G : Z -> X D : X -> [0,1] -> backpropagation
  • 11. ● The approach applied here is the extraction of the short-range borders and the long-range borders of the missing audio content. ● Close neighbor audio have proven success to inpaint short period missing audio, however it fails for long periods. ● A larger range of neighboring audio can tell us something about the long missing audio content. Architecture Long/Short Boarders Boarders 1 length = 1+0.5+1 = 2.5s. Boarders 2 length = 3+0.5+3 = 6.5s.
  • 12. (Architecture of proposed model implemented by Tensorflow) Architecture Dual Discriminators WGAN
  • 13. ● To make GAN perform well, we minimize the total loss of both discriminators and generator, and expect convergence. Architecture Loss Function This figure shows the performance of training the total D-loss and total G-loss for different dataset.
  • 15. ● We considered instrument sounds including piano, acoustic guitar and piano together with a string orchestra using three different dataset. 1. The PIANO dataset contains only piano samples. 2. The SOLO dataset contains different instruments recording including accordion, acoustic guitar, cello, flute, saxophone, trumpet, violin, and xylophone. In our work only the acoustic guitar dataset. 3. The MAESTRO dataset contains recordings from the International Piano-e-Competition. It is a piano performance competition where virtuoso pianists perform on Yamaha claviers together with a string orchestra. The data must be splitted in training / validation / testing so that the same composition, even if performed by multiple contestants, does not appear in multiple subsets. (Important!) Methodology Dataset
  • 16. (Training / validation / testing segmentation) Methodology Dataset
  • 17. Methodology Signal Preprocessing The filter cutoff frequency is set lower than Nyquist frequency. The audio signals are then downsampled by 3.
  • 18. ● Both Generator and Discriminator shares same hyperparameter : Learning rate : 1e-4 Optimizer : Adam Batch size : 64 MAESTRO Iteration ( steps ) PIANO, SOLO ● Parameters : 53248 ● 3 datasets x 2 model = 6 training ● Figure besides shows comparison of generated audio and real audio. Methodology Training WGAN : 73 k D2WGAN : 56.3 k WGAN : 40 k D2WGAN : 53.6 k
  • 19. ● Using the signal-to-noise ratios (SNRs) applied to the time-domain waveforms and magnitude spectrograms is not an appropriate indicator to evaluate the model. ● Therefore, we conducted a user study evaluating the inpainting quality by means of objective difference grades (ODG [22]). ● 50 samples of 6.5 s length for each dataset and model, in total 150 samples per model, evaluate them by two independent person without knowing the source of the samples and evaluate using ODG. Methodology Evaluation Metrics
  • 21. Results & Discussion Different Network For PIANO, ● D-loss and G-loss in both models converge. ● No overfitting occurs. => WGAN slightly outperforms. For SOLO, ● D-loss in WGAN hasn’t converged. ● G-loss in both models converge. => D2WGAN highly outperforms.
  • 22. ● ODG table is as follows. Results & Discussion Different Network +0.02 +0.08 +0.2 +0.1 -0.33 -0.07
  • 23. ● Inpainting quality : SOLO > PIANO > MAESTRO ● The SOLO dataset offers much more inpainting techniques and sound variations than the PIANO dataset. Guitar, 144 tones, covers four octaves. Piano, 88 tones, covers seven octaves. ● Lower frequency signal represents information at a lower rate (needs lesser samples to reconstruct it) whereas a high frequency signal represents information at a faster rate (and hence needs more samples while reconstruction). Results & Discussion Different Dataset ● Human ears are more sensitive in high frequencies, we can know it from k-weighted filter. ● The MAESTRO dataset, a combination of piano together with a string orchestra, is much harder to train. Also, it is a huge dataset of 201 hours of music and there is not only one single instrument playing at the same time.
  • 25. ● Improves ODG performance by training our network longer. ● The D2WGAN model was trained on the PIANO dataset for 140000 steps and tested both on the PIANO and MAESTRO dataset. ● The D2WGAN model showed on both dataset a huge improvement. The results indicate that we didn’t overfit the D2WGAN model and got excellent performance (both mean & std). ● Explanation may be that the model interpreted the orchestra mainly as background noise. Thus, small noise in the inpainting part will be stronger smoothed and therefore the impairment is more perceptible but not annoying. Results & Discussion Different Training Steps
  • 27. ● Audio inpainting of long gap (500~550ms) audio content. ● Focus on waveform strategies rather than spectrogram or multimodal strategies. ● Using WGAN in generating audio with 1. global coherence. 2. good audio quality. 3. different type of instruments from different dataset. => ability to adapt to different audio signals ● Finding a optimal model setting that is generalized rather than overfitted to specific dataset. ● There is no specific metric that can be used to evaluate the quality of generated audio signal and therefore we depend on human judgement using the objective difference grading (ODG) technique for evaluation. ● It is clear that in this unsupervised setting, our goal is not to generate a perfectly matching audio similar to ground truth. Introduction Objective
  • 28. ● We analysed long gap (500 - 550 ms) audio content inpainting using WGAN and D2WGAN. ● The D2WGAN model is a new proposed architecture improvement to inpaint the missing audio content using a short-range and a long-range borders. ● The new proposed D2WGAN model slightly outperforms the classic WGAN model for all three different dataset (piano, guitar, piano and orchestra). ● The long range borders combined with short range borders can potential provide information in correlation with the gap content. ● A better improvement was observed for the SOLO and MAESTRO. ● The poor results for the PIANO can be illustrated by the fact that it lacks audible variation in sound. ● As we aim for generalized solution, we need to have a dataset with a generalized audio content. Conclusion & Outlook
  • 29. ● By increasing the training step, the D2WGAN has significantly improved performance without overfitting. ● Better results can be achieved for audio dataset where a particular instrument is accompanist by other instruments if we train the network only on this particular instrument and neglecting the other instruments. ● As for future work, few topics can be done 1. playing around with the boarder length 2. adding more layers to the convolution operation or changing filters sizes 3. exploring multimodal strategies where we combine waveform and spectrogram images 4. finding datasets that helps strengthen reconstruction of high frequency 5. training on a wide range of music instruments Conclusion & Outlook
  • 31. [1] N. Perraudin, N. Holighaus, P. Majdak, and P. Balazs, “Inpainting of long audio segments with similarity graphs,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018. [2] Y.-L. Chang, K.-Y. Lee, P.-Y. Wu, H.-y. Lee, and W. Hsu, “Deep long audio inpainting,” arXiv:1911.06476v1, 2019. [3] A.VanDenOord,S.Dieleman,H.Zen,K.Simonyan,O.Vinyals,A.Graves,N.Kalchbrenner, A. W. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio.,” in SSW, p. 125, 2016. [4] C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” arXiv:1802.04208, 2018. [5] D. Goodman, G. Lockhart, O. Wasem, and W.-C. Wong, “Waveform substitution techniques for recovering missing speech segments in packed voice communications,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 34, no. 6, pp. 1440–1448, 1986. [6] I. Kauppinen, J. Kauppinen, and P. Saarinen, “A method for long extrapolation of audio signals,” Journal of the Audio Engineering Society, vol. 49, no. 12, pp. 1167–1180, 2001. [7] W.Etter,“Restoration of a discrete-time signal segment by interpolation based on the left-sided and right-sided autoregressive parameters,” IEEE Transactions on Signal Processing, vol. 44, no. 5, pp. 1124–1135, 1996. [8] A. Adler, V. Emiya, M. Jafari, M. Elad, R. Gribonval, and M. Plumbley, “Audio inpainting,” IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 3, pp. 922–932, 2012. [9] K. Siedenburg, M. Drfler, and M. Kowalski, “Audio inpainting with social sparsity,” SPARS Signal Processing with Adaptive Sparse Structured Representations, 2013. References
  • 32. [10] M. Lagrange, S. Marchand, and J.-B. Rault, “Long interpolation of audio signals using linear prediction in sinusoidal modeling,” Audio Eng. Soc., vol. 53, no. 10, pp. 891–905, 2005. [11] A. Lukin and J. Todd, “Parametric interpolation of gaps in audio signals,” Audio Engineering Society Convention, vol. 125, 2008. [12] Y.Bahat,Y.Schechner,andM.Elad,“Self-content-basedaudioinpainting,”SignalProcessing, vol. 111, pp. 61–72, 2015. [13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing sys- tems, pp. 2672–2680, 2014. [14] A. Marafioti, N. Holighaus, N. Perraudin, and P. Majdak, “Adversarial generation of time- frequency features with application in audio synthesis,” arXiv preprint arXiv:1902.04072, 2019. [15] T. Che, Y. Li, A. P. Jacob, Y. Bengio, and W. Li, “Mode regularized generative adversarial networks,” International Conference on Learning Representations, 2017. [16] A. Srivastava, L. Valkov, C. Russell, M. U. Gutmann, and C. Sutton, “VEE-GAN: Reducing mode collapse in GANs using implicit variational learning,” Advances In Neutral Information Processing Systems, 2017. [17] K. J. Liang, C. Li, G. Wang, and L. Carin, “Generative adversarial network training is a con- tinual learning problem,” arXiv:1811.11083v1, 2018. References
  • 33. CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon,and infographics & images by Freepik Thanks