SlideShare a Scribd company logo
1 of 37
Wavenet
Fairies(Adonis Han ํ•œ์ƒํ›ˆ)
sanghan1990@naver.com
Introduction ๏ต Raw audio generation
๏ต WaveNet: very high temporal
resolution (16,000 samples)
Contributions
๏ต Generate raw speech signals
๏ต New architectures based on dilated causal
convolutions
๏ต Single model can be used to generate different
voices, conditioned on a speaker identity
Comment of Wavenet
WaveNet: deep generative model
of audio data that operate
directly at the waveform level
Dilated convolution
โ€ข exponentially increase the receptive field
โ€ข to model the long-range temporal
dependencies
Conditioned model with global or
local way
Causal
convolutions
โ€ข Causal convolutions
(cannot violate the
ordering)
โ€ข Same concept of the
masked convolution
โ€ข No recurrent
connections
Dilated causal
convolutions
โ€ข Dilated causal
convolutions
โ€ข Efficiently increase
the receptive field
โ€ข 1, 2, 4, โ€ฆ, 512, 1, 2,
4, โ€ฆ, 512, 1, 2, 4, โ€ฆ,
512
Softmax distribution
โ€ข Raw audio: a sequence of 16-bit int. value / time step
โ€ข Softmax layer: output 65,536 probabilities
- law companding trasformation
โ€ข Quantize it to 256 possible values
Softmax
distribution
conditional probability ๋ฅผ
modelingํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ,
softmax distributions ์„
์‚ฌ์šฉํ•จ.
Audio ์‹ ํ˜ธ๋Š” 16bit๋กœ
quantization ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€
๋งŽ์Œ ์ด๊ฑธ softmax๋กœ
ํ‘œํ˜„ํ•˜๋ ค๋ฉด sample๋งˆ๋‹ค
65536 ๊ฐœ์˜ output์ด ํ•„์š”.
(๋„ˆ๋ฌด ๋งŽ๋‹ค.)
mu-law companding
๊ธฐ๋ฒ•์„ ์‚ฌ์šฉ. ์‚ฌ๋žŒ์˜ ๊ท€๋Š”
์†Œ๋ฆฌ ํฌ๊ธฐ๊ฐ€ ์ž‘์„ ๋•Œ๋Š”
์ž‘์€ ๋ณ€ํ™”์—๋„ ๋ฏผ๊ฐ ์†Œ๋ฆฌ
ํฌ๊ธฐ๊ฐ€ ํด ๋•Œ๋Š” ๋น„๊ต์  ํฐ
๋ณ€ํ™”์—๋„ ๋‘”๊ฐํ•จ.
quantization์„
nonlinearํ•˜๊ฒŒ ํ•ด์คŒ.
์ด๋ ‡๊ฒŒ ํ•˜๋ฉด 8bit(256
outputs)๋กœ๋„ ๊ฝค ์ข‹์€ ์„ฑ๋Šฅ
์œผ๋กœ encoding/decoding์ด
๊ฐ€๋Šฅ
Gated
activation
units
Residual and
skip
connections
Joint probability
โ€ข Waveform
โ€ข Conditional probability
distribution is modelled by a
stack of convolutional layers
(similarly to PixelCNN)
โ€ข No pooling
โ€ข Dimensionality of input = Dim.
of output
โ€ข Output: softmax layer p(x)
Dilation layer
Background
The key application the dilated convolution
authors have in mind is dense prediction: vision
applications where the predicted object that has
similar size and structure to the input image.
For example, semantic segmentation with one
label per pixel; image super-resolution, denoising,
demosaicing, bottom-up saliency, keypoint
detection, etc.
Dilation layer
In many such applications one wants to integrate
information from different spatial scales and
balance two properties:
1.local, pixel-level accuracy, such as precise detection of
edges, and
2.integrating knowledge of the wider, global context
To address this problem,
people often use some kind of multi-scale
convolutional neural networks, which often relies
on spatial pooling. Instead the authors here
propose using layers dilated convolutions, which
allow us to address the multi-scale problem
efficiently without increasing the number of
parameters too much.
Dilation layer
In the visual system, receptive fields are volumes in visual space
dilated conv = atrous conv (a trous en francais)
receptive field = center + surround
๋นจ๊ฐ„์  ์ฃผ์œ„๋กœ์˜ ํ”ฝ์…€๋“ค๋งŒ ์‚ฌ์šฉํ•˜์—ฌ conv๋ฅผ ์ˆ˜ํ–‰. ํ•ด์ƒ๋„์˜ ์†์‹ค์—†์ด receptive
field ์˜ ํฌ๊ธฐ๋ฅผ ํ™•์žฅํ•  ์ˆ˜ ์žˆ์Œ.
atrous conv ๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์ด์œ ๋Š” ์ „์ฒด receptive field ์—์„œ ๋นจ๊ฐ„์ƒ‰ ์ ์˜ ์œ„์น˜๋งŒ
๊ณ„์ˆ˜๊ฐ€ ์กด์žฌํ•˜๊ณ  ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ 0์œผ๋กœ ์ฑ„์›Œ์ง.
Ref: http://www.inference.vc/dilated-convolutions-and-kronecker-factorisation/
Dilated Convolutions
Dilation layer
์žฅ์ 
1. ํฐ receptive field ๋ฅผ ์ทจํ•˜๋ ค๋ฉด, ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„์•ผ ํ•˜์ง€๋งŒ, dilated conv๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด
receptive field ๋Š” ์ปค์ง€์ง€๋งŒ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐœ์ˆ˜๋Š” ๋Š˜์–ด๋‚˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์—ฐ์‚ฐ๋Ÿ‰ ๊ด€์ ์—์„œ ํƒ์›”ํ•œ
ํšจ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Œ.
2. receptive field๊ฐ€ 7 x7 ์ด๊ธฐ ๋•Œ๋ฌธ์— normal filter ๋กœ ๊ตฌํ˜„์„ ํ•˜๋ฉด ํ•„ํ„ฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐœ์ˆ˜๋Š”
49์ด์ง€๋งŒ dilated conv ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด 49๊ฐœ์ค‘ ๋นจ๊ฐ„์ ์— ํ•ด๋‹นํ•˜๋Š” ๋ถ€๋ถ„์—๋งŒ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์ด๊ณ 
๋‚˜๋จธ์ง€ 40๊ฐœ ์ •๋„๋Š” ๋ชจ๋‘ 0 ์œผ๋กœ ์ฑ„์›Œ์ ธ ์—ฐ์‚ฐ๋Ÿ‰ ๋ถ€๋‹ด์ด 3x3ํ•„ํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š”๊ฒƒ๊ณผ ๊ฐ™์Œ.
3. receptive field ์˜ ํฌ๊ธฐ๊ฐ€ ์ปค์ ธ, dilation ๊ณ„์ˆ˜๋ฅผ ์กฐ์ •ํ•˜๋ฉด ๋‹ค์–‘ํ•œ scale์— ๋Œ€ํ•œ ๋Œ€์‘์ด
๊ฐ€๋Šฅํ•ด์ง„๋‹ค.(๋‹ค์–‘ํ•œ scale์—์„œ์˜ ์ •๋ณด๋ฅผ ๋„์ง‘์–ด๋‚ด๋ ค๋ฉด ๋„“์€ receptive field ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์–ด์•ผ ํ•˜๋Š”๋ฐ
dilated conv๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ณ„ ์–ด๋ ค์›€์ด ์—†๋‹ค.
->๊ธฐ์กด cnn ์—์„œ๋Š” receptive field ํ™•์žฅ์„ ์œ„ํ•ด pooling layer๋ฅผ ํ†ตํ•ด ํฌ๊ธฐ๋ฅผ ์ค„์ธ ํ›„ conv ๋ฅผ
์ˆ˜ํ–‰ํ•˜๋Š” ์‹์œผ๋กœ ํ–ˆ๋‹ค.
EXAMPLE โ€“ MUSIC
โ€ข MagnaTagATune dataset: 200 hours, each 29-second clip is
annotated with tags (genre, instrumentation, tempo, volume
and mood of the music)
โ€ข YouTube piano dataset: 60 hours of solo piano music
โ€ข Enlarging the receptive field was crucial to obtain samples
that sounded music
โ€ข Conditional music models: generate music given a set of tags
specifying e.g. genre or instruments
EXAMPLE โ€“ Multi-speaker speech
generation
Multi-speaker speech generation
โ€ข English multi-speaker corpus from CSTR voice cloning
toolkit(VCTK): 44 hours from 109 different speakers
โ€ข Not conditioned on text
โ€ข generates non-existent but human language-like words in a
smooth way with realistic sounding intonaitons
โ€ข The lack of log range coherence
โ€ข limited receptive filed size (about 300 ms)
โ€ข Powerful model to capture the characteristics of all 109
speakers
EXAMPLE - Text-To-Speech
Text-To-Speech
โ€ข Googleโ€™s TTS dataset (Eng.: 24.6 h, Mandarin: 34.8 h)
โ€ข Locally conditioned on linguistic features which were derived
from input texts
โ€ข Evaluation
โ€ข subjective paired comparison tests: choose one they
preferred
โ€ข mean opinion score (MOS): (1: bad, 2: poor, 3: fair, 4: good, 5:
excellent)
(1: bad, 2: poor, 3: fair, 4: good, 5: excellent)
Conidtional WaveNets (cont.)
โ€ข Global conditioning
๏‚ง h: the output dist. across all timesteps
โ€ข Local conditioning
๏‚ง second timeseries : lower sampling
frequency than raw data
๏‚ง transform using transposed conv.
+ Global Conditioning is characterized by a single latent
representation h that influences the output distribution
across all timesteps
+ For Local Conditioning, we have a second timeseries h(t),
possibly with a lower sampling frequency
Dilated Convolution
Optimizer details
Buildmodel 1
optimizer
Compute โ€“ receiptive field
Build model2
Ref
โ€ข http://deepsound.io/wavenet_first_try.html
โ€ข --keras
โ€ข https://github.com/basveeling/wavenet
โ€ข --tensorflow
โ€ข https://github.com/ibab/tensorflow-wavenet
โ€ข https://tensorflow.blog/2016/09/09/wavenet-deepminds-new-model-for-audio/
โ€ข https://deepmind.com/blog/wavenet-generative-model-raw-audio/
โ€ข https://www.youtube.com/watch?v=nsrSrYtKkT8
โ€ข --github
โ€ข https://github.com/usernaamee/keras-wavenet
โ€ข https://github.com/munich-ai-labs/keras2-wavenet
โ€ข https://github.com/rampage644/wavenet
โ€ข http://www.modulabs.co.kr/DeepLAB_Paper/16552(๋ชจ๋‘์˜ ์—ฐ๊ตฌ์†Œ) [PR12] WaveNet - A Generative Model for Raw
Audio
http://www.modulabs.co.kr/DeepLAB_Paper/15027(๋ชจ๋‘์˜ ์—ฐ๊ตฌ์†Œ)

More Related Content

Similar to how to understand and implement the "WAVENET"

[NUGU CONFERENCE 2019] ํŠธ๋ž™ A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] ํŠธ๋ž™ A-4 : Zero-shot learning for Personalized Text-to-S...[NUGU CONFERENCE 2019] ํŠธ๋ž™ A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] ํŠธ๋ž™ A-4 : Zero-shot learning for Personalized Text-to-S...NUGU developers
ย 
New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...Sound and Vision R&D
ย 
New Life for Old Media (NEM presentation)
New Life for Old Media  (NEM presentation)New Life for Old Media  (NEM presentation)
New Life for Old Media (NEM presentation)Victor de Boer
ย 
Taiwan course
Taiwan courseTaiwan course
Taiwan courseBharath Kumar
ย 
Engineering Intelligent NLP Applications Using Deep Learning โ€“ Part 2
Engineering Intelligent NLP Applications Using Deep Learning โ€“ Part 2 Engineering Intelligent NLP Applications Using Deep Learning โ€“ Part 2
Engineering Intelligent NLP Applications Using Deep Learning โ€“ Part 2 Saurabh Kaushik
ย 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...changedaeoh
ย 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentenceANISH BHANUSHALI
ย 
Multizone reproduction of speech soundfields a perceptually weighted approac...
Multizone reproduction of speech soundfields  a perceptually weighted approac...Multizone reproduction of speech soundfields  a perceptually weighted approac...
Multizone reproduction of speech soundfields a perceptually weighted approac...Jacob Donley
ย 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
ย 
Reverb w5 imp_2
Reverb w5 imp_2Reverb w5 imp_2
Reverb w5 imp_2Jan Zurcher
ย 
final ppt BATCH 3.pptx
final ppt BATCH 3.pptxfinal ppt BATCH 3.pptx
final ppt BATCH 3.pptxMounika715343
ย 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
ย 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionStephen Marquard
ย 
Ted Willke - The Brainโ€™s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brainโ€™s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brainโ€™s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brainโ€™s Guide to Dealing with Context in Language UnderstandingMLconf
ย 
Transmitaudioprocessing
TransmitaudioprocessingTransmitaudioprocessing
TransmitaudioprocessingDonald Snodgrass
ย 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
ย 
WaveNet
WaveNetWaveNet
WaveNetAbeyHurtis
ย 
AC overview
AC overviewAC overview
AC overviewWarNik Chow
ย 

Similar to how to understand and implement the "WAVENET" (20)

[NUGU CONFERENCE 2019] ํŠธ๋ž™ A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] ํŠธ๋ž™ A-4 : Zero-shot learning for Personalized Text-to-S...[NUGU CONFERENCE 2019] ํŠธ๋ž™ A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] ํŠธ๋ž™ A-4 : Zero-shot learning for Personalized Text-to-S...
ย 
New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...
ย 
New Life for Old Media (NEM presentation)
New Life for Old Media  (NEM presentation)New Life for Old Media  (NEM presentation)
New Life for Old Media (NEM presentation)
ย 
Taiwan course
Taiwan courseTaiwan course
Taiwan course
ย 
Engineering Intelligent NLP Applications Using Deep Learning โ€“ Part 2
Engineering Intelligent NLP Applications Using Deep Learning โ€“ Part 2 Engineering Intelligent NLP Applications Using Deep Learning โ€“ Part 2
Engineering Intelligent NLP Applications Using Deep Learning โ€“ Part 2
ย 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
ย 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
ย 
Multizone reproduction of speech soundfields a perceptually weighted approac...
Multizone reproduction of speech soundfields  a perceptually weighted approac...Multizone reproduction of speech soundfields  a perceptually weighted approac...
Multizone reproduction of speech soundfields a perceptually weighted approac...
ย 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
ย 
Reverb w5 imp_2
Reverb w5 imp_2Reverb w5 imp_2
Reverb w5 imp_2
ย 
The tipping point
The tipping pointThe tipping point
The tipping point
ย 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
ย 
final ppt BATCH 3.pptx
final ppt BATCH 3.pptxfinal ppt BATCH 3.pptx
final ppt BATCH 3.pptx
ย 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
ย 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
ย 
Ted Willke - The Brainโ€™s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brainโ€™s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brainโ€™s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brainโ€™s Guide to Dealing with Context in Language Understanding
ย 
Transmitaudioprocessing
TransmitaudioprocessingTransmitaudioprocessing
Transmitaudioprocessing
ย 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
ย 
WaveNet
WaveNetWaveNet
WaveNet
ย 
AC overview
AC overviewAC overview
AC overview
ย 

More from Adonis Han

LDA : latent Dirichlet Allocation (Fairies NLP Series) - Korean Ver.
LDA : latent Dirichlet Allocation (Fairies NLP Series) - Korean Ver.LDA : latent Dirichlet Allocation (Fairies NLP Series) - Korean Ver.
LDA : latent Dirichlet Allocation (Fairies NLP Series) - Korean Ver.Adonis Han
ย 
(Kor ver.)NLP embedding(word2vec) tutorial & implementation(Tensorflow)
(Kor ver.)NLP embedding(word2vec) tutorial & implementation(Tensorflow)(Kor ver.)NLP embedding(word2vec) tutorial & implementation(Tensorflow)
(Kor ver.)NLP embedding(word2vec) tutorial & implementation(Tensorflow)Adonis Han
ย 
[kor ver.]ํŒจํ„ด์ธ์‹์„ ์œ„ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง Caps-net ๊ตฌํ˜„
[kor ver.]ํŒจํ„ด์ธ์‹์„ ์œ„ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง Caps-net ๊ตฌํ˜„ [kor ver.]ํŒจํ„ด์ธ์‹์„ ์œ„ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง Caps-net ๊ตฌํ˜„
[kor ver.]ํŒจํ„ด์ธ์‹์„ ์œ„ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง Caps-net ๊ตฌํ˜„ Adonis Han
ย 
[kor ver.]Global GO (Bigdata-Cloud computing project - mainly in MVC model2)
[kor ver.]Global GO (Bigdata-Cloud computing project - mainly in MVC model2)[kor ver.]Global GO (Bigdata-Cloud computing project - mainly in MVC model2)
[kor ver.]Global GO (Bigdata-Cloud computing project - mainly in MVC model2)Adonis Han
ย 
House pricing prediction in R(Regression Project)
House pricing prediction in R(Regression Project)House pricing prediction in R(Regression Project)
House pricing prediction in R(Regression Project)Adonis Han
ย 
Facial detection by CNN(Convolution Neural Network) in Kaggle
Facial detection by CNN(Convolution Neural Network) in KaggleFacial detection by CNN(Convolution Neural Network) in Kaggle
Facial detection by CNN(Convolution Neural Network) in KaggleAdonis Han
ย 

More from Adonis Han (6)

LDA : latent Dirichlet Allocation (Fairies NLP Series) - Korean Ver.
LDA : latent Dirichlet Allocation (Fairies NLP Series) - Korean Ver.LDA : latent Dirichlet Allocation (Fairies NLP Series) - Korean Ver.
LDA : latent Dirichlet Allocation (Fairies NLP Series) - Korean Ver.
ย 
(Kor ver.)NLP embedding(word2vec) tutorial & implementation(Tensorflow)
(Kor ver.)NLP embedding(word2vec) tutorial & implementation(Tensorflow)(Kor ver.)NLP embedding(word2vec) tutorial & implementation(Tensorflow)
(Kor ver.)NLP embedding(word2vec) tutorial & implementation(Tensorflow)
ย 
[kor ver.]ํŒจํ„ด์ธ์‹์„ ์œ„ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง Caps-net ๊ตฌํ˜„
[kor ver.]ํŒจํ„ด์ธ์‹์„ ์œ„ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง Caps-net ๊ตฌํ˜„ [kor ver.]ํŒจํ„ด์ธ์‹์„ ์œ„ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง Caps-net ๊ตฌํ˜„
[kor ver.]ํŒจํ„ด์ธ์‹์„ ์œ„ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง Caps-net ๊ตฌํ˜„
ย 
[kor ver.]Global GO (Bigdata-Cloud computing project - mainly in MVC model2)
[kor ver.]Global GO (Bigdata-Cloud computing project - mainly in MVC model2)[kor ver.]Global GO (Bigdata-Cloud computing project - mainly in MVC model2)
[kor ver.]Global GO (Bigdata-Cloud computing project - mainly in MVC model2)
ย 
House pricing prediction in R(Regression Project)
House pricing prediction in R(Regression Project)House pricing prediction in R(Regression Project)
House pricing prediction in R(Regression Project)
ย 
Facial detection by CNN(Convolution Neural Network) in Kaggle
Facial detection by CNN(Convolution Neural Network) in KaggleFacial detection by CNN(Convolution Neural Network) in Kaggle
Facial detection by CNN(Convolution Neural Network) in Kaggle
ย 

Recently uploaded

Call Girls in G.T.B. Nagar (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X79953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
ย 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
ย 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
ย 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
ย 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
ย 
๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...
๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...
๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...vershagrag
ย 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
ย 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...HyderabadDolls
ย 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
ย 
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts ServiceCall Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Servicenishakur201
ย 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
ย 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
ย 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
ย 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
ย 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
ย 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridihmeghakumariji156
ย 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
ย 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
ย 

Recently uploaded (20)

Call Girls in G.T.B. Nagar (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [๐Ÿ”9953056974๐Ÿ”] escort service 24X7
ย 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
ย 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
ย 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
ย 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
ย 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
ย 
๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...
๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...
๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...
ย 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
ย 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
ย 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
ย 
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts ServiceCall Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
ย 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
ย 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
ย 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
ย 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
ย 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
ย 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
ย 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
ย 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
ย 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
ย 

how to understand and implement the "WAVENET"

  • 2. Introduction ๏ต Raw audio generation ๏ต WaveNet: very high temporal resolution (16,000 samples)
  • 3. Contributions ๏ต Generate raw speech signals ๏ต New architectures based on dilated causal convolutions ๏ต Single model can be used to generate different voices, conditioned on a speaker identity
  • 4. Comment of Wavenet WaveNet: deep generative model of audio data that operate directly at the waveform level Dilated convolution โ€ข exponentially increase the receptive field โ€ข to model the long-range temporal dependencies Conditioned model with global or local way
  • 5. Causal convolutions โ€ข Causal convolutions (cannot violate the ordering) โ€ข Same concept of the masked convolution โ€ข No recurrent connections
  • 6. Dilated causal convolutions โ€ข Dilated causal convolutions โ€ข Efficiently increase the receptive field โ€ข 1, 2, 4, โ€ฆ, 512, 1, 2, 4, โ€ฆ, 512, 1, 2, 4, โ€ฆ, 512
  • 7. Softmax distribution โ€ข Raw audio: a sequence of 16-bit int. value / time step โ€ข Softmax layer: output 65,536 probabilities - law companding trasformation โ€ข Quantize it to 256 possible values
  • 8. Softmax distribution conditional probability ๋ฅผ modelingํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ, softmax distributions ์„ ์‚ฌ์šฉํ•จ. Audio ์‹ ํ˜ธ๋Š” 16bit๋กœ quantization ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Œ ์ด๊ฑธ softmax๋กœ ํ‘œํ˜„ํ•˜๋ ค๋ฉด sample๋งˆ๋‹ค 65536 ๊ฐœ์˜ output์ด ํ•„์š”. (๋„ˆ๋ฌด ๋งŽ๋‹ค.) mu-law companding ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉ. ์‚ฌ๋žŒ์˜ ๊ท€๋Š” ์†Œ๋ฆฌ ํฌ๊ธฐ๊ฐ€ ์ž‘์„ ๋•Œ๋Š” ์ž‘์€ ๋ณ€ํ™”์—๋„ ๋ฏผ๊ฐ ์†Œ๋ฆฌ ํฌ๊ธฐ๊ฐ€ ํด ๋•Œ๋Š” ๋น„๊ต์  ํฐ ๋ณ€ํ™”์—๋„ ๋‘”๊ฐํ•จ. quantization์„ nonlinearํ•˜๊ฒŒ ํ•ด์คŒ. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด 8bit(256 outputs)๋กœ๋„ ๊ฝค ์ข‹์€ ์„ฑ๋Šฅ ์œผ๋กœ encoding/decoding์ด ๊ฐ€๋Šฅ
  • 10.
  • 12.
  • 13.
  • 14. Joint probability โ€ข Waveform โ€ข Conditional probability distribution is modelled by a stack of convolutional layers (similarly to PixelCNN) โ€ข No pooling โ€ข Dimensionality of input = Dim. of output โ€ข Output: softmax layer p(x)
  • 15. Dilation layer Background The key application the dilated convolution authors have in mind is dense prediction: vision applications where the predicted object that has similar size and structure to the input image. For example, semantic segmentation with one label per pixel; image super-resolution, denoising, demosaicing, bottom-up saliency, keypoint detection, etc.
  • 16. Dilation layer In many such applications one wants to integrate information from different spatial scales and balance two properties: 1.local, pixel-level accuracy, such as precise detection of edges, and 2.integrating knowledge of the wider, global context To address this problem, people often use some kind of multi-scale convolutional neural networks, which often relies on spatial pooling. Instead the authors here propose using layers dilated convolutions, which allow us to address the multi-scale problem efficiently without increasing the number of parameters too much.
  • 17. Dilation layer In the visual system, receptive fields are volumes in visual space dilated conv = atrous conv (a trous en francais) receptive field = center + surround ๋นจ๊ฐ„์  ์ฃผ์œ„๋กœ์˜ ํ”ฝ์…€๋“ค๋งŒ ์‚ฌ์šฉํ•˜์—ฌ conv๋ฅผ ์ˆ˜ํ–‰. ํ•ด์ƒ๋„์˜ ์†์‹ค์—†์ด receptive field ์˜ ํฌ๊ธฐ๋ฅผ ํ™•์žฅํ•  ์ˆ˜ ์žˆ์Œ. atrous conv ๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์ด์œ ๋Š” ์ „์ฒด receptive field ์—์„œ ๋นจ๊ฐ„์ƒ‰ ์ ์˜ ์œ„์น˜๋งŒ ๊ณ„์ˆ˜๊ฐ€ ์กด์žฌํ•˜๊ณ  ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ 0์œผ๋กœ ์ฑ„์›Œ์ง.
  • 19. Dilation layer ์žฅ์  1. ํฐ receptive field ๋ฅผ ์ทจํ•˜๋ ค๋ฉด, ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„์•ผ ํ•˜์ง€๋งŒ, dilated conv๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด receptive field ๋Š” ์ปค์ง€์ง€๋งŒ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐœ์ˆ˜๋Š” ๋Š˜์–ด๋‚˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์—ฐ์‚ฐ๋Ÿ‰ ๊ด€์ ์—์„œ ํƒ์›”ํ•œ ํšจ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Œ. 2. receptive field๊ฐ€ 7 x7 ์ด๊ธฐ ๋•Œ๋ฌธ์— normal filter ๋กœ ๊ตฌํ˜„์„ ํ•˜๋ฉด ํ•„ํ„ฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐœ์ˆ˜๋Š” 49์ด์ง€๋งŒ dilated conv ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด 49๊ฐœ์ค‘ ๋นจ๊ฐ„์ ์— ํ•ด๋‹นํ•˜๋Š” ๋ถ€๋ถ„์—๋งŒ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์ด๊ณ  ๋‚˜๋จธ์ง€ 40๊ฐœ ์ •๋„๋Š” ๋ชจ๋‘ 0 ์œผ๋กœ ์ฑ„์›Œ์ ธ ์—ฐ์‚ฐ๋Ÿ‰ ๋ถ€๋‹ด์ด 3x3ํ•„ํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š”๊ฒƒ๊ณผ ๊ฐ™์Œ. 3. receptive field ์˜ ํฌ๊ธฐ๊ฐ€ ์ปค์ ธ, dilation ๊ณ„์ˆ˜๋ฅผ ์กฐ์ •ํ•˜๋ฉด ๋‹ค์–‘ํ•œ scale์— ๋Œ€ํ•œ ๋Œ€์‘์ด ๊ฐ€๋Šฅํ•ด์ง„๋‹ค.(๋‹ค์–‘ํ•œ scale์—์„œ์˜ ์ •๋ณด๋ฅผ ๋„์ง‘์–ด๋‚ด๋ ค๋ฉด ๋„“์€ receptive field ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์–ด์•ผ ํ•˜๋Š”๋ฐ dilated conv๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ณ„ ์–ด๋ ค์›€์ด ์—†๋‹ค. ->๊ธฐ์กด cnn ์—์„œ๋Š” receptive field ํ™•์žฅ์„ ์œ„ํ•ด pooling layer๋ฅผ ํ†ตํ•ด ํฌ๊ธฐ๋ฅผ ์ค„์ธ ํ›„ conv ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ์‹์œผ๋กœ ํ–ˆ๋‹ค.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. EXAMPLE โ€“ MUSIC โ€ข MagnaTagATune dataset: 200 hours, each 29-second clip is annotated with tags (genre, instrumentation, tempo, volume and mood of the music) โ€ข YouTube piano dataset: 60 hours of solo piano music โ€ข Enlarging the receptive field was crucial to obtain samples that sounded music โ€ข Conditional music models: generate music given a set of tags specifying e.g. genre or instruments
  • 25. EXAMPLE โ€“ Multi-speaker speech generation Multi-speaker speech generation โ€ข English multi-speaker corpus from CSTR voice cloning toolkit(VCTK): 44 hours from 109 different speakers โ€ข Not conditioned on text โ€ข generates non-existent but human language-like words in a smooth way with realistic sounding intonaitons โ€ข The lack of log range coherence โ€ข limited receptive filed size (about 300 ms) โ€ข Powerful model to capture the characteristics of all 109 speakers
  • 26. EXAMPLE - Text-To-Speech Text-To-Speech โ€ข Googleโ€™s TTS dataset (Eng.: 24.6 h, Mandarin: 34.8 h) โ€ข Locally conditioned on linguistic features which were derived from input texts โ€ข Evaluation โ€ข subjective paired comparison tests: choose one they preferred โ€ข mean opinion score (MOS): (1: bad, 2: poor, 3: fair, 4: good, 5: excellent)
  • 27. (1: bad, 2: poor, 3: fair, 4: good, 5: excellent)
  • 28.
  • 29. Conidtional WaveNets (cont.) โ€ข Global conditioning ๏‚ง h: the output dist. across all timesteps โ€ข Local conditioning ๏‚ง second timeseries : lower sampling frequency than raw data ๏‚ง transform using transposed conv.
  • 30. + Global Conditioning is characterized by a single latent representation h that influences the output distribution across all timesteps + For Local Conditioning, we have a second timeseries h(t), possibly with a lower sampling frequency
  • 37. Ref โ€ข http://deepsound.io/wavenet_first_try.html โ€ข --keras โ€ข https://github.com/basveeling/wavenet โ€ข --tensorflow โ€ข https://github.com/ibab/tensorflow-wavenet โ€ข https://tensorflow.blog/2016/09/09/wavenet-deepminds-new-model-for-audio/ โ€ข https://deepmind.com/blog/wavenet-generative-model-raw-audio/ โ€ข https://www.youtube.com/watch?v=nsrSrYtKkT8 โ€ข --github โ€ข https://github.com/usernaamee/keras-wavenet โ€ข https://github.com/munich-ai-labs/keras2-wavenet โ€ข https://github.com/rampage644/wavenet โ€ข http://www.modulabs.co.kr/DeepLAB_Paper/16552(๋ชจ๋‘์˜ ์—ฐ๊ตฌ์†Œ) [PR12] WaveNet - A Generative Model for Raw Audio http://www.modulabs.co.kr/DeepLAB_Paper/15027(๋ชจ๋‘์˜ ์—ฐ๊ตฌ์†Œ)

Editor's Notes

  1. PixelRNN: thousands of random variables (e.g. 64x64) Neural autoregressive generative models that model complex distributions (ex. images and text)
  2. ์ด ๋…ผ๋ฌธ์—์„œ conditional probability ๋ฅผ modelingํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ, softmax distributions ์„ ์‚ฌ์šฉํ•จ. โ€ข Audio ์‹ ํ˜ธ๋Š” 16bit๋กœ quantization ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Œ ์ด๊ฑธ softmax๋กœ ํ‘œํ˜„ํ•˜๋ ค๋ฉด sample๋งˆ๋‹ค 65536 ๊ฐœ์˜ output์ด ํ•„์š”. (๋„ˆ๋ฌด ๋งŽ๋‹ค.) โ€ข mu-law companding ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉ. ์‚ฌ๋žŒ์˜ ๊ท€๋Š” ์†Œ๋ฆฌ ํฌ๊ธฐ๊ฐ€ ์ž‘์„ ๋•Œ๋Š” ์ž‘์€ ๋ณ€ํ™”์—๋„ ๋ฏผ๊ฐ ์†Œ๋ฆฌ ํฌ๊ธฐ๊ฐ€ ํด ๋•Œ๋Š” ๋น„๊ต์  ํฐ ๋ณ€ํ™”์—๋„ ๋‘”๊ฐํ•จ. ๏ƒ  quantization์„ nonlinearํ•˜๊ฒŒ ํ•ด์คŒ. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด 8bit(256 outputs)๋กœ๋„ ๊ฝค ์ข‹์€ ์„ฑ๋Šฅ ์œผ๋กœ encoding/decoding์ด ๊ฐ€๋Šฅ
  3. http://www.inference.vc/dilated-convolutions-and-kronecker-factorisation/