SlideShare a Scribd company logo
1 of 21
Download to read offline
Neural Speech Synthesis with
WaveNet and WaveNet 2
Grant Reaber
Head of Research, Respeecher
gr@respeecher.com
1
Why WaveNet?
2
WaveNet, and similar models like SampleRNN, are the only machine
learning models that directly generate audio waveform (PCM).
The quality they produce is unmatched (best quality text-to-speech of any
system)
These models are an essential piece of any truly state-of-the-art system that
needs to generate sound (though impressive results are also possible with
other techniques)
These systems can be used to produce various kinds of audio, but our focus
will be on speech
Autoregressive Models
3
WaveNet (and SampleRNN) learn the distribution of each audio sample
conditional on all that have come before.
In symbols, the joint probability of a waveform is modeled asx1
,…,xT
TY
t=1
p(xt | x1, . . . , xt 1)
<latexit sha1_base64="jjT5qDYzodMoP1cTJ30vZAWp+LA=">AAACE3icbVA9SwNBEJ3z2/gVtbRZFEFRw52NWgiiIJYREhVy8djb2+iSvdtjd04SjvwIG2v/hY2Fiq2Nnf/APyG4SSz8ejDweG+GmXlhKoVB131zBgaHhkdGx8YLE5NT0zPF2bkTozLNeJUpqfRZSA2XIuFVFCj5Wao5jUPJT8PmQdc/veLaCJVUsJ3yekwvEtEQjKKVguKan2oVBTnuep3zCklXWgH6sYhIK/DWfRkpNOsta294ndWguOSW3B7IX+J9kaW9w/ePWwAoB8VXP1Isi3mCTFJjap6bYj2nGgWTvFPwM8NTypr0gtcsTWjMTT3vPdUhy1aJSENpWwmSnvp9IqexMe04tJ0xxUvz2+uK/3m1DBvb9VwkaYY8Yf1FjUwSVKSbEImE5gxl2xLKtLC3EnZJNWVocyzYELzfL/8l1c3STsk7tmHsQx9jsACLsAIebMEeHEEZqsDgGu7gAR6dG+feeXKe+60DztfMPPyA8/IJ0fOglQ==</latexit><latexit sha1_base64="O/hgRqaq5ZrU2VbfR7C6OQ4F0Oc=">AAACE3icbVDLSsNAFJ3Ud31FXboZFEGxlsSNuhBEQVwqWC00NUwm0zp0kgkzN9IS6j+4cetnuHGh4taNrvwDf0Jw+lho9cCFwzn3cu89QSK4Bsf5sHJDwyOjY+MT+cmp6ZlZe27+TMtUUVaiUkhVDohmgsesBBwEKyeKkSgQ7DxoHHT88yumNJfxKbQSVo1IPeY1TgkYybfXvUTJ0M9g121fnOJktemDF/EQN3234IlQgi40jb3httd8e9kpOl3gv8Ttk+W9w8+vu3d5fezbb14oaRqxGKggWldcJ4FqRhRwKlg776WaJYQ2SJ1VDI1JxHQ16z7VxitGCXFNKlMx4K76cyIjkdatKDCdEYFLPeh1xP+8Sgq17WrG4yQFFtPeoloqMEjcSQiHXDEKomUIoYqbWzG9JIpQMDnmTQju4Mt/SWmzuFN0T0wY+6iHcbSIltAqctEW2kNH6BiVEEU36B49oifr1nqwnq2XXmvO6s8soF+wXr8BJd6iVA==</latexit><latexit sha1_base64="O/hgRqaq5ZrU2VbfR7C6OQ4F0Oc=">AAACE3icbVDLSsNAFJ3Ud31FXboZFEGxlsSNuhBEQVwqWC00NUwm0zp0kgkzN9IS6j+4cetnuHGh4taNrvwDf0Jw+lho9cCFwzn3cu89QSK4Bsf5sHJDwyOjY+MT+cmp6ZlZe27+TMtUUVaiUkhVDohmgsesBBwEKyeKkSgQ7DxoHHT88yumNJfxKbQSVo1IPeY1TgkYybfXvUTJ0M9g121fnOJktemDF/EQN3234IlQgi40jb3httd8e9kpOl3gv8Ttk+W9w8+vu3d5fezbb14oaRqxGKggWldcJ4FqRhRwKlg776WaJYQ2SJ1VDI1JxHQ16z7VxitGCXFNKlMx4K76cyIjkdatKDCdEYFLPeh1xP+8Sgq17WrG4yQFFtPeoloqMEjcSQiHXDEKomUIoYqbWzG9JIpQMDnmTQju4Mt/SWmzuFN0T0wY+6iHcbSIltAqctEW2kNH6BiVEEU36B49oifr1nqwnq2XXmvO6s8soF+wXr8BJd6iVA==</latexit><latexit sha1_base64="5a24bqAnQxuaINJOQ0ieH2NGml4=">AAACE3icbVA9SwNBEN2LXzF+RS1tFoMQMYY7G7UQgjaWERITyMVjb2+TLNm7PXbnJOHIj7Dxr9hYqNja2Plv3HwUGn0w8Hhvhpl5fiy4Btv+sjILi0vLK9nV3Nr6xuZWfnvnVstEUVanUkjV9IlmgkesDhwEa8aKkdAXrOH3r8Z+454pzWVUg2HM2iHpRrzDKQEjefkjN1Yy8FK4cEZ3NRwXBx64IQ/wwHNKrggk6NLA2MfO6NDLF+yyPQH+S5wZKaAZql7+0w0kTUIWARVE65Zjx9BOiQJOBRvl3ESzmNA+6bKWoREJmW6nk6dG+MAoAe5IZSoCPFF/TqQk1HoY+qYzJNDT895Y/M9rJdA5a6c8ihNgEZ0u6iQCg8TjhHDAFaMghoYQqri5FdMeUYSCyTFnQnDmX/5L6ifl87JzYxcql7M0smgP7aMictApqqBrVEV1RNEDekIv6NV6tJ6tN+t92pqxZjO76Besj2+5W52K</latexit>
Training and Generation
4
In training, we learn to predict each sample in a piece of audio given all that
have come before.
Because WaveNet is convolutional, we can do this in parallel for a sequence
of samples. (With RNNs, you could not do this and would use teacher
forcing.)
We generate audio sample by sample. Suppose we have generated the first
n samples. Then we draw from the predicted distribution for the n+1 th
sample conditional on these n samples. Now we can compute a predicated
distribution for the n+2 th sample conditional on these n+1 samples.
We can’t do this in parallel, and because of this the original WaveNet
required minutes to generate a second of audio and was impractical for
most applications.
Sampling from the distribution works much better than using beam search
to find a high likelihood sequence as is commonly done in machine
translation.
Conditional Autoregressive Models
5
Running generation by itself produces a kind of babbling.
For most applications, for instance text-to-speech, we want to control the
generated audio.
This can be done by training a conditioned model, where we condition on
some linguistic features derived from input texts. Then conditions can be
supplied in generation to generate what speech we like.
Can also supply “global” (not changing in time) conditions for things like
speaker identity to produce speech from many different speakers with one
model.
Modeling a Sample
6
WaveNet uses a 256 bin softmax to represent audio (8 bit sample depth,
using “mu encoding” to have smaller bins near zero).
Training is slow at first, and it doesn’t scale to higher sample depth.
So WaveNet 2 uses a discretized mixture of logistic distributions instead (10
components according to Tacotron 2 paper).
WaveNet 2 is functionally identical to WaveNet except for this change,
modeling 24kHz audio instead of 16kHz, and one hyperparameter change to
increase the receptive field, which we will mention later.
Why WaveNet 2?
7
Although WaveNet 2 does make some very minor tweaks to the
architecture, which we have just discussed, by far its main contribution is a
technique to speed up generation by about 3000x.
20x realtime generation
Interesting technique to do this: first train a regular WaveNet, then use it to
train a model that produces audio in parallel rather than sample-by-sample;
The second,“distilled” model produces output that is just as high quality as
the original model’s output.
Modeling Speech (and other audio)
8
Log mel-scale magnitude spectrograms seem to compactly represent all the
information necessary to represent speech (cf.Tacotron 2)
80 channels x 80Hz (vs. 1 channel x 16-44kHz for PCM)
Computed algorithmically from PCM
Lossy, especially because phase information is discarded
Can be inverted with Griffin-Lim
But WaveNet does a better job. Used in Tacotron 2, currently the best
text-to-speech system
Note that when WaveNet is inverting generated spectrogram, as with
Tacotron 2, it can actually learn to correct errors in that spectrogram. (The
loss function does not enforce that the spectrogram inversion is correct,
only that the whole transformation is.)
Convolutions
9
Convolutions are parallelizable in training, and they respect the structure of
the sequence.
But they have two problems.
For autoregression, it is critical not to allow the model to see input from
the future because it won’t be available at generation time (and it would be
trivial for the model to predict the next sample if it could see it!)
The receptive field grows slowly with ordinary convolutions: after n
convolutions of width w, it is only on the order of nw. If, say, the audio is
16kHz, and we want 60ms of context, we need a receptive field of about
1000 samples.
Dilated Causal Convolutions
10
The looking-into-the-future problem is fixed by just shifting the windows of
the convolutions so they only see the past. (The fancy term for this is
“causal” convolution.)
The small receptive field problem is fixed by using discontinuous windows.
If we give every nth sample as input to the convolutional kernel, we say that
the convolution has a dilation factor of n (so ordinary convolutions have a
dilation factor of 1).
In WaveNet, successive convolutional layers have exponentially growing
dilation factors: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512.
So with 10 layers and window size 2, we can get a receptive field of 1024
samples.
Convolution Hyperparameters
11
WaveNet used 30 layers of convolutions, three stacks of 10 convolutions
with dilation going from 1 to 512.
WaveNet 2 uses the same hyperparameters except that it uses a window
size of 3 instead of 2. (This is the “one hyperparameter change” mentioned
earlier.)
(Although DeepMind doesn’t release code and is not entirely explicit about
about their hyperparameters, so this isn’t certain.)
The Rest of the Architecture
12
(Actually, there should be separate convs
for the residual and skip connections and for
The tanh and sigmoid gates.)
Best guess hyperparameters
13
The paper is scant on details about hyperparameters, but here are best
guesses based on a talk by an insider (see https://github.com/ibab/
tensorflow-wavenet/issues/227).
256 skip channels
512 dilation channels
512 residual channels
Conditioning
14
Global and local conditions are projected to the number of dilation
channels and added to the outputs of the filter and gate convolutions of the
gated activation units (before activation).
Separate projections for filter and gate, global and local, and for each layer
(so 30 layer WaveNet with both types of conditioning uses 120 projection
matrices).
Local condition is often at a lower time frequency than the WaveNet. If so,
it is upsampled to the WaveNet frequency using transposed convolutions.
WaveNet 2
15
The rest of the talk will be about how WaveNet 2 is able to generate
samples 3000x faster than the original WaveNet.
The basic idea is to first train a WaveNet and then use it to train a student
network.
The student network takes noise z1,…,zT, drawn from a standard logistic
distribution, as input, and outputs, for each timestep t, parameters st and μt,
which are computed only from z1,…,zt-1, of a logistic distribution from
which xt will be drawn. The draw is controlled by zt: xt = ztst + μt.
1
1 + e
x µ
s<latexit sha1_base64="d5LgZXyJS3q66smCoxfqBk26qkU=">AAACCHicbVC7TsMwFL0pr1JeAUYWiwoJCbVKWICtwMKEikRopTZUjuu0Vp2HbAdRRVlZ+BUWBkCsfAIbA/+C23SAliNd6fice+V7jxdzJpVlfRmFufmFxaXicmlldW19w9zcupFRIgh1SMQj0fSwpJyF1FFMcdqMBcWBx2nDG5yP/MYdFZJF4bUaxtQNcC9kPiNYaaljorYvMEntLLUP6G1ayZ/3lXaQZKnMsqxjlq2qNQaaJfaElGun35c1AKh3zM92NyJJQENFOJayZVuxclMsFCOcZqV2ImmMyQD3aEvTEAdUuun4kgztaaWL/EjoChUaq78nUhxIOQw83Rlg1ZfT3kj8z2slyj92UxbGiaIhyT/yE45UhEaxoC4TlCg+1AQTwfSuiPSxzkLp8Eo6BHv65FniHFZPqvaVDuMMchRhB3ZhH2w4ghpcQB0cIPAAT/ACr8aj8Wy8Ge95a8GYzGzDHxgfP/PFnEI=</latexit><latexit sha1_base64="/JGCDqEL6VhIU7tOrrLdopuLEOE=">AAACCHicbVC7TsMwFHXKq5RXgJHFokJCQq0SFmArj4EJFYnQSk2pHNdprTpOZDuIKMrKwq+wdADEyiewMfAvuE0HaDnSlY7PuVe+93gRo1JZ1pdRmJtfWFwqLpdWVtfWN8zNrVsZxgITB4csFE0PScIoJ46iipFmJAgKPEYa3uB85DfuiZA05DcqiUg7QD1OfYqR0lLHhK4vEE7tLLUPyF1ayZ8PFTeIs1RmWdYxy1bVGgPOEntCyrXT76sLa5jUO+an2w1xHBCuMENStmwrUu0UCUUxI1nJjSWJEB6gHmlpylFAZDsdX5LBPa10oR8KXVzBsfp7IkWBlEng6c4Aqb6c9kbif14rVv5xO6U8ihXhOP/IjxlUIRzFArtUEKxYognCgupdIe4jnYXS4ZV0CPb0ybPEOayeVO1rHcYZyFEEO2AX7AMbHIEauAR14AAMHsEzeAGvxpMxNN6M97y1YExmtsEfGB8/xdidoA==</latexit><latexit sha1_base64="/JGCDqEL6VhIU7tOrrLdopuLEOE=">AAACCHicbVC7TsMwFHXKq5RXgJHFokJCQq0SFmArj4EJFYnQSk2pHNdprTpOZDuIKMrKwq+wdADEyiewMfAvuE0HaDnSlY7PuVe+93gRo1JZ1pdRmJtfWFwqLpdWVtfWN8zNrVsZxgITB4csFE0PScIoJ46iipFmJAgKPEYa3uB85DfuiZA05DcqiUg7QD1OfYqR0lLHhK4vEE7tLLUPyF1ayZ8PFTeIs1RmWdYxy1bVGgPOEntCyrXT76sLa5jUO+an2w1xHBCuMENStmwrUu0UCUUxI1nJjSWJEB6gHmlpylFAZDsdX5LBPa10oR8KXVzBsfp7IkWBlEng6c4Aqb6c9kbif14rVv5xO6U8ihXhOP/IjxlUIRzFArtUEKxYognCgupdIe4jnYXS4ZV0CPb0ybPEOayeVO1rHcYZyFEEO2AX7AMbHIEauAR14AAMHsEzeAGvxpMxNN6M97y1YExmtsEfGB8/xdidoA==</latexit><latexit sha1_base64="QpJ0MVI6vGf/03JZCUsRA0790xY=">AAACCHicbVC7TsMwFHV4lvIKMLJYVEhIqFXCAmwVLIxFIrRSGyrHdVqrthPZDqKysrLwKywMgFj5BDb+BrfNAC1HutLxOffK954oZVRpz/t2FhaXlldWS2vl9Y3NrW13Z/dWJZnEJMAJS2QrQoowKkigqWaklUqCeMRIMxpejv3mPZGKJuJGj1ISctQXNKYYaSt1XdiJJcLGz41/TO5Mdfp8qHZ4lhuV53nXrXg1bwI4T/yCVECBRtf96vQSnHEiNGZIqbbvpTo0SGqKGcnLnUyRFOEh6pO2pQJxokIzuSSHh1bpwTiRtoSGE/X3hEFcqRGPbCdHeqBmvbH4n9fOdHwWGirSTBOBpx/FGYM6geNYYI9KgjUbWYKwpHZXiAfIZqFteGUbgj978jwJTmrnNf/aq9QvijRKYB8cgCPgg1NQB1egAQKAwSN4Bq/gzXlyXpx352PauuAUM3vgD5zPH1yomlY=</latexit>
The Primary Training Objective
16
The student is trained to minimize the KL divergence from it to the teacher
WaveNet, DKL(PS||PT) = H(PS, PT) - H(PS).
It is trying to generate samples that the teacher WaveNet considers likely,
but at the same time it is trying to maximize its own entropy (so it will not
collapse to a mode of the teacher).
For reasons of time, I will skip the procedure for estimating cross-entropy.
H(PS) can be simply estimated. Since the entropy of a logistic distribution
with scale parameter s is ln s + 2,
H(PS) = Ez⇠L(0,1)
 TX
t=1
ln st + 2T
<latexit sha1_base64="n9a6DzWPfVDCkFArHAIROemzTO4=">AAACKHicbVA9axtBEJ2z8yErH5bt0s0SE5BIEHdqkhQCYROjIoVMpFiguxx7q5W8eHfv2J0zyIf+i6s0/ituErCDW1f+E4asJBeJlAcDj/dmmJmXZFJY9P1bb239ydNnz0sb5RcvX73erGxtf7NpbhjvsVSmpp9Qy6XQvIcCJe9nhlOVSH6cnB7M/OMzbqxIdRcnGY8UHWsxEoyik+JKq13txF9rpEk+x8U5Ca1Q5EvVfx/UpmEixmM5CG2u4gKbwfR7N5Sa2BjnjonIO9LoxpU9v+7PQVZJ8Ej2Wof3DxcA0Ikrv8JhynLFNTJJrR0EfoZRQQ0KJvm0HOaWZ5Sd0jEfOKqp4jYq5p9OyVunDMkoNa40krn690RBlbUTlbhORfHELnsz8X/eIMfRx6gQOsuRa7ZYNMolwZTMYiNDYThDOXGEMiPcrYSdUEMZunDLLoRg+eVV0mvUP9WDIxfGPixQgl14A1UI4AO0oA0d6AGDH3AF13DjXXo/vd/e7aJ1zXuc2YF/4N39AWvzpuw=</latexit><latexit sha1_base64="WQm3saTGJeTN71Eq/L6oWwjkw0M=">AAACKHicbVA9SwNBEN3z2/gVtbRZFCGihDsbtRCColhYRExUyJ3H3mYTF3f3jt05IR7xr9jZ+FdsVFRsrfwTgpvEwq8HA4/3ZpiZFyWCG3DdV6evf2BwaHhkNDc2PjE5lZ+eOTJxqimr0ljE+iQihgmuWBU4CHaSaEZkJNhxdL7d8Y8vmDY8VhVoJSyQpKl4g1MCVgrzpb1COTxcwpt4J8wusW+4xPsFd8VbavsRbzZFzTepDDPY9NqnFV8obELoOjrAy3i1EuYX3KLbBf5LvC+yUNp9/7h+jK/KYf7Br8c0lUwBFcSYmucmEGREA6eCtXN+alhC6Dlpspqlikhmgqz7aRsvWqWOG7G2pQB31e8TGZHGtGRkOyWBM/Pb64j/ebUUGutBxlWSAlO0t6iRCgwx7sSG61wzCqJlCaGa21sxPSOaULDh5mwI3u+X/5LqanGj6B3YMLZQDyNoDs2jAvLQGiqhPVRGVUTRDbpDT+jZuXXunRfntdfa53zNzKIfcN4+Ab/PqKs=</latexit><latexit sha1_base64="WQm3saTGJeTN71Eq/L6oWwjkw0M=">AAACKHicbVA9SwNBEN3z2/gVtbRZFCGihDsbtRCColhYRExUyJ3H3mYTF3f3jt05IR7xr9jZ+FdsVFRsrfwTgpvEwq8HA4/3ZpiZFyWCG3DdV6evf2BwaHhkNDc2PjE5lZ+eOTJxqimr0ljE+iQihgmuWBU4CHaSaEZkJNhxdL7d8Y8vmDY8VhVoJSyQpKl4g1MCVgrzpb1COTxcwpt4J8wusW+4xPsFd8VbavsRbzZFzTepDDPY9NqnFV8obELoOjrAy3i1EuYX3KLbBf5LvC+yUNp9/7h+jK/KYf7Br8c0lUwBFcSYmucmEGREA6eCtXN+alhC6Dlpspqlikhmgqz7aRsvWqWOG7G2pQB31e8TGZHGtGRkOyWBM/Pb64j/ebUUGutBxlWSAlO0t6iRCgwx7sSG61wzCqJlCaGa21sxPSOaULDh5mwI3u+X/5LqanGj6B3YMLZQDyNoDs2jAvLQGiqhPVRGVUTRDbpDT+jZuXXunRfntdfa53zNzKIfcN4+Ab/PqKs=</latexit><latexit sha1_base64="09qO86Mb1t23HrdJ4F9yFyQIxqc=">AAACKHicbVDLSgNBEJz1bXxFPXoZDEJECbu5qAchKIIHDxETDWTXZXYySYbMzC4zvUJc8jte/BUvCipe/RInj4OvgoaiqpvurigR3IDrfjhT0zOzc/MLi7ml5ZXVtfz6xrWJU01ZncYi1o2IGCa4YnXgIFgj0YzISLCbqHc69G/umDY8VjXoJyyQpKN4m1MCVgrzlfNiNbzaxcf4LMzusW+4xBdFd9/bHfgR73RE0zepDDM49ga3NV8obEIYOTrAe7hcC/MFt+SOgP8Sb0IKaIJqmH/xWzFNJVNABTGm6bkJBBnRwKlgg5yfGpYQ2iMd1rRUEclMkI0+HeAdq7RwO9a2FOCR+n0iI9KYvoxspyTQNb+9ofif10yhfRhkXCUpMEXHi9qpwBDjYWy4xTWjIPqWEKq5vRXTLtGEgg03Z0Pwfr/8l9TLpaOSd+kWKieTNBbQFtpGReShA1RB56iK6oiiB/SEXtGb8+g8O+/Ox7h1ypnMbKIfcD6/AFNbo+E=</latexit>
Power Loss
17
It turns out that minimizing the KL-distance alone produces a student that
just produces low volume audio not resembling speech (perhaps resembling
whispering). This may be because whispering has a lot of entropy compared
to speech.
It is necessary to use an additional loss term that ensures that the power in
different frequency bands of the generated speech are on average about the
same as in human speech.
The loss is where
It is really only average power that matters, not having the right power over
time since averaging over time before computing the difference in the loss
didn’t make any noticeable difference to the results.
k (g(z, c)) (y)k2
<latexit sha1_base64="4uHXZLHW8AaxNhwZXC2ykIejVMI=">AAACGHicbZC7TgJBFIbP4g3xhlraTCAmEJXs0qgd0cYSExESFsnsMAsTZy+ZmTXBZV/DxsYHsbFQY0vn2zgsFAr+ySRf/v+cnDnHCTmTyjS/jczS8srqWnY9t7G5tb2T3927lUEkCG2QgAei5WBJOfNpQzHFaSsUFHsOp03n/nKSNx+okCzwb9QwpB0P933mMoKVtrp50x4hOxywUr8U246LHpNjlAJJymV0Ms1SY5iU7dFdtZsvmhUzFVoEawbFWsE+egGAejc/tnsBiTzqK8KxlG3LDFUnxkIxwmmSsyNJQ0zucZ+2NfrYo7ITp5sl6FA7PeQGQj9fodT93RFjT8qh5+hKD6uBnM8m5n9ZO1LuWSdmfhgp6pPpIDfiSAVocibUY4ISxYcaMBFM/xWRARaYKH3MnD6CNb/yIjSqlfOKdW0VaxcwVRYOoAAlsOAUanAFdWgAgSd4hXf4MJ6NN+PT+JqWZoxZzz78kTH+ATB1n5w=</latexit><latexit sha1_base64="upSNlfKN8Pc6Xq45aKk2z984m0I=">AAACGHicbZDLTsJAFIaneEO8oS7dTCAmEJW0bNRdoxuXmIiQ0EqmwxQmTC+ZmZrU0rcwbnwVNy7UuGXH2zgUFgr+ySRf/v+cnDnHCRkVUtcnWm5ldW19I79Z2Nre2d0r7h/ciyDimDRxwALedpAgjPqkKalkpB1ygjyHkZYzvJ7mrUfCBQ38OxmHxPZQ36cuxUgqq1vUrRG0wgGt9CuJ5bjwKT2FGeC0WoVnsywz4rRqjR7q3WJZr+mZ4DIYcyibJevkeWLGjW5xbPUCHHnEl5ghITqGHko7QVxSzEhasCJBQoSHqE86Cn3kEWEn2WYpPFZOD7oBV8+XMHN/dyTIEyL2HFXpITkQi9nU/C/rRNK9sBPqh5EkPp4NciMGZQCnZ4I9ygmWLFaAMKfqrxAPEEdYqmMW1BGMxZWXoVmvXdaMW6NsXoGZ8uAIlEAFGOAcmOAGNEATYPAC3sAH+NRetXftS/uelea0ec8h+CNt/AM4WaEi</latexit><latexit sha1_base64="upSNlfKN8Pc6Xq45aKk2z984m0I=">AAACGHicbZDLTsJAFIaneEO8oS7dTCAmEJW0bNRdoxuXmIiQ0EqmwxQmTC+ZmZrU0rcwbnwVNy7UuGXH2zgUFgr+ySRf/v+cnDnHCRkVUtcnWm5ldW19I79Z2Nre2d0r7h/ciyDimDRxwALedpAgjPqkKalkpB1ygjyHkZYzvJ7mrUfCBQ38OxmHxPZQ36cuxUgqq1vUrRG0wgGt9CuJ5bjwKT2FGeC0WoVnsywz4rRqjR7q3WJZr+mZ4DIYcyibJevkeWLGjW5xbPUCHHnEl5ghITqGHko7QVxSzEhasCJBQoSHqE86Cn3kEWEn2WYpPFZOD7oBV8+XMHN/dyTIEyL2HFXpITkQi9nU/C/rRNK9sBPqh5EkPp4NciMGZQCnZ4I9ygmWLFaAMKfqrxAPEEdYqmMW1BGMxZWXoVmvXdaMW6NsXoGZ8uAIlEAFGOAcmOAGNEATYPAC3sAH+NRetXftS/uelea0ec8h+CNt/AM4WaEi</latexit><latexit sha1_base64="puEWTBR6uMeAwhWUjXZsLoFGP8c=">AAACGHicbZC7TsMwFIadcivlFmBksaiQWgmqpAuwVbAwFonQSk2oHNdprToX2Q5SSPMaLLwKCwMg1m68DW6aAVqOZOnT/5+j4/O7EaNCGsa3VlpZXVvfKG9WtrZ3dvf0/YN7EcYcEwuHLORdFwnCaEAsSSUj3YgT5LuMdNzx9czvPBIuaBjcySQijo+GAfUoRlJJfd2wJ9CORrQ2rKW268Gn7BTmgLN6HZ7NvVxIsro9eWj29arRMPKCy2AWUAVFtfv61B6EOPZJIDFDQvRMI5JOirikmJGsYseCRAiP0ZD0FAbIJ8JJ88syeKKUAfRCrl4gYa7+nkiRL0Tiu6rTR3IkFr2Z+J/Xi6V34aQ0iGJJAjxf5MUMyhDOYoIDygmWLFGAMKfqrxCPEEdYqjArKgRz8eRlsJqNy4Z5a1ZbV0UaZXAEjkENmOActMANaAMLYPAMXsE7+NBetDftU/uat5a0YuYQ/Clt+gMff54T</latexit>
(x) = |STFT(x)|2
<latexit sha1_base64="lPoRiuCH5o7K9y8MaXsv8n948mw=">AAACEHicbVDLSgNBEOyNrxhfUY9ehgQhIoTdXNSDEBTEY8TEBLIxzE5mkyGzD2ZmJWGTX/DiD/gRXjyoePXozb9x8kA0saChqOqmu8sJOZPKNL+MxMLi0vJKcjW1tr6xuZXe3rmRQSQIrZCAB6LmYEk582lFMcVpLRQUew6nVad7PvKrd1RIFvhl1Q9pw8Ntn7mMYKWlZjpnhx2Wi23HRb3hATpFA9tzgl58Xb4oD3/0wW2hmc6aeXMMNE+sKckWM/bhIwCUmulPuxWQyKO+IhxLWbfMUDViLBQjnA5TdiRpiEkXt2ldUx97VDbi8UdDtK+VFnIDoctXaKz+noixJ2Xfc3Snh1VHznoj8T+vHin3uBEzP4wU9clkkRtxpAI0ige1mKBE8b4mmAimb0WkgwUmSoeY0iFYsy/Pk0ohf5K3rqxs8QwmSMIeZCAHFhxBES6hBBUgcA9P8AKvxoPxbLwZ75PWhDGd2YU/MD6+Af15nYw=</latexit><latexit sha1_base64="agot0VInaW2AAaLU3Re4vvRXyvE=">AAACEHicbVDLSsNAFJ34rPVVdelmaBEqQkm6URdCURCXFRtbaGKZTCft0MmDmYk0pPkFN/opblyouHXprn/j9IFo64ELh3Pu5d57nJBRIXV9qC0sLi2vrGbWsusbm1vbuZ3dWxFEHBMTByzgDQcJwqhPTEklI42QE+Q5jNSd3sXIr98TLmjg12QcEttDHZ+6FCOppFauaIVdWkwsx4X99BCewYHlOUE/uald1tIffXBXbuUKekkfA84TY0oKlbx19DSsxNVW7stqBzjyiC8xQ0I0DT2UdoK4pJiRNGtFgoQI91CHNBX1kUeEnYw/SuGBUtrQDbgqX8Kx+nsiQZ4QseeoTg/Jrpj1RuJ/XjOS7omdUD+MJPHxZJEbMSgDOIoHtiknWLJYEYQ5VbdC3EUcYalCzKoQjNmX54lZLp2WjGujUDkHE2TAPsiDIjDAMaiAK1AFJsDgATyDV/CmPWov2rv2MWld0KYze+APtM9vBWyfEg==</latexit><latexit sha1_base64="agot0VInaW2AAaLU3Re4vvRXyvE=">AAACEHicbVDLSsNAFJ34rPVVdelmaBEqQkm6URdCURCXFRtbaGKZTCft0MmDmYk0pPkFN/opblyouHXprn/j9IFo64ELh3Pu5d57nJBRIXV9qC0sLi2vrGbWsusbm1vbuZ3dWxFEHBMTByzgDQcJwqhPTEklI42QE+Q5jNSd3sXIr98TLmjg12QcEttDHZ+6FCOppFauaIVdWkwsx4X99BCewYHlOUE/uald1tIffXBXbuUKekkfA84TY0oKlbx19DSsxNVW7stqBzjyiC8xQ0I0DT2UdoK4pJiRNGtFgoQI91CHNBX1kUeEnYw/SuGBUtrQDbgqX8Kx+nsiQZ4QseeoTg/Jrpj1RuJ/XjOS7omdUD+MJPHxZJEbMSgDOIoHtiknWLJYEYQ5VbdC3EUcYalCzKoQjNmX54lZLp2WjGujUDkHE2TAPsiDIjDAMaiAK1AFJsDgATyDV/CmPWov2rv2MWld0KYze+APtM9vBWyfEg==</latexit><latexit sha1_base64="D4dz6belcndpmBaJNH3XgMpim9E=">AAACEHicbVBNS8NAEN34WetX1KOXxSLUS0l6UQ9CURCPFRtbaGLZbDft0s0m7G6kJe1f8OJf8eJBxatHb/4bt20QbX0w8Hhvhpl5fsyoVJb1ZSwsLi2vrObW8usbm1vb5s7urYwSgYmDIxaJho8kYZQTR1HFSCMWBIU+I3W/dzH26/dESBrxmhrExAtRh9OAYqS01DKLbtylxdT1A9gfHcEzOHRDP+qnN7XL2uhHH96VW2bBKlkTwHliZ6QAMlRb5qfbjnASEq4wQ1I2bStWXoqEopiRUd5NJIkR7qEOaWrKUUikl04+GsFDrbRhEAldXMGJ+nsiRaGUg9DXnSFSXTnrjcX/vGaighMvpTxOFOF4uihIGFQRHMcD21QQrNhAE4QF1bdC3EUCYaVDzOsQ7NmX54lTLp2W7Gu7UDnP0siBfXAAisAGx6ACrkAVOACDB/AEXsCr8Wg8G2/G+7R1wchm9sAfGB/f7IOcAw==</latexit>
Other Losses
18
With KL-Loss and power loss, the student is already pretty good, but for
the best results, two other losses helped.
They used a perceptual loss similar to the style loss used in style transfer
but with features extracted by a WaveNet-like classifier trained to predict
phones from raw audio.
And they used a contrastive loss that penalizes the student for producing
high likelihood samples that are high likelihood independent of the
condition.
Multiple Flows
19
Another thing that was not strictly necessary but improved the final result
was to chain several student models together with the output of one fed to
the input of the next.
The final distribution is still logistic with parameters
Architecture of the Student
20
The student network is a WaveNet except that it doesn’t have skip
connections. (They don’t give the reason for this change.)
Questions?
21
Thanks for attending!
I hope we have time for a few questions
If you are interested in developing models like WaveNet and WaveNet 2,
Respeecher is hiring! Talk to me or our CEO Alex Serdiuk at the
conference, or send us a message on our web site, respeecher.com.

More Related Content

What's hot

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Universitat Politècnica de Catalunya
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksTaegyun Jeon
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
Cancellation of Zigbee interference in OFDM based WLAN for multipath channel
Cancellation of Zigbee interference in OFDM based WLAN for multipath channelCancellation of Zigbee interference in OFDM based WLAN for multipath channel
Cancellation of Zigbee interference in OFDM based WLAN for multipath channelIDES Editor
 
P03 neural networks cvpr2012 deep learning methods for vision
P03 neural networks cvpr2012 deep learning methods for visionP03 neural networks cvpr2012 deep learning methods for vision
P03 neural networks cvpr2012 deep learning methods for visionzukun
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Universitat Politècnica de Catalunya
 
Text prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelText prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelANIRUDHMALODE2
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtplYan Drugalya
 
Recurrent neural networks for sequence learning and learning human identity f...
Recurrent neural networks for sequence learning and learning human identity f...Recurrent neural networks for sequence learning and learning human identity f...
Recurrent neural networks for sequence learning and learning human identity f...SungminYou
 
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Universitat Politècnica de Catalunya
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing MachinesKato Yuzuru
 
Training course lect3
Training course lect3Training course lect3
Training course lect3Noor Dhiya
 

What's hot (20)

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Naist2015 dec ver1
Naist2015 dec ver1Naist2015 dec ver1
Naist2015 dec ver1
 
Cancellation of Zigbee interference in OFDM based WLAN for multipath channel
Cancellation of Zigbee interference in OFDM based WLAN for multipath channelCancellation of Zigbee interference in OFDM based WLAN for multipath channel
Cancellation of Zigbee interference in OFDM based WLAN for multipath channel
 
P03 neural networks cvpr2012 deep learning methods for vision
P03 neural networks cvpr2012 deep learning methods for visionP03 neural networks cvpr2012 deep learning methods for vision
P03 neural networks cvpr2012 deep learning methods for vision
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
 
Text prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelText prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language Model
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtpl
 
Recurrent neural networks for sequence learning and learning human identity f...
Recurrent neural networks for sequence learning and learning human identity f...Recurrent neural networks for sequence learning and learning human identity f...
Recurrent neural networks for sequence learning and learning human identity f...
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
 
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing Machines
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
Training course lect3
Training course lect3Training course lect3
Training course lect3
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
 

Similar to Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neural nets”

Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesYannis Flet-Berliac
 
DL for sentence classification project Write-up
DL for sentence classification project Write-upDL for sentence classification project Write-up
DL for sentence classification project Write-upHoàng Triều Trịnh
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)IRJET Journal
 
Interference mitigation by dynamic self power control in femtocell
Interference mitigation by dynamic self power control in femtocellInterference mitigation by dynamic self power control in femtocell
Interference mitigation by dynamic self power control in femtocellYara Ali
 
Simulation of Scale-Free Networks
Simulation of Scale-Free NetworksSimulation of Scale-Free Networks
Simulation of Scale-Free NetworksGabriele D'Angelo
 
Sepformer&DPTNet.pdf
Sepformer&DPTNet.pdfSepformer&DPTNet.pdf
Sepformer&DPTNet.pdfssuser849b73
 
Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfssuser849b73
 
Automated Speech Recognition
Automated Speech Recognition Automated Speech Recognition
Automated Speech Recognition Pruthvij Thakar
 
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Takuma_OKAMOTO
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdfnyomans1
 
Wireless Ad Hoc Networks
Wireless Ad Hoc NetworksWireless Ad Hoc Networks
Wireless Ad Hoc NetworksTara Hardin
 
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded SoftwareBeyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded SoftwareMiro Samek
 

Similar to Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neural nets” (20)

Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language Services
 
Sudormrf.pdf
Sudormrf.pdfSudormrf.pdf
Sudormrf.pdf
 
DL for sentence classification project Write-up
DL for sentence classification project Write-upDL for sentence classification project Write-up
DL for sentence classification project Write-up
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
 
Conformer review
Conformer reviewConformer review
Conformer review
 
Interference mitigation by dynamic self power control in femtocell
Interference mitigation by dynamic self power control in femtocellInterference mitigation by dynamic self power control in femtocell
Interference mitigation by dynamic self power control in femtocell
 
Simulation of Scale-Free Networks
Simulation of Scale-Free NetworksSimulation of Scale-Free Networks
Simulation of Scale-Free Networks
 
Sepformer&DPTNet.pdf
Sepformer&DPTNet.pdfSepformer&DPTNet.pdf
Sepformer&DPTNet.pdf
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 
Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdf
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Bh36352357
Bh36352357Bh36352357
Bh36352357
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
 
Automated Speech Recognition
Automated Speech Recognition Automated Speech Recognition
Automated Speech Recognition
 
An Optics Life
An Optics LifeAn Optics Life
An Optics Life
 
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
Vsync track c
Vsync   track cVsync   track c
Vsync track c
 
Wireless Ad Hoc Networks
Wireless Ad Hoc NetworksWireless Ad Hoc Networks
Wireless Ad Hoc Networks
 
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded SoftwareBeyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
 

More from Lviv Startup Club

Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Lviv Startup Club
 
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Lviv Startup Club
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Lviv Startup Club
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Lviv Startup Club
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Lviv Startup Club
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Lviv Startup Club
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Lviv Startup Club
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Lviv Startup Club
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Lviv Startup Club
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Lviv Startup Club
 
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Lviv Startup Club
 
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Lviv Startup Club
 
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Lviv Startup Club
 
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Lviv Startup Club
 
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Lviv Startup Club
 
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Lviv Startup Club
 
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Lviv Startup Club
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Lviv Startup Club
 
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Lviv Startup Club
 

More from Lviv Startup Club (20)

Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
 
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
 
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
 
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
 
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
 
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
 
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
 
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
 
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
 
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
 

Recently uploaded

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Recently uploaded (20)

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neural nets”

  • 1. Neural Speech Synthesis with WaveNet and WaveNet 2 Grant Reaber Head of Research, Respeecher gr@respeecher.com 1
  • 2. Why WaveNet? 2 WaveNet, and similar models like SampleRNN, are the only machine learning models that directly generate audio waveform (PCM). The quality they produce is unmatched (best quality text-to-speech of any system) These models are an essential piece of any truly state-of-the-art system that needs to generate sound (though impressive results are also possible with other techniques) These systems can be used to produce various kinds of audio, but our focus will be on speech
  • 3. Autoregressive Models 3 WaveNet (and SampleRNN) learn the distribution of each audio sample conditional on all that have come before. In symbols, the joint probability of a waveform is modeled asx1 ,…,xT TY t=1 p(xt | x1, . . . , xt 1) <latexit sha1_base64="jjT5qDYzodMoP1cTJ30vZAWp+LA=">AAACE3icbVA9SwNBEJ3z2/gVtbRZFEFRw52NWgiiIJYREhVy8djb2+iSvdtjd04SjvwIG2v/hY2Fiq2Nnf/APyG4SSz8ejDweG+GmXlhKoVB131zBgaHhkdGx8YLE5NT0zPF2bkTozLNeJUpqfRZSA2XIuFVFCj5Wao5jUPJT8PmQdc/veLaCJVUsJ3yekwvEtEQjKKVguKan2oVBTnuep3zCklXWgH6sYhIK/DWfRkpNOsta294ndWguOSW3B7IX+J9kaW9w/ePWwAoB8VXP1Isi3mCTFJjap6bYj2nGgWTvFPwM8NTypr0gtcsTWjMTT3vPdUhy1aJSENpWwmSnvp9IqexMe04tJ0xxUvz2+uK/3m1DBvb9VwkaYY8Yf1FjUwSVKSbEImE5gxl2xLKtLC3EnZJNWVocyzYELzfL/8l1c3STsk7tmHsQx9jsACLsAIebMEeHEEZqsDgGu7gAR6dG+feeXKe+60DztfMPPyA8/IJ0fOglQ==</latexit><latexit sha1_base64="O/hgRqaq5ZrU2VbfR7C6OQ4F0Oc=">AAACE3icbVDLSsNAFJ3Ud31FXboZFEGxlsSNuhBEQVwqWC00NUwm0zp0kgkzN9IS6j+4cetnuHGh4taNrvwDf0Jw+lho9cCFwzn3cu89QSK4Bsf5sHJDwyOjY+MT+cmp6ZlZe27+TMtUUVaiUkhVDohmgsesBBwEKyeKkSgQ7DxoHHT88yumNJfxKbQSVo1IPeY1TgkYybfXvUTJ0M9g121fnOJktemDF/EQN3234IlQgi40jb3httd8e9kpOl3gv8Ttk+W9w8+vu3d5fezbb14oaRqxGKggWldcJ4FqRhRwKlg776WaJYQ2SJ1VDI1JxHQ16z7VxitGCXFNKlMx4K76cyIjkdatKDCdEYFLPeh1xP+8Sgq17WrG4yQFFtPeoloqMEjcSQiHXDEKomUIoYqbWzG9JIpQMDnmTQju4Mt/SWmzuFN0T0wY+6iHcbSIltAqctEW2kNH6BiVEEU36B49oifr1nqwnq2XXmvO6s8soF+wXr8BJd6iVA==</latexit><latexit sha1_base64="O/hgRqaq5ZrU2VbfR7C6OQ4F0Oc=">AAACE3icbVDLSsNAFJ3Ud31FXboZFEGxlsSNuhBEQVwqWC00NUwm0zp0kgkzN9IS6j+4cetnuHGh4taNrvwDf0Jw+lho9cCFwzn3cu89QSK4Bsf5sHJDwyOjY+MT+cmp6ZlZe27+TMtUUVaiUkhVDohmgsesBBwEKyeKkSgQ7DxoHHT88yumNJfxKbQSVo1IPeY1TgkYybfXvUTJ0M9g121fnOJktemDF/EQN3234IlQgi40jb3httd8e9kpOl3gv8Ttk+W9w8+vu3d5fezbb14oaRqxGKggWldcJ4FqRhRwKlg776WaJYQ2SJ1VDI1JxHQ16z7VxitGCXFNKlMx4K76cyIjkdatKDCdEYFLPeh1xP+8Sgq17WrG4yQFFtPeoloqMEjcSQiHXDEKomUIoYqbWzG9JIpQMDnmTQju4Mt/SWmzuFN0T0wY+6iHcbSIltAqctEW2kNH6BiVEEU36B49oifr1nqwnq2XXmvO6s8soF+wXr8BJd6iVA==</latexit><latexit sha1_base64="5a24bqAnQxuaINJOQ0ieH2NGml4=">AAACE3icbVA9SwNBEN2LXzF+RS1tFoMQMYY7G7UQgjaWERITyMVjb2+TLNm7PXbnJOHIj7Dxr9hYqNja2Plv3HwUGn0w8Hhvhpl5fiy4Btv+sjILi0vLK9nV3Nr6xuZWfnvnVstEUVanUkjV9IlmgkesDhwEa8aKkdAXrOH3r8Z+454pzWVUg2HM2iHpRrzDKQEjefkjN1Yy8FK4cEZ3NRwXBx64IQ/wwHNKrggk6NLA2MfO6NDLF+yyPQH+S5wZKaAZql7+0w0kTUIWARVE65Zjx9BOiQJOBRvl3ESzmNA+6bKWoREJmW6nk6dG+MAoAe5IZSoCPFF/TqQk1HoY+qYzJNDT895Y/M9rJdA5a6c8ihNgEZ0u6iQCg8TjhHDAFaMghoYQqri5FdMeUYSCyTFnQnDmX/5L6ifl87JzYxcql7M0smgP7aMictApqqBrVEV1RNEDekIv6NV6tJ6tN+t92pqxZjO76Besj2+5W52K</latexit>
  • 4. Training and Generation 4 In training, we learn to predict each sample in a piece of audio given all that have come before. Because WaveNet is convolutional, we can do this in parallel for a sequence of samples. (With RNNs, you could not do this and would use teacher forcing.) We generate audio sample by sample. Suppose we have generated the first n samples. Then we draw from the predicted distribution for the n+1 th sample conditional on these n samples. Now we can compute a predicated distribution for the n+2 th sample conditional on these n+1 samples. We can’t do this in parallel, and because of this the original WaveNet required minutes to generate a second of audio and was impractical for most applications. Sampling from the distribution works much better than using beam search to find a high likelihood sequence as is commonly done in machine translation.
  • 5. Conditional Autoregressive Models 5 Running generation by itself produces a kind of babbling. For most applications, for instance text-to-speech, we want to control the generated audio. This can be done by training a conditioned model, where we condition on some linguistic features derived from input texts. Then conditions can be supplied in generation to generate what speech we like. Can also supply “global” (not changing in time) conditions for things like speaker identity to produce speech from many different speakers with one model.
  • 6. Modeling a Sample 6 WaveNet uses a 256 bin softmax to represent audio (8 bit sample depth, using “mu encoding” to have smaller bins near zero). Training is slow at first, and it doesn’t scale to higher sample depth. So WaveNet 2 uses a discretized mixture of logistic distributions instead (10 components according to Tacotron 2 paper). WaveNet 2 is functionally identical to WaveNet except for this change, modeling 24kHz audio instead of 16kHz, and one hyperparameter change to increase the receptive field, which we will mention later.
  • 7. Why WaveNet 2? 7 Although WaveNet 2 does make some very minor tweaks to the architecture, which we have just discussed, by far its main contribution is a technique to speed up generation by about 3000x. 20x realtime generation Interesting technique to do this: first train a regular WaveNet, then use it to train a model that produces audio in parallel rather than sample-by-sample; The second,“distilled” model produces output that is just as high quality as the original model’s output.
  • 8. Modeling Speech (and other audio) 8 Log mel-scale magnitude spectrograms seem to compactly represent all the information necessary to represent speech (cf.Tacotron 2) 80 channels x 80Hz (vs. 1 channel x 16-44kHz for PCM) Computed algorithmically from PCM Lossy, especially because phase information is discarded Can be inverted with Griffin-Lim But WaveNet does a better job. Used in Tacotron 2, currently the best text-to-speech system Note that when WaveNet is inverting generated spectrogram, as with Tacotron 2, it can actually learn to correct errors in that spectrogram. (The loss function does not enforce that the spectrogram inversion is correct, only that the whole transformation is.)
  • 9. Convolutions 9 Convolutions are parallelizable in training, and they respect the structure of the sequence. But they have two problems. For autoregression, it is critical not to allow the model to see input from the future because it won’t be available at generation time (and it would be trivial for the model to predict the next sample if it could see it!) The receptive field grows slowly with ordinary convolutions: after n convolutions of width w, it is only on the order of nw. If, say, the audio is 16kHz, and we want 60ms of context, we need a receptive field of about 1000 samples.
  • 10. Dilated Causal Convolutions 10 The looking-into-the-future problem is fixed by just shifting the windows of the convolutions so they only see the past. (The fancy term for this is “causal” convolution.) The small receptive field problem is fixed by using discontinuous windows. If we give every nth sample as input to the convolutional kernel, we say that the convolution has a dilation factor of n (so ordinary convolutions have a dilation factor of 1). In WaveNet, successive convolutional layers have exponentially growing dilation factors: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512. So with 10 layers and window size 2, we can get a receptive field of 1024 samples.
  • 11. Convolution Hyperparameters 11 WaveNet used 30 layers of convolutions, three stacks of 10 convolutions with dilation going from 1 to 512. WaveNet 2 uses the same hyperparameters except that it uses a window size of 3 instead of 2. (This is the “one hyperparameter change” mentioned earlier.) (Although DeepMind doesn’t release code and is not entirely explicit about about their hyperparameters, so this isn’t certain.)
  • 12. The Rest of the Architecture 12 (Actually, there should be separate convs for the residual and skip connections and for The tanh and sigmoid gates.)
  • 13. Best guess hyperparameters 13 The paper is scant on details about hyperparameters, but here are best guesses based on a talk by an insider (see https://github.com/ibab/ tensorflow-wavenet/issues/227). 256 skip channels 512 dilation channels 512 residual channels
  • 14. Conditioning 14 Global and local conditions are projected to the number of dilation channels and added to the outputs of the filter and gate convolutions of the gated activation units (before activation). Separate projections for filter and gate, global and local, and for each layer (so 30 layer WaveNet with both types of conditioning uses 120 projection matrices). Local condition is often at a lower time frequency than the WaveNet. If so, it is upsampled to the WaveNet frequency using transposed convolutions.
  • 15. WaveNet 2 15 The rest of the talk will be about how WaveNet 2 is able to generate samples 3000x faster than the original WaveNet. The basic idea is to first train a WaveNet and then use it to train a student network. The student network takes noise z1,…,zT, drawn from a standard logistic distribution, as input, and outputs, for each timestep t, parameters st and μt, which are computed only from z1,…,zt-1, of a logistic distribution from which xt will be drawn. The draw is controlled by zt: xt = ztst + μt. 1 1 + e x µ s<latexit sha1_base64="d5LgZXyJS3q66smCoxfqBk26qkU=">AAACCHicbVC7TsMwFL0pr1JeAUYWiwoJCbVKWICtwMKEikRopTZUjuu0Vp2HbAdRRVlZ+BUWBkCsfAIbA/+C23SAliNd6fice+V7jxdzJpVlfRmFufmFxaXicmlldW19w9zcupFRIgh1SMQj0fSwpJyF1FFMcdqMBcWBx2nDG5yP/MYdFZJF4bUaxtQNcC9kPiNYaaljorYvMEntLLUP6G1ayZ/3lXaQZKnMsqxjlq2qNQaaJfaElGun35c1AKh3zM92NyJJQENFOJayZVuxclMsFCOcZqV2ImmMyQD3aEvTEAdUuun4kgztaaWL/EjoChUaq78nUhxIOQw83Rlg1ZfT3kj8z2slyj92UxbGiaIhyT/yE45UhEaxoC4TlCg+1AQTwfSuiPSxzkLp8Eo6BHv65FniHFZPqvaVDuMMchRhB3ZhH2w4ghpcQB0cIPAAT/ACr8aj8Wy8Ge95a8GYzGzDHxgfP/PFnEI=</latexit><latexit sha1_base64="/JGCDqEL6VhIU7tOrrLdopuLEOE=">AAACCHicbVC7TsMwFHXKq5RXgJHFokJCQq0SFmArj4EJFYnQSk2pHNdprTpOZDuIKMrKwq+wdADEyiewMfAvuE0HaDnSlY7PuVe+93gRo1JZ1pdRmJtfWFwqLpdWVtfWN8zNrVsZxgITB4csFE0PScIoJ46iipFmJAgKPEYa3uB85DfuiZA05DcqiUg7QD1OfYqR0lLHhK4vEE7tLLUPyF1ayZ8PFTeIs1RmWdYxy1bVGgPOEntCyrXT76sLa5jUO+an2w1xHBCuMENStmwrUu0UCUUxI1nJjSWJEB6gHmlpylFAZDsdX5LBPa10oR8KXVzBsfp7IkWBlEng6c4Aqb6c9kbif14rVv5xO6U8ihXhOP/IjxlUIRzFArtUEKxYognCgupdIe4jnYXS4ZV0CPb0ybPEOayeVO1rHcYZyFEEO2AX7AMbHIEauAR14AAMHsEzeAGvxpMxNN6M97y1YExmtsEfGB8/xdidoA==</latexit><latexit sha1_base64="/JGCDqEL6VhIU7tOrrLdopuLEOE=">AAACCHicbVC7TsMwFHXKq5RXgJHFokJCQq0SFmArj4EJFYnQSk2pHNdprTpOZDuIKMrKwq+wdADEyiewMfAvuE0HaDnSlY7PuVe+93gRo1JZ1pdRmJtfWFwqLpdWVtfWN8zNrVsZxgITB4csFE0PScIoJ46iipFmJAgKPEYa3uB85DfuiZA05DcqiUg7QD1OfYqR0lLHhK4vEE7tLLUPyF1ayZ8PFTeIs1RmWdYxy1bVGgPOEntCyrXT76sLa5jUO+an2w1xHBCuMENStmwrUu0UCUUxI1nJjSWJEB6gHmlpylFAZDsdX5LBPa10oR8KXVzBsfp7IkWBlEng6c4Aqb6c9kbif14rVv5xO6U8ihXhOP/IjxlUIRzFArtUEKxYognCgupdIe4jnYXS4ZV0CPb0ybPEOayeVO1rHcYZyFEEO2AX7AMbHIEauAR14AAMHsEzeAGvxpMxNN6M97y1YExmtsEfGB8/xdidoA==</latexit><latexit sha1_base64="QpJ0MVI6vGf/03JZCUsRA0790xY=">AAACCHicbVC7TsMwFHV4lvIKMLJYVEhIqFXCAmwVLIxFIrRSGyrHdVqrthPZDqKysrLwKywMgFj5BDb+BrfNAC1HutLxOffK954oZVRpz/t2FhaXlldWS2vl9Y3NrW13Z/dWJZnEJMAJS2QrQoowKkigqWaklUqCeMRIMxpejv3mPZGKJuJGj1ISctQXNKYYaSt1XdiJJcLGz41/TO5Mdfp8qHZ4lhuV53nXrXg1bwI4T/yCVECBRtf96vQSnHEiNGZIqbbvpTo0SGqKGcnLnUyRFOEh6pO2pQJxokIzuSSHh1bpwTiRtoSGE/X3hEFcqRGPbCdHeqBmvbH4n9fOdHwWGirSTBOBpx/FGYM6geNYYI9KgjUbWYKwpHZXiAfIZqFteGUbgj978jwJTmrnNf/aq9QvijRKYB8cgCPgg1NQB1egAQKAwSN4Bq/gzXlyXpx352PauuAUM3vgD5zPH1yomlY=</latexit>
  • 16. The Primary Training Objective 16 The student is trained to minimize the KL divergence from it to the teacher WaveNet, DKL(PS||PT) = H(PS, PT) - H(PS). It is trying to generate samples that the teacher WaveNet considers likely, but at the same time it is trying to maximize its own entropy (so it will not collapse to a mode of the teacher). For reasons of time, I will skip the procedure for estimating cross-entropy. H(PS) can be simply estimated. Since the entropy of a logistic distribution with scale parameter s is ln s + 2, H(PS) = Ez⇠L(0,1)  TX t=1 ln st + 2T <latexit sha1_base64="n9a6DzWPfVDCkFArHAIROemzTO4=">AAACKHicbVA9axtBEJ2z8yErH5bt0s0SE5BIEHdqkhQCYROjIoVMpFiguxx7q5W8eHfv2J0zyIf+i6s0/ituErCDW1f+E4asJBeJlAcDj/dmmJmXZFJY9P1bb239ydNnz0sb5RcvX73erGxtf7NpbhjvsVSmpp9Qy6XQvIcCJe9nhlOVSH6cnB7M/OMzbqxIdRcnGY8UHWsxEoyik+JKq13txF9rpEk+x8U5Ca1Q5EvVfx/UpmEixmM5CG2u4gKbwfR7N5Sa2BjnjonIO9LoxpU9v+7PQVZJ8Ej2Wof3DxcA0Ikrv8JhynLFNTJJrR0EfoZRQQ0KJvm0HOaWZ5Sd0jEfOKqp4jYq5p9OyVunDMkoNa40krn690RBlbUTlbhORfHELnsz8X/eIMfRx6gQOsuRa7ZYNMolwZTMYiNDYThDOXGEMiPcrYSdUEMZunDLLoRg+eVV0mvUP9WDIxfGPixQgl14A1UI4AO0oA0d6AGDH3AF13DjXXo/vd/e7aJ1zXuc2YF/4N39AWvzpuw=</latexit><latexit sha1_base64="WQm3saTGJeTN71Eq/L6oWwjkw0M=">AAACKHicbVA9SwNBEN3z2/gVtbRZFCGihDsbtRCColhYRExUyJ3H3mYTF3f3jt05IR7xr9jZ+FdsVFRsrfwTgpvEwq8HA4/3ZpiZFyWCG3DdV6evf2BwaHhkNDc2PjE5lZ+eOTJxqimr0ljE+iQihgmuWBU4CHaSaEZkJNhxdL7d8Y8vmDY8VhVoJSyQpKl4g1MCVgrzpb1COTxcwpt4J8wusW+4xPsFd8VbavsRbzZFzTepDDPY9NqnFV8obELoOjrAy3i1EuYX3KLbBf5LvC+yUNp9/7h+jK/KYf7Br8c0lUwBFcSYmucmEGREA6eCtXN+alhC6Dlpspqlikhmgqz7aRsvWqWOG7G2pQB31e8TGZHGtGRkOyWBM/Pb64j/ebUUGutBxlWSAlO0t6iRCgwx7sSG61wzCqJlCaGa21sxPSOaULDh5mwI3u+X/5LqanGj6B3YMLZQDyNoDs2jAvLQGiqhPVRGVUTRDbpDT+jZuXXunRfntdfa53zNzKIfcN4+Ab/PqKs=</latexit><latexit sha1_base64="WQm3saTGJeTN71Eq/L6oWwjkw0M=">AAACKHicbVA9SwNBEN3z2/gVtbRZFCGihDsbtRCColhYRExUyJ3H3mYTF3f3jt05IR7xr9jZ+FdsVFRsrfwTgpvEwq8HA4/3ZpiZFyWCG3DdV6evf2BwaHhkNDc2PjE5lZ+eOTJxqimr0ljE+iQihgmuWBU4CHaSaEZkJNhxdL7d8Y8vmDY8VhVoJSyQpKl4g1MCVgrzpb1COTxcwpt4J8wusW+4xPsFd8VbavsRbzZFzTepDDPY9NqnFV8obELoOjrAy3i1EuYX3KLbBf5LvC+yUNp9/7h+jK/KYf7Br8c0lUwBFcSYmucmEGREA6eCtXN+alhC6Dlpspqlikhmgqz7aRsvWqWOG7G2pQB31e8TGZHGtGRkOyWBM/Pb64j/ebUUGutBxlWSAlO0t6iRCgwx7sSG61wzCqJlCaGa21sxPSOaULDh5mwI3u+X/5LqanGj6B3YMLZQDyNoDs2jAvLQGiqhPVRGVUTRDbpDT+jZuXXunRfntdfa53zNzKIfcN4+Ab/PqKs=</latexit><latexit sha1_base64="09qO86Mb1t23HrdJ4F9yFyQIxqc=">AAACKHicbVDLSgNBEJz1bXxFPXoZDEJECbu5qAchKIIHDxETDWTXZXYySYbMzC4zvUJc8jte/BUvCipe/RInj4OvgoaiqpvurigR3IDrfjhT0zOzc/MLi7ml5ZXVtfz6xrWJU01ZncYi1o2IGCa4YnXgIFgj0YzISLCbqHc69G/umDY8VjXoJyyQpKN4m1MCVgrzlfNiNbzaxcf4LMzusW+4xBdFd9/bHfgR73RE0zepDDM49ga3NV8obEIYOTrAe7hcC/MFt+SOgP8Sb0IKaIJqmH/xWzFNJVNABTGm6bkJBBnRwKlgg5yfGpYQ2iMd1rRUEclMkI0+HeAdq7RwO9a2FOCR+n0iI9KYvoxspyTQNb+9ofif10yhfRhkXCUpMEXHi9qpwBDjYWy4xTWjIPqWEKq5vRXTLtGEgg03Z0Pwfr/8l9TLpaOSd+kWKieTNBbQFtpGReShA1RB56iK6oiiB/SEXtGb8+g8O+/Ox7h1ypnMbKIfcD6/AFNbo+E=</latexit>
  • 17. Power Loss 17 It turns out that minimizing the KL-distance alone produces a student that just produces low volume audio not resembling speech (perhaps resembling whispering). This may be because whispering has a lot of entropy compared to speech. It is necessary to use an additional loss term that ensures that the power in different frequency bands of the generated speech are on average about the same as in human speech. The loss is where It is really only average power that matters, not having the right power over time since averaging over time before computing the difference in the loss didn’t make any noticeable difference to the results. k (g(z, c)) (y)k2 <latexit sha1_base64="4uHXZLHW8AaxNhwZXC2ykIejVMI=">AAACGHicbZC7TgJBFIbP4g3xhlraTCAmEJXs0qgd0cYSExESFsnsMAsTZy+ZmTXBZV/DxsYHsbFQY0vn2zgsFAr+ySRf/v+cnDnHCTmTyjS/jczS8srqWnY9t7G5tb2T3927lUEkCG2QgAei5WBJOfNpQzHFaSsUFHsOp03n/nKSNx+okCzwb9QwpB0P933mMoKVtrp50x4hOxywUr8U246LHpNjlAJJymV0Ms1SY5iU7dFdtZsvmhUzFVoEawbFWsE+egGAejc/tnsBiTzqK8KxlG3LDFUnxkIxwmmSsyNJQ0zucZ+2NfrYo7ITp5sl6FA7PeQGQj9fodT93RFjT8qh5+hKD6uBnM8m5n9ZO1LuWSdmfhgp6pPpIDfiSAVocibUY4ISxYcaMBFM/xWRARaYKH3MnD6CNb/yIjSqlfOKdW0VaxcwVRYOoAAlsOAUanAFdWgAgSd4hXf4MJ6NN+PT+JqWZoxZzz78kTH+ATB1n5w=</latexit><latexit sha1_base64="upSNlfKN8Pc6Xq45aKk2z984m0I=">AAACGHicbZDLTsJAFIaneEO8oS7dTCAmEJW0bNRdoxuXmIiQ0EqmwxQmTC+ZmZrU0rcwbnwVNy7UuGXH2zgUFgr+ySRf/v+cnDnHCRkVUtcnWm5ldW19I79Z2Nre2d0r7h/ciyDimDRxwALedpAgjPqkKalkpB1ygjyHkZYzvJ7mrUfCBQ38OxmHxPZQ36cuxUgqq1vUrRG0wgGt9CuJ5bjwKT2FGeC0WoVnsywz4rRqjR7q3WJZr+mZ4DIYcyibJevkeWLGjW5xbPUCHHnEl5ghITqGHko7QVxSzEhasCJBQoSHqE86Cn3kEWEn2WYpPFZOD7oBV8+XMHN/dyTIEyL2HFXpITkQi9nU/C/rRNK9sBPqh5EkPp4NciMGZQCnZ4I9ygmWLFaAMKfqrxAPEEdYqmMW1BGMxZWXoVmvXdaMW6NsXoGZ8uAIlEAFGOAcmOAGNEATYPAC3sAH+NRetXftS/uelea0ec8h+CNt/AM4WaEi</latexit><latexit sha1_base64="upSNlfKN8Pc6Xq45aKk2z984m0I=">AAACGHicbZDLTsJAFIaneEO8oS7dTCAmEJW0bNRdoxuXmIiQ0EqmwxQmTC+ZmZrU0rcwbnwVNy7UuGXH2zgUFgr+ySRf/v+cnDnHCRkVUtcnWm5ldW19I79Z2Nre2d0r7h/ciyDimDRxwALedpAgjPqkKalkpB1ygjyHkZYzvJ7mrUfCBQ38OxmHxPZQ36cuxUgqq1vUrRG0wgGt9CuJ5bjwKT2FGeC0WoVnsywz4rRqjR7q3WJZr+mZ4DIYcyibJevkeWLGjW5xbPUCHHnEl5ghITqGHko7QVxSzEhasCJBQoSHqE86Cn3kEWEn2WYpPFZOD7oBV8+XMHN/dyTIEyL2HFXpITkQi9nU/C/rRNK9sBPqh5EkPp4NciMGZQCnZ4I9ygmWLFaAMKfqrxAPEEdYqmMW1BGMxZWXoVmvXdaMW6NsXoGZ8uAIlEAFGOAcmOAGNEATYPAC3sAH+NRetXftS/uelea0ec8h+CNt/AM4WaEi</latexit><latexit sha1_base64="puEWTBR6uMeAwhWUjXZsLoFGP8c=">AAACGHicbZC7TsMwFIadcivlFmBksaiQWgmqpAuwVbAwFonQSk2oHNdprToX2Q5SSPMaLLwKCwMg1m68DW6aAVqOZOnT/5+j4/O7EaNCGsa3VlpZXVvfKG9WtrZ3dvf0/YN7EcYcEwuHLORdFwnCaEAsSSUj3YgT5LuMdNzx9czvPBIuaBjcySQijo+GAfUoRlJJfd2wJ9CORrQ2rKW268Gn7BTmgLN6HZ7NvVxIsro9eWj29arRMPKCy2AWUAVFtfv61B6EOPZJIDFDQvRMI5JOirikmJGsYseCRAiP0ZD0FAbIJ8JJ88syeKKUAfRCrl4gYa7+nkiRL0Tiu6rTR3IkFr2Z+J/Xi6V34aQ0iGJJAjxf5MUMyhDOYoIDygmWLFGAMKfqrxCPEEdYqjArKgRz8eRlsJqNy4Z5a1ZbV0UaZXAEjkENmOActMANaAMLYPAMXsE7+NBetDftU/uat5a0YuYQ/Clt+gMff54T</latexit> (x) = |STFT(x)|2 <latexit sha1_base64="lPoRiuCH5o7K9y8MaXsv8n948mw=">AAACEHicbVDLSgNBEOyNrxhfUY9ehgQhIoTdXNSDEBTEY8TEBLIxzE5mkyGzD2ZmJWGTX/DiD/gRXjyoePXozb9x8kA0saChqOqmu8sJOZPKNL+MxMLi0vJKcjW1tr6xuZXe3rmRQSQIrZCAB6LmYEk582lFMcVpLRQUew6nVad7PvKrd1RIFvhl1Q9pw8Ntn7mMYKWlZjpnhx2Wi23HRb3hATpFA9tzgl58Xb4oD3/0wW2hmc6aeXMMNE+sKckWM/bhIwCUmulPuxWQyKO+IhxLWbfMUDViLBQjnA5TdiRpiEkXt2ldUx97VDbi8UdDtK+VFnIDoctXaKz+noixJ2Xfc3Snh1VHznoj8T+vHin3uBEzP4wU9clkkRtxpAI0ige1mKBE8b4mmAimb0WkgwUmSoeY0iFYsy/Pk0ohf5K3rqxs8QwmSMIeZCAHFhxBES6hBBUgcA9P8AKvxoPxbLwZ75PWhDGd2YU/MD6+Af15nYw=</latexit><latexit sha1_base64="agot0VInaW2AAaLU3Re4vvRXyvE=">AAACEHicbVDLSsNAFJ34rPVVdelmaBEqQkm6URdCURCXFRtbaGKZTCft0MmDmYk0pPkFN/opblyouHXprn/j9IFo64ELh3Pu5d57nJBRIXV9qC0sLi2vrGbWsusbm1vbuZ3dWxFEHBMTByzgDQcJwqhPTEklI42QE+Q5jNSd3sXIr98TLmjg12QcEttDHZ+6FCOppFauaIVdWkwsx4X99BCewYHlOUE/uald1tIffXBXbuUKekkfA84TY0oKlbx19DSsxNVW7stqBzjyiC8xQ0I0DT2UdoK4pJiRNGtFgoQI91CHNBX1kUeEnYw/SuGBUtrQDbgqX8Kx+nsiQZ4QseeoTg/Jrpj1RuJ/XjOS7omdUD+MJPHxZJEbMSgDOIoHtiknWLJYEYQ5VbdC3EUcYalCzKoQjNmX54lZLp2WjGujUDkHE2TAPsiDIjDAMaiAK1AFJsDgATyDV/CmPWov2rv2MWld0KYze+APtM9vBWyfEg==</latexit><latexit sha1_base64="agot0VInaW2AAaLU3Re4vvRXyvE=">AAACEHicbVDLSsNAFJ34rPVVdelmaBEqQkm6URdCURCXFRtbaGKZTCft0MmDmYk0pPkFN/opblyouHXprn/j9IFo64ELh3Pu5d57nJBRIXV9qC0sLi2vrGbWsusbm1vbuZ3dWxFEHBMTByzgDQcJwqhPTEklI42QE+Q5jNSd3sXIr98TLmjg12QcEttDHZ+6FCOppFauaIVdWkwsx4X99BCewYHlOUE/uald1tIffXBXbuUKekkfA84TY0oKlbx19DSsxNVW7stqBzjyiC8xQ0I0DT2UdoK4pJiRNGtFgoQI91CHNBX1kUeEnYw/SuGBUtrQDbgqX8Kx+nsiQZ4QseeoTg/Jrpj1RuJ/XjOS7omdUD+MJPHxZJEbMSgDOIoHtiknWLJYEYQ5VbdC3EUcYalCzKoQjNmX54lZLp2WjGujUDkHE2TAPsiDIjDAMaiAK1AFJsDgATyDV/CmPWov2rv2MWld0KYze+APtM9vBWyfEg==</latexit><latexit sha1_base64="D4dz6belcndpmBaJNH3XgMpim9E=">AAACEHicbVBNS8NAEN34WetX1KOXxSLUS0l6UQ9CURCPFRtbaGLZbDft0s0m7G6kJe1f8OJf8eJBxatHb/4bt20QbX0w8Hhvhpl5fsyoVJb1ZSwsLi2vrObW8usbm1vb5s7urYwSgYmDIxaJho8kYZQTR1HFSCMWBIU+I3W/dzH26/dESBrxmhrExAtRh9OAYqS01DKLbtylxdT1A9gfHcEzOHRDP+qnN7XL2uhHH96VW2bBKlkTwHliZ6QAMlRb5qfbjnASEq4wQ1I2bStWXoqEopiRUd5NJIkR7qEOaWrKUUikl04+GsFDrbRhEAldXMGJ+nsiRaGUg9DXnSFSXTnrjcX/vGaighMvpTxOFOF4uihIGFQRHMcD21QQrNhAE4QF1bdC3EUCYaVDzOsQ7NmX54lTLp2W7Gu7UDnP0siBfXAAisAGx6ACrkAVOACDB/AEXsCr8Wg8G2/G+7R1wchm9sAfGB/f7IOcAw==</latexit>
  • 18. Other Losses 18 With KL-Loss and power loss, the student is already pretty good, but for the best results, two other losses helped. They used a perceptual loss similar to the style loss used in style transfer but with features extracted by a WaveNet-like classifier trained to predict phones from raw audio. And they used a contrastive loss that penalizes the student for producing high likelihood samples that are high likelihood independent of the condition.
  • 19. Multiple Flows 19 Another thing that was not strictly necessary but improved the final result was to chain several student models together with the output of one fed to the input of the next. The final distribution is still logistic with parameters
  • 20. Architecture of the Student 20 The student network is a WaveNet except that it doesn’t have skip connections. (They don’t give the reason for this change.)
  • 21. Questions? 21 Thanks for attending! I hope we have time for a few questions If you are interested in developing models like WaveNet and WaveNet 2, Respeecher is hiring! Talk to me or our CEO Alex Serdiuk at the conference, or send us a message on our web site, respeecher.com.