SlideShare a Scribd company logo
1 of 26
WaveNet an audio generative model based
on the PixelCNN architecture
Guided By
Presented By
Ms. FABEELA ALI RAWTHER ABEY R HURTIS
Introduction
● This work explores raw audio generation techniques, inspired by recent advances in
neural autoregressive generative models.
● The question this paper addresses is whether similar approaches can succeed in
generating wideband raw audio waveforms which are signals with very high
temporal resolution, at least 16,000 samples per second
Convolutional Neural Network(CNN)
● A CNN has very less parameters
compared to a traditional dense
network
● Less parameters means the model
has less space & time complexity
and less training time
PixelCNN
● The h features for each input position at every
layer in the network are split into three parts, each
corresponding to one of the RGB channels
● PixelCNN uses joint distribution of pixels to
generate pixel Xi
● The pixels are interdependent is in raster scan
order
● Pixel Xi is generated for R, G and B channels
which is interdependent
Generation of Pixels
X1
Xi
Xn2
X1
Xi
Xn2
X1
Xi
Xn2
pixel Xi is generated for the three channels.
For generating the pixels a softmax layer is used at the end of the architecture, It outputs the
most probable intensity w.r.t the context pixels before Xi
Masked Convolution
● A filter with mask has a limited receptive
field
● This leads to blind spot in the
convolution
● To avoid the blind spot another
convolution is needed ( a Horizontal
convolution is added)
Masked Convolution (cont…)
● The vertical stack of convolution reads all the
rows above
● Horizontal stack of convolution reads the pixels
from current pixel to all the pixel’s to it’s left
Gated PixelCNN
● When using simply two convolution for vertical and horizontal more
complex interactions cannot be learned but stacks of layer may help
● The need for a different model arise due to the fact that the existing
model is not enough for mapping the more import features
● The Gated PixelCNN fills the gap in learning it combines the input of
gate and filter to implement the horizontal and vertical convolution
⊙ elemental wise multiplication
* Convolutional operation
Gated Activation
Conditional PixelCNN
● Given a high-level image description represented as a latent vector h, we
seek to model the conditional distribution p(x|h)
● We model the conditional distribution by adding terms that depend on h to
the activations before the nonlinearities in the gated PixelCNN equation
Conditional PixelCNN(cont…)
● If h is one-hot encoded that specifies a class this is equivalent to adding a
class dependent bias at every layer
● By mapping h to a spatial representation S=m(h) (which has the same dim
as the image but may have an arbitrary number of feature maps) with a
deconvolutional neural network m() then we obtain a location dependent
bias
WaveNet
● The joint probability of a waveform x = {x1,...,xT} is factored as a product of conditional
probabilities as follows
● The conditional probability distribution is modelled by stack of convolution layers.
There is no pooling and the output has the same time dimensionality as the input
● The model uses softmax for predicting the next value
● The model is optimized to maximize the log-likelihood of the data w.r.t the
parameters
Causal Convolution
● The causal
convolution
ensures the
conditional
probability is
satisfied
● The audio is a 1-
D data therefore
simple masking
tensor is used
for convolution
Causal convolution(cont…)
● At training time , the conditional predictions for all timesteps can be made in parallel
because all timesteps of ground truth x are known.
● When generating the predictions are sequential, After each prediction it is fed back into
the network to predict the next sample
● The problem with causal convolutions is that they require many layers, or large filters to
increase the receptive field
Dilated Convolution
● A dilated convolution is a convolution
where the filter is applied over an
area larger than its length by
skipping input values with a certain
step
Dilated Convolution
● The more layer the convolution
has the receptive field and
efficiency of the model
increases
Softmax Distributions
● Because raw audio is typically stored as a sequence of 16-bit integer values (one per
timestep), a softmax layer would need to output 65,536 probabilities per timestep to model
all possible values
● To make this more tractable, we first apply a µ-law companding transformation (ITU-T,
1988) to the data, and then quantize it to 256 possible values
-1< x < 1 and μ = 255
Gated Activation Units
● We use the same gated activation unit as used in the gated PixelCNN and no need for two
CNN
Conditional Wavenet
● Given an additional input h WaveNets can model the conditional distribution p(x|h) of the audio
given the input
● We condition the model on other inputs in two different ways: global conditioning and local
conditioning
● Global conditioning is characterised by a single latent representation h that influences the output
distribution across all timesteps, e.g. a speaker embedding
● For local conditioning we can condition after linguistic features
Advantages and Limitations
● Training is easy
● Less space and time required
● Better receptive field
● Better conditioning
● Slower Generation
Applications
● Music Generation with WaveNet
○ Enlarging the receptive field was crucial to obtain samples that sounded musical
○ Even with a receptive field of several seconds ,the models did not enforce long-range consistency which resulted
in second-to-second variations in genre, instrumentation,volume and sound quality
○ Conditional generation with genre or instruments work reasonably well
○ Dataset used 1) MaganaTagATune 2) Youtube piano dataset
● Conditioning on ImageNet Classes
○ Give one-hot encoding for hi for the i-th class we model p(x|hi)
○ Significantly improved the log likelihood
○ We observed great improvements in the visual quality of the generated samples
○ We see that the generated classes are very distinct from one another, and that the corresponding objects, animals
and backgrounds are clearly produced. Furthermore the images of a single class are very diverse: for example the
model was able to generate similar scenes from different angles and lightning conditions
Conclusion
● Computationally more efficient
● More state-of-the-art performance on the ImageNet 32x32 and 64x64 dataset
● Conditional modelling with classes generate realistic images corresponding classes. On
human portraits the model is capable of generating new images from the same person in
different poses and lighting condition
● High log-likelihood scores
● In future applications involve generating new images with a certain object solely from a
single example image and creating variational autoencoders
References
1. Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu,
Conditional Image Generation with PixelCNN Decoders arXiv:1601.053328v2 [cs.CV] 18 Jun 2016
2. Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal
Kalchbrenner, Andrew Senior, Koray Kavukcuoglu, WaveNet: A Generative Model for Raw Audio
3. Aäron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu ,Pixel Recurrent Neural Networks
arXiv:1601.06759v3 [cs.CV] 19 Aug 2016
4. Bengio, Yoshua and Bengio, Samy. Modeling high dimensional discrete data with multi-layer neural
networks. pp. 400–406. MIT Press, 2000.
5. Agiomyrgiannakis, Yannis. Vocaine the vocoder and applications is speech synthesis. In ICASSP, pp. 4230–
4234, 2015.
References
6. Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S
Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on
heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
7. Marc G Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Remi Munos.
Unifying count-based exploration and intrinsic motivation. arXiv preprint arXiv:1606.01868, 2016.
8. Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear independent components
estimation. arXiv preprint arXiv:1410.8516, 2014
THANK YOU

More Related Content

What's hot

2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Deep belief network.pptx
Deep belief network.pptxDeep belief network.pptx
Deep belief network.pptxSushilAcharya18
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Recurrent and Recursive Nets (part 2)
Recurrent and Recursive Nets (part 2)Recurrent and Recursive Nets (part 2)
Recurrent and Recursive Nets (part 2)sohaib_alam
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and originShubhankar Mohan
 
Fuzzy Set Theory
Fuzzy Set TheoryFuzzy Set Theory
Fuzzy Set TheoryAMIT KUMAR
 
Adaptive Resonance Theory
Adaptive Resonance TheoryAdaptive Resonance Theory
Adaptive Resonance TheoryNaveen Kumar
 
Logics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiLogics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiShaishavShah8
 
Neural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronNeural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronMostafa G. M. Mostafa
 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding GradientsSiddharth Vij
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesisswapnac12
 

What's hot (20)

Lstm
LstmLstm
Lstm
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
WaveNet.pdf
WaveNet.pdfWaveNet.pdf
WaveNet.pdf
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
Deep belief network.pptx
Deep belief network.pptxDeep belief network.pptx
Deep belief network.pptx
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Back propagation
Back propagationBack propagation
Back propagation
 
Recurrent and Recursive Nets (part 2)
Recurrent and Recursive Nets (part 2)Recurrent and Recursive Nets (part 2)
Recurrent and Recursive Nets (part 2)
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
Fuzzy Set Theory
Fuzzy Set TheoryFuzzy Set Theory
Fuzzy Set Theory
 
Adaptive Resonance Theory
Adaptive Resonance TheoryAdaptive Resonance Theory
Adaptive Resonance Theory
 
Logics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiLogics for non monotonic reasoning-ai
Logics for non monotonic reasoning-ai
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Neural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronNeural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's Perceptron
 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding Gradients
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesis
 

Similar to WaveNet

convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningssusere5ddd6
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術CHENHuiMei
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
GNR638_Course Project for spring semester
GNR638_Course Project for spring semesterGNR638_Course Project for spring semester
GNR638_Course Project for spring semesterBijayChandraDasTECH0
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Single Image Depth Estimation using frequency domain analysis and Deep learning
Single Image Depth Estimation using frequency domain analysis and Deep learningSingle Image Depth Estimation using frequency domain analysis and Deep learning
Single Image Depth Estimation using frequency domain analysis and Deep learningAhan M R
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networksmilad abbasi
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural networkAbdullah Khan Zehady
 

Similar to WaveNet (20)

convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
Icon18revrec sudeshna
Icon18revrec sudeshnaIcon18revrec sudeshna
Icon18revrec sudeshna
 
TransNeRF
TransNeRFTransNeRF
TransNeRF
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Mnist report
Mnist reportMnist report
Mnist report
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
GNR638_Course Project for spring semester
GNR638_Course Project for spring semesterGNR638_Course Project for spring semester
GNR638_Course Project for spring semester
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 
Single Image Depth Estimation using frequency domain analysis and Deep learning
Single Image Depth Estimation using frequency domain analysis and Deep learningSingle Image Depth Estimation using frequency domain analysis and Deep learning
Single Image Depth Estimation using frequency domain analysis and Deep learning
 
CNN_AH.pptx
CNN_AH.pptxCNN_AH.pptx
CNN_AH.pptx
 
CNN_AH.pptx
CNN_AH.pptxCNN_AH.pptx
CNN_AH.pptx
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural network
 

Recently uploaded

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 

Recently uploaded (20)

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 

WaveNet

  • 1. WaveNet an audio generative model based on the PixelCNN architecture Guided By Presented By Ms. FABEELA ALI RAWTHER ABEY R HURTIS
  • 2. Introduction ● This work explores raw audio generation techniques, inspired by recent advances in neural autoregressive generative models. ● The question this paper addresses is whether similar approaches can succeed in generating wideband raw audio waveforms which are signals with very high temporal resolution, at least 16,000 samples per second
  • 3. Convolutional Neural Network(CNN) ● A CNN has very less parameters compared to a traditional dense network ● Less parameters means the model has less space & time complexity and less training time
  • 4. PixelCNN ● The h features for each input position at every layer in the network are split into three parts, each corresponding to one of the RGB channels ● PixelCNN uses joint distribution of pixels to generate pixel Xi ● The pixels are interdependent is in raster scan order ● Pixel Xi is generated for R, G and B channels which is interdependent
  • 5. Generation of Pixels X1 Xi Xn2 X1 Xi Xn2 X1 Xi Xn2 pixel Xi is generated for the three channels. For generating the pixels a softmax layer is used at the end of the architecture, It outputs the most probable intensity w.r.t the context pixels before Xi
  • 6. Masked Convolution ● A filter with mask has a limited receptive field ● This leads to blind spot in the convolution ● To avoid the blind spot another convolution is needed ( a Horizontal convolution is added)
  • 7. Masked Convolution (cont…) ● The vertical stack of convolution reads all the rows above ● Horizontal stack of convolution reads the pixels from current pixel to all the pixel’s to it’s left
  • 8. Gated PixelCNN ● When using simply two convolution for vertical and horizontal more complex interactions cannot be learned but stacks of layer may help ● The need for a different model arise due to the fact that the existing model is not enough for mapping the more import features ● The Gated PixelCNN fills the gap in learning it combines the input of gate and filter to implement the horizontal and vertical convolution ⊙ elemental wise multiplication * Convolutional operation
  • 10. Conditional PixelCNN ● Given a high-level image description represented as a latent vector h, we seek to model the conditional distribution p(x|h) ● We model the conditional distribution by adding terms that depend on h to the activations before the nonlinearities in the gated PixelCNN equation
  • 11. Conditional PixelCNN(cont…) ● If h is one-hot encoded that specifies a class this is equivalent to adding a class dependent bias at every layer ● By mapping h to a spatial representation S=m(h) (which has the same dim as the image but may have an arbitrary number of feature maps) with a deconvolutional neural network m() then we obtain a location dependent bias
  • 12. WaveNet ● The joint probability of a waveform x = {x1,...,xT} is factored as a product of conditional probabilities as follows ● The conditional probability distribution is modelled by stack of convolution layers. There is no pooling and the output has the same time dimensionality as the input ● The model uses softmax for predicting the next value ● The model is optimized to maximize the log-likelihood of the data w.r.t the parameters
  • 13. Causal Convolution ● The causal convolution ensures the conditional probability is satisfied ● The audio is a 1- D data therefore simple masking tensor is used for convolution
  • 14. Causal convolution(cont…) ● At training time , the conditional predictions for all timesteps can be made in parallel because all timesteps of ground truth x are known. ● When generating the predictions are sequential, After each prediction it is fed back into the network to predict the next sample ● The problem with causal convolutions is that they require many layers, or large filters to increase the receptive field
  • 15. Dilated Convolution ● A dilated convolution is a convolution where the filter is applied over an area larger than its length by skipping input values with a certain step
  • 16. Dilated Convolution ● The more layer the convolution has the receptive field and efficiency of the model increases
  • 17. Softmax Distributions ● Because raw audio is typically stored as a sequence of 16-bit integer values (one per timestep), a softmax layer would need to output 65,536 probabilities per timestep to model all possible values ● To make this more tractable, we first apply a µ-law companding transformation (ITU-T, 1988) to the data, and then quantize it to 256 possible values -1< x < 1 and μ = 255
  • 18. Gated Activation Units ● We use the same gated activation unit as used in the gated PixelCNN and no need for two CNN
  • 19. Conditional Wavenet ● Given an additional input h WaveNets can model the conditional distribution p(x|h) of the audio given the input ● We condition the model on other inputs in two different ways: global conditioning and local conditioning ● Global conditioning is characterised by a single latent representation h that influences the output distribution across all timesteps, e.g. a speaker embedding ● For local conditioning we can condition after linguistic features
  • 20. Advantages and Limitations ● Training is easy ● Less space and time required ● Better receptive field ● Better conditioning ● Slower Generation
  • 21. Applications ● Music Generation with WaveNet ○ Enlarging the receptive field was crucial to obtain samples that sounded musical ○ Even with a receptive field of several seconds ,the models did not enforce long-range consistency which resulted in second-to-second variations in genre, instrumentation,volume and sound quality ○ Conditional generation with genre or instruments work reasonably well ○ Dataset used 1) MaganaTagATune 2) Youtube piano dataset ● Conditioning on ImageNet Classes ○ Give one-hot encoding for hi for the i-th class we model p(x|hi) ○ Significantly improved the log likelihood ○ We observed great improvements in the visual quality of the generated samples ○ We see that the generated classes are very distinct from one another, and that the corresponding objects, animals and backgrounds are clearly produced. Furthermore the images of a single class are very diverse: for example the model was able to generate similar scenes from different angles and lightning conditions
  • 22.
  • 23. Conclusion ● Computationally more efficient ● More state-of-the-art performance on the ImageNet 32x32 and 64x64 dataset ● Conditional modelling with classes generate realistic images corresponding classes. On human portraits the model is capable of generating new images from the same person in different poses and lighting condition ● High log-likelihood scores ● In future applications involve generating new images with a certain object solely from a single example image and creating variational autoencoders
  • 24. References 1. Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu, Conditional Image Generation with PixelCNN Decoders arXiv:1601.053328v2 [cs.CV] 18 Jun 2016 2. Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu, WaveNet: A Generative Model for Raw Audio 3. Aäron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu ,Pixel Recurrent Neural Networks arXiv:1601.06759v3 [cs.CV] 19 Aug 2016 4. Bengio, Yoshua and Bengio, Samy. Modeling high dimensional discrete data with multi-layer neural networks. pp. 400–406. MIT Press, 2000. 5. Agiomyrgiannakis, Yannis. Vocaine the vocoder and applications is speech synthesis. In ICASSP, pp. 4230– 4234, 2015.
  • 25. References 6. Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016. 7. Marc G Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Remi Munos. Unifying count-based exploration and intrinsic motivation. arXiv preprint arXiv:1606.01868, 2016. 8. Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014