SlideShare a Scribd company logo
1 of 21
Introduction to Speech Recognition
with Deep learning
Speech Processing
Prof. Mahmoud Gad Allah
"Computers are able to see, hear
and learn. Welcome to the Future“
- Dave Waters
Speech recognition, or speech-to-text, is the ability of a machine or program to
identify words spoken aloud and convert them into readable text.
Speech Recognition with Deep Learning
Automatic speech recognition (ASR) refers to the task of recognizing
human speech and translating it into text.
 This research field has gained a lot of focus over the last decades. It is an important
research area for human-to-machine communication.
 Early methods focused on manual feature extraction and conventional techniques
such as Gaussian Mixture Models (GMM), the Dynamic Time Warping
(DTW) algorithm and Hidden Markov Models (HMM).
 More recently, neural networks such as recurrent neural networks (RNNs),
convolutional neural networks (CNNs) and in the last years Transformers, have been
applied on ASR and have achieved great performance.
What is Deep Learning?
Deep learning is a subset of methods for machine learning which is a field dedicated to
the study and development of machines that can learn (sometimes with the goal of
eventually attaining general artificial intelligence).
Why Call it “Deep Learning“?
Why Not Just “Artificial Neural Networks“?
 Geoffrey Hinton is a pioneer in the field of artificial neural networks and co-
published the first paper on the backpropagation algorithm for training multilayer
perceptron networks.
 He may have started the introduction of the phrasing “deep” to describe the
development of large artificial neural networks.
 Deep learning is a deep neural network with many hidden layers and many nodes
in every hidden layer.
Speech Recognition with Deep Learning
The overall flow of ASR can be represented as shown below:
Deep Learning Architecture
There are many variations of deep learning architecture for ASR. Two commonly used
approaches are:
 A CNN (Convolutional Neural Network) plus RNN-based (Recurrent Neural Network)
architecture that uses the CTC Loss algorithm to demarcate each character of the
words in the speech. eg. Baidu’s Deep Speech model.
 An RNN-based sequence-to-sequence network that treats each ‘slice’ of the
spectrogram as one element in a sequence eg. Google’s Listen Attend Spell (LAS)
model.
Deep Learning Architecture
CNN performs better than RNN in Speech Recognition ?
at Image Processing and Speech Emotional Recognition(generally Speech
Recognition)due to application of filters and MaxPooling which leads to elimination of
lighter pixels in Image Processing(compared to darker); and noise and ... (compared to
the voice phonemes)in Speech Recognition;accordingly , CNNs work by reducing an
image/speech to its key features and using the combined probabilities of the identified
features appearing together to determine a classification.While RNNs deals with,only,
sequential and there is no elimination and filtering which has made RNN fall back
compared to CNN in aforementioned tasks.The link below is a brief description of CNN.
Convolutional Neural Network (ConvNet/CNN)
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image,
assign importance (learnable weights and biases) to various aspects/objects in the image and be able to
differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other
classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets
have the ability to learn these filters/characteristics.
Convolutional Neural Network (ConvNet/CNN)
The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain
and was inspired by the organization of the Visual Cortex. Individual neurons respond to stimuli only in a
restricted region of the visual field known as the Receptive Field. A collection of such fields overlap to cover the
entire visual area.
Convolutional Neural Network (ConvNet/CNN)
Convolutional neural networks refer to a sub-category of neural networks: they, therefore, have all the
characteristics of neural networks. However, CNN is specifically designed to process input images. Their
architecture is then more specific: it is composed of two main blocks.
Convolutional Neural Network (ConvNet/CNN)
The first block makes the particularity of this type of neural network since it functions as a feature extractor. To
do this, it performs template matching by applying convolution filtering operations. The first layer filters the
image with several convolution kernels and returns “feature maps”, which are then normalized (with an
activation function) and/or resized.
This process can be repeated several times: we filter the features maps obtained with new kernels, which gives
us new features maps to normalize and resize, and we can filter again, and so on. Finally, the values of the last
feature maps are concatenated into a vector. This vector defines the output of the first block and the input of the
second.
Convolutional Neural Network (ConvNet/CNN)
The second block is not characteristic of a CNN: it is in fact at the end of all the neural networks used for
classification. The input vector values are transformed (with several linear combinations and activation functions)
to return a new vector to the output. This last vector contains as many elements as there are classes: element i
represents the probability that the image belongs to class i. Each element is therefore between 0 and 1, and the
sum of all is worth 1. These probabilities are calculated by the last layer of this block (and therefore of the
network), which uses a logistic function (binary classification) or a softmax function (multi-class classification) as
an activation function.
The different layers of a CNN
1. The convolutional layer
 The convolutional layer is the key component of convolutional neural networks, and is always at least
their first layer.
Its purpose is to detect the presence of a set of features in the images received as input. This is done by
convolution filtering: the principle is to “drag” a window representing the feature on the image, and to
calculate the convolution product between the feature and each portion of the scanned image. A feature is
then seen as a filter: the two terms are equivalent in this context.
 The convolutional layer thus receives several images as input, and calculates the convolution of each of
them with each filter. The filters correspond exactly to the features we want to find in the images.
We get for each pair (image, filter) a feature map, which tells us where the features are in the image: the
higher the value, the more the corresponding place in the image resembles the feature.
 Unlike traditional methods, features are not pre-defined according to a particular formalism (for
example SIFT), but learned by the network during the training phase! Filter kernels refer to the
convolution layer weights.
The different layers of a CNN
2. The pooling layer
 This type of layer is often placed between two layers of convolution: it receives several feature maps and
applies the pooling operation to each of them.
The pooling operation consists in reducing the size of the images while preserving their important
characteristics.
To do this, we cut the image into regular cells, then we keep the maximum value within each cell. In practice,
small square cells are often used to avoid losing too much information. The most common choices are 2x2
adjacent cells that don’t overlap, or 3x3 cells, separated from each other by a step of 2 pixels
(thus overlapping).
 The pooling layer reduces the number of parameters and calculations in the network. This improves the
efficiency of the network and avoids over-learning.
The different layers of a CNN
3. The ReLu correction layer
 ReLU (Rectified Linear Units) refers to the real non-linear function defined by ReLU(x)=max(0,x). Visually,
it looks like the following:
 The ReLU correction layer replaces all negative values received as inputs by zeros. It acts as
an activation function.
The different layers of a CNN
4. The fully-connected layer
 The fully-connected layer is always the last layer of a neural network, convolutional or not — so it is not
characteristic of a CNN.
This type of layer receives an input vector and produces a new output vector. To do this, it applies a linear
combination and then possibly an activation function to the input values received.
 The last fully-connected layer classifies the image as an input to the network: it returns
a vector of size N, where N is the number of classes in our image classification problem.
Each element of the vector indicates the probability for the input image to belong to a
class.
References
• https://www.analyticsvidhya.com/blog/2020/10/what-is-the-convolutional-
neural-network-architecture/
• https://towardsdatascience.com/understand-the-architecture-of-cnn-
90a25e244c7
• https://www.analyticsvidhya.com/blog/2021/05/convolutional-neural-networks-
cnn/
• https://clevertap.com/blog/neural-networks/
• https://towardsdatascience.com/audio-deep-learning-made-simple-automatic-
speech-recognition-asr-how-it-works-716cfce4c706

More Related Content

What's hot

Compression: Images (JPEG)
Compression: Images (JPEG)Compression: Images (JPEG)
Compression: Images (JPEG)danishrafiq
 
The motion estimation
The motion estimationThe motion estimation
The motion estimationsakshij91
 
Color Image Processing: Basics
Color Image Processing: BasicsColor Image Processing: Basics
Color Image Processing: BasicsA B Shinde
 
Image Representation & Descriptors
Image Representation & DescriptorsImage Representation & Descriptors
Image Representation & DescriptorsPundrikPatel
 
Raster animation
Raster animationRaster animation
Raster animationabhijit754
 
Fundamentals of Data compression
Fundamentals of Data compressionFundamentals of Data compression
Fundamentals of Data compressionM.k. Praveen
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkmustafa aadel
 
Image Processing: Spatial filters
Image Processing: Spatial filtersImage Processing: Spatial filters
Image Processing: Spatial filtersA B Shinde
 
Bit plane slicing
Bit plane slicingBit plane slicing
Bit plane slicingAsad Ali
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Interpixel redundancy
Interpixel redundancyInterpixel redundancy
Interpixel redundancyNaveen Kumar
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
05 histogram processing DIP
05 histogram processing DIP05 histogram processing DIP
05 histogram processing DIPbabak danyal
 

What's hot (20)

Compression: Images (JPEG)
Compression: Images (JPEG)Compression: Images (JPEG)
Compression: Images (JPEG)
 
The motion estimation
The motion estimationThe motion estimation
The motion estimation
 
Color Image Processing: Basics
Color Image Processing: BasicsColor Image Processing: Basics
Color Image Processing: Basics
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Halftoning in Computer Graphics
Halftoning  in Computer GraphicsHalftoning  in Computer Graphics
Halftoning in Computer Graphics
 
Rgb and cmy color model
Rgb and cmy color modelRgb and cmy color model
Rgb and cmy color model
 
Image Representation & Descriptors
Image Representation & DescriptorsImage Representation & Descriptors
Image Representation & Descriptors
 
Image Compression
Image CompressionImage Compression
Image Compression
 
Raster animation
Raster animationRaster animation
Raster animation
 
JPEG
JPEGJPEG
JPEG
 
Gamma and Colour Space
Gamma and Colour SpaceGamma and Colour Space
Gamma and Colour Space
 
Fundamentals of Data compression
Fundamentals of Data compressionFundamentals of Data compression
Fundamentals of Data compression
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Image Processing: Spatial filters
Image Processing: Spatial filtersImage Processing: Spatial filters
Image Processing: Spatial filters
 
Bit plane slicing
Bit plane slicingBit plane slicing
Bit plane slicing
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Interpixel redundancy
Interpixel redundancyInterpixel redundancy
Interpixel redundancy
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
05 histogram processing DIP
05 histogram processing DIP05 histogram processing DIP
05 histogram processing DIP
 
Vector quantization
Vector quantizationVector quantization
Vector quantization
 

Similar to Speech Processing with deep learning

Deep Neural Network DNN.docx
Deep Neural Network DNN.docxDeep Neural Network DNN.docx
Deep Neural Network DNN.docxjaffarbikat
 
BASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxBASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxRiteshPandey184067
 
let's dive to deep learning
let's dive to deep learninglet's dive to deep learning
let's dive to deep learningMohamed Essam
 
Convolutional Neural Networks square measure terribly kind of like n.pdf
Convolutional Neural Networks square measure terribly kind of like n.pdfConvolutional Neural Networks square measure terribly kind of like n.pdf
Convolutional Neural Networks square measure terribly kind of like n.pdfpoddaranand1
 
deeplearning
deeplearningdeeplearning
deeplearninghuda2018
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningRADO7900
 
Let_s_Dive_to_Deep_Learning.pptx
Let_s_Dive_to_Deep_Learning.pptxLet_s_Dive_to_Deep_Learning.pptx
Let_s_Dive_to_Deep_Learning.pptxMohamed Essam
 
Convolution neural networks
Convolution neural networksConvolution neural networks
Convolution neural networksFares Hasan
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
Top 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inTop 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inAmanKumarSingh97
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionIJCSIS Research Publications
 
Deep learning for Computer Vision intro
Deep learning for Computer Vision introDeep learning for Computer Vision intro
Deep learning for Computer Vision introNadav Carmel
 
Deep Learning Training at Intel
Deep Learning Training at IntelDeep Learning Training at Intel
Deep Learning Training at IntelAtul Vaish
 

Similar to Speech Processing with deep learning (20)

Deep Neural Network DNN.docx
Deep Neural Network DNN.docxDeep Neural Network DNN.docx
Deep Neural Network DNN.docx
 
BASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxBASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptx
 
Cnn
CnnCnn
Cnn
 
deep learning
deep learningdeep learning
deep learning
 
let's dive to deep learning
let's dive to deep learninglet's dive to deep learning
let's dive to deep learning
 
Convolutional Neural Networks square measure terribly kind of like n.pdf
Convolutional Neural Networks square measure terribly kind of like n.pdfConvolutional Neural Networks square measure terribly kind of like n.pdf
Convolutional Neural Networks square measure terribly kind of like n.pdf
 
deeplearning
deeplearningdeeplearning
deeplearning
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Let_s_Dive_to_Deep_Learning.pptx
Let_s_Dive_to_Deep_Learning.pptxLet_s_Dive_to_Deep_Learning.pptx
Let_s_Dive_to_Deep_Learning.pptx
 
Convolution neural networks
Convolution neural networksConvolution neural networks
Convolution neural networks
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
DL.pdf
DL.pdfDL.pdf
DL.pdf
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
Mnist report
Mnist reportMnist report
Mnist report
 
Neural Network
Neural NetworkNeural Network
Neural Network
 
Top 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inTop 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know in
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware Detection
 
Deep learning for Computer Vision intro
Deep learning for Computer Vision introDeep learning for Computer Vision intro
Deep learning for Computer Vision intro
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
Deep Learning Training at Intel
Deep Learning Training at IntelDeep Learning Training at Intel
Deep Learning Training at Intel
 

More from Mohamed Essam

Data Science Crash course
Data Science Crash courseData Science Crash course
Data Science Crash courseMohamed Essam
 
2.Feature Extraction
2.Feature Extraction2.Feature Extraction
2.Feature ExtractionMohamed Essam
 
Introduction to Robotics.pptx
Introduction to Robotics.pptxIntroduction to Robotics.pptx
Introduction to Robotics.pptxMohamed Essam
 
Introduction_to_Gui_with_tkinter.pptx
Introduction_to_Gui_with_tkinter.pptxIntroduction_to_Gui_with_tkinter.pptx
Introduction_to_Gui_with_tkinter.pptxMohamed Essam
 
Getting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptxGetting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptxMohamed Essam
 
OOP-Advanced_Programming.pptx
OOP-Advanced_Programming.pptxOOP-Advanced_Programming.pptx
OOP-Advanced_Programming.pptxMohamed Essam
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxMohamed Essam
 
1.What_if_Adham_Nour_tried_to_make_a_Machine_Learning_Model_at_Home.pptx
1.What_if_Adham_Nour_tried_to_make_a_Machine_Learning_Model_at_Home.pptx1.What_if_Adham_Nour_tried_to_make_a_Machine_Learning_Model_at_Home.pptx
1.What_if_Adham_Nour_tried_to_make_a_Machine_Learning_Model_at_Home.pptxMohamed Essam
 
2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptx2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptxMohamed Essam
 
Activation_function.pptx
Activation_function.pptxActivation_function.pptx
Activation_function.pptxMohamed Essam
 
Deep_Learning_Frameworks
Deep_Learning_FrameworksDeep_Learning_Frameworks
Deep_Learning_FrameworksMohamed Essam
 
Software Engineering
Software EngineeringSoftware Engineering
Software EngineeringMohamed Essam
 

More from Mohamed Essam (20)

Data Science Crash course
Data Science Crash courseData Science Crash course
Data Science Crash course
 
2.Feature Extraction
2.Feature Extraction2.Feature Extraction
2.Feature Extraction
 
Data Science
Data ScienceData Science
Data Science
 
Introduction to Robotics.pptx
Introduction to Robotics.pptxIntroduction to Robotics.pptx
Introduction to Robotics.pptx
 
Introduction_to_Gui_with_tkinter.pptx
Introduction_to_Gui_with_tkinter.pptxIntroduction_to_Gui_with_tkinter.pptx
Introduction_to_Gui_with_tkinter.pptx
 
Getting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptxGetting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptx
 
Linear_algebra.pptx
Linear_algebra.pptxLinear_algebra.pptx
Linear_algebra.pptx
 
OOP-Advanced_Programming.pptx
OOP-Advanced_Programming.pptxOOP-Advanced_Programming.pptx
OOP-Advanced_Programming.pptx
 
1.Basic_Syntax
1.Basic_Syntax1.Basic_Syntax
1.Basic_Syntax
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
 
1.What_if_Adham_Nour_tried_to_make_a_Machine_Learning_Model_at_Home.pptx
1.What_if_Adham_Nour_tried_to_make_a_Machine_Learning_Model_at_Home.pptx1.What_if_Adham_Nour_tried_to_make_a_Machine_Learning_Model_at_Home.pptx
1.What_if_Adham_Nour_tried_to_make_a_Machine_Learning_Model_at_Home.pptx
 
Clean_Code
Clean_CodeClean_Code
Clean_Code
 
Linear_Regression
Linear_RegressionLinear_Regression
Linear_Regression
 
2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptx2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptx
 
Naieve_Bayee.pptx
Naieve_Bayee.pptxNaieve_Bayee.pptx
Naieve_Bayee.pptx
 
Activation_function.pptx
Activation_function.pptxActivation_function.pptx
Activation_function.pptx
 
Deep_Learning_Frameworks
Deep_Learning_FrameworksDeep_Learning_Frameworks
Deep_Learning_Frameworks
 
Neural_Network
Neural_NetworkNeural_Network
Neural_Network
 
Software Engineering
Software EngineeringSoftware Engineering
Software Engineering
 

Recently uploaded

What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?Watsoo Telematics
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 

Recently uploaded (20)

What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 

Speech Processing with deep learning

  • 1. Introduction to Speech Recognition with Deep learning Speech Processing Prof. Mahmoud Gad Allah
  • 2. "Computers are able to see, hear and learn. Welcome to the Future“ - Dave Waters
  • 3. Speech recognition, or speech-to-text, is the ability of a machine or program to identify words spoken aloud and convert them into readable text.
  • 4. Speech Recognition with Deep Learning Automatic speech recognition (ASR) refers to the task of recognizing human speech and translating it into text.  This research field has gained a lot of focus over the last decades. It is an important research area for human-to-machine communication.  Early methods focused on manual feature extraction and conventional techniques such as Gaussian Mixture Models (GMM), the Dynamic Time Warping (DTW) algorithm and Hidden Markov Models (HMM).  More recently, neural networks such as recurrent neural networks (RNNs), convolutional neural networks (CNNs) and in the last years Transformers, have been applied on ASR and have achieved great performance.
  • 5. What is Deep Learning? Deep learning is a subset of methods for machine learning which is a field dedicated to the study and development of machines that can learn (sometimes with the goal of eventually attaining general artificial intelligence).
  • 6. Why Call it “Deep Learning“? Why Not Just “Artificial Neural Networks“?  Geoffrey Hinton is a pioneer in the field of artificial neural networks and co- published the first paper on the backpropagation algorithm for training multilayer perceptron networks.  He may have started the introduction of the phrasing “deep” to describe the development of large artificial neural networks.  Deep learning is a deep neural network with many hidden layers and many nodes in every hidden layer.
  • 7. Speech Recognition with Deep Learning The overall flow of ASR can be represented as shown below:
  • 8. Deep Learning Architecture There are many variations of deep learning architecture for ASR. Two commonly used approaches are:  A CNN (Convolutional Neural Network) plus RNN-based (Recurrent Neural Network) architecture that uses the CTC Loss algorithm to demarcate each character of the words in the speech. eg. Baidu’s Deep Speech model.  An RNN-based sequence-to-sequence network that treats each ‘slice’ of the spectrogram as one element in a sequence eg. Google’s Listen Attend Spell (LAS) model.
  • 9. Deep Learning Architecture CNN performs better than RNN in Speech Recognition ? at Image Processing and Speech Emotional Recognition(generally Speech Recognition)due to application of filters and MaxPooling which leads to elimination of lighter pixels in Image Processing(compared to darker); and noise and ... (compared to the voice phonemes)in Speech Recognition;accordingly , CNNs work by reducing an image/speech to its key features and using the combined probabilities of the identified features appearing together to determine a classification.While RNNs deals with,only, sequential and there is no elimination and filtering which has made RNN fall back compared to CNN in aforementioned tasks.The link below is a brief description of CNN.
  • 10. Convolutional Neural Network (ConvNet/CNN) A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics.
  • 11. Convolutional Neural Network (ConvNet/CNN) The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex. Individual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field. A collection of such fields overlap to cover the entire visual area.
  • 12. Convolutional Neural Network (ConvNet/CNN) Convolutional neural networks refer to a sub-category of neural networks: they, therefore, have all the characteristics of neural networks. However, CNN is specifically designed to process input images. Their architecture is then more specific: it is composed of two main blocks.
  • 13. Convolutional Neural Network (ConvNet/CNN) The first block makes the particularity of this type of neural network since it functions as a feature extractor. To do this, it performs template matching by applying convolution filtering operations. The first layer filters the image with several convolution kernels and returns “feature maps”, which are then normalized (with an activation function) and/or resized. This process can be repeated several times: we filter the features maps obtained with new kernels, which gives us new features maps to normalize and resize, and we can filter again, and so on. Finally, the values of the last feature maps are concatenated into a vector. This vector defines the output of the first block and the input of the second.
  • 14. Convolutional Neural Network (ConvNet/CNN) The second block is not characteristic of a CNN: it is in fact at the end of all the neural networks used for classification. The input vector values are transformed (with several linear combinations and activation functions) to return a new vector to the output. This last vector contains as many elements as there are classes: element i represents the probability that the image belongs to class i. Each element is therefore between 0 and 1, and the sum of all is worth 1. These probabilities are calculated by the last layer of this block (and therefore of the network), which uses a logistic function (binary classification) or a softmax function (multi-class classification) as an activation function.
  • 15. The different layers of a CNN 1. The convolutional layer  The convolutional layer is the key component of convolutional neural networks, and is always at least their first layer. Its purpose is to detect the presence of a set of features in the images received as input. This is done by convolution filtering: the principle is to “drag” a window representing the feature on the image, and to calculate the convolution product between the feature and each portion of the scanned image. A feature is then seen as a filter: the two terms are equivalent in this context.  The convolutional layer thus receives several images as input, and calculates the convolution of each of them with each filter. The filters correspond exactly to the features we want to find in the images. We get for each pair (image, filter) a feature map, which tells us where the features are in the image: the higher the value, the more the corresponding place in the image resembles the feature.  Unlike traditional methods, features are not pre-defined according to a particular formalism (for example SIFT), but learned by the network during the training phase! Filter kernels refer to the convolution layer weights.
  • 16.
  • 17. The different layers of a CNN 2. The pooling layer  This type of layer is often placed between two layers of convolution: it receives several feature maps and applies the pooling operation to each of them. The pooling operation consists in reducing the size of the images while preserving their important characteristics. To do this, we cut the image into regular cells, then we keep the maximum value within each cell. In practice, small square cells are often used to avoid losing too much information. The most common choices are 2x2 adjacent cells that don’t overlap, or 3x3 cells, separated from each other by a step of 2 pixels (thus overlapping).  The pooling layer reduces the number of parameters and calculations in the network. This improves the efficiency of the network and avoids over-learning.
  • 18.
  • 19. The different layers of a CNN 3. The ReLu correction layer  ReLU (Rectified Linear Units) refers to the real non-linear function defined by ReLU(x)=max(0,x). Visually, it looks like the following:  The ReLU correction layer replaces all negative values received as inputs by zeros. It acts as an activation function.
  • 20. The different layers of a CNN 4. The fully-connected layer  The fully-connected layer is always the last layer of a neural network, convolutional or not — so it is not characteristic of a CNN. This type of layer receives an input vector and produces a new output vector. To do this, it applies a linear combination and then possibly an activation function to the input values received.  The last fully-connected layer classifies the image as an input to the network: it returns a vector of size N, where N is the number of classes in our image classification problem. Each element of the vector indicates the probability for the input image to belong to a class.
  • 21. References • https://www.analyticsvidhya.com/blog/2020/10/what-is-the-convolutional- neural-network-architecture/ • https://towardsdatascience.com/understand-the-architecture-of-cnn- 90a25e244c7 • https://www.analyticsvidhya.com/blog/2021/05/convolutional-neural-networks- cnn/ • https://clevertap.com/blog/neural-networks/ • https://towardsdatascience.com/audio-deep-learning-made-simple-automatic- speech-recognition-asr-how-it-works-716cfce4c706