SlideShare a Scribd company logo
1 of 17
Introduction
to
Speech
Processing
using Librosa
D r. S h i k h a B a g h e l
P o s t d o c t o r a l F e l l o w
L E A P L a b , E E D e p a r t m e n t ,
I I S c B e n g a l u r u
Content
• Speech Signal
a) Introduction
b) Speech Production & Perception
c) Sampling Theorem
d) Need for Short Term Processing
e) Fundamental Frequency
f) Zero-Crossing Rate
g) Short Term Energy
h) Spectrogram
• Librosa Library
2
D r . S h i k h a B a g h e l , I I S c B e n g a l u r u
Speech Signal: An introduction
D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u 3
A primary medium for our day-
to-day life communication.
Applications
Speech recognition
Speech coding
Speech synthesis (Text to speech
conversion)
Speaker verification / recognition
Speech enhancement
Aids to the Handicaped
Biomedical Applications
Image Credit: https://www.vectorstock.com/royalty-free-vector/bubble-people-
bubbling-speech-communication-vector-25841557
Speech Production & Perception
4
D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u
Speech Production
• Speech signal is composed of a sequence of sound units (or phonemes).
• Sound unit production:
5
D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u
Video
Credit: https://www.youtube.com/watch?v=JF8rlKuSoFM
Sampling Theorem
6
D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u
Image Credit: https://www.tutorialspoint.com/signals_and_systems/signals_sampling_theorem.htm
Sampling Rate: Number of samples per second
Sampling frequency (fs) ≥ 2 × Maximum frequency
(fm)
Need for Short Term Processing of
Speech
• Speech is produced from a time varying vocal tract system with time varying
excitation.
• Speech signal is non-stationary in nature.
• Most of the signal processing tools studied in signals and systems and signal processing
assume time invariant system and time invariant excitation, i.e., stationary signal.
• Hence these tools are not directly applicable for speech processing.
• Speech signal may be stationary when it is viewed in blocks of 10-30 msec.
• Hence to process speech by different signal processing tools, it is viewed in terms of 10-
30 msec. Such a processing is termed as Short-Term Processing (STP).
D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 7
Audio File format
• .mp3
Lossy format. It compresses the data. Essential information might be lost.
• .flac
It also compresses the data, but original signal can be reconstructed perfectly.
• .wav
An uncompressed format. The best audio quality, but the file size is largest.
D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 8
Windowing
D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 9
Frame size: 25 ms and frame shift: 10 ms
Audio as a function of time and Frequency
D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 0
https://towardsdatascience.com/extract-features-of-music-75a3f9bc265d
Fundamental Frequency
(F0)
• Rate at which vocal-folds
vibrates.
• Fundamental Frequency (F0)
= 1/ time taken to complete
one vocal-fold vibration
D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u 1 1
Video Credit: https://youtu.be/mJedwz_r2Pc
Image Credit: https://wiki.aalto.fi/pages/viewpage.action?pageId=149890776
Zero-Crossing Rate
• The zero-crossing rate is the rate of sign-changes along a signal, i.e., the rate at which
the signal changes from positive to negative or back.
• This feature has been used heavily in both speech recognition and music information
retrieval.
• It usually has higher values for highly percussive sounds like those in metal and rock.
D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 2
https://www.analyticsvidhya.com/blog/2022/01/analysis-of-zero-crossing-rates-of-different-music-
genre-tracks/
Zero-Crossing Rate
D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 3
Short Term Energy
• The energy associated with speech is time varying in nature.
• By the nature of production, the speech signal consist of voiced, unvoiced and silence
regions.
• Further the energy associated with voiced region is large compared to unvoiced region
and silence region will not have least or negligible energy.
• Thus short term energy can be used for voiced, unvoiced and silence classification of
speech.
D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 4
Short Term Energy
D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 5
Spectrogram
• A spectrogram is a visual representation of the spectrum of frequencies of sound or
other signals as they vary with time.
• It’s a representation of frequencies changing with respect to time for given music
signals.
D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 6
https://towardsdatascience.com/understanding-audio-data-fourier-transform-fft-spectrogram-and-
speech-recognition-a4072d228520
Librosa Library
Let's Explore
D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u 1 7

More Related Content

Similar to SpeechProcessing_using_Librosa__1___1_.pptx

Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Rehan Ahmed
 
Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Madhu Babu
 
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...IRJET Journal
 
Occupational noise hazard
Occupational noise hazardOccupational noise hazard
Occupational noise hazardCandice Graham
 
Studio vs Live Sound Engineering
Studio vs Live Sound EngineeringStudio vs Live Sound Engineering
Studio vs Live Sound EngineeringHasibur Rahman
 
Multimedia systems_Digital Audio and Digital Video
Multimedia systems_Digital Audio and Digital VideoMultimedia systems_Digital Audio and Digital Video
Multimedia systems_Digital Audio and Digital VideoTMARAGATHAM
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speechNikolay Karpov
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speechNikolay Karpov
 
How to play audio from a microcontroller
How to play audio from a microcontrollerHow to play audio from a microcontroller
How to play audio from a microcontrollerMahadev Gopalakrishnan
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...CSCJournals
 
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...RicardoVallejo30
 
Audio Fundamentals
Audio Fundamentals Audio Fundamentals
Audio Fundamentals James West
 
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4Build Your Own VR Display Course - SIGGRAPH 2017: Part 4
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4StanfordComputationalImaging
 
Fundamentals of multimedia priya singh.pptx
Fundamentals of multimedia priya singh.pptxFundamentals of multimedia priya singh.pptx
Fundamentals of multimedia priya singh.pptxKARANPATEL770617
 
20150211 NAB paper - Audio Loudness Range -John Kean
20150211 NAB paper - Audio Loudness Range -John Kean20150211 NAB paper - Audio Loudness Range -John Kean
20150211 NAB paper - Audio Loudness Range -John KeanJeremy Adams
 
FiNAL Presentation.pptx.................
FiNAL Presentation.pptx.................FiNAL Presentation.pptx.................
FiNAL Presentation.pptx.................erickamwana1
 

Similar to SpeechProcessing_using_Librosa__1___1_.pptx (20)

Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01
 
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
 
Occupational noise hazard
Occupational noise hazardOccupational noise hazard
Occupational noise hazard
 
Studio vs Live Sound Engineering
Studio vs Live Sound EngineeringStudio vs Live Sound Engineering
Studio vs Live Sound Engineering
 
Multimedia systems_Digital Audio and Digital Video
Multimedia systems_Digital Audio and Digital VideoMultimedia systems_Digital Audio and Digital Video
Multimedia systems_Digital Audio and Digital Video
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
How to play audio from a microcontroller
How to play audio from a microcontrollerHow to play audio from a microcontroller
How to play audio from a microcontroller
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
 
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...
 
Audio Fundamentals
Audio Fundamentals Audio Fundamentals
Audio Fundamentals
 
audio
audioaudio
audio
 
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4Build Your Own VR Display Course - SIGGRAPH 2017: Part 4
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4
 
Fundamentals of multimedia priya singh.pptx
Fundamentals of multimedia priya singh.pptxFundamentals of multimedia priya singh.pptx
Fundamentals of multimedia priya singh.pptx
 
Reverb w5 imp_2
Reverb w5 imp_2Reverb w5 imp_2
Reverb w5 imp_2
 
20150211 NAB paper - Audio Loudness Range -John Kean
20150211 NAB paper - Audio Loudness Range -John Kean20150211 NAB paper - Audio Loudness Range -John Kean
20150211 NAB paper - Audio Loudness Range -John Kean
 
Digital audio
Digital audioDigital audio
Digital audio
 
FiNAL Presentation.pptx.................
FiNAL Presentation.pptx.................FiNAL Presentation.pptx.................
FiNAL Presentation.pptx.................
 

Recently uploaded

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证a8om7o51
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.pptRachmaGhifari
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...yulianti213969
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethSamantha Rae Coolbeth
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 

Recently uploaded (20)

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 

SpeechProcessing_using_Librosa__1___1_.pptx

  • 1. Introduction to Speech Processing using Librosa D r. S h i k h a B a g h e l P o s t d o c t o r a l F e l l o w L E A P L a b , E E D e p a r t m e n t , I I S c B e n g a l u r u
  • 2. Content • Speech Signal a) Introduction b) Speech Production & Perception c) Sampling Theorem d) Need for Short Term Processing e) Fundamental Frequency f) Zero-Crossing Rate g) Short Term Energy h) Spectrogram • Librosa Library 2 D r . S h i k h a B a g h e l , I I S c B e n g a l u r u
  • 3. Speech Signal: An introduction D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u 3 A primary medium for our day- to-day life communication. Applications Speech recognition Speech coding Speech synthesis (Text to speech conversion) Speaker verification / recognition Speech enhancement Aids to the Handicaped Biomedical Applications Image Credit: https://www.vectorstock.com/royalty-free-vector/bubble-people- bubbling-speech-communication-vector-25841557
  • 4. Speech Production & Perception 4 D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u
  • 5. Speech Production • Speech signal is composed of a sequence of sound units (or phonemes). • Sound unit production: 5 D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u Video Credit: https://www.youtube.com/watch?v=JF8rlKuSoFM
  • 6. Sampling Theorem 6 D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u Image Credit: https://www.tutorialspoint.com/signals_and_systems/signals_sampling_theorem.htm Sampling Rate: Number of samples per second Sampling frequency (fs) ≥ 2 × Maximum frequency (fm)
  • 7. Need for Short Term Processing of Speech • Speech is produced from a time varying vocal tract system with time varying excitation. • Speech signal is non-stationary in nature. • Most of the signal processing tools studied in signals and systems and signal processing assume time invariant system and time invariant excitation, i.e., stationary signal. • Hence these tools are not directly applicable for speech processing. • Speech signal may be stationary when it is viewed in blocks of 10-30 msec. • Hence to process speech by different signal processing tools, it is viewed in terms of 10- 30 msec. Such a processing is termed as Short-Term Processing (STP). D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 7
  • 8. Audio File format • .mp3 Lossy format. It compresses the data. Essential information might be lost. • .flac It also compresses the data, but original signal can be reconstructed perfectly. • .wav An uncompressed format. The best audio quality, but the file size is largest. D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 8
  • 9. Windowing D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 9 Frame size: 25 ms and frame shift: 10 ms
  • 10. Audio as a function of time and Frequency D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 0 https://towardsdatascience.com/extract-features-of-music-75a3f9bc265d
  • 11. Fundamental Frequency (F0) • Rate at which vocal-folds vibrates. • Fundamental Frequency (F0) = 1/ time taken to complete one vocal-fold vibration D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u 1 1 Video Credit: https://youtu.be/mJedwz_r2Pc Image Credit: https://wiki.aalto.fi/pages/viewpage.action?pageId=149890776
  • 12. Zero-Crossing Rate • The zero-crossing rate is the rate of sign-changes along a signal, i.e., the rate at which the signal changes from positive to negative or back. • This feature has been used heavily in both speech recognition and music information retrieval. • It usually has higher values for highly percussive sounds like those in metal and rock. D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 2 https://www.analyticsvidhya.com/blog/2022/01/analysis-of-zero-crossing-rates-of-different-music- genre-tracks/
  • 13. Zero-Crossing Rate D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 3
  • 14. Short Term Energy • The energy associated with speech is time varying in nature. • By the nature of production, the speech signal consist of voiced, unvoiced and silence regions. • Further the energy associated with voiced region is large compared to unvoiced region and silence region will not have least or negligible energy. • Thus short term energy can be used for voiced, unvoiced and silence classification of speech. D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 4
  • 15. Short Term Energy D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 5
  • 16. Spectrogram • A spectrogram is a visual representation of the spectrum of frequencies of sound or other signals as they vary with time. • It’s a representation of frequencies changing with respect to time for given music signals. D R . S H I K H A B A G H E L , I I S C B E N G A L U R U 1 6 https://towardsdatascience.com/understanding-audio-data-fourier-transform-fft-spectrogram-and- speech-recognition-a4072d228520
  • 17. Librosa Library Let's Explore D r . S h i k h a B a g h e l , I I S c , B e n g a l u r u 1 7