SlideShare a Scribd company logo
Speaker Identification from
Voice
Guided by: Dr.S.R.Balasundaram Presented by: Triloki Gupta
(Professor) M.Tech(DataAnalytics)
205217006
1
Department of Computer Application
Content
● Introduction
● Motivation
● Problem statement
● Objective
● Model Architecture
● Features of model
● Implementation details
● About dataset
● Result
● Conclusion and Future work
● References
2
Department of Computer Application
Introduction
• The fundamental purpose of speech is communication, i.e., the
transmission of messages.
• The speech signal conveys information about the identity of
the speaker.
• The area of speaker identification is concerned with extracting
the identity of the person speaking the utterance.
• Recent development has made it possible to use this in the
security system.
Department of Computer Application 3
Cont..
● Two common recognition tasks are:
○ speaker verification (determining whether a speaker’s
claimed identity is true or false) and
○ speaker identification (classifying the identity of an
unknown voice among a set of speakers).
4Department of Computer Application
Motivation
● Speaker identification makes it possible to use the speaker's voice to
verify their identity.
● And control access to services such as voice dialing, banking by
telephone, database access services, voice mail, security control for
confidential information areas, and remote access to computers etc.
5
Department of Computer Application
Problem Statement
● Understanding how to recognize complex, high-dimensional
voice/speech/audio data is one of the greatest challenges of our time.
● Traditional(GMMs) approach suffers from an inherent assumption of
linearity in speech signal dynamics. Such approaches are prone to
overfitting and have problems with generalization.
6
Department of Computer Application
Objective
● The objective of speaker identification is to determine the
identity of a speaker by machine on the basis of his/her voice.
● No identity is claimed by the user.
7
Department of Computer Application
Model Architecture
MLP CNN
8
Department of Computer Application
Cont..
RNN LSTM
Department of Computer Application
Work Flow
Department of Computer Application
Features of model
● The special structure such as local connectivity, weight sharing, non-linear
function, and pooling in CNNs exhibits some degree of invariance to small shifts
of speech features along the frequency axis, which is important to deal with
speaker and environment variations.
● RNN is Hidden state, which remembers some information about a sequence.
RNN have a “memory” which remembers all information about what has been
calculated. RNN work just fine when we are dealing with short-term
dependencies.
● LSTM is an artificial recurrent neural network (RNN) architecture. LSTM is
dealing with long-term dependencies rather than short-term dependencies.
11
Department of Computer Application
Implementation Details
● Creating Dataset of 14 speaker
● Data preprocessing
○ Feature extraction:
■ mfcc(Mel-frequency cepstral coefficients)
■ melspectogram(mel-scaled spectrogram)
■ chroma_stft(Short-Time Fourier Transform)
■ chroma_cqt(Constant-Q transform)
■ chroma_cens(Chroma Energy Normalized)
● Building Neural Network:
○ CNN
○ RNN
○ LSTM
● Person identification based on his/her voice 12
Department of Computer Application
Cont..
● Sample of feature in image:
13
Department of Computer Application
About dataset
● This dataset contains 1,330 voice recordings from 14 classes and each class
contains about 90 to 100 voice. Each class label is set with a speaker name.
● Feature extraction is done by mfcc (Mel-frequency cepstral coefficients),
melspectogram(mel-scaled spectrogram), chroma_stft (Short-Time Fourier
Transform), chroma_cqt (Constant-Q transform), and chroma_cens
(Chroma Energy Normalized). The neural network is trained by applying
these features as input parameters.
● From each voice, extracting 200 features by mfcc, melspectogram,
chroma_stft, chroma_cqt, and chroma_cens which means 40 from each.
14
Department of Computer Application
Results
● MLP:
○ Test acc = 98.35%, Train acc = 86.67%, Train loss = 0.8480,Test loss = 0.0321
15
Department of Computer Application
Cont..
● CNN:
○ Test acc = 99.17%, Train acc = 99.38 Train loss = 0.0261, Test loss = 0.0248
Department of Computer Application
Cont..
● RNN:
○ Test acc = 98.35%, Train acc = 96.04, Train loss = 0.1229, Test loss = 0.0358
Department of Computer Application
Cont..
● LSTM:
○ Test acc = 99.67%, Train acc = 99.58, Train loss = 0.0312, Test loss = 0.0091
Department of Computer Application
Cont..
● GRU:
○ Test acc = 97.52%, Train acc = 99.58 , Train loss = 0.0105, Test loss = 0.1984
Department of Computer Application
Conclusion and Future Work
● This system was able to identify 14 different speakers in a satisfactory way.
These speakers were the users from whom we took the samples to train the
system. The speaker identification system was tested using different samples
from those used to train it.
● The achieved test accuracy from MLP, CNN, RNN, LSTM, and GRU was
98.35%, 99.17%, 98.35%, 99.67%, and 97.52% respectively.
● The future work is like tagging the speaker from mix voice.
20
Department of Computer Application
References
[1] M. Schmidt and H. Gish, “Speaker identification via support vector classifiers”, 1996 IEEE International Conference on Acoustics, Speech, and
Signal Processing Conference Proceedings.
[2] Amirsina Torfi, Jeremy Dawson and Nasser M. Nasrabadi, “Text-Independent Speaker Verification Using 3D Convolutional Neural Networks,”
arXiv:1705.09422v7, 2018.
[3] Mirco Ravanelli and Yoshua Bengio, “Speaker recognition from raw waveform with SincNET,” arXiv:1808.00158v2, 2018.
[4] Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan and Zhenyao Zhu, “Deep Speaker: an End-
to-End Neural Speaker Embedding System”, arXiv:1705.02304v1, May 2017
[5] Roberto Togneri and Daniel Pullella, “An Overview of Speaker Identification: Accuracy and Robustness Issues”, IEEE Circuits and Systems
Magazine, 09 June 2011
[6] R.V Pawar, P.P.Kajave, and S.N.Mali, “Speaker Identification using Neural Networks”, World Academy of Science, Engineering and
Technology, 12 2005
[7] Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu, “Convolutional Neural Networks for Speech
Recognition”, IEEE/ACM Transaction on audio, speech, and language processing, vol. 22, no. 10, october 2014
21
Department of Computer Application
22
Department of Computer Application

More Related Content

What's hot

Automated Fingerprint Identification Systems
Automated Fingerprint Identification SystemsAutomated Fingerprint Identification Systems
Automated Fingerprint Identification Systems
Rmcauley
 
Forensic analysis of soil
Forensic analysis of soilForensic analysis of soil
Forensic analysis of soil
Ketan Patil
 
FORENSIC EXAMINATION OF Soil
FORENSIC EXAMINATION OF SoilFORENSIC EXAMINATION OF Soil
FORENSIC EXAMINATION OF Soil
Chhavi Agarwal
 
Audio authentication techniques
Audio authentication techniquesAudio authentication techniques
Audio authentication techniques
priyanka pandey
 
Gait pattern.pptx
Gait pattern.pptxGait pattern.pptx
Gait pattern.pptx
MATANGI LAD
 
conventional methods of fingerprint development
conventional methods of fingerprint developmentconventional methods of fingerprint development
conventional methods of fingerprint development
faraharooj
 
Facial reconstruction
Facial reconstructionFacial reconstruction
Facial reconstruction
Anjali Awasthi
 
Age of documents (Questioned Document)
Age of  documents (Questioned Document)Age of  documents (Questioned Document)
Age of documents (Questioned Document)
Shreyas Patel
 
Brain Fingerprinting PPT
Brain Fingerprinting PPTBrain Fingerprinting PPT
Brain Fingerprinting PPT
Vishnu Mysterio
 
VSC ppt forensic science Shailesh Chaubey .pptx
VSC ppt  forensic science Shailesh Chaubey .pptxVSC ppt  forensic science Shailesh Chaubey .pptx
VSC ppt forensic science Shailesh Chaubey .pptx
SHAILESH CHAUBEY
 
Speaker identification based on temporal parameters
Speaker identification based on temporal parametersSpeaker identification based on temporal parameters
Speaker identification based on temporal parameters
Alexandria University
 
Forensic phonetics[1]
Forensic phonetics[1]Forensic phonetics[1]
Forensic phonetics[1]
PAHELI SHARMA
 
Brain fingerprinting
Brain fingerprintingBrain fingerprinting
Brain fingerprinting
pgrr
 
ESDA
ESDAESDA
Pattern recognition palm print authentication system
Pattern recognition palm print authentication systemPattern recognition palm print authentication system
Pattern recognition palm print authentication system
Mazin Alwaaly
 
Soil as forensic evidence
Soil as forensic evidenceSoil as forensic evidence
Soil as forensic evidence
Tejasvi Bhatia
 
restoration of toolmarks
restoration of toolmarksrestoration of toolmarks
restoration of toolmarks
Hemant Jain
 
19 Forensic Science Powerpoint Chapter 19 Forensic Footwear Evi
19  Forensic Science Powerpoint Chapter 19 Forensic Footwear Evi19  Forensic Science Powerpoint Chapter 19 Forensic Footwear Evi
19 Forensic Science Powerpoint Chapter 19 Forensic Footwear EviGrossmont College
 
Poroscopy and edgeoscopy
Poroscopy and edgeoscopyPoroscopy and edgeoscopy
Poroscopy and edgeoscopy
kiran malik
 
Tool marks and its forensic significance
Tool marks and its forensic significanceTool marks and its forensic significance
Tool marks and its forensic significance
Stina14
 

What's hot (20)

Automated Fingerprint Identification Systems
Automated Fingerprint Identification SystemsAutomated Fingerprint Identification Systems
Automated Fingerprint Identification Systems
 
Forensic analysis of soil
Forensic analysis of soilForensic analysis of soil
Forensic analysis of soil
 
FORENSIC EXAMINATION OF Soil
FORENSIC EXAMINATION OF SoilFORENSIC EXAMINATION OF Soil
FORENSIC EXAMINATION OF Soil
 
Audio authentication techniques
Audio authentication techniquesAudio authentication techniques
Audio authentication techniques
 
Gait pattern.pptx
Gait pattern.pptxGait pattern.pptx
Gait pattern.pptx
 
conventional methods of fingerprint development
conventional methods of fingerprint developmentconventional methods of fingerprint development
conventional methods of fingerprint development
 
Facial reconstruction
Facial reconstructionFacial reconstruction
Facial reconstruction
 
Age of documents (Questioned Document)
Age of  documents (Questioned Document)Age of  documents (Questioned Document)
Age of documents (Questioned Document)
 
Brain Fingerprinting PPT
Brain Fingerprinting PPTBrain Fingerprinting PPT
Brain Fingerprinting PPT
 
VSC ppt forensic science Shailesh Chaubey .pptx
VSC ppt  forensic science Shailesh Chaubey .pptxVSC ppt  forensic science Shailesh Chaubey .pptx
VSC ppt forensic science Shailesh Chaubey .pptx
 
Speaker identification based on temporal parameters
Speaker identification based on temporal parametersSpeaker identification based on temporal parameters
Speaker identification based on temporal parameters
 
Forensic phonetics[1]
Forensic phonetics[1]Forensic phonetics[1]
Forensic phonetics[1]
 
Brain fingerprinting
Brain fingerprintingBrain fingerprinting
Brain fingerprinting
 
ESDA
ESDAESDA
ESDA
 
Pattern recognition palm print authentication system
Pattern recognition palm print authentication systemPattern recognition palm print authentication system
Pattern recognition palm print authentication system
 
Soil as forensic evidence
Soil as forensic evidenceSoil as forensic evidence
Soil as forensic evidence
 
restoration of toolmarks
restoration of toolmarksrestoration of toolmarks
restoration of toolmarks
 
19 Forensic Science Powerpoint Chapter 19 Forensic Footwear Evi
19  Forensic Science Powerpoint Chapter 19 Forensic Footwear Evi19  Forensic Science Powerpoint Chapter 19 Forensic Footwear Evi
19 Forensic Science Powerpoint Chapter 19 Forensic Footwear Evi
 
Poroscopy and edgeoscopy
Poroscopy and edgeoscopyPoroscopy and edgeoscopy
Poroscopy and edgeoscopy
 
Tool marks and its forensic significance
Tool marks and its forensic significanceTool marks and its forensic significance
Tool marks and its forensic significance
 

Similar to Speaker identification

A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
CSCJournals
 
IRJET - Study on the Effects of Increase in the Depth of the Feature Extracto...
IRJET - Study on the Effects of Increase in the Depth of the Feature Extracto...IRJET - Study on the Effects of Increase in the Depth of the Feature Extracto...
IRJET - Study on the Effects of Increase in the Depth of the Feature Extracto...
IRJET Journal
 
Text Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptText Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.ppt
Grace136708
 
Deep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionDeep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event Detection
Sai Kiran Kadam
 
Et25897899
Et25897899Et25897899
Et25897899
IJERA Editor
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker Recognition
Sai Kiran Kadam
 
D04812125
D04812125D04812125
D04812125
IOSR-JEN
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural network
eSAT Journals
 
Sound event detection using deep neural networks
Sound event detection using deep neural networksSound event detection using deep neural networks
Sound event detection using deep neural networks
TELKOMNIKA JOURNAL
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
CSCJournals
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
IRJET Journal
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
IJECEIAES
 
Review of Deep Neural Network Detectors in SM MIMO System
Review of Deep Neural Network Detectors in SM MIMO SystemReview of Deep Neural Network Detectors in SM MIMO System
Review of Deep Neural Network Detectors in SM MIMO System
ijtsrd
 
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based ModelReal-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
adil raja
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
IJCSEA Journal
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
TANVIRAHMED611926
 
Voice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix FactorizationVoice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix Factorization
IRJET Journal
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
IRJET Journal
 
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic ProgrammingRealtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
adil raja
 

Similar to Speaker identification (20)

A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
Ijetcas14 426
Ijetcas14 426Ijetcas14 426
Ijetcas14 426
 
IRJET - Study on the Effects of Increase in the Depth of the Feature Extracto...
IRJET - Study on the Effects of Increase in the Depth of the Feature Extracto...IRJET - Study on the Effects of Increase in the Depth of the Feature Extracto...
IRJET - Study on the Effects of Increase in the Depth of the Feature Extracto...
 
Text Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptText Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.ppt
 
Deep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionDeep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event Detection
 
Et25897899
Et25897899Et25897899
Et25897899
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker Recognition
 
D04812125
D04812125D04812125
D04812125
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural network
 
Sound event detection using deep neural networks
Sound event detection using deep neural networksSound event detection using deep neural networks
Sound event detection using deep neural networks
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
 
Review of Deep Neural Network Detectors in SM MIMO System
Review of Deep Neural Network Detectors in SM MIMO SystemReview of Deep Neural Network Detectors in SM MIMO System
Review of Deep Neural Network Detectors in SM MIMO System
 
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based ModelReal-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
 
Voice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix FactorizationVoice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix Factorization
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
 
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic ProgrammingRealtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
 

More from Triloki Gupta

GCP Deployment- Vertex AI
GCP Deployment- Vertex AIGCP Deployment- Vertex AI
GCP Deployment- Vertex AI
Triloki Gupta
 
Flask-Python
Flask-PythonFlask-Python
Flask-Python
Triloki Gupta
 
Sign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols ClassificationSign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols Classification
Triloki Gupta
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.
Triloki Gupta
 
Naive Bayes Classifier using R.
Naive Bayes Classifier using R.Naive Bayes Classifier using R.
Naive Bayes Classifier using R.
Triloki Gupta
 
Meta analysis.
Meta analysis.Meta analysis.
Meta analysis.
Triloki Gupta
 
Enhancement of Old Images and Documents by Digital Image Processing Techniques.
Enhancement of Old Images and Documents by Digital Image Processing Techniques.Enhancement of Old Images and Documents by Digital Image Processing Techniques.
Enhancement of Old Images and Documents by Digital Image Processing Techniques.
Triloki Gupta
 

More from Triloki Gupta (7)

GCP Deployment- Vertex AI
GCP Deployment- Vertex AIGCP Deployment- Vertex AI
GCP Deployment- Vertex AI
 
Flask-Python
Flask-PythonFlask-Python
Flask-Python
 
Sign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols ClassificationSign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols Classification
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.
 
Naive Bayes Classifier using R.
Naive Bayes Classifier using R.Naive Bayes Classifier using R.
Naive Bayes Classifier using R.
 
Meta analysis.
Meta analysis.Meta analysis.
Meta analysis.
 
Enhancement of Old Images and Documents by Digital Image Processing Techniques.
Enhancement of Old Images and Documents by Digital Image Processing Techniques.Enhancement of Old Images and Documents by Digital Image Processing Techniques.
Enhancement of Old Images and Documents by Digital Image Processing Techniques.
 

Recently uploaded

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 

Recently uploaded (20)

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 

Speaker identification

  • 1. Speaker Identification from Voice Guided by: Dr.S.R.Balasundaram Presented by: Triloki Gupta (Professor) M.Tech(DataAnalytics) 205217006 1 Department of Computer Application
  • 2. Content ● Introduction ● Motivation ● Problem statement ● Objective ● Model Architecture ● Features of model ● Implementation details ● About dataset ● Result ● Conclusion and Future work ● References 2 Department of Computer Application
  • 3. Introduction • The fundamental purpose of speech is communication, i.e., the transmission of messages. • The speech signal conveys information about the identity of the speaker. • The area of speaker identification is concerned with extracting the identity of the person speaking the utterance. • Recent development has made it possible to use this in the security system. Department of Computer Application 3
  • 4. Cont.. ● Two common recognition tasks are: ○ speaker verification (determining whether a speaker’s claimed identity is true or false) and ○ speaker identification (classifying the identity of an unknown voice among a set of speakers). 4Department of Computer Application
  • 5. Motivation ● Speaker identification makes it possible to use the speaker's voice to verify their identity. ● And control access to services such as voice dialing, banking by telephone, database access services, voice mail, security control for confidential information areas, and remote access to computers etc. 5 Department of Computer Application
  • 6. Problem Statement ● Understanding how to recognize complex, high-dimensional voice/speech/audio data is one of the greatest challenges of our time. ● Traditional(GMMs) approach suffers from an inherent assumption of linearity in speech signal dynamics. Such approaches are prone to overfitting and have problems with generalization. 6 Department of Computer Application
  • 7. Objective ● The objective of speaker identification is to determine the identity of a speaker by machine on the basis of his/her voice. ● No identity is claimed by the user. 7 Department of Computer Application
  • 8. Model Architecture MLP CNN 8 Department of Computer Application
  • 9. Cont.. RNN LSTM Department of Computer Application
  • 10. Work Flow Department of Computer Application
  • 11. Features of model ● The special structure such as local connectivity, weight sharing, non-linear function, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations. ● RNN is Hidden state, which remembers some information about a sequence. RNN have a “memory” which remembers all information about what has been calculated. RNN work just fine when we are dealing with short-term dependencies. ● LSTM is an artificial recurrent neural network (RNN) architecture. LSTM is dealing with long-term dependencies rather than short-term dependencies. 11 Department of Computer Application
  • 12. Implementation Details ● Creating Dataset of 14 speaker ● Data preprocessing ○ Feature extraction: ■ mfcc(Mel-frequency cepstral coefficients) ■ melspectogram(mel-scaled spectrogram) ■ chroma_stft(Short-Time Fourier Transform) ■ chroma_cqt(Constant-Q transform) ■ chroma_cens(Chroma Energy Normalized) ● Building Neural Network: ○ CNN ○ RNN ○ LSTM ● Person identification based on his/her voice 12 Department of Computer Application
  • 13. Cont.. ● Sample of feature in image: 13 Department of Computer Application
  • 14. About dataset ● This dataset contains 1,330 voice recordings from 14 classes and each class contains about 90 to 100 voice. Each class label is set with a speaker name. ● Feature extraction is done by mfcc (Mel-frequency cepstral coefficients), melspectogram(mel-scaled spectrogram), chroma_stft (Short-Time Fourier Transform), chroma_cqt (Constant-Q transform), and chroma_cens (Chroma Energy Normalized). The neural network is trained by applying these features as input parameters. ● From each voice, extracting 200 features by mfcc, melspectogram, chroma_stft, chroma_cqt, and chroma_cens which means 40 from each. 14 Department of Computer Application
  • 15. Results ● MLP: ○ Test acc = 98.35%, Train acc = 86.67%, Train loss = 0.8480,Test loss = 0.0321 15 Department of Computer Application
  • 16. Cont.. ● CNN: ○ Test acc = 99.17%, Train acc = 99.38 Train loss = 0.0261, Test loss = 0.0248 Department of Computer Application
  • 17. Cont.. ● RNN: ○ Test acc = 98.35%, Train acc = 96.04, Train loss = 0.1229, Test loss = 0.0358 Department of Computer Application
  • 18. Cont.. ● LSTM: ○ Test acc = 99.67%, Train acc = 99.58, Train loss = 0.0312, Test loss = 0.0091 Department of Computer Application
  • 19. Cont.. ● GRU: ○ Test acc = 97.52%, Train acc = 99.58 , Train loss = 0.0105, Test loss = 0.1984 Department of Computer Application
  • 20. Conclusion and Future Work ● This system was able to identify 14 different speakers in a satisfactory way. These speakers were the users from whom we took the samples to train the system. The speaker identification system was tested using different samples from those used to train it. ● The achieved test accuracy from MLP, CNN, RNN, LSTM, and GRU was 98.35%, 99.17%, 98.35%, 99.67%, and 97.52% respectively. ● The future work is like tagging the speaker from mix voice. 20 Department of Computer Application
  • 21. References [1] M. Schmidt and H. Gish, “Speaker identification via support vector classifiers”, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings. [2] Amirsina Torfi, Jeremy Dawson and Nasser M. Nasrabadi, “Text-Independent Speaker Verification Using 3D Convolutional Neural Networks,” arXiv:1705.09422v7, 2018. [3] Mirco Ravanelli and Yoshua Bengio, “Speaker recognition from raw waveform with SincNET,” arXiv:1808.00158v2, 2018. [4] Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan and Zhenyao Zhu, “Deep Speaker: an End- to-End Neural Speaker Embedding System”, arXiv:1705.02304v1, May 2017 [5] Roberto Togneri and Daniel Pullella, “An Overview of Speaker Identification: Accuracy and Robustness Issues”, IEEE Circuits and Systems Magazine, 09 June 2011 [6] R.V Pawar, P.P.Kajave, and S.N.Mali, “Speaker Identification using Neural Networks”, World Academy of Science, Engineering and Technology, 12 2005 [7] Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu, “Convolutional Neural Networks for Speech Recognition”, IEEE/ACM Transaction on audio, speech, and language processing, vol. 22, no. 10, october 2014 21 Department of Computer Application