*
Under guidance of
Dr. G. Pradhan
NIT PATNA (ECE dept.)
NAME-PAMMI KUMARI
M.TECH 2nd yr (ECE dept.)
ROLL NO.-1329005
• Introduction
• Summary of Literature review
• Issues in existing speaker verification systems
• Motivation for the present work
• Baseline speaker verification system
• Experimental results
• Proposal for future work
*
To develop voice password based speaker verification
To study impact of text-mismatch on the performance of voice
password based speaker verification system
Develop a voice password based speaker verification system in
text-independent mode
Explore method to model speaker information in limited data
condition
 Most of the application where speech signal of short duration
used around 3-5ms, but Speaker verification system provide
poor performance for short duration speech signal
This degradation of performance is due to phonetic variability
between training and testing speech data
SPEAKER VERIFICATION: The speaker verification is a
process of verifying the identity of the claimant . It performs
one-to-one comparison between a newly input voiceprint and the
voiceprint for the claimed identity that is stored in the database.
*
Fig :-Block diagram of speaker
verification system
Input
Speech Similarity
Feature
Extraction
Verificatio
n result
Speaker
ID(#M)
Reference
model
(Speaker
#M)
Threshold
Decision
*
Training Reference model
Speech
Identity claim
Testing
Speech R
Accept/reject
Pre-
processing
Feature
extraction
Model
Building
Pre-
processing
Feature
extraction comparison
Decision
logic
Fig: Voice password speaker verification system
Cont….
• when an identity claim is made by a speaker, the
speech data is compared with respect to the model of
the speaker whose identity is claimed.
• The concept of threshold is used to come up with the
decision.
• If the similarity of the test speech data to the target
model is below the threshold ,the speaker is accepted.
• This process involves a binary decision (accept/reject)
about the claimed identity regardless of the population
size.
• Hence, the performance of the verification system
does not depend on the size of the population.
• In the first stage, pre-processing and feature
extraction is performed over a database of
speakers.
• The second stage is to generate models, where
vectors representing speaker specific
characteristic are obtained, this leads to the
feature vectors.
• The third stage is decision, which accepts or
rejects the claimed identity of a speaker.
*
Basic block diagram of a biometric system
PRE-
PROCESSING
FEATHER
EXTRACTION
APPLICATION
DEVICE
TEMPLATE
GENERATOR MATCHER
STORED
TEMPLATE
SENSOR
*
Text-dependent speaker verification-In this, speaker
system is based on the utterance of a fixed
predetermined phrases.
Text-independent speaker verification-In this, the reference
(what are spoken in training) & the test (what are uttered in
actual use) utterance may have completely different content
is text-independent.
*Literature
• Research in the field of speaker recognition was initially
carried out in 1950s in Bell laboratories using isolated digites
[1].
• In 2000 most of the research was describe the major elements
of Gaussian mixture model (GMM)-based speaker verification
system used successfully in several NIST Speaker Recognition
Evaluations(SREs).
• 1960-1990 most of the research was focused on extraction of
speaker specific information from the speech data, and
development of text dependent speaker verification system.
• In 1990-2005 the speaker recognition method
shifted from template based pattern matching to
statistical modeling. Different statistical
modeling method like GMM and GMM-UBM are
proposed.
• 2005- 2014 most of the research was focused on
compensation of mismatches and development of
practical verification systems. Different
compensation methods like i-vectors and PLDA
are proposed
1. K. H. Davis, et. al., “Automatic recognition of spoken digits,”
J.A.S.A., 24 (6), pp. 637-642, 1952.
*
• In the speech analysis stage, through the
techniques have been developed to improve the
speaker verification performance, no particular
analysis techniques is specially meant for limited
data condition.
• The use of segmental analysis under limited data
condition provides few feature vectors which
leads to poor speaker models leads to degradation
of performance.
*
• Most of the application where speech signal of short
duration used around 3-5ms, but Speaker verification
system provide poor performance for short duration
speech signal
• This degradation of performance is due to phonetic
variability between training and testing speech data
• The phonetic variability may be reduced by artificially
generating multiple utterance.
• Most of the SV system develop score normalization
using on cohort centric normalization. The speaker
centric score normalization may provide better result.
*
• For Baseline speaker verification the
following parameter are used
 VAD threshold is taken 0.1 of average
energy
 Baseline uses MFCC features
 Feature vector: It uses 39 dimension
feature vector and 20ms frame size with
shift 2ms.
 Modeling: GMM
 GMM size: 8, 16, 32, 64.
*
*
34.613
32.87
32.0971
32.4634
*
27.4725 25.1374
23.6722
22.6190
• Extraction of feature to reduce the impact
of phonetic variability.
• Different residue of behavioral feature may
be extracted in addition to MFCC for
speaker verification.
• In this project we considered GMM
modeling technique in next work many
other technique may be used like i-vector.
*
*

DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION

  • 1.
    * Under guidance of Dr.G. Pradhan NIT PATNA (ECE dept.) NAME-PAMMI KUMARI M.TECH 2nd yr (ECE dept.) ROLL NO.-1329005
  • 2.
    • Introduction • Summaryof Literature review • Issues in existing speaker verification systems • Motivation for the present work • Baseline speaker verification system • Experimental results • Proposal for future work *
  • 3.
    To develop voicepassword based speaker verification To study impact of text-mismatch on the performance of voice password based speaker verification system Develop a voice password based speaker verification system in text-independent mode Explore method to model speaker information in limited data condition  Most of the application where speech signal of short duration used around 3-5ms, but Speaker verification system provide poor performance for short duration speech signal This degradation of performance is due to phonetic variability between training and testing speech data
  • 4.
    SPEAKER VERIFICATION: Thespeaker verification is a process of verifying the identity of the claimant . It performs one-to-one comparison between a newly input voiceprint and the voiceprint for the claimed identity that is stored in the database. * Fig :-Block diagram of speaker verification system Input Speech Similarity Feature Extraction Verificatio n result Speaker ID(#M) Reference model (Speaker #M) Threshold Decision
  • 5.
    * Training Reference model Speech Identityclaim Testing Speech R Accept/reject Pre- processing Feature extraction Model Building Pre- processing Feature extraction comparison Decision logic Fig: Voice password speaker verification system
  • 6.
    Cont…. • when anidentity claim is made by a speaker, the speech data is compared with respect to the model of the speaker whose identity is claimed. • The concept of threshold is used to come up with the decision. • If the similarity of the test speech data to the target model is below the threshold ,the speaker is accepted. • This process involves a binary decision (accept/reject) about the claimed identity regardless of the population size. • Hence, the performance of the verification system does not depend on the size of the population.
  • 7.
    • In thefirst stage, pre-processing and feature extraction is performed over a database of speakers. • The second stage is to generate models, where vectors representing speaker specific characteristic are obtained, this leads to the feature vectors. • The third stage is decision, which accepts or rejects the claimed identity of a speaker. *
  • 8.
    Basic block diagramof a biometric system PRE- PROCESSING FEATHER EXTRACTION APPLICATION DEVICE TEMPLATE GENERATOR MATCHER STORED TEMPLATE SENSOR
  • 9.
    * Text-dependent speaker verification-Inthis, speaker system is based on the utterance of a fixed predetermined phrases. Text-independent speaker verification-In this, the reference (what are spoken in training) & the test (what are uttered in actual use) utterance may have completely different content is text-independent.
  • 10.
    *Literature • Research inthe field of speaker recognition was initially carried out in 1950s in Bell laboratories using isolated digites [1]. • In 2000 most of the research was describe the major elements of Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations(SREs). • 1960-1990 most of the research was focused on extraction of speaker specific information from the speech data, and development of text dependent speaker verification system.
  • 11.
    • In 1990-2005the speaker recognition method shifted from template based pattern matching to statistical modeling. Different statistical modeling method like GMM and GMM-UBM are proposed. • 2005- 2014 most of the research was focused on compensation of mismatches and development of practical verification systems. Different compensation methods like i-vectors and PLDA are proposed 1. K. H. Davis, et. al., “Automatic recognition of spoken digits,” J.A.S.A., 24 (6), pp. 637-642, 1952. *
  • 12.
    • In thespeech analysis stage, through the techniques have been developed to improve the speaker verification performance, no particular analysis techniques is specially meant for limited data condition. • The use of segmental analysis under limited data condition provides few feature vectors which leads to poor speaker models leads to degradation of performance. *
  • 13.
    • Most ofthe application where speech signal of short duration used around 3-5ms, but Speaker verification system provide poor performance for short duration speech signal • This degradation of performance is due to phonetic variability between training and testing speech data • The phonetic variability may be reduced by artificially generating multiple utterance. • Most of the SV system develop score normalization using on cohort centric normalization. The speaker centric score normalization may provide better result. *
  • 14.
    • For Baselinespeaker verification the following parameter are used  VAD threshold is taken 0.1 of average energy  Baseline uses MFCC features  Feature vector: It uses 39 dimension feature vector and 20ms frame size with shift 2ms.  Modeling: GMM  GMM size: 8, 16, 32, 64. *
  • 15.
  • 16.
  • 17.
    • Extraction offeature to reduce the impact of phonetic variability. • Different residue of behavioral feature may be extracted in addition to MFCC for speaker verification. • In this project we considered GMM modeling technique in next work many other technique may be used like i-vector. *
  • 18.