Text Independent Speaker recognitom framework for detecting criminals.ppt

A TEXT INDEPENDENT SPEAKER
RECOGNITION SYSTEM FOR
DETECTING CRIMINALS
An M.Sc. Oral Defence SEMINAR
By
AMUSAAFOLARIN IBRAHIM
(PG13 /0025)
Department of Computer Science
Federal University of Agriculture, Abeokuta
Supervisory team:
Prof. A. S. Sodiya
Dr. O. R. Vincent
Prof. A. A. A Agboola

OUTLINE
• Introduction
• Motivation
• Problem statement
• Research objectives
• Literature Review
• Research design and methodology
• Implementation
• Contribution to knowledge
• Recommendation
• Conclusion and future work
• References 2

INTRODUCTION
• In criminal law, kidnapping is the unlawful taking away or
transportation of a person against the person’s will, usually to
hold the person unlawfully (Gary, 2007).
• Kidnapping is a major challenge in the society that is yet to be
eradicated due to a number of factors ranging from political to
socio-economic problems.
• This may be done for ransom or in furtherance of another
crime, or in connection with a child custody dispute. The
typical behavior of a kidnapper after successfully abducting a
victim entails that the kidnapper contact the victim’s family for
one or more demands. 3

INTRODUCTION
• Speaker recognition is the process that enables machines to
understand and interpret the human speech by making use of
certain algorithms and verify the authenticity of a speaker with
the help of a database.(Teunen et al., 2000,Reynolds., 2000)
• Speaker recognition is the process of automatically recognizing
speaker’s voice on the basis of individual information included
in the input speech waves..(Parul and Dubey. 2012)
• Speaker recognition is generally divided into two tasks
• Speaker Verification(SV)
• Speaker Identification(SI) 4

INTRODUCTION
5
Fig.1 human speech production system.

INTRODUCTION
The voices consist of five major components
• Pitch (fundamental frequency)
• Tone
• Amplitude (Volume)
• Quality
• Loudness
Underlying premise of a speaker recognition of each
person’s voice differ in the components above to make it
uniquely distinguishable.
6

INTRODUCTION
Speaker recognition can be divided into three categories
• Text-dependent (TD)
• Text-independent (TI)
• Text-prompted
Many approaches have been proposed for TI speaker recognition
• Vector Quantization (VQ based method)
• Autoassociative neural network (AANN)
• Gaussian Mixture Model
• Matrix representation
• Decision trees
• K-nearest Neighbors (K-NN)
7

INTRODUCTION
The feature extraction used for this system are:
• Mel Frequency Cepstral Coefficient (MFCC)
• Modified Mel Frequency Cepstral Coefficient (MMFCC)
The classifier used for this system are;
• VQ based method
• Artificial neural network (Auto-associate neural network
(AANN))
8

MOTIVATION
• Many countries, especially developing countries like
Nigeria, are still faced with the problems of insecurity
such as armed robbery, kidnapping, etc. and there is the
need to develop systems that will be able to mitigate
against these problems.
• From the literature, the efficiency and reliability of
most existing speaker recognition systems are not
quite perfect for voice detection and there is a need
to introduce an improve system to detect unknown
voices. (Parul and Dubey.,2012).
9

PROBLEM STATEMENT
A number of speaker recognition systems have been developed
over the years. However, the most prevalent challenges facing
these systems include:
• Poor clustering of speaker models in the database leading to high computational
search time for matching a speaker model. (Parul and Dubey. R.B. 2012).
• Low noise reduction by existing algorithms leading to poor construction of speaker
model. (Sumithra et al., 2012)
• Inability of existing speaker recognition systems to detect unknown speakers.
(Kinnunen et al. 2009).
• Poor quality result from a single model is still susceptible to speaker recognition
system. (Visalakshi. R and Dhanalakshmi. P. 2014) 10

RESEARCH OBJECTIVES
The objectives of this work are to;
• Identify the strengths and weaknesses in the
existing approaches used for speaker
recognition system.
• An improved speaker recognition system
efficient detection of criminals.
• Simulate and evaluate the proposed system. 11

LITERATURE REVIEW
METHODS USED AND THEIR FEATURES EXTRACTOR
FEATURE EXTRACTION:
• Mel Frequency Cepstral Coefficient
• Mel Frequency Perceptual Linear Predicitive (ML-PLP)
• Perceptual Linear Predicitive (PLP)
• Modified Mel Frequency Cepstral Coefficient.
CLASSIFIERS;
• Vector quantization
• Artificial neural network (Auto associative neural network and
Radial basis function neural network)
• K-nearest Neighbors (K-NN)
12

LITERATURE REVIEW
• (Visalakshi. and Dhanalakshmi. 2014): This work
compared the efficiency of Radial basin function neural
network (RBFNN) and autoassociative neural network
(AANN) for the design of speaker recognition.
Strength: gives a better performance using AANN of 94.93%
than RBFNN of 89.1%
Weakness: Robustness of RBFNN affect the performance of
the system
• Sumithra et al., (2012): This combined the two feature
extraction techniques to design a speaker recognition system.
Strength: It is suitable for highly secured environment.
Weakness: It was not quite efficient 13

RELATED WORK
S/
N
AUTHORS
FEATURE
EXTRACTIONS
CLASSIFIERS STRENGTHS WEAKNESSES
1 Kinnunen. and
Haizhou., 2009
MFCC VQ, GMM, SVM
and others
Solve text
dependency and
speech duration
Encounter limited trained
data and unbalanced text
2 Visalahi and
Dhanalakshmi.,
2014
LPCC, LPC and
MFCC
RBFNN and
AANN
Gives a better
performance of
AANN of about
(94.93%)
Robustness of RBFNN
can affect the
performance of the
system
3 Revathi. and
Venkatarami et
al., 2009
PLP and MF-PLP Vector
quantization
Improve the
speech recognition
accuracy of 91%
Isolated use of PLP
reduces detection
accuracy
4 Nair et al., 2012 MFCC Hybrid algorithm Solve the problem
of enhancing
speech in noisy
environment
Better methods are
needed for
speech/silence detection
5 Sumithra et al.,
2012
MFCC and
MMFCC
Vector
quantization
It suitable for
highly secured
environment
Can only have better
efficiency if the speaker
modeling techniques are
used
14

S/N AUTHORS
FEATURE
EXTRACTIONS
CLASSIFIERS STRENGTHS WEAKNESSES
6 Hautamaki et
al., 2008
Affine
transformation
invariance
Graph matching It has the potential to
complement or replace
currently used statistical
and template based
methods
It can’t be used in real-life
speaker recognition system
only if the size of the
association graph grows fast
for large models.
7 Hanilci. and
Figen. 2009
MFCC VQ-Based and
PCA-Based
classifier
Opinion fussion
improves identification
rate effectively
PCA classifier is not better in
noisy or distorted speech
only slightly better in clean
speech.
8 Sekar, 2012 K-Nearest
Neighbor (KNN)
classifier
The accuracy rate of
combining RT and DCT
to extract features is
96%
No improvement in accuracy
for increment in number of
DCT coefficient
9 Parul and
Dubey. 2012
MFCC Vector
Quantization
R ecognition accuracy of
80%
Challenged by highly variant
input speech signal
10 Cuiling. and
Tiejun. 2008
Effective at defeating
FASRS
The last three disguise
pattern are weak with their
CRRs of 85%
15

METHODOLOGY
DESIGN CONSIDERATION
The following requirements are considered to design the system;
• Combining two classifiers together (autoassociative
neural network and vector quantization.
• Reducing computational time (search time) in the
database.
• AANN model is used to train the speech (noisy) which
ensures that the characteristics of the voices are
similar both in training and testing phrase.
• Reduce dimensionality of the feature vectors for easy
computation. 16

METHODOLOGY
17
Pre-
processing
Framing Windowing FastFourier
transform
(FFT)
MMFCC
FeatureExtractor
Speaker
database
Auto-associate
neural network
Vector
Quantization
Classifier
Cluster
search
Speaker ID
Voice
signal
Fig 2 Proposed architecture for
Text Independent Speaker
Recognition System

METHODOLOGY
• Data collection: Implementing this system, real life datasets
was collected from different speakers at different
environment like a market, hall. Also existing datasets was
used. (body conducted speech datasets) for both training
and testing phrase
• The data spitted into two categories which the vocal tracts
will pass through namely; training and testing. 27

IMPLEMENTATION
Hardware requirement
Minimum hardware requirements are;
• PC workstation with Intel core i3 processor at 2.40Hz
• 4GB of RAM
• 5.1 SVGA or VGA with desktop performance for windows
• 86.6MB/s primary disk data transfer
• 500GB hard disk capacity
• A mouse and a keyboard
Software requirement
Minimum software requirements are;
• Windows 7, windows 8, windows 10
• Java programming language (JDK 8), NetBeans 8.0.1.
• Sound forge version 5.0
• AANN java libraries
• VQ java libraries
28

IMPLEMENTATION
Feature extractor performance
Voice features were extracted from a sample of voices gotten from group of
persons using the feature extraction algorithms (MFCC)
29

IMPLEMENTATION
Feature extractor performance cont’d
MFCC is applied to the voice samples to get the feature sets
30

IMPLEMENTATION
Classifier performance
Neural net. training the voices during the training phrase
31
Current
error
(%)
Error
Improvement
(%)

IMPLEMENTATION
32
 Diagram for the Euclidean distances of different speakers in a cluster
Euclidean
distanc
e
data point

IMPLEMENTATION
 Euclidean distance of a speaker
33

RESULT
Analyzing how the system select a speaker comparing with
the speakers in a cluster.
Training phase;
Where all valid speakers are accounted.
Speaker A matches with speaker A
Speaker B matches with speaker B
Speaker C matches with speaker C
Speaker D matches with speaker D
Speaker E matches with speaker E
Speaker F matches with speaker F
Speaker G matches with speaker G
Speaker H matches with speaker H
34

RESULT
Testing phase
Speaker D matches with speaker A-------------NO
Speaker D matches with speaker B-----------NO
Speaker D matches with speaker C---------NO
Speaker D matches with speaker D----------YES
35

RESULT
Matching score
Table shows the accuracy of the system based on Biometrics Cyber Security (BCS) datasets which gives
94.12%
Table 1
38
.
Speaker Number of match Number of mismatch Accuracy %
1 5 1 92
2 6 0 100
3 6 0 100
4 6 0 100
5
6
7
8
9
10
11
12
13
14
15
6
6
6
6
6
6
6
6
6
6
6
0
0
0
0
0
0
0
0
0
0
0
100
100
100
100
100
100
100
100
100
100
100

RESULT
39
Matching score
The accuracy rate of the system based on real life datasets gotten from individuals in different environment is 88.24%.
Table 2
Speaker Number of match Number of mismatch Accuracy %
1 5 1 92
2 6 0 100
3 6 0 100
4 6 0 100
5
6
7
8
9
10
11
12
13
14
15
6
6
6
6
6
6
6
6
6
6
6
0
0
0
0
0
0
0
0
0
0
0
100
100
100
100
100
100
100
100
100
100
100

RESULT
False acceptance and false reject rates on test datasets with different
thresholds.
Table 4.
40
Thresholds False acceptance
rate%
False rejection
rate%
Total error rate% Equal error rate
%
0.5 3.3 4.7 8 4
1 2 5.3 7.3 3.7
1.5 0.7 4 4.7 2.4
2 2.7 6 8.7 7.5

RESULT
41
TISRS ANALYSES
Table 5
Speaker ID Total samples Recognition accuracy % Rejection accuracy %
1 6 91 84
2 6 90 81
3 6 94 86
4 6 91 80
5 6 95 84
6 6 90 85
7 6 90 82
8
9
10
11
12
13
14
15
6
6
6
6
6
6
6
6
97
92
95
93
96
94
97
91
80
82
87
81
80
87
88
82

RESULT
PARULAND DUBEY,.2012 ANALYSES
Table 6
42
Speaker ID Total samples Recognition
accuracy%
Rejection accuracy
%
1 5 80 85
2 5 79 80
3 5 81 84
4 5 83 81
5 5 86 80
6 5 85 83
7 5 87 82

RESULT
SEKAR., 2012 ANALYSES
The accuracy rate of Sekar. 2012 is 96%
Table 6
43
Speaker Number of match Number of
mismatch
Accuracy %
1 5 0 100
2 5 0 100
3 4 1 80
4 5 0 100
5 5 0 100

RESULT
Comparing TISRS with Parul and Dubey.,2012 and Sekar.,2012
Table 7
44
Systems Accuracy
rate/detection
speed %
Recognition
accuracy %
Rejection
accuracy
FFT
calculation
MFCC/MMFCC
feature extraction
speed
Total recognition
time
TISRS 94.12 93.07 83.27 110ms 3seconds 6-8 seconds
Parul and
Dubey, 2012
- 80 85 100ms 4-5 seconds 5-7 seconds
Sekar, 2012 96 - - - - -

CONCLUSION
In this study, the major goal of this work, is to create a speaker recognition system for
detecting criminals;
• The Speaker Recognition System has good accuracy and performance for recognition of
speaker based on their normal voices given in respective of their languages and dialect.
• Based on the disguised voices gotten from each speakers, it had a great and effective
performance of the system. The efficiency of the voice used varies based on the disguising type
of voice obtained from the speaker.
• The current system was designed to evaluate the performance of the algorithms provided under
different types of inputs. Based on the experiment, it was observed that the matching error of
the system basically results from;
--Insufficient number of corresponding extractor
--Large distortion
--Missing features and spurious features.
• The implemented algorithms are more accurate, detects a person easily with its voice and faster
than previous systems like Parul and Dubey (2012) while Sekar (2012) has better accuracy rate
compared to this method using BCS dataset but the developed system has a better accuracy rate
using real life datasets.
• It is difficult to achieve a very high classification rate and it is beneficial to incorporate the
information into the designed algorithm to improve its discrimination performance. 45

CONTRIBUTION TO KNOWLEDGE
The contributions are listed below;
• The introduction of combined classifier techniques for efficient clustering
and decision making.
• In today’s high rate of crimes, improved mechanism for detection of
criminals was introduced.
• The capturing and used of real life datasets has also made the new design
practicable.
46

RECOMENDATION
For actual reduction of criminals, it is recommended that the
system is implemented in the following systems;
• Time and Attendance Systems
• Access Control Systems
• Telephone-Banking/Booking
• Biometric Login to telephone aided shopping systems
Information and Reservation Services
• Security control for confidential information Forensic purposes
47

FUTURE WORK
• In the future, research should focus on combining one or two
other biometrics techniques for better authentication as stated in
the concluding part of the work.
• Furthermore, research can also consider the adoption of other
areas of biometrics in speaker recognition system. This will
reduce the rate of crime in the country
48

REFERENCES
Cuiling Zhang and Tiejun Tan 2008. Voice disguise and automatic speaker
recognition. Forensic science international.
Hanilçi Cemal and Figen Ertaş 2009. Principal Component Based
Classification for Text-Independent Speaker Identification. Department of
Electronic Engineering, Uludag University, Bursa, TURKEY.
Hautamäki Ville, Tomi Kinnunen, Pasi Fränti 2008. Text-independent speaker
recognition using graph matching. Pattern recognition letter.
Kinnunen, T., Haizhou, L. 2009. An overview of text-independent speaker
recognition from features to supervectors. Speech communication.
Parul, G. and Dubey R.B. 2012.Automatic speaker recognition system.
International Journal of Advanced Computer Research.
Revathi, A. and Venkataramani, Y. 2009. Iterative clustering approach for text
independent speaker identification using multiple features. International
Journal of Computer science & Information Technology (IJCSIT) 1(2).
49

Text Independent Speaker recognitom framework for detecting criminals.ppt

Recommended

Recommended

More Related Content

Similar to Text Independent Speaker recognitom framework for detecting criminals.ppt

Similar to Text Independent Speaker recognitom framework for detecting criminals.ppt (20)

Recently uploaded

Recently uploaded (17)

Text Independent Speaker recognitom framework for detecting criminals.ppt