SlideShare a Scribd company logo
DR.HARISINGHGOURVISHWAVIDYALAYA
TOPIC :- VOICE ANALYSIS
SUBMITTEDTO:-
Dr. NAVJOT KAUR KANWAL
( DEPARTMENTOFCRIMINOLOGYANDFORENSICSCIENCE)
SUBMITTEDBY :-
NIKHIL KUMAR SINGH
REGISTRAION
NO.=Y19242514
 A voice is more than just a string of sounds. Voices are
inherently complex.
 They signal a great deal of information in addition to
the intended linguistic message: the speaker’s sex, for
example, or their emotional state or state of health.
 Some of this information is clearly of potential forensic
importance.
 However, the different types of information conveyed
by a voice are not signalled in separate channels, but
are convolved together with the linguistic message.
 Knowledge of how this occurs is necessary to interpret
the ubiquitous variation in speech, and to assess the
comparability of speech samples.
 Speaker’s identification is the process of
determining whether two or more recordings of
speech are from the same speaker.
 Speaker identification can be very effective,
contributing to both conviction and elimination of
suspect. In this task, a voice print of an unknown
speaker is analysed and then compared with
speech samples of known speakers.
 The unknown speaker is identified as the speaker
whose model best matches the input model; it is
the identification of a person from characteristics of
voices.
 It is the process of automatically recognising
who is speaking by using the speaker specific
information included in the speech waves to
verify identities being claimed by people
accessing systems i.e.; it enables access control
of various services by voice.
 Applicable services include voice dialling,
banking over a telephone network, telephone
shopping, database access network,
information and reservation services, voice
mail, security control for confidential
information and remote access to computers.
 Another important application of speaker
recognition technology is as a forensic tool.
 Speaker identification in the forensic context is
usually about comparing voices.
 Probably the most common task involves the
comparison of one or more samples of an
offender’s voice with one or more samples of a
suspect’s voice.
 Voices are important things for humans. They
are the medium through which we do a lot of
communicating with the outside world: our
ideas, of course, but also our emotions and our
personality.
 Voices are also one of the media through which we
(successfully, most of the time) recognise other
humans who are important to us – members of our
family, media personalities, our friends and
enemies.
 Although evidence from DNA analysis is
potentially vastly more eloquent in its power than
evidence from voices, DNA can’t talk.
 It can’t be recorded planning, carrying out or
confessing to a crime. It can’t be so apparently
directly incriminating.
 Perhaps it is these features that contribute to the
interest and importance of FSI.
 Voices are extremely complex things, and some
of the inherent limitations of the forensic-
phonetic method are in part a consequence of
the interaction between their complexity and
the real world in which they are used.
 It is one of the aims of this paper to explain
how this comes about.
 The basic ideas which we will be focussing
over here are like; What speech sounds are like,
What is a voice? Forensic speaker
identification, voice comparison, Forensic-
phonetic speaker identification etc.
 The most common task in forensic speaker
identification involves the comparison of one
or more samples of an unknown voice
(sometimes called the questioned sample) with
one or more samples of a known voice.
 Often the unknown voice is that of the
individual alleged to have committed an
offence (hereafter called the offender) and the
known voice belongs to the suspect.
 Both prosecution and defence are then
concerned with being able to say whether the
two samples have come from the same person,
and thus being able either to identify the
suspect as the offender or to eliminate them
from suspicion.
 Sometimes it is important to be able to attach a
voice to an individual, or not, irrespective of
questions of guilt.
 In order to tell whether the same voice is
present in two or more speech samples, it must
be possible to tell the difference between, or
discriminate between voices.
 Put more accurately, it must be possible to
discriminate between samples from the voice of
the same speaker and samples from the voices
of different speakers.
 So identification in this sense is the secondary
result of a process of discrimination.
 The suspect may be identified as the offender
to the extent that the evidence supports the
hypothesis that questioned and suspect
samples are from the same voice.
 If not, no identification results.
 In this regard, therefore, the identification in
forensic speaker identification is somewhat
imprecise.
 In criminalistics, the identification process
seeks individualisation.
 identifying a person or an object means that it
is possible to distinguish this person or object
from all others on the surface of the Earth.
 The forensic individualisation process can be
seen as a reduction process beginning from an
initial population to a single person.
 Recently, an investigation concerning the
inference of identity in forensic speaker
recognition has shown the inadequacy of the
main solutions proposed to assess the evidence
in this field.
 The concept of identity underlying the
verification and the identification tasks does
not correspond to the concept of identity
accepted in forensic science.(C Cham pod, et
al., 2000)
 Speaker verification is the other common task
in speaker recognition.
 This is where ‘an identity claim from an
individual is accepted or rejected by comparing
a sample of his speech against a stored
reference sample by the individual whose
identity he is claiming’
 The aim of speaker identification is, not
surprisingly, identification: ‘to identify an
unknown voice as one or none of a set of known
voices’.
 One has a speech sample from an unknown
speaker, and a set of speech samples from different
speakers the identity of whom is known.
 The task is to compare the sample from the
unknown speaker with the known set of samples,
and determine whether it was produced by any of
the known speakers.
 In speaker identification, the reference set of
known speakers can be of two types: closed or
open.
 This distinction refers to whether the set is known
to contain a sample of the unknown voice or not.
 A closed reference set means that it is known that
the owner of the unknown voice is one of the
known speakers.
 An open set means that it is not known whether the
owner of the unknown voice is present in the
reference set or not.
 MFCC (Mel-Frequency Cepstral Coefficients )
 The most easiest and prevalent method to
extract spectral features is calculating the Mel-
Frequency Cepstral Coefficients (MFCC) from
human voice.
 It is one of the most popular methods of feature
extraction used in speech recognition systems.
It is based on frequency domain using the Mel
scale which is based on the human ear scale.
 Time domain features are less accurate than the
frequency domain features. The main aim of
feature extraction is to reduce the size of the
speech signal before the recognition of the
signal.
 Steps involved in feature extraction are pre-
emphasis, framing, windowing, fast fourier
transform, Mel-frequency filtering, Logarithmic
function and Discrete Cosine Transform etc.(
 Douglas A, et al., 1995)
 The first step in MFCC is pre-emphasis which is
used to boost the high frequencies of a speech
signal which are lost during speech production.
 Pre-emphasis is needed because high frequency
components of the speech signal have small
amplitude with respect to low frequency
components. Therefore higher frequencies are
artificially boosted in order to increase the signal-
to- noise ratio.
 Next, is framing which is used to block the frames
obtained by analog to digital conversion (ADC) of
speech signal.
 The number of samples in each frame is chosen
as 256 and the number of samples overlapping
between adjacent frames is 128.
 Overlapping frames are used to acquire the
information from the boundaries of the frame.
 Due to discontinuities at the start and the end
of the frame causes undesirable effects in the
frequency response, so windowing is used to
eliminate the discontinuities at the edges.
 In the discipline of speaker recognition a wide
range of methods and procedures are adopted
by the experts for identification.
 Such type of analysis involves a group of trained
phoneticians giving their judgement regarding the
similarity and dissimilarity between the two
speech events, after hearing the samples again and
again to find out some similarities in their
linguistic, phonetic and acoustic features.
 Human listeners are robust speaker recognizers
when presented with the degraded speech.
 Listener performance free from all types of
limitations like the signal to noise ratio, speech
bandwidth, the amount of speech material,
distortions occurring in the speech signals as a
result of speech coding, transmission systems, etc.
 In this technique, different utterances of the
speakers are segregated in respect of each speaker
by way of repeated listening of recorded
conversation.
 The segregated conversations of each speaker are
repeatedly heard to identify linguistic features and
phonetic features like articulation rate, flow of
speech, degree of vowels and consonant formation,
rhythm, striking time, pauses etc.
 There are cues in voice and speech behaviour,
which are individual and thus make it possible to
recognize the familiar voices.
 This involves the semi-automatic
measurements of particular acoustic speech
parameters such as vowel formants,
articulation rate, which is sometimes combined
with the results of auditory phonetic analysis
by a human expert.
 In 1941, an electro mechanical acoustic
spectrograph was developed by Dr. Raleph
Potter, Bell Telephone
 Laboratory, with an idea to convert sounds into
pictures. (Kent RD, Read C 2001) A sound
spectrograph is an instrument which is able to
give a permanent record of changing energy-
frequency distribution throughout the time of a
speech wave.
 The spectrograms are the graphic displays of
the amplitude as a function of both frequency
and time.
 Examiners visually inspect and compare
similarities or differences of patterns of the energy
distribution in the spectrograms.
 It is generally believed that formant structures and
other spectral characteristics which are evident
from a spectrogram are unique for each individual.
 The most widely used features are fundamental
frequencies, formant bandwidths, formant
frequencies, spectral composition of fricatives and
plosives for individual segments, and transitions.
 However, the main drawback of this voiceprint analysis is that the
spectrograms of the speech signal from same individual will show large
intra speaker variations, because of the fact that no speaker actually is
capable of producing two identical speech utterances(Gfroerer S 2003).
 This method is obviously neither objective nor superior to aural-
perceptual methods; it is basically a shifting of subjective judgement to
the visual domain.
 The objectivity, reliability and validity of the method have been discussed
controversially.
 The method has been widely used in the US, parts of Europe and other
countries until the 1980s but in the present scenario it has been losing its
ground.
 The FBI are using it for investigative purposes, most U.S. courts do not
accept voiceprint evidence.
 Today voiceprint identification is not used in forensic labs in Europe, but
still practised in developing countries like China, Vietnam etc.
 This approach differs greatly from the earlier
methods used for identification as it is both
universal as well as automatic.
 It is considered universal because it does not focus
on specific acoustic parameters and consider the
speech as a continuously varying complex wave or
signal.
 While, it’s automatic nature reduces the subjective
evaluation of any speech material to minimum.
 Most of such automatic identification system
today involves techniques like:
 The Gaussian Mixture Model(GMM) is a
parametric probability density function which
is represented as a weighted sum of Gaussian
component densities.
 It is used as a parametric model of probability
distribution of measuring features in biometric
systems.
 Gaussian Mixture Model(GMM) is used as a
classifier to compare the features extracted
from the MFCC with the stored templates
 The long- term speech spectrum is used as an important cue
of determining the voice quality . In this technique, large
number of feature vectors is collected for each known
speaker.
 The average and variance of each component of the feature
vector are calculated, and vector of mean value, and vector
of the variances, is used to model each speaker.
 A similar model is made for the unknown speaker.
 This technique is most useful for text independent
recognition, where large amount of data is required for
construction of the speaker’s model.
 This method will not be beneficial if the utterances are too
short and if contains the insufficient amount of data.
 The major disadvantage of long-term
averaging is that each speaker’s model consists
of a single cluster of data represented by an
average and variance vector.
 If the data contain multiple clusters of vectors,
the variance will be very high. Since human
speech is composed primarily of vowels, it is
natural to expect feature vectors to form
clusters, each one based on the pronunciation
of a specific vowel
 This is a technique in which each speaker’s model is
prepared which consists of several clusters of data, along
with their centroids.
 VQ reduces these sets of vectors to a codebook, which
provides an efficient way of building and comparing models
of speakers . VQ is used in several ways in speaker
recognition.
 In some systems it is used simply to compress data. In other
systems, VQ is a preprocessing step for other methods such
as HMMs.
 For text-dependent identification and verification several
codebooks are created or “trained” for each speaker, who
speaks a prescribed text several times.
 These codebooks are considered as the speaker’s template.
During the operational phase the same prescribed text is
spoken by the unknown person.
 The comparison is done on the basis of
observed differences or similarities between the
unknown person’s template, and each trained
template, after removing the variations in the
speaking rate.
 For text-independent speaker recognition a
single codebook is created for each speaker.
The codebook is considered as an accurate
created for each speaker.
 The codebook is considered as an accurate model
of the speaker because it is formed from a much
larger amount of speech than in the text-dependent
case.
 This method introduces a new factor affecting the
performance of the system, which is code-book
size. Larger codebooks will perform a better job of
characterizing a speaker’s voice, but these results
in increased computational expenses and the
danger of not producing results in real time, which
is a significant factor for verification.
 The advantage of this method is that it requires
only a small amount of data to create a speaker’s
model without causing any loss to the accuracy.
 The phenomenon of tendering tape recorded
conversation before law courts as evidence,
particularly in cases arising under the Prevention of
Corruption Act, where such conversation is recorded
by sending the complainant with a recording device to
the person demanding or offering bribe has almost
become a common practice now.
 In civil cases also parties may rely upon tape records
of relevant conversation to support their version.
 In such cases the court has to face various questions
regarding admissibility, nature and evidentiary value
of such a tape- recorded conversation.
 The Indian Evidence Act, prior to its being
amended by the Information Technology Act,
2000, mainly dealt with evidence, which was in
oral or documentary form.
 Nothing was there to point out about the
admissibility, nature and evidentiary value of a
conversation or statement recorded in an electro-
magnetic device.
 Being confronted with the question of this nature
and called upon to decide the same, the law courts
in India as well as in England devised and
developed principles so that such evidence must
be received in law courts and acted upon. (Adv KC
Suresh 2011)
 In India at Chandigarh Forensic Science
laboratories voice identification techniques are
regularly conducted and the Supreme Court has
held that voice identification data is admissible in
court.
 In India at Bangalore, SRC Institute of Speech and
Hearing has the facility for voice analysis.
 The All India Institute of Speech and Hearing,
Mysore, which has been working in the field for
many years now, even wants to start a one-year
PG Diploma course in forensic voice analysis.
 The Michigan state police set up a voice
identification unit in 1966. Sound spectrograph
evidence was first admitted into a court in 1967
during a military trial (court-martial), United
States v. Wright.
 Judge Ferguson wrote a lengthy dissent, saying
that voice identification by sound spectrograph
did not meet the Frye standard of general
acceptance by the scientific community.(Lisa
Yount 2007)
 The first reported application of the voiceprint
technique in a criminal proceeding occurred in the
1966 case of People v. Straehle.
 The defendant, a police officer, had telephoned
the operator of an illicit gambling enterprise to
warn him of an impending police raid.
 Later, during a grand jury inquiry, the police
officer denied making the call.
 At the ensuing perjury trial, the prosecution
introduced voiceprints of the telephone calls
and sample voiceprints of the defendant's
voice, supported by the expert opinion of
Lawrence Kersta that all recordings were of the
defendant's voice.( John F Decker, et al., 1977)
 In 1976 the New York Supreme Court pointed out, in
the case of People v. Rogers, that fifty different trial
courts had admitted spectrographic voice identification
evidence, as had fourteen out of fifteen U. S. District
Court judges, and only two out of thirty- seven states
considering the issue had rejected admission.
 The Rogers court stated that this technique, when
accompanied by aural examination and conducted by a
qualified examiner, had now reached the level of
general scientific acceptance by those who would be
expected to be familiar with its use, and as such, has
reached the level of scientific acceptance and reliability
necessary for admission. (Adv KC Suresh 2011).
 The lead story from Washington Post this
morning is regarding a recording that was thought
to be Donald Trump.
 Trump denied the recording was his voice.
 Primeau Forensics was asked by the media to
perform a forensic voice identification test to
determine if the unknown voice in the Washington
Post story features the voice of Donald
Trump.Primeau Forensics located a C-Span
interview from 1991 titled ‘Donald Trump on
Economic Recovery’.
 We chose this recording as the ‘known’ Donald
Trump voice for forensic comparison.
 We chose this older voice sample because it
was closer in time to the ‘unknown’ recording.
 The biometric software program that we used
is a Speech Pro Product titled ‘SIS 2’.
 We formatted each speech sample based on
training received from Owen Forensic Services
and loaded them into the biometric software.
 The result was a 98% mismatch meaning the
‘unknown’ voice recording that surfaced in the
Washington Post today is NOT the voice of
Donald Trump.
 As Cain explained in an article he wrote for the
Criminal Division of the U.S. Department of Justice —
in collaboration with Lonnie Smrkovski, chief of the
voiceprint unit of the Michigan State Police and Mindy
Wilson, a psychologist and private examiner practicing
in Lansing, Michigan — the fundamental principle of
voice identification rests on the fact that like a
fingerprint, every voice is unique and "individually
characteristic enough to distinguish it from others
through...analysis”.
 Fingerprints are identified through literal analysis;
voices are identified through comparative voiceprints.
 Cain points out that uniqueness in human speech is the
product of two general factors.
 "The first," he says, "lies in the sizes of the vocal cavities such as the
throat, nasal and oral cavities and the shape, length and tension in an
individual's vocal cords located in the larynx. The vocal cavities are
resonators, much like organ pipes, which reinforce some of the overtones
produced by the vocal cords, which produce formats or voiceprint bars.
 The likelihood that two people would have exactly the same size and
configuration (is) very remote."
 The second factor in determining voice uniqueness is the manner in
which the "articulators" or muscles of speech are manipulated when an
individual is talking. The articulators include the lips, teeth, tongue, soft
palate and jaw muscles, "whose controlled interplay"— Cain explains —
"produces the second factor in determining voice uniqueness is the
manner in which the "articulators" or muscles of speech are manipulated
when an individual is talking.
 The articulators include the lips, teeth, tongue, soft palate and jaw
muscles, "whose controlled interplay"— Cain explains — "produces
intelligible speech...The likelihood that two persons could develop
identical use patterns of their articulators also appears to be very remote."
 While Cain agrees that "there is disagreement
in the so-called 'scientific community' on the
degree of accuracy with which examiners can
identify speakers under all conditions, there is
agreement that voices can, m fact, be
identified."
 GMM
 For acquiring the results the speech signal is
recoded. The system is trained for multiple
words such as Samosa, Dosa , Tea etc.
 The results for the word Samosa are shown.
 The speech signal which is recorded for the
word Samosa
 Short duration samples are more demanding and should be carefully
analysed.
 Dissimilarity in the language of questioned and specimen voice samples.
 Emotion variability in questioned and specimen sample.
 Misspoken or misread prompted phrases.
 Poorly recorded/noisy samples are difficult to analyse.
 Insufficient number of comparable words.
 Disguise in speech samples poses s problem in speaker identification.
 Extreme emotional state.
 Change in physical state of speaker (e.g. effect of alcohol).
 The attitude of how the speech is said by the speaker.
 Channel mismatch or mismatch in recording condition.
 Different pronunciation speed of the test data compared with the training
data.
 Speaker’s health.
 Aging (the vocal tract can drift away from models with age).
 Thus we are able to recognize multiple words such as Samosa, Dosa, Tea
and is converted into text by using this paper.
 This system is suitable with an environment with less ambient noise.
 The system provides good performance with respect to other systems.
 It can be concluded that GMM provides more accuracy.
 In lieu of the above discussion, it can be inferred that the comparison of
voice samples is quite complicated but absolutely possible.
 The skill of an examiner itself along with chosen parameters and selection
of appropriate technique for identification is largely decisive and can
facilitate accurate and conclusive results.
 There have been many advancements and success made in this field,
however, much remains to be done in order to overpower the daunting
limitations which still prevails and limits the process.
 If we successfully overcome all such limitations, this technique with its
promising features will have an obvious advantage over the pre-existing
ones for establishing individual identity
1. C. Champod and D. Meuwly, The inference of identity in forensic
speaker recognition, Speech Communication, vol. 31, pp. 193-203,
2000.
2. Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker
Identification using Gaussian Mixture Speaker Models. IEEE
Transactions on Acoustics, Speech, and Signal Processing 3(1) (1995)
72–83
3. Zetterholm E (2007) Detection of speaker characteristics using voice
imitation. Springer Berlin Heidelberg 4441: 192-205.
4. Braun A, Kunzel HJ (1998) Is forensic speaker identification
unethical - or can it be unethical not to do it?. forensic linguistics 5:
10-21.
5. Kent RD, Read C (2001) The acoustic analysis of speech. university
of Wisconsin- Madison, A.I.T.B.S Publishers and distributors, Delhi.
6. Samudravijaya K (2003) Speech and speaker recognition: a tutorial.
Tata institute of fundamental research, Mumbai.
7. YA (2000) A research paper in forensic science. the university of Auckland,
New Zealand.
8. Gfroerer S (2003) Auditory-instrumental forensic speaker recognition.
Eurospeech, Geneva.
9. Harmegnies B, Landercy A (1988) Intra-speaker variability of the long
term speech pattern. Speech communication 7: 81-86.
10. Kekre HB, Sarode TK (2008) Speech data compression using vector
quantization. International journal of computer and information science
and engineering 2:8.
11. Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time
sequential images using hidden markov model. IEEE: 379-385.
12. Abdulla WH, Kasabov NK (1999) The concepts of hidden markov model
in speech recognition. Information Science Discussion Papers 99/09,
university of Otago, New Zealand: 1-40.
13. Bennani Y, Gallinari P (1995) Neural networks for discrimination and
modelization of speakers. Speech communication 17: 159-175.
14. Nakasone H, Beck SD (2001) Forensic automatic speaker identification.
paper presented at- a speaker odyssey, Crete, Greece.
15. Zetterholm E (2007) Detection of speaker characteristics using voice
imitation. Springer Berlin Heidelberg 4441: 192-205.
16.https://www.deepdyve.com/lp/elsevier/interpol-survey-of-
the-use-of-speaker-identification-by-law-
vewW0By1LL?viewMode=multi#bsSignUpModal.
.17.https://indiankanoon.org/
voice recognition

More Related Content

What's hot

EAR PRINT.pptx
EAR PRINT.pptxEAR PRINT.pptx
EAR PRINT.pptx
MATANGI LAD
 
Latent fingerprint development
Latent fingerprint developmentLatent fingerprint development
Latent fingerprint development
Tejasvi Bhatia
 
ear print.pptx
ear print.pptxear print.pptx
ear print.pptx
MATANGI LAD
 
Development of Latent Fingerprints
Development of Latent FingerprintsDevelopment of Latent Fingerprints
Development of Latent Fingerprints
Hamza Mohammad
 
Fingerprint - Everything You Need To Know About Fingerprints
Fingerprint - Everything You Need To Know About FingerprintsFingerprint - Everything You Need To Know About Fingerprints
Fingerprint - Everything You Need To Know About Fingerprints
SwaroopSonone
 
TOOL MARKS
TOOL MARKSTOOL MARKS
VSC ppt forensic science Shailesh Chaubey .pptx
VSC ppt  forensic science Shailesh Chaubey .pptxVSC ppt  forensic science Shailesh Chaubey .pptx
VSC ppt forensic science Shailesh Chaubey .pptx
SHAILESH CHAUBEY
 
Forensic analysis of foot wear impression
Forensic analysis of foot wear impressionForensic analysis of foot wear impression
Forensic analysis of foot wear impression
SURYAKANT MISHRA
 
Touch dna
Touch dnaTouch dna
Touch dna
ATHULKSABU
 
Crime scene management
Crime scene managementCrime scene management
Crime scene management
Hafeez Bhutta
 
Bloodstain pattern analysis
Bloodstain pattern analysisBloodstain pattern analysis
Bloodstain pattern analysis
chaitra pradeep
 
Types of Crime Scenes
Types of Crime ScenesTypes of Crime Scenes
Types of Crime Scenes
Don Caeiro
 
Individual Characteristics.pptx
Individual Characteristics.pptxIndividual Characteristics.pptx
Individual Characteristics.pptx
Soham Bhattacharya
 
Gait forensic science
Gait forensic scienceGait forensic science
Gait forensic science
Nupur Walia
 
ESDA
ESDAESDA
Forensic Question Document Examination,
Forensic Question Document Examination,Forensic Question Document Examination,
Forensic Question Document Examination,
Ishan Tiwari
 
Small Particle Reagent Technique of Fingerprint Development
Small Particle Reagent Technique of Fingerprint DevelopmentSmall Particle Reagent Technique of Fingerprint Development
Small Particle Reagent Technique of Fingerprint Development
RitujaGharote
 
Pattern recognition palm print authentication system
Pattern recognition palm print authentication systemPattern recognition palm print authentication system
Pattern recognition palm print authentication system
Mazin Alwaaly
 
Improvised Firearms
Improvised FirearmsImprovised Firearms
Improvised Firearms
Ketan Patil
 
Class Characteristics of Handwriting.pptx
Class Characteristics of Handwriting.pptxClass Characteristics of Handwriting.pptx
Class Characteristics of Handwriting.pptx
Soham Bhattacharya
 

What's hot (20)

EAR PRINT.pptx
EAR PRINT.pptxEAR PRINT.pptx
EAR PRINT.pptx
 
Latent fingerprint development
Latent fingerprint developmentLatent fingerprint development
Latent fingerprint development
 
ear print.pptx
ear print.pptxear print.pptx
ear print.pptx
 
Development of Latent Fingerprints
Development of Latent FingerprintsDevelopment of Latent Fingerprints
Development of Latent Fingerprints
 
Fingerprint - Everything You Need To Know About Fingerprints
Fingerprint - Everything You Need To Know About FingerprintsFingerprint - Everything You Need To Know About Fingerprints
Fingerprint - Everything You Need To Know About Fingerprints
 
TOOL MARKS
TOOL MARKSTOOL MARKS
TOOL MARKS
 
VSC ppt forensic science Shailesh Chaubey .pptx
VSC ppt  forensic science Shailesh Chaubey .pptxVSC ppt  forensic science Shailesh Chaubey .pptx
VSC ppt forensic science Shailesh Chaubey .pptx
 
Forensic analysis of foot wear impression
Forensic analysis of foot wear impressionForensic analysis of foot wear impression
Forensic analysis of foot wear impression
 
Touch dna
Touch dnaTouch dna
Touch dna
 
Crime scene management
Crime scene managementCrime scene management
Crime scene management
 
Bloodstain pattern analysis
Bloodstain pattern analysisBloodstain pattern analysis
Bloodstain pattern analysis
 
Types of Crime Scenes
Types of Crime ScenesTypes of Crime Scenes
Types of Crime Scenes
 
Individual Characteristics.pptx
Individual Characteristics.pptxIndividual Characteristics.pptx
Individual Characteristics.pptx
 
Gait forensic science
Gait forensic scienceGait forensic science
Gait forensic science
 
ESDA
ESDAESDA
ESDA
 
Forensic Question Document Examination,
Forensic Question Document Examination,Forensic Question Document Examination,
Forensic Question Document Examination,
 
Small Particle Reagent Technique of Fingerprint Development
Small Particle Reagent Technique of Fingerprint DevelopmentSmall Particle Reagent Technique of Fingerprint Development
Small Particle Reagent Technique of Fingerprint Development
 
Pattern recognition palm print authentication system
Pattern recognition palm print authentication systemPattern recognition palm print authentication system
Pattern recognition palm print authentication system
 
Improvised Firearms
Improvised FirearmsImprovised Firearms
Improvised Firearms
 
Class Characteristics of Handwriting.pptx
Class Characteristics of Handwriting.pptxClass Characteristics of Handwriting.pptx
Class Characteristics of Handwriting.pptx
 

Similar to voice recognition

Speaker identification based on temporal parameters
Speaker identification based on temporal parametersSpeaker identification based on temporal parameters
Speaker identification based on temporal parameters
Alexandria University
 
B110512
B110512B110512
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECHBASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
IJCSEA Journal
 
Voice
VoiceVoice
Voice
replay21
 
Perception of sounds
Perception of soundsPerception of sounds
Perception of sounds
Aseel K. Mahmood
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
International Journal of Science and Research (IJSR)
 
UNDERSTANDING SPEECH
UNDERSTANDING SPEECHUNDERSTANDING SPEECH
UNDERSTANDING SPEECH
juli ani
 
A Phonetic Forensic Analysis of Imran Khan’s Speeches.pdf
A Phonetic Forensic Analysis of Imran Khan’s Speeches.pdfA Phonetic Forensic Analysis of Imran Khan’s Speeches.pdf
A Phonetic Forensic Analysis of Imran Khan’s Speeches.pdf
Faiz Ullah
 
Deaf Speech Assessment Using Digital Processing Techniques
Deaf Speech Assessment Using Digital Processing TechniquesDeaf Speech Assessment Using Digital Processing Techniques
Deaf Speech Assessment Using Digital Processing Techniques
sipij
 
Gender voice classification with huge accuracy rate
Gender voice classification with huge accuracy rateGender voice classification with huge accuracy rate
Gender voice classification with huge accuracy rate
TELKOMNIKA JOURNAL
 
The human mind at work
The human mind at workThe human mind at work
The human mind at work
Faith Clavaton
 
Clinical linguistics (presentasi 1)
Clinical linguistics (presentasi 1)Clinical linguistics (presentasi 1)
Clinical linguistics (presentasi 1)
HusniThamrin30
 
Voice Emotion Recognition
Voice Emotion RecognitionVoice Emotion Recognition
Voice Emotion Recognition
paperpublications3
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
Jeff Nelson
 
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
indexPub
 
Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...
ijnlc
 
Final Edited Deliverable
Final Edited DeliverableFinal Edited Deliverable
Final Edited Deliverable
skylerdan
 

Similar to voice recognition (20)

Speaker identification based on temporal parameters
Speaker identification based on temporal parametersSpeaker identification based on temporal parameters
Speaker identification based on temporal parameters
 
B110512
B110512B110512
B110512
 
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECHBASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
 
ESSAY2
ESSAY2ESSAY2
ESSAY2
 
Voice
VoiceVoice
Voice
 
Perception of sounds
Perception of soundsPerception of sounds
Perception of sounds
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
 
UNDERSTANDING SPEECH
UNDERSTANDING SPEECHUNDERSTANDING SPEECH
UNDERSTANDING SPEECH
 
T0 numtq0nzq=
T0 numtq0nzq=T0 numtq0nzq=
T0 numtq0nzq=
 
A Phonetic Forensic Analysis of Imran Khan’s Speeches.pdf
A Phonetic Forensic Analysis of Imran Khan’s Speeches.pdfA Phonetic Forensic Analysis of Imran Khan’s Speeches.pdf
A Phonetic Forensic Analysis of Imran Khan’s Speeches.pdf
 
Deaf Speech Assessment Using Digital Processing Techniques
Deaf Speech Assessment Using Digital Processing TechniquesDeaf Speech Assessment Using Digital Processing Techniques
Deaf Speech Assessment Using Digital Processing Techniques
 
Gender voice classification with huge accuracy rate
Gender voice classification with huge accuracy rateGender voice classification with huge accuracy rate
Gender voice classification with huge accuracy rate
 
The human mind at work
The human mind at workThe human mind at work
The human mind at work
 
Clinical linguistics (presentasi 1)
Clinical linguistics (presentasi 1)Clinical linguistics (presentasi 1)
Clinical linguistics (presentasi 1)
 
Voice Emotion Recognition
Voice Emotion RecognitionVoice Emotion Recognition
Voice Emotion Recognition
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
 
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...
 
Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...
 
Final Edited Deliverable
Final Edited DeliverableFinal Edited Deliverable
Final Edited Deliverable
 

More from Hemant Jain

Internal ballistic
Internal ballisticInternal ballistic
Internal ballistic
Hemant Jain
 
ammunition and its classification
ammunition and its classificationammunition and its classification
ammunition and its classification
Hemant Jain
 
Firearms
FirearmsFirearms
Firearms
Hemant Jain
 
restoration of toolmarks
restoration of toolmarksrestoration of toolmarks
restoration of toolmarks
Hemant Jain
 
Terminal ballistics
Terminal ballisticsTerminal ballistics
Terminal ballistics
Hemant Jain
 
History of firearms
History of firearmsHistory of firearms
History of firearms
Hemant Jain
 
Glass fracture
Glass fracture Glass fracture
Glass fracture
Hemant Jain
 
Firearm and its classification.
Firearm and its classification.Firearm and its classification.
Firearm and its classification.
Hemant Jain
 
Explosives and its classification
Explosives and its classificationExplosives and its classification
Explosives and its classification
Hemant Jain
 
photography, forensic photography and its explanation
photography, forensic photography and its explanationphotography, forensic photography and its explanation
photography, forensic photography and its explanation
Hemant Jain
 

More from Hemant Jain (10)

Internal ballistic
Internal ballisticInternal ballistic
Internal ballistic
 
ammunition and its classification
ammunition and its classificationammunition and its classification
ammunition and its classification
 
Firearms
FirearmsFirearms
Firearms
 
restoration of toolmarks
restoration of toolmarksrestoration of toolmarks
restoration of toolmarks
 
Terminal ballistics
Terminal ballisticsTerminal ballistics
Terminal ballistics
 
History of firearms
History of firearmsHistory of firearms
History of firearms
 
Glass fracture
Glass fracture Glass fracture
Glass fracture
 
Firearm and its classification.
Firearm and its classification.Firearm and its classification.
Firearm and its classification.
 
Explosives and its classification
Explosives and its classificationExplosives and its classification
Explosives and its classification
 
photography, forensic photography and its explanation
photography, forensic photography and its explanationphotography, forensic photography and its explanation
photography, forensic photography and its explanation
 

Recently uploaded

PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
christianmathematics
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Reflective and Evaluative Practice...pdf
Reflective and Evaluative Practice...pdfReflective and Evaluative Practice...pdf
Reflective and Evaluative Practice...pdf
amberjdewit93
 
MERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDFMERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDF
scholarhattraining
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
Reflective and Evaluative Practice PowerPoint
Reflective and Evaluative Practice PowerPointReflective and Evaluative Practice PowerPoint
Reflective and Evaluative Practice PowerPoint
amberjdewit93
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
AG2 Design
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
kitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptxkitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptx
datarid22
 

Recently uploaded (20)

PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Reflective and Evaluative Practice...pdf
Reflective and Evaluative Practice...pdfReflective and Evaluative Practice...pdf
Reflective and Evaluative Practice...pdf
 
MERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDFMERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDF
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
Reflective and Evaluative Practice PowerPoint
Reflective and Evaluative Practice PowerPointReflective and Evaluative Practice PowerPoint
Reflective and Evaluative Practice PowerPoint
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
kitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptxkitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptx
 

voice recognition

  • 1. DR.HARISINGHGOURVISHWAVIDYALAYA TOPIC :- VOICE ANALYSIS SUBMITTEDTO:- Dr. NAVJOT KAUR KANWAL ( DEPARTMENTOFCRIMINOLOGYANDFORENSICSCIENCE) SUBMITTEDBY :- NIKHIL KUMAR SINGH REGISTRAION NO.=Y19242514
  • 2.  A voice is more than just a string of sounds. Voices are inherently complex.  They signal a great deal of information in addition to the intended linguistic message: the speaker’s sex, for example, or their emotional state or state of health.  Some of this information is clearly of potential forensic importance.  However, the different types of information conveyed by a voice are not signalled in separate channels, but are convolved together with the linguistic message.  Knowledge of how this occurs is necessary to interpret the ubiquitous variation in speech, and to assess the comparability of speech samples.
  • 3.  Speaker’s identification is the process of determining whether two or more recordings of speech are from the same speaker.  Speaker identification can be very effective, contributing to both conviction and elimination of suspect. In this task, a voice print of an unknown speaker is analysed and then compared with speech samples of known speakers.  The unknown speaker is identified as the speaker whose model best matches the input model; it is the identification of a person from characteristics of voices.
  • 4.  It is the process of automatically recognising who is speaking by using the speaker specific information included in the speech waves to verify identities being claimed by people accessing systems i.e.; it enables access control of various services by voice.  Applicable services include voice dialling, banking over a telephone network, telephone shopping, database access network, information and reservation services, voice mail, security control for confidential information and remote access to computers.  Another important application of speaker recognition technology is as a forensic tool.
  • 5.  Speaker identification in the forensic context is usually about comparing voices.  Probably the most common task involves the comparison of one or more samples of an offender’s voice with one or more samples of a suspect’s voice.  Voices are important things for humans. They are the medium through which we do a lot of communicating with the outside world: our ideas, of course, but also our emotions and our personality.
  • 6.  Voices are also one of the media through which we (successfully, most of the time) recognise other humans who are important to us – members of our family, media personalities, our friends and enemies.  Although evidence from DNA analysis is potentially vastly more eloquent in its power than evidence from voices, DNA can’t talk.  It can’t be recorded planning, carrying out or confessing to a crime. It can’t be so apparently directly incriminating.  Perhaps it is these features that contribute to the interest and importance of FSI.
  • 7.  Voices are extremely complex things, and some of the inherent limitations of the forensic- phonetic method are in part a consequence of the interaction between their complexity and the real world in which they are used.  It is one of the aims of this paper to explain how this comes about.
  • 8.  The basic ideas which we will be focussing over here are like; What speech sounds are like, What is a voice? Forensic speaker identification, voice comparison, Forensic- phonetic speaker identification etc.
  • 9.  The most common task in forensic speaker identification involves the comparison of one or more samples of an unknown voice (sometimes called the questioned sample) with one or more samples of a known voice.  Often the unknown voice is that of the individual alleged to have committed an offence (hereafter called the offender) and the known voice belongs to the suspect.
  • 10.  Both prosecution and defence are then concerned with being able to say whether the two samples have come from the same person, and thus being able either to identify the suspect as the offender or to eliminate them from suspicion.  Sometimes it is important to be able to attach a voice to an individual, or not, irrespective of questions of guilt.
  • 11.  In order to tell whether the same voice is present in two or more speech samples, it must be possible to tell the difference between, or discriminate between voices.  Put more accurately, it must be possible to discriminate between samples from the voice of the same speaker and samples from the voices of different speakers.  So identification in this sense is the secondary result of a process of discrimination.
  • 12.  The suspect may be identified as the offender to the extent that the evidence supports the hypothesis that questioned and suspect samples are from the same voice.  If not, no identification results.  In this regard, therefore, the identification in forensic speaker identification is somewhat imprecise.
  • 13.  In criminalistics, the identification process seeks individualisation.  identifying a person or an object means that it is possible to distinguish this person or object from all others on the surface of the Earth.  The forensic individualisation process can be seen as a reduction process beginning from an initial population to a single person.
  • 14.  Recently, an investigation concerning the inference of identity in forensic speaker recognition has shown the inadequacy of the main solutions proposed to assess the evidence in this field.  The concept of identity underlying the verification and the identification tasks does not correspond to the concept of identity accepted in forensic science.(C Cham pod, et al., 2000)
  • 15.  Speaker verification is the other common task in speaker recognition.  This is where ‘an identity claim from an individual is accepted or rejected by comparing a sample of his speech against a stored reference sample by the individual whose identity he is claiming’
  • 16.  The aim of speaker identification is, not surprisingly, identification: ‘to identify an unknown voice as one or none of a set of known voices’.  One has a speech sample from an unknown speaker, and a set of speech samples from different speakers the identity of whom is known.  The task is to compare the sample from the unknown speaker with the known set of samples, and determine whether it was produced by any of the known speakers.
  • 17.
  • 18.  In speaker identification, the reference set of known speakers can be of two types: closed or open.  This distinction refers to whether the set is known to contain a sample of the unknown voice or not.  A closed reference set means that it is known that the owner of the unknown voice is one of the known speakers.  An open set means that it is not known whether the owner of the unknown voice is present in the reference set or not.
  • 19.  MFCC (Mel-Frequency Cepstral Coefficients )  The most easiest and prevalent method to extract spectral features is calculating the Mel- Frequency Cepstral Coefficients (MFCC) from human voice.  It is one of the most popular methods of feature extraction used in speech recognition systems. It is based on frequency domain using the Mel scale which is based on the human ear scale.
  • 20.  Time domain features are less accurate than the frequency domain features. The main aim of feature extraction is to reduce the size of the speech signal before the recognition of the signal.  Steps involved in feature extraction are pre- emphasis, framing, windowing, fast fourier transform, Mel-frequency filtering, Logarithmic function and Discrete Cosine Transform etc.(  Douglas A, et al., 1995)
  • 21.
  • 22.  The first step in MFCC is pre-emphasis which is used to boost the high frequencies of a speech signal which are lost during speech production.  Pre-emphasis is needed because high frequency components of the speech signal have small amplitude with respect to low frequency components. Therefore higher frequencies are artificially boosted in order to increase the signal- to- noise ratio.  Next, is framing which is used to block the frames obtained by analog to digital conversion (ADC) of speech signal.
  • 23.  The number of samples in each frame is chosen as 256 and the number of samples overlapping between adjacent frames is 128.  Overlapping frames are used to acquire the information from the boundaries of the frame.  Due to discontinuities at the start and the end of the frame causes undesirable effects in the frequency response, so windowing is used to eliminate the discontinuities at the edges.
  • 24.  In the discipline of speaker recognition a wide range of methods and procedures are adopted by the experts for identification.
  • 25.  Such type of analysis involves a group of trained phoneticians giving their judgement regarding the similarity and dissimilarity between the two speech events, after hearing the samples again and again to find out some similarities in their linguistic, phonetic and acoustic features.  Human listeners are robust speaker recognizers when presented with the degraded speech.  Listener performance free from all types of limitations like the signal to noise ratio, speech bandwidth, the amount of speech material, distortions occurring in the speech signals as a result of speech coding, transmission systems, etc.
  • 26.  In this technique, different utterances of the speakers are segregated in respect of each speaker by way of repeated listening of recorded conversation.  The segregated conversations of each speaker are repeatedly heard to identify linguistic features and phonetic features like articulation rate, flow of speech, degree of vowels and consonant formation, rhythm, striking time, pauses etc.  There are cues in voice and speech behaviour, which are individual and thus make it possible to recognize the familiar voices.
  • 27.  This involves the semi-automatic measurements of particular acoustic speech parameters such as vowel formants, articulation rate, which is sometimes combined with the results of auditory phonetic analysis by a human expert.  In 1941, an electro mechanical acoustic spectrograph was developed by Dr. Raleph Potter, Bell Telephone
  • 28.  Laboratory, with an idea to convert sounds into pictures. (Kent RD, Read C 2001) A sound spectrograph is an instrument which is able to give a permanent record of changing energy- frequency distribution throughout the time of a speech wave.  The spectrograms are the graphic displays of the amplitude as a function of both frequency and time.
  • 29.  Examiners visually inspect and compare similarities or differences of patterns of the energy distribution in the spectrograms.  It is generally believed that formant structures and other spectral characteristics which are evident from a spectrogram are unique for each individual.  The most widely used features are fundamental frequencies, formant bandwidths, formant frequencies, spectral composition of fricatives and plosives for individual segments, and transitions.
  • 30.  However, the main drawback of this voiceprint analysis is that the spectrograms of the speech signal from same individual will show large intra speaker variations, because of the fact that no speaker actually is capable of producing two identical speech utterances(Gfroerer S 2003).  This method is obviously neither objective nor superior to aural- perceptual methods; it is basically a shifting of subjective judgement to the visual domain.  The objectivity, reliability and validity of the method have been discussed controversially.  The method has been widely used in the US, parts of Europe and other countries until the 1980s but in the present scenario it has been losing its ground.  The FBI are using it for investigative purposes, most U.S. courts do not accept voiceprint evidence.  Today voiceprint identification is not used in forensic labs in Europe, but still practised in developing countries like China, Vietnam etc.
  • 31.  This approach differs greatly from the earlier methods used for identification as it is both universal as well as automatic.  It is considered universal because it does not focus on specific acoustic parameters and consider the speech as a continuously varying complex wave or signal.  While, it’s automatic nature reduces the subjective evaluation of any speech material to minimum.  Most of such automatic identification system today involves techniques like:
  • 32.  The Gaussian Mixture Model(GMM) is a parametric probability density function which is represented as a weighted sum of Gaussian component densities.  It is used as a parametric model of probability distribution of measuring features in biometric systems.  Gaussian Mixture Model(GMM) is used as a classifier to compare the features extracted from the MFCC with the stored templates
  • 33.
  • 34.  The long- term speech spectrum is used as an important cue of determining the voice quality . In this technique, large number of feature vectors is collected for each known speaker.  The average and variance of each component of the feature vector are calculated, and vector of mean value, and vector of the variances, is used to model each speaker.  A similar model is made for the unknown speaker.  This technique is most useful for text independent recognition, where large amount of data is required for construction of the speaker’s model.  This method will not be beneficial if the utterances are too short and if contains the insufficient amount of data.
  • 35.  The major disadvantage of long-term averaging is that each speaker’s model consists of a single cluster of data represented by an average and variance vector.  If the data contain multiple clusters of vectors, the variance will be very high. Since human speech is composed primarily of vowels, it is natural to expect feature vectors to form clusters, each one based on the pronunciation of a specific vowel
  • 36.  This is a technique in which each speaker’s model is prepared which consists of several clusters of data, along with their centroids.  VQ reduces these sets of vectors to a codebook, which provides an efficient way of building and comparing models of speakers . VQ is used in several ways in speaker recognition.  In some systems it is used simply to compress data. In other systems, VQ is a preprocessing step for other methods such as HMMs.  For text-dependent identification and verification several codebooks are created or “trained” for each speaker, who speaks a prescribed text several times.  These codebooks are considered as the speaker’s template. During the operational phase the same prescribed text is spoken by the unknown person.
  • 37.  The comparison is done on the basis of observed differences or similarities between the unknown person’s template, and each trained template, after removing the variations in the speaking rate.  For text-independent speaker recognition a single codebook is created for each speaker. The codebook is considered as an accurate created for each speaker.
  • 38.  The codebook is considered as an accurate model of the speaker because it is formed from a much larger amount of speech than in the text-dependent case.  This method introduces a new factor affecting the performance of the system, which is code-book size. Larger codebooks will perform a better job of characterizing a speaker’s voice, but these results in increased computational expenses and the danger of not producing results in real time, which is a significant factor for verification.  The advantage of this method is that it requires only a small amount of data to create a speaker’s model without causing any loss to the accuracy.
  • 39.  The phenomenon of tendering tape recorded conversation before law courts as evidence, particularly in cases arising under the Prevention of Corruption Act, where such conversation is recorded by sending the complainant with a recording device to the person demanding or offering bribe has almost become a common practice now.  In civil cases also parties may rely upon tape records of relevant conversation to support their version.  In such cases the court has to face various questions regarding admissibility, nature and evidentiary value of such a tape- recorded conversation.
  • 40.  The Indian Evidence Act, prior to its being amended by the Information Technology Act, 2000, mainly dealt with evidence, which was in oral or documentary form.  Nothing was there to point out about the admissibility, nature and evidentiary value of a conversation or statement recorded in an electro- magnetic device.  Being confronted with the question of this nature and called upon to decide the same, the law courts in India as well as in England devised and developed principles so that such evidence must be received in law courts and acted upon. (Adv KC Suresh 2011)
  • 41.  In India at Chandigarh Forensic Science laboratories voice identification techniques are regularly conducted and the Supreme Court has held that voice identification data is admissible in court.  In India at Bangalore, SRC Institute of Speech and Hearing has the facility for voice analysis.  The All India Institute of Speech and Hearing, Mysore, which has been working in the field for many years now, even wants to start a one-year PG Diploma course in forensic voice analysis.
  • 42.  The Michigan state police set up a voice identification unit in 1966. Sound spectrograph evidence was first admitted into a court in 1967 during a military trial (court-martial), United States v. Wright.  Judge Ferguson wrote a lengthy dissent, saying that voice identification by sound spectrograph did not meet the Frye standard of general acceptance by the scientific community.(Lisa Yount 2007)  The first reported application of the voiceprint technique in a criminal proceeding occurred in the 1966 case of People v. Straehle.
  • 43.  The defendant, a police officer, had telephoned the operator of an illicit gambling enterprise to warn him of an impending police raid.  Later, during a grand jury inquiry, the police officer denied making the call.  At the ensuing perjury trial, the prosecution introduced voiceprints of the telephone calls and sample voiceprints of the defendant's voice, supported by the expert opinion of Lawrence Kersta that all recordings were of the defendant's voice.( John F Decker, et al., 1977)
  • 44.  In 1976 the New York Supreme Court pointed out, in the case of People v. Rogers, that fifty different trial courts had admitted spectrographic voice identification evidence, as had fourteen out of fifteen U. S. District Court judges, and only two out of thirty- seven states considering the issue had rejected admission.  The Rogers court stated that this technique, when accompanied by aural examination and conducted by a qualified examiner, had now reached the level of general scientific acceptance by those who would be expected to be familiar with its use, and as such, has reached the level of scientific acceptance and reliability necessary for admission. (Adv KC Suresh 2011).
  • 45.  The lead story from Washington Post this morning is regarding a recording that was thought to be Donald Trump.  Trump denied the recording was his voice.  Primeau Forensics was asked by the media to perform a forensic voice identification test to determine if the unknown voice in the Washington Post story features the voice of Donald Trump.Primeau Forensics located a C-Span interview from 1991 titled ‘Donald Trump on Economic Recovery’.  We chose this recording as the ‘known’ Donald Trump voice for forensic comparison.
  • 46.  We chose this older voice sample because it was closer in time to the ‘unknown’ recording.  The biometric software program that we used is a Speech Pro Product titled ‘SIS 2’.  We formatted each speech sample based on training received from Owen Forensic Services and loaded them into the biometric software.  The result was a 98% mismatch meaning the ‘unknown’ voice recording that surfaced in the Washington Post today is NOT the voice of Donald Trump.
  • 47.  As Cain explained in an article he wrote for the Criminal Division of the U.S. Department of Justice — in collaboration with Lonnie Smrkovski, chief of the voiceprint unit of the Michigan State Police and Mindy Wilson, a psychologist and private examiner practicing in Lansing, Michigan — the fundamental principle of voice identification rests on the fact that like a fingerprint, every voice is unique and "individually characteristic enough to distinguish it from others through...analysis”.  Fingerprints are identified through literal analysis; voices are identified through comparative voiceprints.  Cain points out that uniqueness in human speech is the product of two general factors.
  • 48.  "The first," he says, "lies in the sizes of the vocal cavities such as the throat, nasal and oral cavities and the shape, length and tension in an individual's vocal cords located in the larynx. The vocal cavities are resonators, much like organ pipes, which reinforce some of the overtones produced by the vocal cords, which produce formats or voiceprint bars.  The likelihood that two people would have exactly the same size and configuration (is) very remote."  The second factor in determining voice uniqueness is the manner in which the "articulators" or muscles of speech are manipulated when an individual is talking. The articulators include the lips, teeth, tongue, soft palate and jaw muscles, "whose controlled interplay"— Cain explains — "produces the second factor in determining voice uniqueness is the manner in which the "articulators" or muscles of speech are manipulated when an individual is talking.  The articulators include the lips, teeth, tongue, soft palate and jaw muscles, "whose controlled interplay"— Cain explains — "produces intelligible speech...The likelihood that two persons could develop identical use patterns of their articulators also appears to be very remote."
  • 49.  While Cain agrees that "there is disagreement in the so-called 'scientific community' on the degree of accuracy with which examiners can identify speakers under all conditions, there is agreement that voices can, m fact, be identified."
  • 50.  GMM  For acquiring the results the speech signal is recoded. The system is trained for multiple words such as Samosa, Dosa , Tea etc.  The results for the word Samosa are shown.  The speech signal which is recorded for the word Samosa
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.  Short duration samples are more demanding and should be carefully analysed.  Dissimilarity in the language of questioned and specimen voice samples.  Emotion variability in questioned and specimen sample.  Misspoken or misread prompted phrases.  Poorly recorded/noisy samples are difficult to analyse.  Insufficient number of comparable words.  Disguise in speech samples poses s problem in speaker identification.  Extreme emotional state.  Change in physical state of speaker (e.g. effect of alcohol).  The attitude of how the speech is said by the speaker.  Channel mismatch or mismatch in recording condition.  Different pronunciation speed of the test data compared with the training data.  Speaker’s health.  Aging (the vocal tract can drift away from models with age).
  • 56.  Thus we are able to recognize multiple words such as Samosa, Dosa, Tea and is converted into text by using this paper.  This system is suitable with an environment with less ambient noise.  The system provides good performance with respect to other systems.  It can be concluded that GMM provides more accuracy.  In lieu of the above discussion, it can be inferred that the comparison of voice samples is quite complicated but absolutely possible.  The skill of an examiner itself along with chosen parameters and selection of appropriate technique for identification is largely decisive and can facilitate accurate and conclusive results.  There have been many advancements and success made in this field, however, much remains to be done in order to overpower the daunting limitations which still prevails and limits the process.  If we successfully overcome all such limitations, this technique with its promising features will have an obvious advantage over the pre-existing ones for establishing individual identity
  • 57. 1. C. Champod and D. Meuwly, The inference of identity in forensic speaker recognition, Speech Communication, vol. 31, pp. 193-203, 2000. 2. Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification using Gaussian Mixture Speaker Models. IEEE Transactions on Acoustics, Speech, and Signal Processing 3(1) (1995) 72–83 3. Zetterholm E (2007) Detection of speaker characteristics using voice imitation. Springer Berlin Heidelberg 4441: 192-205. 4. Braun A, Kunzel HJ (1998) Is forensic speaker identification unethical - or can it be unethical not to do it?. forensic linguistics 5: 10-21. 5. Kent RD, Read C (2001) The acoustic analysis of speech. university of Wisconsin- Madison, A.I.T.B.S Publishers and distributors, Delhi. 6. Samudravijaya K (2003) Speech and speaker recognition: a tutorial. Tata institute of fundamental research, Mumbai.
  • 58. 7. YA (2000) A research paper in forensic science. the university of Auckland, New Zealand. 8. Gfroerer S (2003) Auditory-instrumental forensic speaker recognition. Eurospeech, Geneva. 9. Harmegnies B, Landercy A (1988) Intra-speaker variability of the long term speech pattern. Speech communication 7: 81-86. 10. Kekre HB, Sarode TK (2008) Speech data compression using vector quantization. International journal of computer and information science and engineering 2:8. 11. Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time sequential images using hidden markov model. IEEE: 379-385. 12. Abdulla WH, Kasabov NK (1999) The concepts of hidden markov model in speech recognition. Information Science Discussion Papers 99/09, university of Otago, New Zealand: 1-40. 13. Bennani Y, Gallinari P (1995) Neural networks for discrimination and modelization of speakers. Speech communication 17: 159-175. 14. Nakasone H, Beck SD (2001) Forensic automatic speaker identification. paper presented at- a speaker odyssey, Crete, Greece. 15. Zetterholm E (2007) Detection of speaker characteristics using voice imitation. Springer Berlin Heidelberg 4441: 192-205.