This document discusses voice analysis and its applications in forensic science. It covers several key points:
1) Voices convey a great deal of information beyond language, such as the speaker's identity, emotions, and health. This makes voice analysis potentially useful for forensic purposes.
2) Common forensic tasks involving voice analysis include speaker identification, where an unknown voice is compared to voice samples from known individuals, and voice comparison to determine if two recordings come from the same speaker.
3) Automatic speaker recognition techniques use algorithms like Gaussian mixture models and mel-frequency cepstral coefficients to analyze and compare voice recordings without human interpretation. These techniques aim to provide more objective and universal analysis compared to older auditory-based methods.
Automated Fingerprint Identification System (AFIS)Alok Yadav
Automated fingerprint identification is the process of using a computer to match fingerprints against a database of known and unknown prints in the fingerprint identification system.
Tool marks are often found on scene of crime.. this presentation enlights very basic processing of how these marks are being examined by forensic scientists
The task of speaker identification is to determine the identity of a speaker by machine. To recognize the voice, the voices must be familiar in the case of human beings as well as machines.
The objective of speaker identification is to determine the identity of a speaker by machine on the basis of his/her voice. No identity is claimed by the user.
GitHub Link:https://github.com/TrilokiDA/Speaker-Identification-from-Voice
A digital imaging instrument manufactured by Foster + Freeman that employs combinations of light sources and filters to examine document evidence under various wavelengths of radiation ranging from ultraviolet to the infrared regions of the electromagnetic spectrum.
VSC is a preferred tool as it supports non destructive examination of documents. VSC uses multiple parameters like IR, UV, and White light providing accurate results.
Automated Fingerprint Identification System (AFIS)Alok Yadav
Automated fingerprint identification is the process of using a computer to match fingerprints against a database of known and unknown prints in the fingerprint identification system.
Tool marks are often found on scene of crime.. this presentation enlights very basic processing of how these marks are being examined by forensic scientists
The task of speaker identification is to determine the identity of a speaker by machine. To recognize the voice, the voices must be familiar in the case of human beings as well as machines.
The objective of speaker identification is to determine the identity of a speaker by machine on the basis of his/her voice. No identity is claimed by the user.
GitHub Link:https://github.com/TrilokiDA/Speaker-Identification-from-Voice
A digital imaging instrument manufactured by Foster + Freeman that employs combinations of light sources and filters to examine document evidence under various wavelengths of radiation ranging from ultraviolet to the infrared regions of the electromagnetic spectrum.
VSC is a preferred tool as it supports non destructive examination of documents. VSC uses multiple parameters like IR, UV, and White light providing accurate results.
Fingerprint - Everything You Need To Know About FingerprintsSwaroopSonone
A detailed fingerprint presentation. Fingerprint is one of the most important criminal investigation tools due to their two significant features- uniqueness and persistence. The unique features of friction ridge skin persist from before birth, i.e. during fetal development to the decomposition after death...
VSC VIDEO SPECTRAL COMPARATAOR FORENSIC APPLICATIONS BY SHAILESH CHAUBEY STUDENT OF FORENSIC SCIENCE & CRIMINOLOGY FROM BUNDELKHAND UNIVERSITY JHANSI UTTAR PRADESH INDIA . THIS PPT SHOWS ABOUT THE FEATURES, APPLICATIONS , CASE LAWS & NEED OF VSC IN FORENSIC ASPECTS FOR DOCUMENT EXAMINATION & HANDWRITING . THIS PRESENTATION WILL HELP TO GET MORE INFORMATION ABOUT VSC BY VARIOUS SLIDES.
this is used in crime investigators for finding the evidences where there is lack of availability of evidence. some cells that was peeled off from our any parts of body will be seen in the crime scene and it is possible to find these kind of evidence form the crime scene.
Fingerprint - Everything You Need To Know About FingerprintsSwaroopSonone
A detailed fingerprint presentation. Fingerprint is one of the most important criminal investigation tools due to their two significant features- uniqueness and persistence. The unique features of friction ridge skin persist from before birth, i.e. during fetal development to the decomposition after death...
VSC VIDEO SPECTRAL COMPARATAOR FORENSIC APPLICATIONS BY SHAILESH CHAUBEY STUDENT OF FORENSIC SCIENCE & CRIMINOLOGY FROM BUNDELKHAND UNIVERSITY JHANSI UTTAR PRADESH INDIA . THIS PPT SHOWS ABOUT THE FEATURES, APPLICATIONS , CASE LAWS & NEED OF VSC IN FORENSIC ASPECTS FOR DOCUMENT EXAMINATION & HANDWRITING . THIS PRESENTATION WILL HELP TO GET MORE INFORMATION ABOUT VSC BY VARIOUS SLIDES.
this is used in crime investigators for finding the evidences where there is lack of availability of evidence. some cells that was peeled off from our any parts of body will be seen in the crime scene and it is possible to find these kind of evidence form the crime scene.
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECHIJCSEA Journal
Speech is a rich source of information which gives not only about what a speaker says, but also about what the speaker’s attitude is toward the listener and toward the topic under discussion—as well as the speaker’s own current state of mind. Recently increasing attention has been directed to the study of the emotional content of speech signals, and hence, many systems have been proposed to identify the emotional content of a spoken utterance. The focus of this research work is to enhance man machine interface by focusing on user’s speech emotion. This paper gives the results of the basic analysis on prosodic features and also compares the prosodic features
of, various types and degrees of emotional expressions in Tamil speech based on the auditory impressions between the two genders of speakers as well as listeners. The speech samples consist of “neutral” speech as well as speech with three types of emotions (“anger”, “joy”, and “sadness”) of three degrees (“light”, “medium”, and “strong”). A listening test is also being conducted using 300 speech samples uttered by students at the ages of 19 -22 the ages of 19-22 years old. The features of prosodic parameters based on the emotional speech classified according to the auditory impressions of the subjects are analyzed. Analysis results suggest that prosodic features that identify their emotions and degrees are not only speakers’ gender dependent, but also listeners’ gender dependent.
Accents of English have been investigated for many years both from the perspective of native and non-native speakers of the language. Various research results imply that non-native speakers of English language produce certain speech characteristics which are uncommon in native speakers’ speech. This is because non-native speakers do not produce the same tongue movement as native speakers. This paper presents an isolated English word recognition system devised with the speech of local Bangladeshi people, who are also non-native speakers of English language. Here, we have also noticed a different speech characteristic which is not available within the speech of native English speakers. Two acoustic features, ‘pitch’ and ‘formants’ have been utilized to develop the system. The system is speaker-independent and stands on Template based approach. The recognition method applied here is very simple and the recognition accuracy is also very satisfactory.
A Phonetic Forensic Analysis of Imran Khan’s Speeches.pdfFaiz Ullah
The objective of this research was to analyze the speeches made by Al Tools and Imran Khan. Praat played a crucial
role in conducting this analysis. Nowadays, there are numerous fake videos and audios associated with specific
individuals. For instance, speeches made by Al Tools, such as Imran Khan's speech after being imprisoned, were
released.
Deaf Speech Assessment Using Digital Processing Techniquessipij
This paper mainly deals with analysis on acoustical characteristics of speeches of deaf people for the purpose of increasing the speech recognition rate. Since speech to text or sound system for a normal
speaker is available, by designing a speech to text or sound system for deaf, they can make use of all computer aided devices and normal speakers can also communicate with them freely. Fundamental frequency or the pitch frequency of the vocal fold and resonant frequency of the vocal tract or formants are considered for analysis which are the foremost characteristics of speech. Compared to normal speech, there is a high variability in deaf speech and by hearing once we couldn’t understand it. Deaf speech is taken from children in the age group of 5-10 years from Maharishi vidya mandir centre for hearing impaired. Another set of speech were taken from normal speakers for comparison. Initially the input is sampled, filtered, windowed and Pitch frequency is determined for each frame. Similarly first six formants are determined for each frame. The fundamental frequency contour of deaf children exhibit unusual characteristics, and the formants are also very closed. This shows that, pitch and formants cannot be used as features for deaf speech recognition. At the same time, variation in the pitch and formants for deaf is larger than normal speakers it can be used for speaker classification purpose.
Gender voice recognition stands for an imperative research field in acoustics and speech processing as human voice shows very remarkable aspects. This study investigates speech signals to devise a gender classifier by speech analysis to forecast the gender of the speaker by investigating diverse parameters of the voice sample. A database has 2270 voice samples of celebrities, both male and female. Through Mel frequency cepstrum coefficient (MFCC), vector quantization (VQ), and machine learning algorithm (J 48), an accuracy of about 100% is achieved by the proposed classification technique based on data mining and Java script.
Abstract: Speech technology and systems in human computer interaction have witnessed a stable and remarkable advancement over the last two decades. Today, speech technologies are commercially available for an unlimited but interesting range of tasks. These technologies enable machines to respond correctly and reliably to human voices, and provide useful and valuable services. This thesis presents the characteristics of emotion in voice and on that basis propose a new method to detecting emotion in a simplified way by using a prosodic features and spectral from speech. We classify seven emotions: happy, anger, fear, disgust, sadness and neutral inner state. This thesis discusses the method to extract features from a recorded speech sample, and using those features, to detect the emotion of the subject. Every emotion comprises different vocal parameters exhibiting diverse characteristics of speech, which is used for preliminary classification. Then Mel-Frequency Cepstrum Coefficient (MFCC) method was used to extract spectral features. The MFCC coefficients were again trained by Artificial Neural Network (ANN) which then classifies the input in particular emotional class.
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...indexPub
Main fundamental challenge for recent research work on speech based on science and technology is to understand and model the user variants in Spoken Languages. Users have their style of speaking, reliant on various factors, adding the dialect and accent of the speaker as well as the social and economic background of the speaker and contextual attributes like degree of knowledge between the listener, speaker and the position or rank of the speaking condition, from very normal to formal. In the past few decades, an extensive progress has been seen in automatically verifying the language of a speaker offered a sample speech. The main purpose of dialect verification is the recognition of a speaker’s region dialect, within a pre-determined language, offered the acoustic signal alone. DR (Dialect Recognition) is a main issue in particular, since even within the similar dialect and accent or register user change may occur. For illustration, In Spontaneous speech, few speakers tend to exhibit more optimizing and alteration of function words than others. The main issue of dialect recognition system has been viewed as challenging than that of language classification or recognition due to the maximum similarity among dialects of the similar language. While, dialects may differ in any dimensions of the linguistic spectrum such as syntactic, lexical, morphological, phonological differences, these changes are likely to be more indirect across dialects than those across languages such as Hindi, Punjabi and English etc.
Novel cochlear filter based cepstral coefficients for classification of unvoi...ijnlc
In this paper, the use of new auditory-based features derived from cochlear filters, have been proposed for
classification of unvoiced fricatives. Classification attempts have been made to classify sibilant (i.e., /s/,
/sh/) vs. non-sibilants (i.e., /f/, /th/) as well as for fricatives within each sub-category (i.e., intra-sibilants
and intra-non-sibilants). Our experimental results indicate that proposed feature set, viz., Cochlear Filterbased
Cepstral Coefficients (CFCC) performs better for individual fricative classification (i.e., a jump of
3.41 % in average classification accuracy and a fall of 6.59 % in EER) in clean conditions than the stateof-
the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC). Furthermore, under signal
degradation conditions (i.e., by additive white noise) classification accuracy using proposed feature set
drops much slowly (i.e., from 86.73 % in clean conditions to 77.46 % at SNR of 5 dB) than by using MFCC
(i.e., from 82.18 % in clean conditions to 46.93 % at SNR of 5 dB).
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Delivering Micro-Credentials in Technical and Vocational Education and TrainingAG2 Design
Explore how micro-credentials are transforming Technical and Vocational Education and Training (TVET) with this comprehensive slide deck. Discover what micro-credentials are, their importance in TVET, the advantages they offer, and the insights from industry experts. Additionally, learn about the top software applications available for creating and managing micro-credentials. This presentation also includes valuable resources and a discussion on the future of these specialised certifications.
For more detailed information on delivering micro-credentials in TVET, visit this https://tvettrainer.com/delivering-micro-credentials-in-tvet/
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
2. A voice is more than just a string of sounds. Voices are
inherently complex.
They signal a great deal of information in addition to
the intended linguistic message: the speaker’s sex, for
example, or their emotional state or state of health.
Some of this information is clearly of potential forensic
importance.
However, the different types of information conveyed
by a voice are not signalled in separate channels, but
are convolved together with the linguistic message.
Knowledge of how this occurs is necessary to interpret
the ubiquitous variation in speech, and to assess the
comparability of speech samples.
3. Speaker’s identification is the process of
determining whether two or more recordings of
speech are from the same speaker.
Speaker identification can be very effective,
contributing to both conviction and elimination of
suspect. In this task, a voice print of an unknown
speaker is analysed and then compared with
speech samples of known speakers.
The unknown speaker is identified as the speaker
whose model best matches the input model; it is
the identification of a person from characteristics of
voices.
4. It is the process of automatically recognising
who is speaking by using the speaker specific
information included in the speech waves to
verify identities being claimed by people
accessing systems i.e.; it enables access control
of various services by voice.
Applicable services include voice dialling,
banking over a telephone network, telephone
shopping, database access network,
information and reservation services, voice
mail, security control for confidential
information and remote access to computers.
Another important application of speaker
recognition technology is as a forensic tool.
5. Speaker identification in the forensic context is
usually about comparing voices.
Probably the most common task involves the
comparison of one or more samples of an
offender’s voice with one or more samples of a
suspect’s voice.
Voices are important things for humans. They
are the medium through which we do a lot of
communicating with the outside world: our
ideas, of course, but also our emotions and our
personality.
6. Voices are also one of the media through which we
(successfully, most of the time) recognise other
humans who are important to us – members of our
family, media personalities, our friends and
enemies.
Although evidence from DNA analysis is
potentially vastly more eloquent in its power than
evidence from voices, DNA can’t talk.
It can’t be recorded planning, carrying out or
confessing to a crime. It can’t be so apparently
directly incriminating.
Perhaps it is these features that contribute to the
interest and importance of FSI.
7. Voices are extremely complex things, and some
of the inherent limitations of the forensic-
phonetic method are in part a consequence of
the interaction between their complexity and
the real world in which they are used.
It is one of the aims of this paper to explain
how this comes about.
8. The basic ideas which we will be focussing
over here are like; What speech sounds are like,
What is a voice? Forensic speaker
identification, voice comparison, Forensic-
phonetic speaker identification etc.
9. The most common task in forensic speaker
identification involves the comparison of one
or more samples of an unknown voice
(sometimes called the questioned sample) with
one or more samples of a known voice.
Often the unknown voice is that of the
individual alleged to have committed an
offence (hereafter called the offender) and the
known voice belongs to the suspect.
10. Both prosecution and defence are then
concerned with being able to say whether the
two samples have come from the same person,
and thus being able either to identify the
suspect as the offender or to eliminate them
from suspicion.
Sometimes it is important to be able to attach a
voice to an individual, or not, irrespective of
questions of guilt.
11. In order to tell whether the same voice is
present in two or more speech samples, it must
be possible to tell the difference between, or
discriminate between voices.
Put more accurately, it must be possible to
discriminate between samples from the voice of
the same speaker and samples from the voices
of different speakers.
So identification in this sense is the secondary
result of a process of discrimination.
12. The suspect may be identified as the offender
to the extent that the evidence supports the
hypothesis that questioned and suspect
samples are from the same voice.
If not, no identification results.
In this regard, therefore, the identification in
forensic speaker identification is somewhat
imprecise.
13. In criminalistics, the identification process
seeks individualisation.
identifying a person or an object means that it
is possible to distinguish this person or object
from all others on the surface of the Earth.
The forensic individualisation process can be
seen as a reduction process beginning from an
initial population to a single person.
14. Recently, an investigation concerning the
inference of identity in forensic speaker
recognition has shown the inadequacy of the
main solutions proposed to assess the evidence
in this field.
The concept of identity underlying the
verification and the identification tasks does
not correspond to the concept of identity
accepted in forensic science.(C Cham pod, et
al., 2000)
15. Speaker verification is the other common task
in speaker recognition.
This is where ‘an identity claim from an
individual is accepted or rejected by comparing
a sample of his speech against a stored
reference sample by the individual whose
identity he is claiming’
16. The aim of speaker identification is, not
surprisingly, identification: ‘to identify an
unknown voice as one or none of a set of known
voices’.
One has a speech sample from an unknown
speaker, and a set of speech samples from different
speakers the identity of whom is known.
The task is to compare the sample from the
unknown speaker with the known set of samples,
and determine whether it was produced by any of
the known speakers.
17.
18. In speaker identification, the reference set of
known speakers can be of two types: closed or
open.
This distinction refers to whether the set is known
to contain a sample of the unknown voice or not.
A closed reference set means that it is known that
the owner of the unknown voice is one of the
known speakers.
An open set means that it is not known whether the
owner of the unknown voice is present in the
reference set or not.
19. MFCC (Mel-Frequency Cepstral Coefficients )
The most easiest and prevalent method to
extract spectral features is calculating the Mel-
Frequency Cepstral Coefficients (MFCC) from
human voice.
It is one of the most popular methods of feature
extraction used in speech recognition systems.
It is based on frequency domain using the Mel
scale which is based on the human ear scale.
20. Time domain features are less accurate than the
frequency domain features. The main aim of
feature extraction is to reduce the size of the
speech signal before the recognition of the
signal.
Steps involved in feature extraction are pre-
emphasis, framing, windowing, fast fourier
transform, Mel-frequency filtering, Logarithmic
function and Discrete Cosine Transform etc.(
Douglas A, et al., 1995)
21.
22. The first step in MFCC is pre-emphasis which is
used to boost the high frequencies of a speech
signal which are lost during speech production.
Pre-emphasis is needed because high frequency
components of the speech signal have small
amplitude with respect to low frequency
components. Therefore higher frequencies are
artificially boosted in order to increase the signal-
to- noise ratio.
Next, is framing which is used to block the frames
obtained by analog to digital conversion (ADC) of
speech signal.
23. The number of samples in each frame is chosen
as 256 and the number of samples overlapping
between adjacent frames is 128.
Overlapping frames are used to acquire the
information from the boundaries of the frame.
Due to discontinuities at the start and the end
of the frame causes undesirable effects in the
frequency response, so windowing is used to
eliminate the discontinuities at the edges.
24. In the discipline of speaker recognition a wide
range of methods and procedures are adopted
by the experts for identification.
25. Such type of analysis involves a group of trained
phoneticians giving their judgement regarding the
similarity and dissimilarity between the two
speech events, after hearing the samples again and
again to find out some similarities in their
linguistic, phonetic and acoustic features.
Human listeners are robust speaker recognizers
when presented with the degraded speech.
Listener performance free from all types of
limitations like the signal to noise ratio, speech
bandwidth, the amount of speech material,
distortions occurring in the speech signals as a
result of speech coding, transmission systems, etc.
26. In this technique, different utterances of the
speakers are segregated in respect of each speaker
by way of repeated listening of recorded
conversation.
The segregated conversations of each speaker are
repeatedly heard to identify linguistic features and
phonetic features like articulation rate, flow of
speech, degree of vowels and consonant formation,
rhythm, striking time, pauses etc.
There are cues in voice and speech behaviour,
which are individual and thus make it possible to
recognize the familiar voices.
27. This involves the semi-automatic
measurements of particular acoustic speech
parameters such as vowel formants,
articulation rate, which is sometimes combined
with the results of auditory phonetic analysis
by a human expert.
In 1941, an electro mechanical acoustic
spectrograph was developed by Dr. Raleph
Potter, Bell Telephone
28. Laboratory, with an idea to convert sounds into
pictures. (Kent RD, Read C 2001) A sound
spectrograph is an instrument which is able to
give a permanent record of changing energy-
frequency distribution throughout the time of a
speech wave.
The spectrograms are the graphic displays of
the amplitude as a function of both frequency
and time.
29. Examiners visually inspect and compare
similarities or differences of patterns of the energy
distribution in the spectrograms.
It is generally believed that formant structures and
other spectral characteristics which are evident
from a spectrogram are unique for each individual.
The most widely used features are fundamental
frequencies, formant bandwidths, formant
frequencies, spectral composition of fricatives and
plosives for individual segments, and transitions.
30. However, the main drawback of this voiceprint analysis is that the
spectrograms of the speech signal from same individual will show large
intra speaker variations, because of the fact that no speaker actually is
capable of producing two identical speech utterances(Gfroerer S 2003).
This method is obviously neither objective nor superior to aural-
perceptual methods; it is basically a shifting of subjective judgement to
the visual domain.
The objectivity, reliability and validity of the method have been discussed
controversially.
The method has been widely used in the US, parts of Europe and other
countries until the 1980s but in the present scenario it has been losing its
ground.
The FBI are using it for investigative purposes, most U.S. courts do not
accept voiceprint evidence.
Today voiceprint identification is not used in forensic labs in Europe, but
still practised in developing countries like China, Vietnam etc.
31. This approach differs greatly from the earlier
methods used for identification as it is both
universal as well as automatic.
It is considered universal because it does not focus
on specific acoustic parameters and consider the
speech as a continuously varying complex wave or
signal.
While, it’s automatic nature reduces the subjective
evaluation of any speech material to minimum.
Most of such automatic identification system
today involves techniques like:
32. The Gaussian Mixture Model(GMM) is a
parametric probability density function which
is represented as a weighted sum of Gaussian
component densities.
It is used as a parametric model of probability
distribution of measuring features in biometric
systems.
Gaussian Mixture Model(GMM) is used as a
classifier to compare the features extracted
from the MFCC with the stored templates
33.
34. The long- term speech spectrum is used as an important cue
of determining the voice quality . In this technique, large
number of feature vectors is collected for each known
speaker.
The average and variance of each component of the feature
vector are calculated, and vector of mean value, and vector
of the variances, is used to model each speaker.
A similar model is made for the unknown speaker.
This technique is most useful for text independent
recognition, where large amount of data is required for
construction of the speaker’s model.
This method will not be beneficial if the utterances are too
short and if contains the insufficient amount of data.
35. The major disadvantage of long-term
averaging is that each speaker’s model consists
of a single cluster of data represented by an
average and variance vector.
If the data contain multiple clusters of vectors,
the variance will be very high. Since human
speech is composed primarily of vowels, it is
natural to expect feature vectors to form
clusters, each one based on the pronunciation
of a specific vowel
36. This is a technique in which each speaker’s model is
prepared which consists of several clusters of data, along
with their centroids.
VQ reduces these sets of vectors to a codebook, which
provides an efficient way of building and comparing models
of speakers . VQ is used in several ways in speaker
recognition.
In some systems it is used simply to compress data. In other
systems, VQ is a preprocessing step for other methods such
as HMMs.
For text-dependent identification and verification several
codebooks are created or “trained” for each speaker, who
speaks a prescribed text several times.
These codebooks are considered as the speaker’s template.
During the operational phase the same prescribed text is
spoken by the unknown person.
37. The comparison is done on the basis of
observed differences or similarities between the
unknown person’s template, and each trained
template, after removing the variations in the
speaking rate.
For text-independent speaker recognition a
single codebook is created for each speaker.
The codebook is considered as an accurate
created for each speaker.
38. The codebook is considered as an accurate model
of the speaker because it is formed from a much
larger amount of speech than in the text-dependent
case.
This method introduces a new factor affecting the
performance of the system, which is code-book
size. Larger codebooks will perform a better job of
characterizing a speaker’s voice, but these results
in increased computational expenses and the
danger of not producing results in real time, which
is a significant factor for verification.
The advantage of this method is that it requires
only a small amount of data to create a speaker’s
model without causing any loss to the accuracy.
39. The phenomenon of tendering tape recorded
conversation before law courts as evidence,
particularly in cases arising under the Prevention of
Corruption Act, where such conversation is recorded
by sending the complainant with a recording device to
the person demanding or offering bribe has almost
become a common practice now.
In civil cases also parties may rely upon tape records
of relevant conversation to support their version.
In such cases the court has to face various questions
regarding admissibility, nature and evidentiary value
of such a tape- recorded conversation.
40. The Indian Evidence Act, prior to its being
amended by the Information Technology Act,
2000, mainly dealt with evidence, which was in
oral or documentary form.
Nothing was there to point out about the
admissibility, nature and evidentiary value of a
conversation or statement recorded in an electro-
magnetic device.
Being confronted with the question of this nature
and called upon to decide the same, the law courts
in India as well as in England devised and
developed principles so that such evidence must
be received in law courts and acted upon. (Adv KC
Suresh 2011)
41. In India at Chandigarh Forensic Science
laboratories voice identification techniques are
regularly conducted and the Supreme Court has
held that voice identification data is admissible in
court.
In India at Bangalore, SRC Institute of Speech and
Hearing has the facility for voice analysis.
The All India Institute of Speech and Hearing,
Mysore, which has been working in the field for
many years now, even wants to start a one-year
PG Diploma course in forensic voice analysis.
42. The Michigan state police set up a voice
identification unit in 1966. Sound spectrograph
evidence was first admitted into a court in 1967
during a military trial (court-martial), United
States v. Wright.
Judge Ferguson wrote a lengthy dissent, saying
that voice identification by sound spectrograph
did not meet the Frye standard of general
acceptance by the scientific community.(Lisa
Yount 2007)
The first reported application of the voiceprint
technique in a criminal proceeding occurred in the
1966 case of People v. Straehle.
43. The defendant, a police officer, had telephoned
the operator of an illicit gambling enterprise to
warn him of an impending police raid.
Later, during a grand jury inquiry, the police
officer denied making the call.
At the ensuing perjury trial, the prosecution
introduced voiceprints of the telephone calls
and sample voiceprints of the defendant's
voice, supported by the expert opinion of
Lawrence Kersta that all recordings were of the
defendant's voice.( John F Decker, et al., 1977)
44. In 1976 the New York Supreme Court pointed out, in
the case of People v. Rogers, that fifty different trial
courts had admitted spectrographic voice identification
evidence, as had fourteen out of fifteen U. S. District
Court judges, and only two out of thirty- seven states
considering the issue had rejected admission.
The Rogers court stated that this technique, when
accompanied by aural examination and conducted by a
qualified examiner, had now reached the level of
general scientific acceptance by those who would be
expected to be familiar with its use, and as such, has
reached the level of scientific acceptance and reliability
necessary for admission. (Adv KC Suresh 2011).
45. The lead story from Washington Post this
morning is regarding a recording that was thought
to be Donald Trump.
Trump denied the recording was his voice.
Primeau Forensics was asked by the media to
perform a forensic voice identification test to
determine if the unknown voice in the Washington
Post story features the voice of Donald
Trump.Primeau Forensics located a C-Span
interview from 1991 titled ‘Donald Trump on
Economic Recovery’.
We chose this recording as the ‘known’ Donald
Trump voice for forensic comparison.
46. We chose this older voice sample because it
was closer in time to the ‘unknown’ recording.
The biometric software program that we used
is a Speech Pro Product titled ‘SIS 2’.
We formatted each speech sample based on
training received from Owen Forensic Services
and loaded them into the biometric software.
The result was a 98% mismatch meaning the
‘unknown’ voice recording that surfaced in the
Washington Post today is NOT the voice of
Donald Trump.
47. As Cain explained in an article he wrote for the
Criminal Division of the U.S. Department of Justice —
in collaboration with Lonnie Smrkovski, chief of the
voiceprint unit of the Michigan State Police and Mindy
Wilson, a psychologist and private examiner practicing
in Lansing, Michigan — the fundamental principle of
voice identification rests on the fact that like a
fingerprint, every voice is unique and "individually
characteristic enough to distinguish it from others
through...analysis”.
Fingerprints are identified through literal analysis;
voices are identified through comparative voiceprints.
Cain points out that uniqueness in human speech is the
product of two general factors.
48. "The first," he says, "lies in the sizes of the vocal cavities such as the
throat, nasal and oral cavities and the shape, length and tension in an
individual's vocal cords located in the larynx. The vocal cavities are
resonators, much like organ pipes, which reinforce some of the overtones
produced by the vocal cords, which produce formats or voiceprint bars.
The likelihood that two people would have exactly the same size and
configuration (is) very remote."
The second factor in determining voice uniqueness is the manner in
which the "articulators" or muscles of speech are manipulated when an
individual is talking. The articulators include the lips, teeth, tongue, soft
palate and jaw muscles, "whose controlled interplay"— Cain explains —
"produces the second factor in determining voice uniqueness is the
manner in which the "articulators" or muscles of speech are manipulated
when an individual is talking.
The articulators include the lips, teeth, tongue, soft palate and jaw
muscles, "whose controlled interplay"— Cain explains — "produces
intelligible speech...The likelihood that two persons could develop
identical use patterns of their articulators also appears to be very remote."
49. While Cain agrees that "there is disagreement
in the so-called 'scientific community' on the
degree of accuracy with which examiners can
identify speakers under all conditions, there is
agreement that voices can, m fact, be
identified."
50. GMM
For acquiring the results the speech signal is
recoded. The system is trained for multiple
words such as Samosa, Dosa , Tea etc.
The results for the word Samosa are shown.
The speech signal which is recorded for the
word Samosa
51.
52.
53.
54.
55. Short duration samples are more demanding and should be carefully
analysed.
Dissimilarity in the language of questioned and specimen voice samples.
Emotion variability in questioned and specimen sample.
Misspoken or misread prompted phrases.
Poorly recorded/noisy samples are difficult to analyse.
Insufficient number of comparable words.
Disguise in speech samples poses s problem in speaker identification.
Extreme emotional state.
Change in physical state of speaker (e.g. effect of alcohol).
The attitude of how the speech is said by the speaker.
Channel mismatch or mismatch in recording condition.
Different pronunciation speed of the test data compared with the training
data.
Speaker’s health.
Aging (the vocal tract can drift away from models with age).
56. Thus we are able to recognize multiple words such as Samosa, Dosa, Tea
and is converted into text by using this paper.
This system is suitable with an environment with less ambient noise.
The system provides good performance with respect to other systems.
It can be concluded that GMM provides more accuracy.
In lieu of the above discussion, it can be inferred that the comparison of
voice samples is quite complicated but absolutely possible.
The skill of an examiner itself along with chosen parameters and selection
of appropriate technique for identification is largely decisive and can
facilitate accurate and conclusive results.
There have been many advancements and success made in this field,
however, much remains to be done in order to overpower the daunting
limitations which still prevails and limits the process.
If we successfully overcome all such limitations, this technique with its
promising features will have an obvious advantage over the pre-existing
ones for establishing individual identity
57. 1. C. Champod and D. Meuwly, The inference of identity in forensic
speaker recognition, Speech Communication, vol. 31, pp. 193-203,
2000.
2. Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker
Identification using Gaussian Mixture Speaker Models. IEEE
Transactions on Acoustics, Speech, and Signal Processing 3(1) (1995)
72–83
3. Zetterholm E (2007) Detection of speaker characteristics using voice
imitation. Springer Berlin Heidelberg 4441: 192-205.
4. Braun A, Kunzel HJ (1998) Is forensic speaker identification
unethical - or can it be unethical not to do it?. forensic linguistics 5:
10-21.
5. Kent RD, Read C (2001) The acoustic analysis of speech. university
of Wisconsin- Madison, A.I.T.B.S Publishers and distributors, Delhi.
6. Samudravijaya K (2003) Speech and speaker recognition: a tutorial.
Tata institute of fundamental research, Mumbai.
58. 7. YA (2000) A research paper in forensic science. the university of Auckland,
New Zealand.
8. Gfroerer S (2003) Auditory-instrumental forensic speaker recognition.
Eurospeech, Geneva.
9. Harmegnies B, Landercy A (1988) Intra-speaker variability of the long
term speech pattern. Speech communication 7: 81-86.
10. Kekre HB, Sarode TK (2008) Speech data compression using vector
quantization. International journal of computer and information science
and engineering 2:8.
11. Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time
sequential images using hidden markov model. IEEE: 379-385.
12. Abdulla WH, Kasabov NK (1999) The concepts of hidden markov model
in speech recognition. Information Science Discussion Papers 99/09,
university of Otago, New Zealand: 1-40.
13. Bennani Y, Gallinari P (1995) Neural networks for discrimination and
modelization of speakers. Speech communication 17: 159-175.
14. Nakasone H, Beck SD (2001) Forensic automatic speaker identification.
paper presented at- a speaker odyssey, Crete, Greece.
15. Zetterholm E (2007) Detection of speaker characteristics using voice
imitation. Springer Berlin Heidelberg 4441: 192-205.