This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
Presentation slides discussing the theory and empirical results of a text-independent speaker verification system I developed based upon classification of MFCCs. Both mininimum-distance classification and least-likelihood ratio classification using Gaussian Mixture Models were discussed.
Text-Independent Speaker Verification ReportCody Ray
Provides an introduction to the task of speaker recognition, and describes a not-so-novel speaker recognition system based upon a minimum-distance classification scheme. We describe both the theory and practical details for a reference implementation. Furthermore, we discuss an advanced technique for classification based upon Gaussian Mixture Models (GMM). Finally, we discuss the results of a set of experiments performed using our reference implementation.
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...gt_ebuddy
Joint Speech and Speaker Recognition using Hidden Markov Model/Vector Quantization for speaker independent Speech Recognition and Gaussian Mixture Model for speech independent speaker recognition- used MFCC (Mel-Frequency Cepstral Coefficient) for Feature Extraction (delta,delta delta and energy - 39 coefficients).
Developed in JAVA with client/server Architecture, web interface developed in Adobe Flex.
This project was done at TU, IOE - Pulchowk Campus, Nepal.
For more details visit http://ganeshtiwaridotcomdotnp.blogspot.com
ABSTRACT OF PROJECT>>>
Biometric is physical characteristic unique to each individual. It has a very useful application in authentication and access control.
The designed system is a text-prompted version of voice biometric which incorporates text-independent speaker verification and speaker-independent speech verification system implemented independently. The foundation for this joint system is that the speech signal conveys both the speech content and speaker identity. Such systems are more-secure from playback attack, since the word to speak during authentication is not previously set.
During the course of the project various digital signal processing and pattern classification algorithms were studied. Short time spectral analysis was performed to obtain MFCC, energy and their deltas as feature. Feature extraction module is same for both systems. Speaker modeling was done by GMM and Left to Right Discrete HMM with VQ was used for isolated word modeling. And results of both systems were combined to authenticate the user.
The speech model for each word was pre-trained by using utterance of 45 English words. The speaker model was trained by utterance of about 2 minutes each by 15 speakers. While uttering the individual words, the recognition rate of the speech recognition system is 92 % and speaker recognition system is 66%. For longer duration of utterance (>5sec) the recognition rate of speaker recognition system improves to 78%.
Presentation slides discussing the theory and empirical results of a text-independent speaker verification system I developed based upon classification of MFCCs. Both mininimum-distance classification and least-likelihood ratio classification using Gaussian Mixture Models were discussed.
Text-Independent Speaker Verification ReportCody Ray
Provides an introduction to the task of speaker recognition, and describes a not-so-novel speaker recognition system based upon a minimum-distance classification scheme. We describe both the theory and practical details for a reference implementation. Furthermore, we discuss an advanced technique for classification based upon Gaussian Mixture Models (GMM). Finally, we discuss the results of a set of experiments performed using our reference implementation.
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...gt_ebuddy
Joint Speech and Speaker Recognition using Hidden Markov Model/Vector Quantization for speaker independent Speech Recognition and Gaussian Mixture Model for speech independent speaker recognition- used MFCC (Mel-Frequency Cepstral Coefficient) for Feature Extraction (delta,delta delta and energy - 39 coefficients).
Developed in JAVA with client/server Architecture, web interface developed in Adobe Flex.
This project was done at TU, IOE - Pulchowk Campus, Nepal.
For more details visit http://ganeshtiwaridotcomdotnp.blogspot.com
ABSTRACT OF PROJECT>>>
Biometric is physical characteristic unique to each individual. It has a very useful application in authentication and access control.
The designed system is a text-prompted version of voice biometric which incorporates text-independent speaker verification and speaker-independent speech verification system implemented independently. The foundation for this joint system is that the speech signal conveys both the speech content and speaker identity. Such systems are more-secure from playback attack, since the word to speak during authentication is not previously set.
During the course of the project various digital signal processing and pattern classification algorithms were studied. Short time spectral analysis was performed to obtain MFCC, energy and their deltas as feature. Feature extraction module is same for both systems. Speaker modeling was done by GMM and Left to Right Discrete HMM with VQ was used for isolated word modeling. And results of both systems were combined to authenticate the user.
The speech model for each word was pre-trained by using utterance of 45 English words. The speaker model was trained by utterance of about 2 minutes each by 15 speakers. While uttering the individual words, the recognition rate of the speech recognition system is 92 % and speaker recognition system is 66%. For longer duration of utterance (>5sec) the recognition rate of speaker recognition system improves to 78%.
Isolated words recognition using mfcc, lpc and neural networkeSAT Journals
Abstract Automatic speech recognition is an important topic of speech processing. This paper presents the use of an Artificial Neural Network (ANN) for isolated word recognition. The Pre-processing is done and voiced speech is detected based on energy and zero crossing rates (ZCR). The proposed approach used in speech recognition is Mel Frequency Cepstral Coefficients (MFCC) and combine features of both MFCC and Linear Predictive Coding (LPC). The back-propagation is used as a classifier. The recognition accuracy is increased when combine features of both LPC and MFCC are used as compared to only MFCC approach using Neural Network as a classifier.. Keywords: Pre-processing, Mel frequency Cepstral Coefficient (MFCC), Linear Predictive Coding (LPC), Artificial Neural Network (ANN).
The following resources come from the 2009/10 B.Sc in Media Technology and Digital Broadcast (course number 2ELE0076) from the University of Hertfordshire. All the mini projects are designed as level two modules of the undergraduate programmes.
The task of speaker identification is to determine the identity of a speaker by machine. To recognize the voice, the voices must be familiar in the case of human beings as well as machines.
The objective of speaker identification is to determine the identity of a speaker by machine on the basis of his/her voice. No identity is claimed by the user.
GitHub Link:https://github.com/TrilokiDA/Speaker-Identification-from-Voice
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnijcsa
In this paper we present text dependent speaker recognition with an enhancement of detecting the emotion
of the speaker prior using the hybrid FFBN and GMM methods. The emotional state of the speaker
influences recognition system. Mel-frequency Cepstral Coefficient (MFCC) feature set is used for
experimentation. To recognize the emotional state of a speaker Gaussian Mixture Model (GMM) is used in
training phase and in testing phase Feed Forward Back Propagation Neural Network (FFBNN). Speech
database consisting of 25 speakers recorded in five different emotional states: happy, angry, sad, surprise
and neutral is used for experimentation. The results reveal that the emotional state of the speaker shows a
significant impact on the accuracy of speaker recognition.
Isolated words recognition using mfcc, lpc and neural networkeSAT Journals
Abstract Automatic speech recognition is an important topic of speech processing. This paper presents the use of an Artificial Neural Network (ANN) for isolated word recognition. The Pre-processing is done and voiced speech is detected based on energy and zero crossing rates (ZCR). The proposed approach used in speech recognition is Mel Frequency Cepstral Coefficients (MFCC) and combine features of both MFCC and Linear Predictive Coding (LPC). The back-propagation is used as a classifier. The recognition accuracy is increased when combine features of both LPC and MFCC are used as compared to only MFCC approach using Neural Network as a classifier.. Keywords: Pre-processing, Mel frequency Cepstral Coefficient (MFCC), Linear Predictive Coding (LPC), Artificial Neural Network (ANN).
The following resources come from the 2009/10 B.Sc in Media Technology and Digital Broadcast (course number 2ELE0076) from the University of Hertfordshire. All the mini projects are designed as level two modules of the undergraduate programmes.
The task of speaker identification is to determine the identity of a speaker by machine. To recognize the voice, the voices must be familiar in the case of human beings as well as machines.
The objective of speaker identification is to determine the identity of a speaker by machine on the basis of his/her voice. No identity is claimed by the user.
GitHub Link:https://github.com/TrilokiDA/Speaker-Identification-from-Voice
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnijcsa
In this paper we present text dependent speaker recognition with an enhancement of detecting the emotion
of the speaker prior using the hybrid FFBN and GMM methods. The emotional state of the speaker
influences recognition system. Mel-frequency Cepstral Coefficient (MFCC) feature set is used for
experimentation. To recognize the emotional state of a speaker Gaussian Mixture Model (GMM) is used in
training phase and in testing phase Feed Forward Back Propagation Neural Network (FFBNN). Speech
database consisting of 25 speakers recorded in five different emotional states: happy, angry, sad, surprise
and neutral is used for experimentation. The results reveal that the emotional state of the speaker shows a
significant impact on the accuracy of speaker recognition.
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
Voice Identification And Recognition System, MatlabSohaib Tallat
A simple yet complex approach to modern sophistication.
Made this project using the MFCC approach and then embedding the code to a Graphical User Interface. In the end made a standalone application for the program using deployment tools of matlab
Voice recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.
This document describes how to build a simple, yet complete and representative automatic speaker recognition system. Such a speaker recognition system has potential in many security applications. For example, users have to speak a PIN (Personal Identification Number) in order to gain access to the laboratory door, or users have to speak their credit card number over the telephone line to verify their identity. By checking the voice characteristics of the input utterance, using an automatic speaker recognition system similar to the one that we will describe, the system is able to add an extra level of security.
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
Extraction and selection of the best parametric representation of acoustic signal is the most important task in designing any speaker recognition system. A wide range of possibilities exists for parametrically representing the speech signal such as Linear Prediction Coding (LPC) ,Mel frequency Cepstrum coefficients (MFCC) and others. MFCC are currently the most popular choice for any speaker recognition system, though one of the shortcomings of MFCC is that the signal is assumed to be stationary within the given time frame and is therefore unable to analyze the non-stationary signal. Therefore it is not suitable for noisy speech signals. To overcome this problem several researchers used different types of AM-FM modulation/demodulation techniques for extracting features from speech signal. In some approaches it is proposed to use the wavelet filterbanks for extracting the features. In this paper a technique for extracting the features by combining the above mentioned approaches is proposed. Features are extracted from the envelope of the signal and then passed through wavelet filterbank. It is found that the proposed method outperforms the existing feature extraction techniques.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
Abstract : This paper discusses the methodology for a project named “Speech Recognized Automation System using Speaker Identification through wireless communication”. This project gives the design of Automation system using wireless communication and speaker recognition using Matlab code. Straightforward programming interface of Matlab makes it an ideal tool for speech analysis in project. This automation system is useful for home appliances as well as in industry. This paper discusses the overall design of a wireless automation system which is built and implemented. The speech recognition centers on recognition of speech commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel Frequency Cepstral Coefficient (MFCC) algorithm is used to recognize the speech of speaker and to extract features of speech. It uses low-power RF ZigBee transceiver wireless communication modules which are relatively cheap. This automation system is intended to control lights, fans and other electrical appliances in a home or office using speech commands like Light, Fan etc. Further, if security is not big issue then Speech processor is used to control the appliances without speaker identification. Keywords — Automation system, MATLAB code, MFCC, speaker identification, ZigBee transceiver.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
This paper discusses the methodology for a project named “Speech Recognized Automation System
using Speaker Identification through wireless communication”. This project gives the design of Automation
system using wireless communication and speaker recognition using Matlab code. Straightforward
programming interface of Matlab makes it an ideal tool for speech analysis in project. This automation system
is useful for home appliances as well as in industry. This paper discusses the overall design of a wireless
automation system which is built and implemented. The speech recognition centers on recognition of speech
commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel
Frequency Cepstral Coefficient (MFCC) algorithm is used to recognize the speech of speaker and to extract
features of speech. It uses low-power RF ZigBee transceiver wireless communication modules which are
relatively cheap. This automation system is intended to control lights, fans and other electrical appliances in a
home or office using speech commands like Light, Fan etc. Further, if security is not big issue then Speech
processor is used to control the appliances without speaker identification
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
In this paper, improvement of an ASR system for
Hindi language, based on Vector quantized MFCC as feature
vectors and HMM as classifier, is discussed. MFCC features
are usually pre-processed before being used for recognition.
One of these pre-processing is to create delta and delta-delta
coefficients and append them to MFCC to create feature vector.
This paper focuses on all digits in Hindi (Zero to Nine), which
is based on isolated word structure. Performance of the system
is evaluated by accurate Recognition Rate (RR). The effect of
the combination of the Delta MFCC (DMFCC) feature along
with the Delta-Delta MFCC (DDMFCC) feature shows
approximately 2.5% further improvement in the RR, with no
additional computational costs involved. RR of the system for
the speakers involved in the training phase is found to give
better recognition accuracy than that for the speakers who
were not involved in the training phase. Word wise RR is
observed to be good in some digits with distinct phones.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
One of the common and easier techniques of feature extraction is Mel Frequency Cestrum Coefficient (MFCC) which allows the signals to extract the feature vector. It is used by Dynamic Feature Extraction and provide high performance rate when compared to previous technique like LPC. But one of the major drawbacks in this technique is robustness. Another feature extraction technique is Relative Spectral (RASTA). In effect the RASTA filter band passes each feature coefficient and in both the log spectral and the Spectral domains appear linear channel distortions as an additive constant. The high-pass portions of the equivalent band pass filter effect the convolution noise introduced in the channel. The low-pass filtering helps in smoothing frame to frame spectral changes. Compared to MFCC feature extraction technique, RASTA filtering reduces the impact of the noise in signals and provides high robustness
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical, text-independent speaker recognition. Our baseline speaker recognition system accuracy, built using statistical modeling techniques, gives an accuracy of 81% on the standard MIT database and our baseline gender recognition system gives an accuracy of 93.795%. We then propose and implement a novel state-space pruning technique by performing gender recognition before speaker recognition so as to improve the accuracy/timeliness of our baseline speaker recognition system. Based on the experiments conducted on the MIT database, we demonstrate that our proposed system improves the accuracy over the baseline system by approximately 2%, while reducing the computational time by more than 30%.
Speech Analysis and synthesis using VocoderIJTET Journal
Abstract— In this paper, I proposed a speech analysis and synthesis using a vocoder. Voice conversion systems do not create new speech signals, but just transform existing one. The proposed speech vocoding is different from speech coding. To analyze the speech signal and represent it with less number of bits, so that bandwidth efficiency can be increased. The Synthesis of speech signal from the received bits of information. In this paper three aspects of analysis have been discussed: pitch refinement, spectral envelope estimation and maximum voiced frequency estimation. A Quasi-harmonic analysis model can be used to implement a pitch refinement algorithm which improves the accuracy of the spectral estimation. Harmonic plus noise model to reconstruct the speech signal from parameter. Finally to achieve the highest possible resynthesis quality using the lowest possible number of bits to transmit the speech signal. Future work aims at incorporating the phase information into the analysis and modeling process and also synthesis these three aspects in different pitch period.
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
Speech recognition is one of the important applications of artificial intelligence (AI). Speech recognition aims to recognize spoken words regardless of who is speaking to them. The process of voice recognition involves extracting meaningful features from spoken words and then classifying these features into their classes. This paper presents a neural network classification system for Arabic letters. The paper will study the effect of changing the multi-layer perceptron (MLP) artificial neural network (ANN) properties to obtain an optimized performance. The proposed system consists of two main stages; first, the recorded spoken letters are transformed from the time domain into the frequency domain using fast Fourier transform (FFT), and features are extracted using mel frequency cepstral coefficients (MFCC). Second, the extracted features are then classified using the MLP ANN with back-propagation (BP) learning algorithm. The obtained results show that the proposed system along with the extracted features can classify Arabic spoken letters using two neural network hidden layers with an accuracy of around 86%.
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
Speech is the most natural way of information exchange. It provides an efficient means of means of manmachine communication using speech interfacing. Speech interfacing involves speech synthesis and speech recognition. Speech recognition allows a computer to identify the words that a person speaks to a microphone or telephone. The two main components, normally used in speech recognition, are signal processing component at front-end and pattern matching component at back-end. In this paper, a setup that uses Mel frequency cepstral coefficients at front-end and artificial neural networks at back-end has been developed to perform the experiments for analyzing the speech recognition performance. Various experiments have been performed by varying the number of layers and type of network transfer function, which helps in deciding the network architecture to be used for acoustic modelling at back end.
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Speaker and Speech Recognition for Secured Smart Home ApplicationsRoger Gomes
The paper published in discusses implementation of a robust text-independent speaker recognition system using MFCC extraction of feature vectors its matching using VQ and optimization using LBG, further a text dependent speech recognition system using the DTW algorithm's implementation is discussed in the context of home automation.
Classification of Language Speech Recognition Systemijtsrd
This paper is aimed to implement Classification of Language Speech Recognition System by using feature extraction and classification. It is an Automatic language Speech Recognition system. This system is a software architecture which outputs digits from the input speech signals. The system is emphasized on Speaker Dependent Isolated Word Recognition System. To implement this system, a good quality microphone is required to record the speech signals. This system contains two main modules feature extraction and feature matching. Feature extraction is the process of extracting a small amount of data from the voice signal that can later be used to represent each speech signal. Feature matching involves the actual procedure to identify the unknown speech signal by comparing extracted features from the voice input of a set of known speech signals and the decision making process. In this system, the Mel frequency Cepstrum Coefficient MFCC is used for feature extraction and Vector Quantization VQ which uses the LBG algorithm is used for feature matching. Khin May Yee | Moh Moh Khaing | Thu Zar Aung "Classification of Language Speech Recognition System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26546.pdfPaper URL: https://www.ijtsrd.com/computer-science/speech-recognition/26546/classification-of-language-speech-recognition-system/khin-may-yee
Similar to Speaker Recognition System using MFCC and Vector Quantization Approach (20)
Due to availability of internet and evolution of embedded devices, Internet of things can be useful to contribute in energy domain. The Internet of Things (IoT) will deliver a smarter grid to enable more information and connectivity throughout the infrastructure and to homes. Through the IoT, consumers, manufacturers and utility providers will come across new ways to manage devices and ultimately conserve resources and save money by using smart meters, home gateways, smart plugs and connected appliances. The future smart home, various devices will be able to measure and share their energy consumption, and actively participate in house-wide or building wide energy management systems. This paper discusses the different approaches being taken worldwide to connect the smart grid. Full system solutions can be developed by combining hardware and software to address some of the challenges in building a smarter and more connected smart grid.
A Survey Report on : Security & Challenges in Internet of Thingsijsrd.com
In the era of computing technology, Internet of Things (IoT) devices are now popular in each and every domains like e-governance, e-Health, e-Home, e-Commerce, and e-Trafficking etc. Iot is spreading from small to large applications in all fields like Smart Cities, Smart Grids, Smart Transportation. As on one side IoT provide facilities and services for the society. On the other hand, IoT security is also a crucial issues.IoT security is an area which totally concerned for giving security to connected devices and networks in the IoT .As, IoT is vast area with usability, performance, security, and reliability as a major challenges in it. The growth of the IoT is exponentially increases as driven by market pressures, which proportionally increases the security threats involved in IoT The relationship between the security and billions of devices connecting to the Internet cannot be described with existing mathematical methods. In this paper, we explore the opportunities possible in the IoT with security threats and challenges associated with it.
In today’s emerging world of Internet, each and every thing is supposed to be in connected mode with the help of billions of smart devices. By connecting all the devises used in our day to day life, make our life trouble less and easy. We are incorporated in a world where we are used to have smart phones, smart cars, smart gadgets, smart homes and smart cities. Different institutes and researchers are working for creating a smart world for us but real question which we need to emphasis on is how to make dumb devises talk with uncommon hardware and communication technology. For the same what kind of mechanism to use with various protocols and less human interaction. The purpose is to provide the key area for application of IoT and a platform on which various devices having different mechanism and protocols can communicate with an integrated architecture.
Study on Issues in Managing and Protecting Data of IOTijsrd.com
This paper discusses variety of issues for preserving and managing data produced by IoT. Every second large amount of data are added or updated in the IoT databases across the heterogeneous environment. While managing the data each phase of data processing for IoT data is exigent like storing data, querying, indexing, transaction management and failure handling. We also refer to the problem of data integration and protection as data requires to be fit in single layout and travel securely as they arrive in the pool from diversified sources in different structure. Finally, we confer a standardized pathway to manage and to defend data in consistent manner.
Interactive Technologies for Improving Quality of Education to Build Collabor...ijsrd.com
Today with advancement in Information Communication Technology (ICT) the way the education is being delivered is seeing a paradigm shift from boring classroom lectures to interactive applications such as 2-D and 3-D learning content, animations, live videos, response systems, interactive panels, education games, virtual laboratories and collaborative research (data gathering and analysis) etc. Engineering is emerging with more innovative solutions in the field of education and bringing out their innovative products to improve education delivery. The academic institutes which were once hesitant to use such technology are now looking forward to such innovations. They are adopting the new ways as they are realizing the vast benefits of using such methods and technology. The benefits are better comprehensibility, improved learning efficiency of students, and access to vast knowledge resources, geographical reach, quick feedback, accountability and quality research. This paper focuses on how engineering can leverage the latest technology and build a collaborative learning environment which can then be integrated with the national e-learning grid.
Internet of Things - Paradigm Shift of Future Internet Application for Specia...ijsrd.com
In the world more than 15% people are living with disability that also include children below age of 10 years. Due to lack of independent support services specially abled (handicap) people overly rely on other people for their basic needs, that excludes them from being financially and socially active. The Internet of Things (IoT) can give support system and a better quality of life as well as participation in routine and day to day life. For this purpose, the future solutions for current problems has been introduced in this paper. Daunting challenges have been considered as future research and glimpse of the IoT for specially abled person is given in the paper.
A Study of the Adverse Effects of IoT on Student's Lifeijsrd.com
Internet of things (IoT) is the most powerful invention and if used in the positive direction, internet can prove to be very productive. But, now a days, due to the social networking sites such as Face book, WhatsApp, twitter, hike etc. internet is producing adverse effects on the student life, especially those students studying at college Level. As it is rightly said, something which has some positive effects also has some of the negative effects on the other hand. In this article, we are discussing some adverse effects of IoT on student’s life.
Pedagogy for Effective use of ICT in English Language Learningijsrd.com
The use of information and communications technology (ICT) in education is a relatively new phenomenon and it has been the educational researchers' focus of attention for more than two decades. Educators and researchers examine the challenges of using ICT and think of new ways to integrate ICT into the curriculum. However, there are some barriers for the teachers that prevent them to use ICT in the classroom and develop supporting materials through ICT. The purpose of this study is to examine the high school English teachers’ perceptions of the factors discouraging teachers to use ICT in the classroom.
In recent years usage of private vehicles create urban traffic more and more crowded. As result traffic becomes one of the important problems in big cities in all over the world. Some of the traffic concerns are traffic jam and accidents which have caused a huge waste of time, more fuel consumption and more pollution. Time is very important parameter in routine life. The main problem faced by the people is real time routing. Our solution Virtual Eye will provide the current updates as in the real time scenario of the specific route. This research paper presents smart traffic navigation system, based on Internet of Things, which is featured by low cost, high compatibility, easy to upgrade, to replace traditional traffic management system and the proposed system can improve road traffic tremendously.
Ontological Model of Educational Programs in Computer Science (Bachelor and M...ijsrd.com
In this work there is illustrated an ontological model of educational programs in computer science for bachelor and master degrees in Computer science and for master educational program “Computer science as second competence†by Tempus project PROMIS.
Understanding IoT Management for Smart Refrigeratorijsrd.com
Lately the concept of Internet of Things (IoT) is being more elaborated and devices and databases are proposed thereby to meet the need of an Internet of Things scenario. IoT is being considered to be an integral part of smart house where devices will be connected to each other and also react upon certain environmental input. This will eventually include the home refrigerator, air conditioner, lights, heater and such other home appliances. Therefore, we focus our research on the database part for such an IoT’ fridge which we called as smart Fridge. We describe the potentials achievable through a database for an IoT refrigerator to manage the refrigerator food and also aid the creation of a monthly budget of the house for a family. The paper aims at the data management issue based on a proposed design for an intelligent refrigerator leveraging the sensor technology and the wireless communication technology. The refrigerator which identifies products by reading the barcodes or RFID tags is proposed to order the required products by connecting to the Internet. Thus the goal of this paper is to minimize human interaction to maintain the daily life events.
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...ijsrd.com
Double wishbone designs allow the engineer to carefully control the motion of the wheel throughout suspension travel. 3-D model of the Lower Wishbone Arm is prepared by using CAD software for modal and stress analysis. The forces and moments are used as the boundary conditions for finite element model of the wishbone arm. By using these boundary conditions static analysis is carried out. Then making the load as a function of time; quasi-static analysis of the wishbone arm is carried out. A finite element based optimization is used to optimize the design of lower wishbone arm. Topology optimization and material optimization techniques are used to optimize lower wishbone arm design.
A Review: Microwave Energy for materials processingijsrd.com
Microwave energy is a latest largest growing technique for material processing. This paper presents a review of microwave technologies used for material processing and its use for industrial applications. Advantages in using microwave energy for processing material include rapid heating, high heating efficiency, heating uniformity and clean energy. The microwave heating has various characteristics and due to which it has been become popular for heating low temperature applications to high temperature applications. In recent years this novel technique has been successfully utilized for the processing of metallic materials. Many researchers have reported microwave energy for sintering, joining and cladding of metallic materials. The aim of this paper is to show the use of microwave energy not only for non-metallic materials but also the metallic materials. The ability to process metals with microwave could assist in the manufacturing of high performance metal parts desired in many industries, for example in automotive and aeronautical industries.
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logsijsrd.com
With an expontial growth of World Wide Web, there are so many information overloaded and it became hard to find out data according to need. Web usage mining is a part of web mining, which deal with automatic discovery of user navigation pattern from web log. This paper presents an overview of web mining and also provide navigation pattern from classification and clustering algorithm for web usage mining. Web usage mining contain three important task namely data preprocessing, pattern discovery and pattern analysis based on discovered pattern. And also contain the comparative study of web mining techniques.
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEMijsrd.com
Application of FACTS controller called Static Synchronous Compensator STATCOM to improve the performance of power grid with Wind Farms is investigated .The essential feature of the STATCOM is that it has the ability to absorb or inject fastly the reactive power with power grid . Therefore the voltage regulation of the power grid with STATCOM FACTS device is achieved. Moreover restoring the stability of the power system having wind farm after occurring severe disturbance such as faults or wind farm mechanical power variation is obtained with STATCOM controller . The dynamic model of the power system having wind farm controlled by proposed STATCOM is developed . To validate the powerful of the STATCOM FACTS controller, the studied power system is simulated and subjected to different severe disturbances. The results prove the effectiveness of the proposed STATCOM controller in terms of fast damping the power system oscillations and restoring the power system stability.
Making model of dual axis solar tracking with Maximum Power Point Trackingijsrd.com
Now a days solar harvesting is more popular. As the popularity become higher the material quality and solar tracking methods are more improved. There are several factors affecting the solar system. Major influence on solar cell, intensity of source radiation and storage techniques The materials used in solar cell manufacturing limit the efficiency of solar cell. This makes it particularly difficult to make considerable improvements in the performance of the cell, and hence restricts the efficiency of the overall collection process. Therefore, the most attainable maximum power point tracking method of improving the performance of solar power collection is to increase the mean intensity of radiation received from the source used. The purposed of tracking system controls elevation and orientation angles of solar panels such that the panels always maintain perpendicular to the sunlight. The measured variables of our automatic system were compared with those of a fixed angle PV system. As a result of the experiment, the voltage generated by the proposed tracking system has an overall of about 28.11% more than the fixed angle PV system. There are three major approaches for maximizing power extraction in medium and large scale systems. They are sun tracking, maximum power point (MPP) tracking or both.
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...ijsrd.com
In day today's relevance, it is mandatory to device the usage of diesel in an economic way. In present scenario, the very low combustion efficiency of CI engine leads to poor performance of engine and produces emission due to incomplete combustion. Study of research papers is focused on the improvement in efficiency of the engine and reduction in emissions by adding ethanol in a diesel with different blends like 5%, 10%, 15%, 20%, 25% and 30% by volume. The performance and emission characteristics of the engine are tested observed using blended fuels and comparative assessment is done with the performance and emission characteristics of engine using pure diesel.
Study and Review on Various Current Comparatorsijsrd.com
This paper presents study and review on various current comparators. It also describes low voltage current comparator using flipped voltage follower (FVF) to obtain the single supply voltage. This circuit has short propagation delay and occupies a small chip area as compare to other current comparators. The results of this circuit has obtained using PSpice simulator for 0.18 μm CMOS technology and a comparison has been performed with its non FVF counterpart to contrast its effectiveness, simplicity, compactness and low power consumption.
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...ijsrd.com
Power dissipation is a challenging problem for today's system-on-chip design and test. This paper presents a novel architecture which generates the test patterns with reduced switching activities; it has the advantage of low test power and low hardware overhead. The proposed LP-TPG (test pattern generator) structure consists of modified low power linear feedback shift register (LP-LFSR), m-bit counter, gray counter, NOR-gate structure and XOR-array. The seed generated from LP-LFSR is EXCLUSIVE-OR ed with the data generated from gray code generator. The XOR result of the sequence is single input changing (SIC) sequence, in turn reduces the switching activity and so power dissipation will be very less. The proposed architecture is simulated using Modelsim and synthesized using Xilinx ISE9.2.The Xilinx chip scope tool will be used to test the logic running on FPGA.
Defending Reactive Jammers in WSN using a Trigger Identification Service.ijsrd.com
In the last decade, the greatest threat to the wireless sensor network has been Reactive Jamming Attack because it is difficult to be disclosed and defend as well as due to its mass destruction to legitimate sensor communications. As discussed above about the Reactive Jammers Nodes, a new scheme to deactivate them efficiently is by identifying all trigger nodes, where transmissions invoke the jammer nodes, which has been proposed and developed. Due to this identification mechanism, many existing reactive jamming defending schemes can be benefited. This Trigger Identification can also work as an application layer .In this paper, on one side we provide the several optimization problems to provide complete trigger identification service framework for unreliable wireless sensor networks and on the other side we also provide an improved algorithm with regard to two sophisticated jamming models, in order to enhance its robustness for various network scenarios.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Water Industry Process Automation and Control Monthly - May 2024.pdf
Speaker Recognition System using MFCC and Vector Quantization Approach
1. IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 9, 2013 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 1934
Abstract—This paper presents an approach to speaker
recognition using frequency spectral information with Mel
frequency for the improvement of speech feature
representation in a Vector Quantization codebook based
recognition approach. The Mel frequency approach extracts
the features of the speech signal to get the training and
testing vectors. The VQ Codebook approach uses training
vectors to form clusters and recognize accurately with the
help of LBG algorithm
Key words: Speaker Recognition, MFCC, Mel Frequencies,
Vector Quantization.
I. INTRODUCTION
Speech is the most natural way of communicating. Unlike
other forms of identification, such as passwords or keys,
speech is the most non-intrusive as a biometric. A person’s
voice cannot be stolen, forgotten or lost, therefore speaker
recognition allows for a secure method of authenticating
speakers.
In speaker recognition system, an unknown speaker
is compared against a database of known speakers, and the
best matching speaker is given as the identification result.
This system is based on the speaker-specific information
which is included in speech waves.
Earlier systems used the traditional approaches but
in this paper, we intend to increase the efficiency and
accuracy of the system by making use of a new approach
which would include the fusion neural networks and
clustering algorithms.
The general process of speaker identification involves two
stages:
1) Training Mode
2) Recognition Mode
In the training mode, a new speaker (with known identity) is
enrolled into the system’s database. In the recognition mode,
an unknown speaker gives a speech input and the system
makes a decision about the speaker’s identity.
II. BLOCK DIAGRAM
The process of real time speaker identification system
consists of two main phases. During the first phase, speaker
enrolment, speech samples are collected from the speakers,
and they are used to train their models. The collection of
enrolled models is also called a speaker database. In the
second phase, identification phase, a test sample from an
unknown speaker is compared against the speaker database.
Both phases include the same first step, feature extraction,
which is used to extract speaker dependent characteristics
from speech. The main purpose of this step is to reduce the
amount of test data while retaining speaker discriminative
information. Then in the enrolment phase, these features are
modelled and stored in the speaker database. This process is
represented in Fig. 1.
Fig. 1: Speaker Enrolment
In the identification step, the extracted features are
compared against the models stored in the speaker database.
Based on these comparisons the final decision about speaker
identity is made. This process is represented in Fig. 2.
Fig. 2: Speaker Authentication
III. PRE-PROCESSING
Speech is recorded by sampling the input which results in a
discrete time speech signal. Pre-processing is a technique
used to make discrete time speech signal more amendable
for the process that follows. The pre-processing techniques
are used to enhance feature extraction. They include pre-
emphasis, framing and windowing. Pre-emphasis is
extensively explained below whereas frame blocking and
windowing are explained in further parts.
A. Pre-emphasis
Pre-emphasis is a technique used in speech processing to
enhance high frequencies of the signal. There are two main
Speaker Recognition System using MFCC and Vector Quantization
Approach
Deepak Harjani1
Mohita Jethwani2
Ms. Mani Roja3
1, 2
Student 3
Associate Professor
1, 3
Dept. of Electronics and Telecommunication 2
Dept. of Computer Science
1, 2, 3
Thadomal Shahani Engineering College, Mumbai, India
2. Speaker Recognition System using MFCC and Vector Quantization Approach
(IJSRD/Vol. 1/Issue 9/2013/0061)
All rights reserved by www.ijsrd.com 1935
factors driving the need for pre-emphasis. Firstly, the speech
signal generally contains more speaker specific information
in the higher frequencies than the lower frequencies.
Secondly, pre-emphasis removes some of the glottal effects
from the vocal tract parameters. For voiced sounds which
have a steep roll-off in the high frequency region, the glottal
source has an approximately –12dB/octave slope. However,
when the acoustic energy radiates from the lips, this causes a
roughly +6dB/octave boost to the spectrum. As a result, a
speech signal when recorded with a microphone from a
distance, has approximately –6dB /octave slope downward
compared to the true spectrum. Therefore, by applying pre-
emphasis, the spectrum is flattened, consisting of formats of
similar heights. The spectrum of unvoiced sound is already
flat therefore there is no reason to pre-emphasize them. This
allows feature extraction to focus on all aspects of the
speech signal.
Pre-emphasis is implemented as a first-order Finite Impulse
Response (FIR) filter defined as:
H(z) = 1 – αz-1 (1.1)
Generally α is chosen to be between 0.9 and 0.95. We have
used α=0.95.
B. Frame Blocking
To prevent the occurrence of aliasing effect, frame blocking
is used to convert the continuous speech signal into frames
of desired sample length. In this step the continuous speech
signal is blocked into frames of N samples, with adjacent
frames being separated by M such that M < N. The first
frame consists of the first N samples. The second frame
begins M samples after the first frame, and overlaps it by N
- M samples. Similarly, the third frame begins 2M samples
after the first frame (or M samples after the second frame)
and overlaps it by N - 2M samples. This process continues
until all the speech is accounted for within one or more
frames. Typical values for N and M are N = 256 (which is
equivalent to ~ 30 mSec windowing and facilitate the fast
radix-2 FFT) and M = 100. [1]
C. Windowing
The signal obtained after frame blocking has signal
discontinuities at the beginning and end of each frame. To
minimize these signal discontinuities blocking is used. In
blocking, the spectral distortion is minimized by using a
window to taper the signal to zero at the beginning and at
the end of each frame. The different windows available for
this process include rectangular window, triangular window,
hanning window, hamming window, etc. If we define the
window as w(n), 0 ≤ n ≤ N–1, where N is the number of
samples in each frame, then the result of windowing is the
signal as in Equation 1.2
y1(n) = x1(n) * w(n) (1.2)
Typically the Hamming window is used for the windowing
process, which has the form as in Equation 1.3:
( ) ( ) (1.3)
IV. FEATURE EXTRACTION
The amount of data, generated during the speech production,
is quite large while the essential characteristics of the speech
process change relatively slowly and therefore, they require
less data. According to these matters feature extraction is a
process of reducing data while retaining speaker
discriminative information. [2]
In order to create a speaker profile, the speech signal must
be analyzed to produce some representation that can be used
as a basis for such a model. In speech analysis this is known
as feature extraction. Feature extraction allows for speaker
specific characteristics to be derived from the speech signal,
which are used to create a speaker model. The speaker
model uses a distortion measure to determine features which
are similar. This places importance on the features extracted,
to accurately represent the speech signal. Feature extraction
phase consists of transforming the speech signal in a set of
feature vectors called parameters. [3]
The aim of this transformation is to obtain a new
representation which is more compact, less redundant, and
more suitable for statistical modeling and calculation of
distances. Most of the speech parameterizations used in
speaker recognition systems relies on cepstral representation
of the speech signal. [4]
A wide range of possibilities exist for parametrically
representing the speech signal for the speaker recognition
task, such as Linear Prediction Coding (LPC), Mel-
Frequency Cepstrum Coefficients (MFCC), and others.
MFCC is perhaps the best known and most popular, and
these will be used in this project.
V. MEL-FREQUENCY CEPSTRAL COEFFICIENTS
MFCC’s are based on the known variation of the human
ear’s critical bandwidths with frequency; filters spaced
linearly at low frequencies and logarithmically at high
frequencies have been used to capture the phonetically
important characteristics of speech. This is expressed in the
mel-frequency scale, which is linear frequency spacing
below 1000 Hz and a logarithmic spacing above 1000 Hz.
[1]
Fig. 3: Block diagram of MFCC processor
A block diagram of the structure of an MFCC processor is
given in Figure 2. The speech input is typically recorded at a
sampling rate above 10000 Hz. This sampling frequency
was chosen to minimize the effects of aliasing in the Analog
-to digital conversion. These sampled signals can capture all
frequencies up to 5 kHz, which cover most energy of sounds
that are generated by humans. As been discussed previously,
the main purpose of the MFCC processor is to mimic the
behaviour of the human ears. In addition, rather than the
speech waveforms themselves, MFFC’s are shown to be less
susceptible to mentioned variations. Figure 2 shows the
block diagram of the MFCC processor.
3. Speaker Recognition System using MFCC and Vector Quantization Approach
(IJSRD/Vol. 1/Issue 9/2013/0061)
All rights reserved by www.ijsrd.com 1936
A. Fast Fourier Transform
All this while the computations have been carried out in the
time domain. But since for MFCC, we need the samples in
the frequency domain, we use the Fast Fourier Transform
method to convert each frame of N samples from time
domain into the frequency domain. The FFT is a fast
algorithm to implement the Discrete Fourier Transform
(DFT) which is defined on the set of N samples {xn} as in
Equation 2.1:
∑ (2.1)
The spectrum obtained after the fast Fourier transform is
used for mel-frequency warping.
B. Mel-Frequency Warping
MFCCs are typically computed by using a bank of
triangular-shaped filters, with the centre frequency of the
filter spaced linearly for frequencies less than 1000 Hz and
logarithmically above 1000 Hz. The bandwidth of each filter
is determined by the centre frequencies of the two adjacent
filters and is dependent on the frequency range of the filter
bank and number of filters chosen for design. But for the
human auditory system it is estimated that the filters have a
bandwidth that is related to the centre frequency of the filter.
Further it has been shown that there is no evidence of two
regions (linear and logarithmic) in the experimentally
determined Mel frequency scale.
Recent studies on the effectiveness of different
frequency regions of the speech spectrum for speaker
recognition frequency-scale warping method provided better
performance than standard Mel scale filter bank. [4]
The frequency-scale warping is implemented by
using the bilinear transform method. A range of warping
functions can be achieved by using the bilinear transform
technique. It is extremely flexible and a wide range of
warping functions can be obtained by suitably fixing a
“warping factor” for the given sampling frequency. We use
this framework to carry out an experimental study toward
determining the optimal choice of frequency-scale warping
for the speaker recognition task.
Fig. 4: Frequency warping using second
Order Bilinear transform
A nonlinear warping of the frequency scale can be effected
by bilinear transformation, given as the transfer function of
a first-order all pass filters.
( ) = (2.2)
This mapping in the Z-plane maps the unit circle onto itself.
From (4) we can obtain
ώ = arg(D(e-jω
)) =
| | ( )
| | ( )
(2.3)
More control over frequency warping can be achieved by
using a second order bilinear transform [4].
C. Cepstrum
After frequency warping, we convert the log Mel spectrum
from frequency back into time. The result obtained after the
Cepstrum operation is called the Mel Frequency Cepstrum
Coefficients (MFCC). Since the Mel spectrum coefficients
are real numbers, so are their logarithmic values. Therefore
we can convert them back into the time domain by using
simple Discrete Cosine Transform. Therefore if we denote
those mel power spectrum coefficients that are the result of
the last step are Sk, k=1, 2…K, we can calculate the
MFCC's, as in Equation 2.4:
ξn = ∑ ( ) [ ( ) ] (2.4)
The cepstral representation of the speech spectrum provides
a good representation of the local spectral properties of the
signal for the given frame analysis. [3]
VI. FEATURE MATCHING AND SPEAKER
RECOGNITION
Fig. 5: Conceptual diagram illustrating vector quantization
codebook formation
The state-of-the-art in feature matching techniques used in
speaker recognition includes Dynamic Time Warping
(DTW), Hidden Markov Modelling (HMM), and Vector
Quantization (VQ). In this project, the VQ approach will be
used, due to ease of implementation and high accuracy. VQ
is a process of mapping vectors from a large vector space to
a finite number of regions in that space. Each region is
called a cluster and can be represented by its centre called a
codeword. The collection of all codeword is called a
codebook.
4. Speaker Recognition System using MFCC and Vector Quantization Approach
(IJSRD/Vol. 1/Issue 9/2013/0061)
All rights reserved by www.ijsrd.com 1937
Figure 5 shows a conceptual diagram to illustrate this
recognition process. In the figure, only two speakers and
two dimensions of the acoustic space are shown.
The training vectors obtained are used to build a speech
specific VQ codebook for the speaker dictionary. The LBG
algorithm is used for clustering a set of L training vectors
into a set of M codebook vectors. This algorithm designs an
M-vector codebook in stages. [6] It starts first by designing
a 1-vector codebook, then uses a splitting technique on the
codeword to initialize the search for a 2-vector codebook,
and continues the splitting process until the desired M-
vector codebook is obtained. This algorithm is based on the
nearest-neighbour search procedure which assigns each
training vector to a cluster associated with the closest
codeword. The centroid of the clusters obtained and the
distortion calculates the sum of the distances of all training
vectors so as to decide whether the procedure has
converged. The point of convergence is used to decide the
result of the speaker recognition. [7]
VII. EXPERIMENTAL RESULTS
The database consists of 20 distinct speakers including both
male and female speakers. It also contains 50 sound files
used for training and testing the Speaker Recognition
module. New sound files are recorded in real time for testing
the Continuous Speech Recognition module in clean and
noisy environments for both multi-speaker and speaker-
independent modes. Recognition rate of the trained VQ
Codebook model is defined as follows:
(3.1)
In the equation 3.1, RR is the recognition rate, Ncorrect is
the number of correct recognition of testing speech samples
per digit, and Ntotal is the total number of testing speech
samples per digit [8].
Environment
Number of
Samples
Tested
Number of
Samples
Recognized
Accurately
Recognition
Rate
(%)
Clean 20 19 95
Noisy 20 16 80
Table 1: Overall Recognition Rate of Proposed Speaker
Recognition System
Feature Extraction
Technique
Feature recognition
technique
Recognition
rate
(%)
Reference
LPC VQ and HMM 62% to 96% [9]
MFCC VQ 70% to 85% [10]
MFCC VQ 88.88% [2]
MFCC VQ 57% to 100% [11]
Table. 2: Comparative Performance of various Speaker
Recognition Researches
VIII. CONCLUSION
The performance measured on the basis of accuracy, time
taken to compute the feature recognition, it was observed
that the Speaker Recognition System performs well in both
clean and noisy environment with both multi-speaker and
speaker independent modes. The entire research process was
carried out using MATLAB R13 on an Intel i5 powered
machine. It is noticed that the recognition results on the
clean environment are much higher than the recognition
results of the noisy environment.
ACKNOWLEDGMENT
The authors would like to thank the college authorities for
providing infrastructure to carry out experimental and
research work required. The authors would also like to thank
Ms Mani Roja, Associate Professor of the Electronics and
Telecommunication Department of Thadomal Shahani
Engineering College for reviewing the paper and guiding the
authors with their valuable feedbacks.
REFERENCES
[1] Santosh Gaikwad “Feature Extraction Using Fusion
MFCC For Continuous Marathi Speech Recognition”.
[2] M. A. M. Abu Shariah, R. N. Ainon, R. Zainuddin, and
O. O. Khalifa, “Human Computer Interaction Using
Isolated-Words Speech Recognition Technology,”
IEEE Proceedings of The International Conference on
Intelligent and Advanced Systems (ICIAS’07), Kuala
Lumpur, Malaysia, pp. 1173 – 1178, 2007.
[3] Ibrahim Patel, Dr.Y.Srinivasa Rao “Speech
Recognition using Hidden Markov Model with MFCC-
Sub band technique”
[4] S.K., Podder, “Segment-based Stochastic Modelings
for Speech Recognition”. PhD Thesis. Department of
Electrical and Electronic Engineering, Ehime
University, Matsuyama 790-77, Japan, 1997.
[5] Pradeep Kumar P and Preeti Rao” A Study of
Frequency-Scale Warping for Speaker Recognition”
Englewood Cliffs, N.J., 1993.
[6] Clarence Goh Kok Leon “Robust Computer Voice
Recognition Using Improved MFCC Algorithm”, 2009
International Conference on New Trends in
Information and Service Science.
[7] Ahmad A. M. Abushariah ‘English Digits Speech
Recognition System Based on Hidden Markov
Models”, Conference on Computer and
Communication Engineering (ICCCE 2010).
[8] S.K., Podder, “Segment-based Stochastic Modelings
for Speech Recognition”. PhD Thesis. Department of
Electrical and Electronic Engineering, Ehime
University, Matsuyama 790-77, Japan, 1997.
[9] S.M., Ahadi, H., Sheikhzadeh, R.L., Brennan, and
G.H., Freeman, “An Efficient Front-End for Automatic
Speech Recognition”. IEEE International Conference
on Electronics, Circuits and Systems (ICECS2003),
Sharjah, United Arab Emirates, 2003.
[10] M.R., Hasan, M., Jamil, and M.G., Saifur Rahman,
“Speaker Identification Using Mel Frequency Cepstral
Coefficients”. 3rd International Conference on
Electrical and Computer Engineering, Dhaka,
Bangladesh, pp. 565-568, 2004.
[11] M.Z., Bhotto and M.R., Amin, “Bangali Text
Dependent Speaker Identification Using MelFrequency
Cepstrum Coefficient and Vector Quantization”. 3rd
International Conference on Electrical and Computer
Engineering, Dhaka, Bangladesh, pp. 569-572, 2004.