Environmental Sound detection Using MFCC technique


Published on

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Environmental Sound detection Using MFCC technique

  1. 1. ENVIRONMENTAL NATURAL SOUND DETECTION AND CLASSIFICATION USING CONTENT-BASED RETRIEVAL (CBR) AND MFCC 1 Project Mentor :- Shiladitya Pujari Project group member :Par th Sinha(20093043) Pankaj Kumar(20093013) Manas Sarkar(20093030) Ruchasri Nath(20093055)
  2. 2. MAIN TOPICS  Objective  Methodology  Result  Future scope & conclusion 2
  3. 3. OBJECTIVE  To develop an Environmental Sound Detection & Classification technique (using Content Based Retrieval & MFCC) so that computer system can predict and understand “SOUND” more accurately.  To make computer systems more intelligent & reliable in understanding its environment based on this technique. 3
  5. 5. WHAT ARE MFCCS?     In sound processing, the Mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear Mel scale of frequency. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-ofa-spectrum"). The difference between the cepstrum and the Mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the Mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better representation of sound, for example, in audio compression. MFCCs are commonly derived as follows: 1. Take the Fourier transform of (a windowed excerpt of) a signal. 2. Map the powers of the spectrum obtained above onto the Mel 5 scale, triangular overlapping windows.
  6. 6. (CONTD…….) 3.Take the logs of the powers at each of the mel frequencies. 4.Take the discrete cosine transform of the list of mel log powers, as if it were a signal. 5. The MFCCs are the amplitudes of the resulting spectrum.  MFCCs are commonly used as features in speech recognition systems, such as the systems which can automatically recognize numbers spoken into a telephone. They are also common in speaker recognition, which is the task of recognizing people from their voices.  MFCCs are also increasingly finding uses in music information retrieval applications such as genre classification, audio similarity measures, etc. 6
  7. 7. CBR  Content Based Retrieval means that the retrieval and the required search is based on the analysis of the actual contents of the data(here sound) rather than the metadata such as keywords, tags and/or descriptions associated with the sounds.  In our project we’ll use multimedia database which provides Content Based Retrieval . 7
  8. 8. METHODOLOGY(1) The major steps involved in the entire method are as follows :  Extraction of feature for classifying highly diversified natural sounds.  Making clusters according to their feature similarity.  Finding a match for a particular sound query from the cluster. 8
  9. 9. METHODOLOGY(2)     First we take input sound(audio signal of any format). Then some preprocessing will be done to normalize the signals. Feature Extraction of the audio signal. Next will be the Classification phase(consisting of two phases):Training phase  Testing phase  9
  10. 10. METHODOLOGY(3) 10 Fig: Mel Frequency Cepstral Coefficient pipeline
  11. 11. PROCESS DESCRIPTION Sampling  It is the process of converting a continuous signal into a discrete signal. Sampling can be done for signals varying in space, time, or any other dimension, and similar results are obtained in two or more dimensions. Pre-emphasis  In processing of electronic audio signals,pre-emphasis refers to a system process designed to increase (within a frequency band) the magnitude of some (usually higher) frequencies with respect to the magnitude of other (usually lower) frequencies in order to improve the overall signal-to-noise ratio (SNR) by minimizing the adverse effects. Windowing  In signal processing, a window function (also known as tapering function) is a mathematical function that is zero-valued outside of some chosen interval. For instance, a function that is constant inside the interval and zero elsewhere is called a rectangular window, which describes the shape of its graphical representation. Fast Fourier Transform  FFTs are of great importance to a wide variety of applications, from digital signal processing and solving partial differential equations to algorithms for quick multiplication of large integers. Absolute Value  11 In mathematics, the absolute value (or modulus) |a of a real number a is the numerical value of a without its sign. The absolute value of a number may be thought of as its distance from zero.
  12. 12. PROCESS DESCRIPTION(CONTINUED..) Discrete cosine transformation(DCT)  In particular, a DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but uses only real numbers. DCTs are equivalent to DFTs of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even), where in some variants the input and/or output data are shifted by half a sample. There are eight standard DCT variants, of which four are commonly used. Linear Discriminate Analysis (LDA)  Linear discriminate analysis (LDA) and the related Fisher's linear discriminate are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier or, more commonly, for dimensionality reduction before later classification. 12
  13. 13. TRAINING AND TESTING Fig: Flow chart of Training Session 13 Fig: Flowchart of Testing Session
  14. 14. RESULT On using the above mentioned approaches (MFCC and CBR) for sound detection and classification system we find that the Recognition Rate is very high and very accurate. Although the recognition rate is high enough, one problem is that of Rejection Rate, that is, the rejection rate is not quite good enough. This implies that if the particular sound that is to be tested is already present in the database then the matching process is very accurate but if that sound is not present in the database then the system doesn’t reject the sound (or stop the matching) rather it matches it with the nearest and closest sounds in terms of features.   14
  15. 15. CONCLUSION Future scope and applications  Environmental monitoring  Speaker recognition  Genre classification   Audio similarity measures  Robotic awareness Conclusion This method of environmental sound detection and classification is developed using MFCC pipeline and CBR for extraction of features of a particular sound and retrieval of sound features from the multimedia database respectively. This method can be implemented in the domain of robotics where sound detection and recognition may be possible up to a satisfactory level. If the method will be properly implemented with computer vision, then humancomputer interaction process can be developed much. MFCC is undoubtedly more efficient feature extraction method because it is designed by giving emphasis on human perception power. Using more than one features of a sound may obviously improve the performance of the 15 method. Applying clustering technique, accuracy can be boosted. Another good feature available today is Audio spectrum projection provided by MPEG7 specification. Inclusion of this feature may increase the performance measure of the method.