SlideShare a Scribd company logo
1 of 4
AUDIO CLIP CLASSIFICATION USING LP RESIDUAL AND NEURAL NETWORKS
                             MODELS
                                              Anvita Bajpai and B. Yegnanarayana
                                      Department of Computer Science and Engineering
                                Indian Institute of Technology Madras, Chennai- 600 036, India
                                                  anvita, yegna¡ @cs.iitm.ernet.in
                                               




                        ABSTRACT                                     et al. classify audio into five categories of television (TV)
                                                                     programs using spectral features [4]. Features based on
In this paper, we demonstrate the presence of audio-specific
                                                                     amplitude, zero-crossing, bandwidth, band energy in the
information in the linear prediction (LP) residual, obtained
                                                                     subbands, spectrum and periodicity properties, along with
after removing the predictable part of the signal. We empha-
                                                                     hidden Markov model (HMM) for classification are explored
size the importance of information present in the LP resid-
                                                                     for audio indexing applications in [5]. But it was shown
ual of audio signals, which if added to the spectral informa-
                                                                     that perceptually significant information of audio data is
tion, can give a better performing system. Since it is dif-
                                                                     present in the form of sequence of events, which can be
ficult to extract information from the residual using known
                                                                     obtained after removing the predictable part in the audio
signal processing algorithms, neural networks (NN) models
                                                                     data. Perceptually, there are some discriminating features
are proposed. In this paper, autoassociative neural networks
                                                                     present in the residual which could help in various audio
(AANN) models are used to capture the audio-specific infor-
                                                                     indexing tasks. The challenge lies in developing algorithms
mation from the LP residual of signals. Multilayer feedfor-
                                                                     to capture these perceptually significant features from the
ward neural networks (MLFFNN) models or multilayer per-
                                                                     residual, as it is difficult to extract information using known
ceptron (MLP) are used to classify the audio data using the
                                                                     signal processing algorithms.
audio-specific information captured by AANN models.

                                                                         Objective of this study is to explore the features in
                                                                     addition to the features that are currently used to improve the
                                                                     performance of an audio indexing system. In particular, fea-
                   1. INTRODUCTION
                                                                     tures not used explicitly or implicitly in the current system
In this era of information technology, the data that we use is       are being investigated. Many interesting and perceptually
mostly in the form of audio, video and multimedia. The data,         important features are present in the residual signal obtained
once recorded and stored digitally, conveys no significant            after removing the predictable part. Thus the main objective
information in order to organize and use it. The volume of           of this study is to explore the features present in the linear
data is large, and is increasing daily. Therefore it is difficult     prediction (LP) residual for audio clip classification task.
to organize the data manually. We need to have an automatic          The reason for considering the residual data for study is that
method to index the data, for further search and retrieval.          the residual part of the signal is generally subject to less
Audio plays an important role in classifying multimedia              degradation as compared to the system part [6]. The residual
data as it contains significant information, and is easier to         data contains higher order correlation among samples. As
process when compared to video data. For these reasons,              known signal processing and statistical techniques are not
commercial products of audio retrieval are emerging, e.g.,           suitable to capture this correlation, an autoassociative neural
(http://www.musclefish.com) [1]. Content-based classifica-             networks (AANN) model is proposed to capture these higher
tion of data into different categories is one important step for     order correlations among samples of the residual of the
building an audio indexing system.                                   audio data. AANN have already been studied to capture
                                                                     information from the residual data for tasks such as speaker
    In the traditional approach of audio indexing, audio             recognition [7]. Further, multilayer feedforward neural
is first converted to text, and then it is given to text-based        networks (MLFFNN) models or multilayer perceptron
search engines [2]. Drawbacks of this approach are: a£               (MLP) are proposed for decision making task using the
                                                             ¢




not having accurate speech recognizer, b£ not using speech           audio-specific information captured by AANN models.
                                          ¢




information present in form of prosody, and c£ not appli-
                                                     ¢




cable for non-speech data like music. An elaborate audio
content categorization is proposed by Wold et al. [1], which             The paper is organized as follows: Section 2 discusses
divides the audio content into sixteen groups. The authors           extraction of the LP residual from audio data. Section 3 dis-
have used mean, variance and autocorrelation of loudness,            cusses AANN models for capturing features in LP residual
pitch and bandwidth as audio features and a nearest neigh-           for the audio clip classification. Section 4 discusses MLP
borhood classifier for the task. The authors quote 81%                models for decision making. Section 5 presents the work-
classification accuracy for an audio database with 400 sound          flow of the system. The results of the experimental studies
files. Guo et al. [3] have used features consisting of total          are presented in Section 6. Various issues addressed in this
power, subband energies, bandwidth, pitch and MFCCs, and             paper and possible directions for the future study are summa-
support vector machines (SVMs) for classification. Wang               rized in Section 7.




                                                                   2299
2300
 which is non-speech and non-music). There are significant                                                                                                                                                                                                                                                                                                                                                                                                                       formation from the LP residual.
 selected for the study are speech, music and noise (audio                                                                                                                                                                                                                                                                                                                                                                                                                      section, we discuss methods to capture the audio-specific in-
 in audio is used for classification task. The components                                                                                                                                                                                                                                                                                                                                                                                                                        information could be perceived while listening. In the next
 components. The knowledge of these components present                                                                                                                                                                                                                                                                                                                                                                                                                          cannot be observed in the residual signal, the audio-specific
 while advertisement audio has music and speech as major                                                                                                                                                                                                                                                                                                                                                                                                                        shown in Figure 1. For some cases even if the difference
 components. For example, news audio has clean speech,                                                                                                                                                                                                                                                                                                                                                                                                                          within it. The residual signal of the five different classes are
 In an audio category, there could be one or more audio                                                                                                                                                                                                                                                                                                                                                                                                                         the most difficult class to study, as it has many variations
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                more in the case of football audio. Advertisement audio is
  AANN Models
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                speech and other background sounds, like noise. Noise is
  3.1 Use of Audio Component Knowledge for Building
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                is a part of cartoon audio. Cricket and football have casual
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                gory differs from the news speech in terms of prosody. Music
 information present in LP residual signal.                                                                                                                                                                                                                                                                                                                                                                                                                                     vironment) has clean speech, while speech for cartoon cate-
 for building AANN models for capturing the audio-specific                                                                                                                                                                                                                                                                                                                                                                                                                       variations among them. News audio (in closed studio en-
 section, we discuss the use of audio component knowledge                                                                                                                                                                                                                                                                                                                                                                                                                           The five classes considered for the present study show
 on the structure of the network [7]. In the following sub-
 The performance of the network does not critically depend
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ing to five different categories.
 of nodes in hidden layers have been decided experimentally.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Figure 1: LP residual for the segments of audio clips belong-
 sidered for study), and may vary for other audio. The number
 speech knowledge (which is suitable for the categories con-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       No. of Residual Samples
 ual signal to be considered for study has been derived using
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             400   350        300            250               200            150          100      50       0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 −0.5
 and output layer is 40. However, this the duration of resid-
 sampling frequency and hence the number of nodes in input
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 0
 the residual signal is chosen. The signal is recorded at 8 kHz
                                                                                                                                                                                                                                                                                                                                                                                                                                                                               News




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 0.5
 we are interested in the source features, the 5 ms duration of
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             400   350        300            250               200            150          100      50       0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 −0.5
 A tanh is used as the non-linear activation function. Since,
 40L, where L refers to linear units, and N to non-linear units.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 0
 ture of the network used in our study is 40L 48N 12N 40N
                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Football




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 0.5
 layer AANN model as shown in Figure 2 is used. The struc-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             400   350        300            250               200            150          100      50       0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 −0.5
 the audio information present in the LP residual signal, a five
 specific to the speaker in the LP residual [7]. For capturing
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 0
 models have been shown to capture excitation source features
                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Cricket




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 0.5
 forming an identity mapping of the input space [10]. AANN
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             400   350        300            250               200            150          100      50       0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 −0.5
     AANN models are feedforward neural networks per-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Residual Amplitude for Audio Clips




                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Cartoon




 audio-specific information.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 0.5
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             400   350        300            250               200            150          100      50       0
 Figure 2: Structure of AANN model used for capturing
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 −0.5
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 0
                                                                                                                                                                                                                                                        Layer
              Output Layer                                                                                                                                                                                                                                                                                                                                                                                                             Input Layer
                                                                                                                                                                                                                                                      Compression
                                                                                                                                                                                                                                                                                                                                                                                                                               H HHH
                                                                                                                                                                                                                                                                                                                                                                                                                                GGGG
                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Commercials




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 0.5
                                                                                                                                                                                                                                                                                                                                                                                   4 44                                          GHGHGHGH
                                                                                                                                                                                                                                                         BI BI BI BI                                                                                                                 12121212
                                                                                                            bb bb
                                                                                                    56CDab‚56CDab‚56CDab‚56CDab‚
                                                                                                            aaaa
                                                                                                             ‚‚‚‚                                                                                                                                78BI78BI78BI78BI
                                                                                                                                                                                                                                                                                                                                c4  GHGHG                         39@439@439@4 12EFWXY`12EFWXY`12EFWXY`12EFWXY`
                                                                                                                                                                                                                                                                                                                                                                                                                                   `EFWX `EFWX `EFWX `EFWX
                                                                                                                                                                                                                                                                                                                                                                                                                                        EEEE
                                                                                                                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                              BIBI       78BI78BI78BI78BI
                                                                                   6abstw‚ƒ„56abst6abstw‚ƒ„56abst6abstw‚ƒ„56abst6abstw‚ƒ„56abst
                                              56ab56abab56CDab‚56CDab‚56CDab‚
                                                                                   5555                                                                                                                                                                                                                                                                                                                                              FFFF
                                                                                                                                                                                                                                                                                                                                         349@39@44349@39@44349@349@4 12EFWXY`12EFGHWX12EFWXY`12EFGHWX1EWY1EGW
                                                                                                                                                                                                                                                                                                                                         c34349349349 2222
                                                                                                                                                                                                                                                                                                                                       34349@349@349@ 1111
                                                                                                               5xx 5xx 5xx 5xx
                                                                                                                                                                                                                                                                                                                                                                                                                                   XXXX
                                                                                                                                                                                                                                                                                                                                                                                                                                             XY`XY`XY`XY`
                                                                                                                                                                                                                                                                                                                                                                                                                                          WWWWWWWW
                                                                                                                                                                                                   787878BI78BI78BI78BI                                                                                                                                                d 9@9@9@9@9@9@ EFY`EFY`EFY`EFY`
                                                                                                                                                                                                                                                                                                                                                                                        34@ 34@ 34@
                                                                                                                                                                                                                                                                                                                                                                                            2E2E2E2E
                                                                                   66abst66abstabst
                                                                                   )5555                                                                                                                                                                        BIBIBIBI
                                                                                   abstwxabstwx56‚ababstwxabstwx56‚ab56‚ab                                                                                                                                 PPPP
                                                                                                                                                                                                                                                                           QQQQ
                                                                                                                      ƒƒƒƒ                                                                                                                                   A778A778A778A778
                                                                                                                                                                                                                                                                          RRRR
                                                                                                                                                                                                                                                                            TT TT
                                                                                                                  ‚ƒ„6‚ƒ„6‚ƒ„6‚ƒ„6                                                                                                                                                                                                   d34d 444 2222
                                                                                                                                                                                                                                                                                                                                         34333 1111
                                                                                                                                                                                                                                                                                                                                           349@9@349@9@349@9@  12EFWXY`E`12EFWXY`E`1EWYE
                                                                                   stwx‚ƒ„stwx‚ƒ„stwx‚ƒ„stwx‚ƒ„                                                                                                                    7777
                     )056abƒ„56abƒ„6ƒ556abƒ„56abstwx‚ƒ„56abstwx‚ƒ„6ƒ556abstwx‚ƒ„x56abstwx‚ƒ„56abstwx‚ƒ„6ƒ556abstwx‚ƒ„x56abstwx‚ƒ„56abstwx‚ƒ„6ƒ556abstwx‚ƒ„x                                                                                                                                                                                                               hhh X1X1X1X1
                                                                                   bbbb
                                                                                   
                                                                                                                                                                                                                                                                     APAPAPAP
                                                                                   56a56a56a56a                 
                                                                                                                                                                                                                                                                         RSTUVRSTUVRSTUVRSTUV
                                                                                                                       „„„„
                                                                                                                                                                                                                                        8888
                                                                                                                                                                                                                                              8888                     QQQQ
                                                                                                                                                                                                                                                                                    QRQRQRQR                                                                                           ipipip 
                                                                                                                                                                                                                                                                                                                                                                                              444
                                                                                                                                                                                                                                                                                                                                                                                               pef pef pef
                                                                                                                                                                                                                                                                                                                                                                                                  333
                                                                                   56xƒabstw‚56xƒabstw‚56xƒabstw‚56xƒabstw‚                                                                                                                                                                                                     4444 12E12E12E12E
                                                                                                                                                                                        78UV77878APQRSTUV78APQRSTUVA78BI78APQRSTUV78APQRSTUV78ABI                                                                                33349@ef33349@ef349@ef `12EFWXY`12WX```12EFWXY`12WX1EWY1W
                                                                                                                                                                                                                                                                                                                                                                                                   ipqrefgipqrefgipqrefg W W W W
                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                                       4 hhh ````
                                                                                                                         yyyy
                                                                                                                          
                                                                                                                                                                                                                                                                                                                                        3efgefgefg FWXYFWXYFWXYFWXY
                                                                                                                                                                                                                                              
                                                                                                                          aaaa
                                                                                                                           
                                                                                                                        €€€€
                                                                                                                           ssss
                                                                                                                            
                                                                                                                            ‚„‚„‚„‚„
                                                                                   
                                                                                   ƒ„ƒ„ƒ„ƒ„                                                                                                                                                                                                                                         qrhipqrhipqrhipqr FWXYFWXYFWXYFWXY
                                                                                                                                                                                                                                       78QRSTAPUV78QRSTAPUV78APQRSTUV7PQRT78QRSTAPUV78QRSTAPUV78APQRSTUV7PQRT
                                                       56ƒb„ssss                                                                                                                                                                                                                                                                                                                                                                                                         minimizing the mean squared error over an analysis frame.
                                                                                                                                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                                  3fff 
                                                                                                                                                                                                                                              
                                                                                   aaaa                                €€€€
                                                                                                                               xxxx
                                                                                                                                   €‚€‚€‚€‚
                                                                                                                                     xxxx                                                                                                                                                                                                                                                           hhh
                                                                                   tttt
                                                                                                                               uuuu                                                                                                                                                                                                   34e34efghip34e34e34efghip34efghip  Y`12EFWXY`Y`Y`Y`12EFWXY`1EWY !quot;
                                                                                                                                                                                                                                                                                                                                                   refghiqr34pefghiqr34pefghiqr34p 12WXY12EFWXY`12WXY12WXY12WXY12EFWXY`1EWY
                                                                                   bbbb
                                                                                   'bbbb                                 yyyy
                                                                                                                                 vwvwvwvw
                                                                                                                                       yyyy
                                                                                                                                        wwww                                                                                                                                                                                               4ggg 12E12E12E12E                             ggg
                                                                                                                                                                                                      78UV78UV878QRSTAPUV78APQRSTUV8QR78QRSTAPUV78APQRSTUV8QR                                                           %34efgghhipqr34p34efgghhipqr34p34efgghhipqr34p 112WXYY`11112WXYY`1WYY
                                                                                                                                                                                                                                              78QRSTUV78QRSTUV78QRSTUV78QRSTUV                                                                                                                                                                                                                                ©
                                                                                   abuvwxy€abuvwxy€56xƒbw„abuvwxy€abuvwxy€56xƒbw„56bwxƒ„
                                                                                   „„„„
                                                                                   ‚ƒ‚ƒ‚ƒ‚ƒ                                                                                                                                                                                                                                                                            %  
                                                                                                                                                                                                                                                                                      VVVV                               34q34efghipqr34efghipqr34efghipqr 2WWXY`2WWXY`2WWXY`2WWXY`
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            ¨
                      '(abuvƒ„abuvƒ„abbuvƒ„aa„abuvwxy€‚ƒ„abbstuvwwxxyy€€‚ƒ„aassty€‚‚„abuvwxy€‚ƒ„abbstuvwwxxyy€€‚ƒ„aassty€‚‚„abuvwxy€‚ƒ„abbstuvwwxxyy€€‚ƒ„aassty€‚‚„
                                                                                   abuvwxy€‚ƒ„abuvwxy€‚ƒ„abuvwxy€‚ƒ„abuvwxy€‚ƒ„                                                                                                    78QRT78QRT78QRT78QRT
                                                                                   
                                                                                                                                                                                                                                                                                SSUVUSSUVUSSUVUSSUVU                                    $34qr34ghipqr34ghipqr34ghipqr Y`Y`Y`Y`      r rr                                                                   The linear prediction coefficients ak are determined by
                                                                                                                                                                                                                                              RUVRUVRUVRUV
                                                                                                                                                                                                                                              QQQQ
                                                                                                                                                                                                                                              8888
                                                                                                                                                                                                              78UV78UUVV78QRSSTUV78QRSTUVUV78QRSSTUV78QRSTUUVV
                                                                                   abuvwxy€‚ƒ„abuvwxy€‚ƒ„abuvwxy€‚ƒ„abuvwxy€‚ƒ„
                                                                                                                                         „„„„                                                                                                                                         QSTQSTQSTQST
                                                                                                                                                                                                                                                                                               8888
                                                                                                                                                                                                                                                                                                                           $3434efipqr343434efipqr34efipqr 12WX`12WX`1W
                                                                                                                                                                                                                                                                                                                                                    #prprpr XXXX
                                                                                                                                   ƒƒƒƒ                                                                                          7777
                                                                                                                                                                                                                                              7777                                                                         #34qriii WWWW
                                                                                                                                                                                                                                                                                                                                                    ripipip XXXX




LP Residual
                                                                                                                                                                                                                                                                                                                                                                                                                                                                 LP Residual
                                                                                                                                                                                                                                                                                                                                                       33434ghipqr34ghipqr34ghipqr Y`WXY`WY`Y`Y`WXY`WWYW
                                                                                                                                                                                                                                              8888                                                                                             qrghpqrghghpqrghpqrghgh WXYYWXYWXYWXYYY
                                                                                   uvwxy€‚ƒ„uvwxy€‚ƒ„abuvwxy€‚ƒ„uvwxy€‚ƒ„uvwxy€‚ƒ„abuvwxy€‚ƒ„abuvwxy€‚ƒ„                                                                        7777
                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                              VVVV
                                                                                                                                                                                                                                              STUSTUSTUSTU
                                                                                   
                                                                                                                                                                                                                78UV7788UV78STUV7788QQRSTUV78STUV7788QQRSTUV                                                                                 ipqrighq3ipqrighq3ipqrighq3 ````
                                                           uvƒ„abuvƒƒ„„uvwxy€‚ƒ„abuvwxy€‚ƒ„ƒ„uvwxy€‚ƒ„abuvwxy€‚ƒ„ƒ„uvwxy€‚ƒ„abuvwxy€‚ƒƒ„„                                                                                                                                                                                                    qrghq34iprrghq34iprr34ghipqrr Y`WXY`XY`Y`Y`WXY`XWY
                                                                                   uv‚ƒ„uv‚ƒ„uv‚ƒ„uv‚ƒ„                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                              VVVV
                                                                                                                                                                                                                                              TUTUTUTU                                                                                                qripqrghipqr3ipipqrghipqr3ipipqrghipqr3ip Y`Y`Y`Y`YY
                                                                                                                                                                                                                                              78S78S78S78S
                                                                                                                                                                                                                                UV78UV78STUV78STUV
                                                                                                                                               vvvv
                                                                                                                                       „„„„
                                                                                                                                           ‚ƒ‚ƒ‚ƒ‚ƒ                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                                                                                                             ipqripqripqr
                                                                                                                                                                                                                                              STUUVSTUUVSTUUVSTUUV
                                                                                   uuuu                                                  vv vv
                                                                                                                                                
                                                                      uvƒ„uvƒ„uv‚ƒ„uv‚ƒ„uv‚ƒ„uv‚ƒ„uv‚ƒ„uv‚ƒ„                                                                                                                                                                                                                                                                                                                                                                                        k¥ 1
                                                                                   „„„„
                                                                                   ‚‚‚‚
                                                                                   ƒƒƒƒ                                                                                                                                                                                                                                                                rqrqqq `` `` `` ``
                                                                                                                                                                                                                                                                                                                                                                 qqipqripqripqr YYYY
                                                                                                                                                                                                                                              STUVSTUVSTUVSTUV
                                                                                                                                                                                                                                              VVVV                                                                                                                                                                                                                                                                                                       ¢




                                                                                   „u„u„u„u
                                                                             ƒ„uƒ„vƒ„u‚ƒ„vƒ„u‚ƒ„vƒ„u‚ƒ„v
                                                                                   ƒƒƒƒ                                                                                                                                                                                                                                                                     iii Y`YYY`
                                                                                                                                                                                                                                                                                                                                                                   pqripqrpqrpqripqripqr Y`Y`Y
                                                                                                                                                                                                                                              UVUVSTUVUVUVSTUV                                                                                               qrqripqrqrqripqripqr YYY
                                                                                                                                                                                                                                     UVUVVUUVSTUVVUUVSTUVVU
                                                                                                                                                                                                                                              UVUVUVUV                                                                                                                                                                                                                                             £                                              ¡ ¢          £ ¦          ¡ ¢
                                                                                                                                                                                                                                                                                                                                                                     qrqrqripqrqqrqrqripqrqqripqrq
                                                                                                                                                                                                                                          UVUVUVUVUVUV                                    UV UV UV UV                                                                                                                                                                                         k£                                      s¢ n£          s  n£          s¢ n£      e¢ n£
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                (2)
                                                                                                                                                                                                                                                                                                                             12                                                                                                                                                                                                         §
                                                                                                                                                                                                                                                                                                                                                                       qrqrqrqrqrqrqrqr
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             ∑ ak s¢ n
                   40                                                                                                                                                                                                                                                                                                                                                                                                                                Nodes 40
                                                                                                                                                                                                                                                                                                                                                                           qrqrqrqr
                                                                                                                                                                                                                                                                                                                                                                             qrqrqrqrqrqr                                                                                                                          p
                                                                                                                                                                                                                                                                                                                                                                                       48
                                                                                                                                                                                       48
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 given by,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ple value is termed as prediction error or residual, which is
 information from the LP residual [9].
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     The difference between the actual, and predictable sam-
 propose neural network models to capture the audio-specific
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               k¥ 1
 such an information may involve non-linear processing, we
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ¢
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              £                               £¡
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              ¤¢
 the LP residual is not clear, and also since the extraction of                                                                                                                                                                                                                                                                                                                                                                                                                                                     k£                                              s  n£
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                (1)                                                   ∑ ak s¢ n
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    p
 cific set of parameters to represent the audio information in
 tions among the samples of the residual signal. Since spe-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                past p samples as,
 the audio features may be present in the higher order rela-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    If s¢ n£ is the present sample, then it is predicted by the
 acteristics are present in the LP residual. We conjecture that
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                past p samples, where p represents the order for prediction.
 the audio production system. But the excitation source char-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ysis each sample is predicted as a linear weighted sum of the
 does not contain any significant second order correlations of
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                nal using linear prediction (LP) analysis [8]. In the LP anal-
 tures through the autocorrelation functions, the LP residual
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                The first step is to extract the LP residual from the audio sig-
 Since LP analysis extracts the second order statistical fea-
                   INFORMATION IN LP RESIDUAL                                                                                                                                                                                                                                                                                                                                                                                                                                                       CLIP CLASSIFICATION
              3. AANN MODELS FOR CAPTURING AUDIO                                                                                                                                                                                                                                                                                                                                                                                                                                         2. SIGNIFICANCE OF LP RESIDUAL FOR AUDIO
AUDIO CLIP CLASSIFICATION USING LP RESIDUAL AND NEURAL NETWORKS
AUDIO CLIP CLASSIFICATION USING LP RESIDUAL AND NEURAL NETWORKS

More Related Content

What's hot

Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...IDES Editor
 
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 AudioNovel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audioinventy
 
DATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITY
DATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITYDATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITY
DATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITYcsandit
 
SCHEME OF WORK 2010
SCHEME OF WORK 2010SCHEME OF WORK 2010
SCHEME OF WORK 2010SMS
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...karthik annam
 
Scheme of work est 2011
Scheme of work est 2011Scheme of work est 2011
Scheme of work est 2011SMS
 
Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
 

What's hot (17)

1801 1805
1801 18051801 1805
1801 1805
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
 
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 AudioNovel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
 
2012APMC Conference Presentation
2012APMC Conference Presentation2012APMC Conference Presentation
2012APMC Conference Presentation
 
DATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITY
DATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITYDATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITY
DATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITY
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
SCHEME OF WORK 2010
SCHEME OF WORK 2010SCHEME OF WORK 2010
SCHEME OF WORK 2010
 
L026070074
L026070074L026070074
L026070074
 
E44082429
E44082429E44082429
E44082429
 
Ijetcas14 426
Ijetcas14 426Ijetcas14 426
Ijetcas14 426
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...
 
A2 l
A2 lA2 l
A2 l
 
Scheme of work est 2011
Scheme of work est 2011Scheme of work est 2011
Scheme of work est 2011
 
Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...
 

Viewers also liked

Anvita Dynamic Fontson Web Feb2001
Anvita Dynamic Fontson Web Feb2001Anvita Dynamic Fontson Web Feb2001
Anvita Dynamic Fontson Web Feb2001guest6e7a1b1
 
Water management questionnaire for layouts
Water management questionnaire for layoutsWater management questionnaire for layouts
Water management questionnaire for layoutsbiomeshubha
 
Krishna lilac - IUWM
Krishna lilac - IUWMKrishna lilac - IUWM
Krishna lilac - IUWMbiomeshubha
 
Ready to Move in Flats/Apartments in Noida Extension BY LandlinkeR
Ready to Move in Flats/Apartments in Noida Extension BY LandlinkeRReady to Move in Flats/Apartments in Noida Extension BY LandlinkeR
Ready to Move in Flats/Apartments in Noida Extension BY LandlinkeRLand Linker
 
Amatyc Ignite by Feldon and Tchertchian New Orleans, LA, 2015
Amatyc Ignite by Feldon and Tchertchian New Orleans, LA, 2015Amatyc Ignite by Feldon and Tchertchian New Orleans, LA, 2015
Amatyc Ignite by Feldon and Tchertchian New Orleans, LA, 2015Fred Feldon
 
Permissions to be taken to dig a Borewell
Permissions to be taken to dig a BorewellPermissions to be taken to dig a Borewell
Permissions to be taken to dig a Borewellbiomeshubha
 
Costs of Recharge Wells and other items for RWH
Costs of Recharge Wells and other items for RWHCosts of Recharge Wells and other items for RWH
Costs of Recharge Wells and other items for RWHbiomeshubha
 

Viewers also liked (11)

Anvita Dynamic Fontson Web Feb2001
Anvita Dynamic Fontson Web Feb2001Anvita Dynamic Fontson Web Feb2001
Anvita Dynamic Fontson Web Feb2001
 
Water management questionnaire for layouts
Water management questionnaire for layoutsWater management questionnaire for layouts
Water management questionnaire for layouts
 
Krishna lilac - IUWM
Krishna lilac - IUWMKrishna lilac - IUWM
Krishna lilac - IUWM
 
Ready to Move in Flats/Apartments in Noida Extension BY LandlinkeR
Ready to Move in Flats/Apartments in Noida Extension BY LandlinkeRReady to Move in Flats/Apartments in Noida Extension BY LandlinkeR
Ready to Move in Flats/Apartments in Noida Extension BY LandlinkeR
 
Focus Group
Focus GroupFocus Group
Focus Group
 
Mundial
MundialMundial
Mundial
 
Amatyc Ignite by Feldon and Tchertchian New Orleans, LA, 2015
Amatyc Ignite by Feldon and Tchertchian New Orleans, LA, 2015Amatyc Ignite by Feldon and Tchertchian New Orleans, LA, 2015
Amatyc Ignite by Feldon and Tchertchian New Orleans, LA, 2015
 
Resume
ResumeResume
Resume
 
Permissions to be taken to dig a Borewell
Permissions to be taken to dig a BorewellPermissions to be taken to dig a Borewell
Permissions to be taken to dig a Borewell
 
Costs of Recharge Wells and other items for RWH
Costs of Recharge Wells and other items for RWHCosts of Recharge Wells and other items for RWH
Costs of Recharge Wells and other items for RWH
 
Eletro2
Eletro2Eletro2
Eletro2
 

Similar to AUDIO CLIP CLASSIFICATION USING LP RESIDUAL AND NEURAL NETWORKS

IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET Journal
 
Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Editor IJARCET
 
A computationally efficient learning model to classify audio signal attributes
A computationally efficient learning model to classify audio  signal attributesA computationally efficient learning model to classify audio  signal attributes
A computationally efficient learning model to classify audio signal attributesIJECEIAES
 
Audio Features Based Steganography Detection in WAV File
Audio Features Based Steganography Detection in WAV FileAudio Features Based Steganography Detection in WAV File
Audio Features Based Steganography Detection in WAV Fileijtsrd
 
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identificationIJCSEA Journal
 
Anvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 PresentationAnvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 Presentationguest6e7a1b1
 
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET Journal
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET Journal
 
Literature Survey for Music Genre Classification Using Neural Network
Literature Survey for Music Genre Classification Using Neural NetworkLiterature Survey for Music Genre Classification Using Neural Network
Literature Survey for Music Genre Classification Using Neural NetworkIRJET Journal
 
Speech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineSpeech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineIJECEIAES
 
Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...Venkat Projects
 
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVMIRJET Journal
 
Recognition of music genres using deep learning.
Recognition of music genres using deep learning.Recognition of music genres using deep learning.
Recognition of music genres using deep learning.IRJET Journal
 
Automatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningAutomatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningIRJET Journal
 
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESCONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESAM Publications
 
Anvita Wisp 2007 Presentation
Anvita Wisp 2007 PresentationAnvita Wisp 2007 Presentation
Anvita Wisp 2007 Presentationguest6e7a1b1
 

Similar to AUDIO CLIP CLASSIFICATION USING LP RESIDUAL AND NEURAL NETWORKS (20)

IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural Network
 
Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351
 
A computationally efficient learning model to classify audio signal attributes
A computationally efficient learning model to classify audio  signal attributesA computationally efficient learning model to classify audio  signal attributes
A computationally efficient learning model to classify audio signal attributes
 
Audio Features Based Steganography Detection in WAV File
Audio Features Based Steganography Detection in WAV FileAudio Features Based Steganography Detection in WAV File
Audio Features Based Steganography Detection in WAV File
 
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
 
Servey
ServeyServey
Servey
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identification
 
Anvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 PresentationAnvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 Presentation
 
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound Recognition
 
Literature Survey for Music Genre Classification Using Neural Network
Literature Survey for Music Genre Classification Using Neural NetworkLiterature Survey for Music Genre Classification Using Neural Network
Literature Survey for Music Genre Classification Using Neural Network
 
Speech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineSpeech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machine
 
Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...
 
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVM
 
Recognition of music genres using deep learning.
Recognition of music genres using deep learning.Recognition of music genres using deep learning.
Recognition of music genres using deep learning.
 
Automatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningAutomatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep Learning
 
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESCONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
 
Anvita Wisp 2007 Presentation
Anvita Wisp 2007 PresentationAnvita Wisp 2007 Presentation
Anvita Wisp 2007 Presentation
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

AUDIO CLIP CLASSIFICATION USING LP RESIDUAL AND NEURAL NETWORKS

  • 1. AUDIO CLIP CLASSIFICATION USING LP RESIDUAL AND NEURAL NETWORKS MODELS Anvita Bajpai and B. Yegnanarayana Department of Computer Science and Engineering Indian Institute of Technology Madras, Chennai- 600 036, India anvita, yegna¡ @cs.iitm.ernet.in   ABSTRACT et al. classify audio into five categories of television (TV) programs using spectral features [4]. Features based on In this paper, we demonstrate the presence of audio-specific amplitude, zero-crossing, bandwidth, band energy in the information in the linear prediction (LP) residual, obtained subbands, spectrum and periodicity properties, along with after removing the predictable part of the signal. We empha- hidden Markov model (HMM) for classification are explored size the importance of information present in the LP resid- for audio indexing applications in [5]. But it was shown ual of audio signals, which if added to the spectral informa- that perceptually significant information of audio data is tion, can give a better performing system. Since it is dif- present in the form of sequence of events, which can be ficult to extract information from the residual using known obtained after removing the predictable part in the audio signal processing algorithms, neural networks (NN) models data. Perceptually, there are some discriminating features are proposed. In this paper, autoassociative neural networks present in the residual which could help in various audio (AANN) models are used to capture the audio-specific infor- indexing tasks. The challenge lies in developing algorithms mation from the LP residual of signals. Multilayer feedfor- to capture these perceptually significant features from the ward neural networks (MLFFNN) models or multilayer per- residual, as it is difficult to extract information using known ceptron (MLP) are used to classify the audio data using the signal processing algorithms. audio-specific information captured by AANN models. Objective of this study is to explore the features in addition to the features that are currently used to improve the performance of an audio indexing system. In particular, fea- 1. INTRODUCTION tures not used explicitly or implicitly in the current system In this era of information technology, the data that we use is are being investigated. Many interesting and perceptually mostly in the form of audio, video and multimedia. The data, important features are present in the residual signal obtained once recorded and stored digitally, conveys no significant after removing the predictable part. Thus the main objective information in order to organize and use it. The volume of of this study is to explore the features present in the linear data is large, and is increasing daily. Therefore it is difficult prediction (LP) residual for audio clip classification task. to organize the data manually. We need to have an automatic The reason for considering the residual data for study is that method to index the data, for further search and retrieval. the residual part of the signal is generally subject to less Audio plays an important role in classifying multimedia degradation as compared to the system part [6]. The residual data as it contains significant information, and is easier to data contains higher order correlation among samples. As process when compared to video data. For these reasons, known signal processing and statistical techniques are not commercial products of audio retrieval are emerging, e.g., suitable to capture this correlation, an autoassociative neural (http://www.musclefish.com) [1]. Content-based classifica- networks (AANN) model is proposed to capture these higher tion of data into different categories is one important step for order correlations among samples of the residual of the building an audio indexing system. audio data. AANN have already been studied to capture information from the residual data for tasks such as speaker In the traditional approach of audio indexing, audio recognition [7]. Further, multilayer feedforward neural is first converted to text, and then it is given to text-based networks (MLFFNN) models or multilayer perceptron search engines [2]. Drawbacks of this approach are: a£ (MLP) are proposed for decision making task using the ¢ not having accurate speech recognizer, b£ not using speech audio-specific information captured by AANN models. ¢ information present in form of prosody, and c£ not appli- ¢ cable for non-speech data like music. An elaborate audio content categorization is proposed by Wold et al. [1], which The paper is organized as follows: Section 2 discusses divides the audio content into sixteen groups. The authors extraction of the LP residual from audio data. Section 3 dis- have used mean, variance and autocorrelation of loudness, cusses AANN models for capturing features in LP residual pitch and bandwidth as audio features and a nearest neigh- for the audio clip classification. Section 4 discusses MLP borhood classifier for the task. The authors quote 81% models for decision making. Section 5 presents the work- classification accuracy for an audio database with 400 sound flow of the system. The results of the experimental studies files. Guo et al. [3] have used features consisting of total are presented in Section 6. Various issues addressed in this power, subband energies, bandwidth, pitch and MFCCs, and paper and possible directions for the future study are summa- support vector machines (SVMs) for classification. Wang rized in Section 7. 2299
  • 2. 2300 which is non-speech and non-music). There are significant formation from the LP residual. selected for the study are speech, music and noise (audio section, we discuss methods to capture the audio-specific in- in audio is used for classification task. The components information could be perceived while listening. In the next components. The knowledge of these components present cannot be observed in the residual signal, the audio-specific while advertisement audio has music and speech as major shown in Figure 1. For some cases even if the difference components. For example, news audio has clean speech, within it. The residual signal of the five different classes are In an audio category, there could be one or more audio the most difficult class to study, as it has many variations more in the case of football audio. Advertisement audio is AANN Models speech and other background sounds, like noise. Noise is 3.1 Use of Audio Component Knowledge for Building is a part of cartoon audio. Cricket and football have casual gory differs from the news speech in terms of prosody. Music information present in LP residual signal. vironment) has clean speech, while speech for cartoon cate- for building AANN models for capturing the audio-specific variations among them. News audio (in closed studio en- section, we discuss the use of audio component knowledge The five classes considered for the present study show on the structure of the network [7]. In the following sub- The performance of the network does not critically depend ing to five different categories. of nodes in hidden layers have been decided experimentally. Figure 1: LP residual for the segments of audio clips belong- sidered for study), and may vary for other audio. The number speech knowledge (which is suitable for the categories con- No. of Residual Samples ual signal to be considered for study has been derived using 400 350 300 250 200 150 100 50 0 −0.5 and output layer is 40. However, this the duration of resid- sampling frequency and hence the number of nodes in input 0 the residual signal is chosen. The signal is recorded at 8 kHz News 0.5 we are interested in the source features, the 5 ms duration of 400 350 300 250 200 150 100 50 0 −0.5 A tanh is used as the non-linear activation function. Since, 40L, where L refers to linear units, and N to non-linear units. 0 ture of the network used in our study is 40L 48N 12N 40N Football 0.5 layer AANN model as shown in Figure 2 is used. The struc- 400 350 300 250 200 150 100 50 0 −0.5 the audio information present in the LP residual signal, a five specific to the speaker in the LP residual [7]. For capturing 0 models have been shown to capture excitation source features Cricket 0.5 forming an identity mapping of the input space [10]. AANN 400 350 300 250 200 150 100 50 0 −0.5 AANN models are feedforward neural networks per- 0 Residual Amplitude for Audio Clips Cartoon audio-specific information. 0.5 400 350 300 250 200 150 100 50 0 Figure 2: Structure of AANN model used for capturing −0.5 0 Layer Output Layer Input Layer Compression H HHH GGGG Commercials 0.5 4 44 GHGHGHGH BI BI BI BI 12121212 bb bb 56CDab‚56CDab‚56CDab‚56CDab‚ aaaa ‚‚‚‚ 78BI78BI78BI78BI c4 GHGHG 39@439@439@4 12EFWXY`12EFWXY`12EFWXY`12EFWXY` `EFWX `EFWX `EFWX `EFWX EEEE BIBI 78BI78BI78BI78BI 6abstw‚ƒ„56abst6abstw‚ƒ„56abst6abstw‚ƒ„56abst6abstw‚ƒ„56abst 56ab56abab56CDab‚56CDab‚56CDab‚ 5555 FFFF 349@39@44349@39@44349@349@4 12EFWXY`12EFGHWX12EFWXY`12EFGHWX1EWY1EGW c34349349349 2222 34349@349@349@ 1111 5xx 5xx 5xx 5xx XXXX XY`XY`XY`XY` WWWWWWWW 787878BI78BI78BI78BI d 9@9@9@9@9@9@ EFY`EFY`EFY`EFY` 34@ 34@ 34@ 2E2E2E2E 66abst66abstabst )5555 BIBIBIBI abstwxabstwx56‚ababstwxabstwx56‚ab56‚ab PPPP QQQQ ƒƒƒƒ A778A778A778A778 RRRR TT TT ‚ƒ„6‚ƒ„6‚ƒ„6‚ƒ„6 d34d 444 2222 34333 1111 349@9@349@9@349@9@ 12EFWXY`E`12EFWXY`E`1EWYE stwx‚ƒ„stwx‚ƒ„stwx‚ƒ„stwx‚ƒ„ 7777 )056abƒ„56abƒ„6ƒ556abƒ„56abstwx‚ƒ„56abstwx‚ƒ„6ƒ556abstwx‚ƒ„x56abstwx‚ƒ„56abstwx‚ƒ„6ƒ556abstwx‚ƒ„x56abstwx‚ƒ„56abstwx‚ƒ„6ƒ556abstwx‚ƒ„x hhh X1X1X1X1 bbbb APAPAPAP 56a56a56a56a RSTUVRSTUVRSTUVRSTUV „„„„ 8888 8888 QQQQ QRQRQRQR ipipip 444 pef pef pef 333 56xƒabstw‚56xƒabstw‚56xƒabstw‚56xƒabstw‚ 4444 12E12E12E12E 78UV77878APQRSTUV78APQRSTUVA78BI78APQRSTUV78APQRSTUV78ABI 33349@ef33349@ef349@ef `12EFWXY`12WX```12EFWXY`12WX1EWY1W ipqrefgipqrefgipqrefg W W W W 4 hhh ```` yyyy 3efgefgefg FWXYFWXYFWXYFWXY aaaa €€€€ ssss ‚„‚„‚„‚„ ƒ„ƒ„ƒ„ƒ„ qrhipqrhipqrhipqr FWXYFWXYFWXYFWXY 78QRSTAPUV78QRSTAPUV78APQRSTUV7PQRT78QRSTAPUV78QRSTAPUV78APQRSTUV7PQRT 56ƒb„ssss minimizing the mean squared error over an analysis frame. 3fff aaaa €€€€ xxxx €‚€‚€‚€‚ xxxx hhh tttt uuuu 34e34efghip34e34e34efghip34efghip Y`12EFWXY`Y`Y`Y`12EFWXY`1EWY !quot; refghiqr34pefghiqr34pefghiqr34p 12WXY12EFWXY`12WXY12WXY12WXY12EFWXY`1EWY bbbb 'bbbb yyyy vwvwvwvw yyyy wwww 4ggg 12E12E12E12E ggg 78UV78UV878QRSTAPUV78APQRSTUV8QR78QRSTAPUV78APQRSTUV8QR %34efgghhipqr34p34efgghhipqr34p34efgghhipqr34p 112WXYY`11112WXYY`1WYY 78QRSTUV78QRSTUV78QRSTUV78QRSTUV © abuvwxy€abuvwxy€56xƒbw„abuvwxy€abuvwxy€56xƒbw„56bwxƒ„ „„„„ ‚ƒ‚ƒ‚ƒ‚ƒ % VVVV 34q34efghipqr34efghipqr34efghipqr 2WWXY`2WWXY`2WWXY`2WWXY` ¨ '(abuvƒ„abuvƒ„abbuvƒ„aa„abuvwxy€‚ƒ„abbstuvwwxxyy€€‚ƒ„aassty€‚‚„abuvwxy€‚ƒ„abbstuvwwxxyy€€‚ƒ„aassty€‚‚„abuvwxy€‚ƒ„abbstuvwwxxyy€€‚ƒ„aassty€‚‚„ abuvwxy€‚ƒ„abuvwxy€‚ƒ„abuvwxy€‚ƒ„abuvwxy€‚ƒ„ 78QRT78QRT78QRT78QRT SSUVUSSUVUSSUVUSSUVU $34qr34ghipqr34ghipqr34ghipqr Y`Y`Y`Y` r rr The linear prediction coefficients ak are determined by RUVRUVRUVRUV QQQQ 8888 78UV78UUVV78QRSSTUV78QRSTUVUV78QRSSTUV78QRSTUUVV abuvwxy€‚ƒ„abuvwxy€‚ƒ„abuvwxy€‚ƒ„abuvwxy€‚ƒ„ „„„„ QSTQSTQSTQST 8888 $3434efipqr343434efipqr34efipqr 12WX`12WX`1W #prprpr XXXX ƒƒƒƒ 7777 7777 #34qriii WWWW ripipip XXXX LP Residual LP Residual 33434ghipqr34ghipqr34ghipqr Y`WXY`WY`Y`Y`WXY`WWYW 8888 qrghpqrghghpqrghpqrghgh WXYYWXYWXYWXYYY uvwxy€‚ƒ„uvwxy€‚ƒ„abuvwxy€‚ƒ„uvwxy€‚ƒ„uvwxy€‚ƒ„abuvwxy€‚ƒ„abuvwxy€‚ƒ„ 7777 VVVV STUSTUSTUSTU  78UV7788UV78STUV7788QQRSTUV78STUV7788QQRSTUV ipqrighq3ipqrighq3ipqrighq3 ```` uvƒ„abuvƒƒ„„uvwxy€‚ƒ„abuvwxy€‚ƒ„ƒ„uvwxy€‚ƒ„abuvwxy€‚ƒ„ƒ„uvwxy€‚ƒ„abuvwxy€‚ƒƒ„„ qrghq34iprrghq34iprr34ghipqrr Y`WXY`XY`Y`Y`WXY`XWY uv‚ƒ„uv‚ƒ„uv‚ƒ„uv‚ƒ„ VVVV TUTUTUTU qripqrghipqr3ipipqrghipqr3ipipqrghipqr3ip Y`Y`Y`Y`YY 78S78S78S78S UV78UV78STUV78STUV vvvv „„„„ ‚ƒ‚ƒ‚ƒ‚ƒ ipqripqripqr STUUVSTUUVSTUUVSTUUV uuuu vv vv uvƒ„uvƒ„uv‚ƒ„uv‚ƒ„uv‚ƒ„uv‚ƒ„uv‚ƒ„uv‚ƒ„ k¥ 1 „„„„ ‚‚‚‚ ƒƒƒƒ rqrqqq `` `` `` `` qqipqripqripqr YYYY STUVSTUVSTUVSTUV VVVV ¢ „u„u„u„u ƒ„uƒ„vƒ„u‚ƒ„vƒ„u‚ƒ„vƒ„u‚ƒ„v ƒƒƒƒ iii Y`YYY` pqripqrpqrpqripqripqr Y`Y`Y UVUVSTUVUVUVSTUV qrqripqrqrqripqripqr YYY UVUVVUUVSTUVVUUVSTUVVU UVUVUVUV £ ¡ ¢ £ ¦ ¡ ¢ qrqrqripqrqqrqrqripqrqqripqrq UVUVUVUVUVUV UV UV UV UV k£ s¢ n£ s  n£ s¢ n£ e¢ n£ (2) 12 § qrqrqrqrqrqrqrqr ∑ ak s¢ n 40 Nodes 40 qrqrqrqr qrqrqrqrqrqr p 48 48 given by, ple value is termed as prediction error or residual, which is information from the LP residual [9]. The difference between the actual, and predictable sam- propose neural network models to capture the audio-specific k¥ 1 such an information may involve non-linear processing, we ¢ £ £¡ ¤¢ the LP residual is not clear, and also since the extraction of k£ s  n£ (1) ∑ ak s¢ n p cific set of parameters to represent the audio information in tions among the samples of the residual signal. Since spe- past p samples as, the audio features may be present in the higher order rela- If s¢ n£ is the present sample, then it is predicted by the acteristics are present in the LP residual. We conjecture that past p samples, where p represents the order for prediction. the audio production system. But the excitation source char- ysis each sample is predicted as a linear weighted sum of the does not contain any significant second order correlations of nal using linear prediction (LP) analysis [8]. In the LP anal- tures through the autocorrelation functions, the LP residual The first step is to extract the LP residual from the audio sig- Since LP analysis extracts the second order statistical fea- INFORMATION IN LP RESIDUAL CLIP CLASSIFICATION 3. AANN MODELS FOR CAPTURING AUDIO 2. SIGNIFICANCE OF LP RESIDUAL FOR AUDIO