1) The document proposes using linear prediction (LP) residual and neural networks to classify audio clips. LP residual captures audio-specific information not present in spectral features alone.
2) Autoassociative neural networks (AANN) are used to capture information from the LP residual, which is difficult to extract using signal processing. Multilayer perceptrons (MLP) then classify the audio using AANN-extracted features.
3) The approach is tested on classifying clips into 5 categories: speech, music, noise, cartoon, and advertisements. Results show LP residual contains discriminative features that improve audio classification compared to spectral features alone.
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
AUDIO CLIP CLASSIFICATION USING LP RESIDUAL AND NEURAL NETWORKS
1. AUDIO CLIP CLASSIFICATION USING LP RESIDUAL AND NEURAL NETWORKS
MODELS
Anvita Bajpai and B. Yegnanarayana
Department of Computer Science and Engineering
Indian Institute of Technology Madras, Chennai- 600 036, India
anvita, yegna¡ @cs.iitm.ernet.in
ABSTRACT et al. classify audio into five categories of television (TV)
programs using spectral features [4]. Features based on
In this paper, we demonstrate the presence of audio-specific
amplitude, zero-crossing, bandwidth, band energy in the
information in the linear prediction (LP) residual, obtained
subbands, spectrum and periodicity properties, along with
after removing the predictable part of the signal. We empha-
hidden Markov model (HMM) for classification are explored
size the importance of information present in the LP resid-
for audio indexing applications in [5]. But it was shown
ual of audio signals, which if added to the spectral informa-
that perceptually significant information of audio data is
tion, can give a better performing system. Since it is dif-
present in the form of sequence of events, which can be
ficult to extract information from the residual using known
obtained after removing the predictable part in the audio
signal processing algorithms, neural networks (NN) models
data. Perceptually, there are some discriminating features
are proposed. In this paper, autoassociative neural networks
present in the residual which could help in various audio
(AANN) models are used to capture the audio-specific infor-
indexing tasks. The challenge lies in developing algorithms
mation from the LP residual of signals. Multilayer feedfor-
to capture these perceptually significant features from the
ward neural networks (MLFFNN) models or multilayer per-
residual, as it is difficult to extract information using known
ceptron (MLP) are used to classify the audio data using the
signal processing algorithms.
audio-specific information captured by AANN models.
Objective of this study is to explore the features in
addition to the features that are currently used to improve the
performance of an audio indexing system. In particular, fea-
1. INTRODUCTION
tures not used explicitly or implicitly in the current system
In this era of information technology, the data that we use is are being investigated. Many interesting and perceptually
mostly in the form of audio, video and multimedia. The data, important features are present in the residual signal obtained
once recorded and stored digitally, conveys no significant after removing the predictable part. Thus the main objective
information in order to organize and use it. The volume of of this study is to explore the features present in the linear
data is large, and is increasing daily. Therefore it is difficult prediction (LP) residual for audio clip classification task.
to organize the data manually. We need to have an automatic The reason for considering the residual data for study is that
method to index the data, for further search and retrieval. the residual part of the signal is generally subject to less
Audio plays an important role in classifying multimedia degradation as compared to the system part [6]. The residual
data as it contains significant information, and is easier to data contains higher order correlation among samples. As
process when compared to video data. For these reasons, known signal processing and statistical techniques are not
commercial products of audio retrieval are emerging, e.g., suitable to capture this correlation, an autoassociative neural
(http://www.musclefish.com) [1]. Content-based classifica- networks (AANN) model is proposed to capture these higher
tion of data into different categories is one important step for order correlations among samples of the residual of the
building an audio indexing system. audio data. AANN have already been studied to capture
information from the residual data for tasks such as speaker
In the traditional approach of audio indexing, audio recognition [7]. Further, multilayer feedforward neural
is first converted to text, and then it is given to text-based networks (MLFFNN) models or multilayer perceptron
search engines [2]. Drawbacks of this approach are: a£ (MLP) are proposed for decision making task using the
¢
not having accurate speech recognizer, b£ not using speech audio-specific information captured by AANN models.
¢
information present in form of prosody, and c£ not appli-
¢
cable for non-speech data like music. An elaborate audio
content categorization is proposed by Wold et al. [1], which The paper is organized as follows: Section 2 discusses
divides the audio content into sixteen groups. The authors extraction of the LP residual from audio data. Section 3 dis-
have used mean, variance and autocorrelation of loudness, cusses AANN models for capturing features in LP residual
pitch and bandwidth as audio features and a nearest neigh- for the audio clip classification. Section 4 discusses MLP
borhood classifier for the task. The authors quote 81% models for decision making. Section 5 presents the work-
classification accuracy for an audio database with 400 sound flow of the system. The results of the experimental studies
files. Guo et al. [3] have used features consisting of total are presented in Section 6. Various issues addressed in this
power, subband energies, bandwidth, pitch and MFCCs, and paper and possible directions for the future study are summa-
support vector machines (SVMs) for classification. Wang rized in Section 7.
2299