• Save
Anvita Audio Classification Presentation
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Anvita Audio Classification Presentation

on

  • 1,023 views

 

Statistics

Views

Total Views
1,023
Views on SlideShare
1,023
Embed Views
0

Actions

Likes
2
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Anvita Audio Classification Presentation Presentation Transcript

  • 1. Audio Clip Classification Anvita Bajpai anvita@mailcity.com
  • 2. Source: http://www.hindu.com/thehindu/seta/2002/01/10/stories/2002011000080300.htm
  • 3. Exploding information One hour of TV broadcast across the world is 100 Petabyte. ● Source: http://www.sims.berkeley.edu/research/projects/how-much-info/summary.html#tv
  • 4. Audio indexing Reason of choosing audio data for study ● Easier to process – Contains significant information – Indexing – method of organizing data for further ● search and retrieval Example – book indexing – Audio Indexing – indexing non-text data using ● audio part of it
  • 5. Example of an audio indexing system Source: J. Makhoul, F. Kubala, T. Leek, D. Liu, L. Nguyen, R. Schwartz, and A. Srivastava. “Speech and language technologies for audio indexing and retrieval”, in Proc. of the IEEE, 88(8), pp. 1338-1353, 2000.
  • 6. More examples of audio indexing tasks Spoken document retrieval ● Speaker identification ● Language identification ● Music classification ● Music/speech discrimination ● Audio classification ● An important step in building an audio indexing system. –
  • 7. Levels of information in audio signal Subsegmental information ● Related to excitation source characteristics – Segmental information ● Related to system / physiological characteristics – Suprasegmental information ● Related to behavioural characteristics of audio –
  • 8. Audio clip classification Closed set problem ● To classify a given audio clip in one of the ● following predefined categories Advertisement – Cartoon – Cricket – Football – News –
  • 9. Issues in audio clip classification Feature extraction ● Effective representation of data to capture all – significant properties of audio for the task Robust under various conditions – Classification ● Formulation of a distance measure and rule/models – Training a models for the task ● Testing – actual classification task ● Combining evidences from different systems ●
  • 10. Missing component in existing approaches and it's importance Features derived based on spectral analysis ● Carry significant properties of audio data at segmental level – Miss information present at subsegmental, suprasegmental level – Perceptually significant information in linear prediction ● (LP) residual of signal Complimentary in nature to the spectral information – Subsegmental and suprasegmental information not being used – in current systems
  • 11. Presence of audio-specific Residual Original information in LP residual Aa_res.wav Aa1.wav Aa1.wav
  • 12. Extracting audio-specific information from LP residual LP residual – May contain higher order correlation among samples ● It is difficult to extract it using standard signal processing and ● statistical techniques Hence proposed autoassociative neural networks – (AANN) models to capture information from residual Used to capture features ● for speaker recognition task Structure of network ● 40L 48N 12N 48N 40L –
  • 13. Use of audio component knowledge Audio category ● Composed of one or more audio components – Audio component ● Specific to an audio category – Six components chosen for study ● Music – Speech - Conversational, Cartoon, Clean – Noise - Football, Cricket –
  • 14. Training phase of AANN models Trained one AANN model for each of six ● components Models trained ● for 2000 epochs AANN training error curve
  • 15. Testing phase (confidence scores output of 6 AANN models for a news test clip) a) for a segment of the clip, (b) expended version of the same. Duration of total test clip is 10 sec
  • 16. Work flow diagram (of 6 components) MLP – Multilayer perceptron
  • 17. MLP for decision making task MLP for capturing audio-specific information ● captured by AANN, as it is Suitable for pattern recognition tasks – Have ability to form complex decision surface by – using discriminating learning algorithms Structure of MLP used - 6L 24N 12N 5N ●
  • 18. Confidence scores output of 6 component AANN models Contd... 24 12 Nodes 6 5 M A S1 C Audio Category S2 K S3 F N1 N N2 OP layer IP layer Hidden layers
  • 19. Classification results Audio class % of clips correctly classified DB1 DB2 Advertisement 83.00% 43.50% Cartoon 88.00% 45.50% Cricket 86.00% 38.50% Football 90.50% 75.50% News 85.50% 63.30% Average 86.60% 53.26% DB1 – Data collected from single TV channel, contains 200 clips, 40 of each category DB2 – Data collected across all broadcasted channels, contains 1659 clips, Adv. – 226, Cartoon – 208, Cricket – 318, Football – 600, News – 306
  • 20. Classification results for spectral 1 features-based system Audio class % of clips correctly classified Spectral features-based system LP residual-based system DB1 DB2 DB1 DB2 Advertisement 85.00% 65.00% 83.00% 43.50% Cartoon 90.00% 75.00% 88.00% 45.50% Cricket 90.00% 65.00% 86.00% 38.50% Football 92.50% 40.00% 90.50% 75.60% News 87.50% 65.30% 85.50% 63.30% Average 89.00% 62.06% 86.60% 53.26% Ref. [1] Gaurav Aggarwal, Features for Audio Indexing, M Tech report, CSE Deptt, IIT Madras, Apr. 2002
  • 21. Classification results from source, spectral features-based systems A System 1 System 2 A – All test audio clips (DB2) System 1 – clips recognised using spectral features-based system System 2 – clips recognised using excitation source (LP residual) based system
  • 22. Results of combined (subsegmental and segmental) system for DB2 Audio class % of clips correctly classified in systems Spectral LP residual Abstract level Rank+measurement level Based based Combination Combination Advertisement 65.00% 43.50% 83.00% 92.47% Cartoon 75.00% 45.50% 92.00% 98.55% Cricket 65.00% 38.50% 87.50% 88.67% Football 40.00% 75.60% 87.00% 91.16% News 65.30% 63.30% 86.30% 95.10% Average 62.06% 53.26% 87.25% 93.18%
  • 23. uprasegmental information in Hilbert nvelope of LP residual of audio signal
  • 24. Suprasegmental information in LP residual for audio clip classification Autocorrelation samples of Hilbert envelope of LP residual for 5 audio classes
  • 25. Statistics of autocorrelation sequence Correction – here we have statistics of autocorrelation sequence peaks of HE (not LP residual)
  • 26. Statistics of autocorrelation sequence
  • 27. Scope of future work Extending the framework for other audio ● indexing applications Exploring methods to add suprasegmental ● information to the combined system (though far away..) Building a multimedia ● indexing system
  • 28. Summary and conclusions Need to organize audio data because of its large volume and ● need in real-life applications Presence of audio specific information in LP residual ● AANN model's ability to capture subsegmental information ● from residual for the task Use of MLP for decision making using the information ● captured by AANN Complementary nature of source information to the system ● information Presence of audio-specific suprasegmental information in LP ● residual
  • 29. Major contributions Extraction of audio-specific information from LP ● residual using NN models Showing the complementary nature of source and ● system information for the audio clip classification task Showing the presence of audio-specific ● suprasegmental information in LP residual
  • 30. References T. Zhang and C.-C. J. Kuo, quot;Content-based classification and retrieval of audio,quot; in Conference on 1. Advanced Signal Processing Algorithms, Architectures, and Implementations VIII, San Diego, California, July 1998, vol. 3461 of Proc. of SPIE. J. Makhoul, F. Kubala, T. Leek, D. Liu, L. Nguyen, R. Schwartz, and A. Srivastava. “Speech and 2. language technologies for audio indexing and retrieval”, in Proc. of the IEEE, 88(8), pp. 1338-1353, 2000. Y. Wang, Z. Liu, and J. Huang. “Multimedia Content Analysis using Audio and Visual Clues”, 3. IEEE SP Magazine, 17(6), Nov. 2000. M.A. Kramer, quot;Nonlinear principal component analysis using autoassociative neural networks,quot; 4. AIChE Journal, vol. 37, pp. 233-243, Feb. 1991. J. Makhoul, quot;Linear prediction: A tutorial review,quot; in Proc. IEEE, vol. 63, pp. 561--580, 1975. 5. B. Yegnanarayana, S.R.M. Prasanna, and K.S. Rao, “Speech enhancement using excitation source 6. information,'' in Proc. Int. Conf. Acoust., Speech, Signal Processing, Orlando, FL, USA, May 2002. S.R.M. Prasanna, Ch.S. Gupta, and B. Yegnanarayana, “Autoassociative neural network models for 7. speaker verification using source features,'' in Proc. Sixth Int. Conf. Cognitive Neural Systems, Boston University, Boston, USA, May-June 2002. B. Yegnanarayana, Artificial Neural Networks, Prentice Hall of India, New Delhi, 1999. 8.
  • 31. Related publications 1. Anvita Bajpai and B. Yegnanarayana, “Audio Clip Classification using LP Residual and Neural Networks Models”, European Signal and Image Processing Conference (EUSIPCO-2004), Vienna, Austria, 6-10 September 2004 2. Anvita Bajpai and B. Yegnanarayana, “Exploring Features for Audio Indexing using LP Residual and AANN Models”, accepted for The 17th International FLAIRS Conference (FLAIRS - 2004), Miami Beach, Florida, 17-19 May 2004. 3. Anvita Bajpai and B. Yegnanarayana, “Exploring Features for Audio Clip Classification using LP Residual and Neural Networks Models”, International Conference on Intelligent Signal and Image Processing (ICISIP- 2004), Chennai, India, 4-7 January 2004 4. Gaurav Aggarwal, Anvita Bajpai and B. Yegnanarayana, “Exploring Features for Audio Indexing”, in Indian Research Scholar Seminar (IRIS- 2002), Indian Institute of Science, Bangalore, India, March 2002
  • 32. Following are extra slides not part of main presentation
  • 33. Effect of # of epochs used for AANN training Confidence scores output of 6 AANN models for a news test clip
  • 34. Even well-trained humans don't always react the ● way they were trained. Source: www.computer.org/computer/homepage/ – 0103/random/r1014.pdf, by Bob Colwell
  • 35. Classification of audio using spectral features •Extraction of features - based on –Volume Standard deviation and Dynamic range of volume, Volume ● undulation, 4Hz modulation energy –Zero Crossing Rate Standard deviation of ZCR, Silence-nonsilence ratio ● –Pitch Pitch contour, Pitch standard deviation, Similar pitch ratio, Pitch- ● nonpitch ratio –Spectrum Frequency centroid, Bandwidth, Ratio of energy in various frequency ● sub-bands
  • 36. Features for Categorization of Audio Clips (4Hz modulation energy) Cricket Football News
  • 37. Features for Categorization of Audio Clips (Similar Pitch Ratio) . (Contd..) Cricket Football News
  • 38. Importance of Task Dependent Feature (Standard deviation of ZCR) Speaker 1 Speaker 2 Music