Environmental Sound Recognition with
CELP-based Features
EnShuo Tsau, Seung-Hwan Kim and C.-C. Jay Kuo
Dept. of Electrical...
Outline
 Environmental Sound Recognition (ESR) and Challenge
 Conventional Audio Features
 Motivation and Proposed Solu...
Environmental Sound Recognition (ESR)
 Environmental Sound
• Restaurants, streets, parks, airport and train stations, hal...
Conventional Audio Features
 Conventional features
• MFCCs, MFCC derivatives, sub-band energy, fundamental frequency, LPC...
Motivation for CELP-based Features
Feature Set CELP MFCCs MP
Preserve Data Featuresdata
(reversible)
Easy
Implementation...
Code Excited Linear Prediction (CELP)
.
• M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-qua...
Proposed CELP Features
 240 samples/frame; 4 subframes/frame;
 Available CELP features from bit streams
• LPC(Linear Pre...
Proposed Solution
CELP: 11 dim
MFCC: 21 dim (full bank)
Classifier
(Bayesian Network)
Data Preprocessing
Normalization,
Cl...
Experimental Setup and Result
 10 classes:
• Transportation (3): airplane, motorcycle and train.
• Weather (4): rain, thu...
Comparison of Features
ClassificationAirplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Overall
PITCH 77.8 28...
Confusion Matrix of CELP Features
Classification Rate Airplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind
Airp...
Principal Component Analysis
12
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ...
Speed and Complexity
 Speed
• Feature extraction:
• Real time
• Classification
• Training:
• Depends on different classif...
14
Summary of ESR topic
 Conclusion
• A novel set of CELP-based features are proposed by exploring the CELP bit stream
in...
Conclusion
 A novel set of CELP-based features are proposed by exploring the
CELP bit stream information
 MFCCs represen...
Future Work
 Explore more features
 Speaker recognition and identification
 Longer term signature capture
16
17
Q&A
Thanks
Q & A
Upcoming SlideShare
Loading in …5
×

ISSCS2011

636 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
636
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ISSCS2011

  1. 1. Environmental Sound Recognition with CELP-based Features EnShuo Tsau, Seung-Hwan Kim and C.-C. Jay Kuo Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089-2564 http://viola.usc.edu/
  2. 2. Outline  Environmental Sound Recognition (ESR) and Challenge  Conventional Audio Features  Motivation and Proposed Solution  Experimental Results  Conclusion and Future Work 2
  3. 3. Environmental Sound Recognition (ESR)  Environmental Sound • Restaurants, streets, parks, airport and train stations, hallway, etc  Environmental Sound Recognition (ESR) • Use audio information to assist activities, • Easy storage and process • Robotic navigation and human-computer interactions • Lacking of lighting and angle of the camera problems • Other applications: surveillance, search and rescue  Challenges of ESR • Similar sounds • Multiple generating sources • Noise 3 Unlike speech and music Unstructured Difficult to build model
  4. 4. Conventional Audio Features  Conventional features • MFCCs, MFCC derivatives, sub-band energy, fundamental frequency, LPCCs, energy, zerocrossing, and spectral- centroid, bandwidth, matching pursuit (MP)  Problems with conventional features • MFCCs • Describe the shape of the overall spectrum • Only works well for structured sounds such as speech and music • Performance degrades in the presence of noise • MP • Relatively works well for both structured sound and unstructured sound • Require significant computational complexity 4
  5. 5. Motivation for CELP-based Features Feature Set CELP MFCCs MP Preserve Data Featuresdata (reversible) Easy Implementation ITU-T G.723.1 Low Complexity Real Time Compact Feature Classification Rate ESR Different Applications Speech Music Potential Side Benefits Mobile applications (5.3/6.3 kbps) Fix point  Comparison with MFCCs and MP Bit streams Information Features 5
  6. 6. Code Excited Linear Prediction (CELP) . • M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 10, pp. 937–940, 1985 6 Analysis-by-Synthesis Linear Prediction Short Term Prediction (STP): Linear Prediction Coefficients Long Term Prediction (LTP): Pitch T Residual Description
  7. 7. Proposed CELP Features  240 samples/frame; 4 subframes/frame;  Available CELP features from bit streams • LPC(Linear Prediction Coefficients) – 10 order • or LSF(Line Spectral Frequencies) • Pitch Lag • Open loop • Close loop 20≤ p ≤ 147 • GAIN of two excitation • Pitch filter (5 tap) • Fixed codebook pulse • POS • Location and sign of fixed codebook pulse CELP 7
  8. 8. Proposed Solution CELP: 11 dim MFCC: 21 dim (full bank) Classifier (Bayesian Network) Data Preprocessing Normalization, Cleaning Feature Extraction Classification 8
  9. 9. Experimental Setup and Result  10 classes: • Transportation (3): airplane, motorcycle and train. • Weather (4): rain, thunder, wind and stream • Rural Areas (2): bird, insect. • Indoor (1): restaurant.  Feature Extraction • Modifying standard code ITU-T G.723.1  Classifier • Bayesian Network 9
  10. 10. Comparison of Features ClassificationAirplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Overall PITCH 77.8 28.8 1.1 27.1 1.2 62.6 10.5 0 29.1 21.2 26.8 GAIN 66.3 8.5 44 18.5 32 8.3 8.3 2.4 15.9 11.5 22.2 LPC 85.4 96.3 99.6 89.8 99.1 63.7 98 77 74.1 98.5 88.5 CELP+GAIN 88.7 96.8 99.6 90.4 99 77.8 97.6 79.5 81.6 98.7 91 CELP+GAIN+ POS 92.6 99.5 98.7 73.7 96.3 55.9 96 30 61 93 81.3 MFCC 87.8 90 95.8 86.2 76.8 69.4 77 43.2 86.9 100 82.5 CELP 88.4 96.8 99.6 90.4 99 77.9 97.7 78.8 81.3 98.7 91.2 CELP+MFCC 92.3 97.7 99.5 95.5 99 87.5 98.7 85.4 93.4 99.9 95.2 10 0 10 20 30 40 50 60 70 80 90 100 Airplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Overall ClassificationRate(%) Comparison of Features MFCC CELP CELP+MFCC Short Term and Long Term Prediction Speech like
  11. 11. Confusion Matrix of CELP Features Classification Rate Airplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Airplane 88.4 – – – – 1.9 – 0.2 5.1 4.4 Bird – 96.8 – 0.1 – 1.6 0.3 0.2 1.1 – Insect – – 99.6 – – 0.4 – – – – Motor 0.1 – – 90.4 – 5.7 – 0.3 3.5 – Rain – – – – 99 0.3 0.4 0.1 – – Rest. 1 2.2 – 8.1 0.1 77.9 1.4 2.6 6.8 0.1 Stream – 0.2 – – 0.3 1 97.7 0.2 0.5 – Thunder 1.9 0.6 0.1 3 0.3 7.5 3.8 78.8 3.4 0.7 Train 5.1 0.7 – 5 0.1 7.1 0.1 0.7 81.3 – Wind – – – – – – – 1.3 – 98.7 11
  12. 12. Principal Component Analysis 12 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ClassificationRate(%) Number of Dimension Principal Component Analysis CELP MFCC MFCC+CELP
  13. 13. Speed and Complexity  Speed • Feature extraction: • Real time • Classification • Training: • Depends on different classifier/kernel • Testing: • Fast and neglect able 13 Avg Run Time Training(sec) Testing(sec) CELP 659 8 MFCC 672 9 CELP+MFCC 912 10
  14. 14. 14 Summary of ESR topic  Conclusion • A novel set of CELP-based features are proposed by exploring the CELP bit stream information • MFCCs representing bank energy not suitable for ESR • CELP and CELP+MFCC performs better than MFCC by 10% margin (Bayesian network classifier) in ESR problem • Long and short term prediction • more robust with respect to background noise • CELP enjoys low complexity, easy implementation and extendible benefits • Recognition based on CELP features is desirable since the additional effort required by feature extraction is almost negligible
  15. 15. Conclusion  A novel set of CELP-based features are proposed by exploring the CELP bit stream information  MFCCs representing bank energy not suitable for ESR  CELP and CELP+MFCC performs better than MFCC by 10% margin (Bayesian network classifier) in ESR problem • Long and short term prediction • more robust with respect to background noise  CELP enjoys low complexity, easy implementation and extendible benefits  Recognition based on CELP features is desirable since the additional effort required by feature extraction is almost negligible 15
  16. 16. Future Work  Explore more features  Speaker recognition and identification  Longer term signature capture 16
  17. 17. 17 Q&A Thanks Q & A

×