Part I:  얼굴 검출 기법 Part II:  감성 언어 인식 기법 2011. 3. 11( 금 ). 김성호 영남대학교 전자공학과 Brown Bag Seminar
Part I:  얼굴 검출 기법 연구  [IPIU 2011  학회 발표 ] <ul><li>Motivation </li></ul>
Proposed Object Representation Scheme Viewpoint Figure/Ground mask Local appearance For 2D object: (object center, scale) ...
Visual Context in the Joint Appearance & Shape Model <ul><li>How to integrate those contextual cues? </li></ul>BU+TD Spati...
Mathematical Formulation for Categorization (1/2) Solution:  C ategory label,  V iewpoint,  M ask Key issue:  difficult mo...
Learning  for Distributed Category Representation CC: Category specific Codebook for top-down inference  UC: Universal Cod...
Codebook Selection Reducing Surface Markings  <ul><li>Focus </li></ul><ul><ul><li>What codebook can reduce the effect of  ...
Entropy of Candidate Codebook Low entropy   surface marking High entropy   Semantic parts Finding : High entropy codeboo...
Inference Flow related to Category Model Input … … … Car Airplane … background CB UCB CCB Car category Multi-modal  viewpo...
Demo of Categorization and Segmentation
Category Detection: Caltech Face Dataset [DB1] <ul><li>About face DB </li></ul><ul><ul><li>435 face images with clutter </...
Examples of Face Detection
Test image Bottom-up viewpoints Bottom-up mask Hypothesized viewpoint Hypothesized mask Final Inference result by Boosted ...
Test Results in Real Scene (KAIST) <ul><li>Note: We use Caltech DB and test real images. </li></ul>
Conclusions and Discussions <ul><li>Joint appearance and boundary  with viewpoint is suitable object model for the object ...
Part II:  감성언어 인식 기법 연구  - Introduction <ul><li>Speech </li></ul><ul><ul><li>A sequence of elementary acoustic symbols </l...
Structure of Emotional Speech Recognition <ul><li>핵심 </li></ul><ul><ul><li>Feature extractor </li></ul></ul><ul><ul><li>Cl...
Feature for Emotional Speech Recognition <ul><li>Mel Frequency Cepstral Coefficients ( MFCC ) </li></ul><ul><ul><li>Convey...
Classifier: Support Vector Machine  Feature space Learning :  Finding optimal classifier Recognition :  Performed by the l...
Classifier: Nearest Class Mean  Feature space Learning :  Finding class means Recognition :  Finding nearest class
Exp.1 on EMO Database  <ul><li>구성 </li></ul><ul><ul><li>7 종의 감정 데이터 (happy, angry, anxious, fearful, bored, disgusted, neu...
Recognition using Nearest Class Mean Classifier <ul><li>Learning: 150 (randomly selected), test: 150  </li></ul>Recognitio...
Recognition using SVM <ul><li>Recognition rate: 38.0% </li></ul>SVM  보다  Nearest Class Mean Classifier 가 우수함 .
Exp2.  독일어로 학습    일본어 테스트 <ul><li>놀람 </li></ul><ul><li>슬픔 </li></ul><ul><li>기쁨 </li></ul>독일어와 일본어의 차이로 인해 인식이 불안정함 .
Exp3.  일본어로 학습    일본어로 테스트 <ul><li>DB 구성 : 5 개 감정 , 57 개 음성클립 ( 언덕 위의  구름  4 화 ) </li></ul>'neutral 'anger’ 'happy’ 'frei...
인식결과 : Nearest Class Mean Classifier  이용 56.7%
인식결과 : SVM  이용 86.6% SVM  인식 기법이 더 우수함 .
결론 및 향후 할일 <ul><li>결론 </li></ul><ul><ul><li>MFCC  특징량 추출 및 인식기 (SVM, Nearest mean class classifier)  개발 </li></ul></ul><ul...
Upcoming SlideShare
Loading in …5
×

얼굴검출기법 감성언어인식기법

937 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
937
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

얼굴검출기법 감성언어인식기법

  1. 1. Part I: 얼굴 검출 기법 Part II: 감성 언어 인식 기법 2011. 3. 11( 금 ). 김성호 영남대학교 전자공학과 Brown Bag Seminar
  2. 2. Part I: 얼굴 검출 기법 연구 [IPIU 2011 학회 발표 ] <ul><li>Motivation </li></ul>
  3. 3. Proposed Object Representation Scheme Viewpoint Figure/Ground mask Local appearance For 2D object: (object center, scale) For 3D object: 3D object pose Boundary shape Figure/ground information Appearance codebook Part pose Joint appearance and shape model
  4. 4. Visual Context in the Joint Appearance & Shape Model <ul><li>How to integrate those contextual cues? </li></ul>BU+TD Spatial Context Hierarchical Context Part – Part context (bottom-up) Object - Background context (top-down) Part – Whole context (bottom-up/top-down)  Grouping property  Supporting contextually related category  Predicting figure-ground Weak neighbor support Strong neighbor support Cooperative
  5. 5. Mathematical Formulation for Categorization (1/2) Solution: C ategory label, V iewpoint, M ask Key issue: difficult modeling of prior due to complex high dimensions Our approach appearance pose Utilize graphical model especially Directed graphical model (Bayesian Net) V M F A X {C,B} N Top-down Bottom-up Viewpoint Figure-ground Codebook index b2 f4 f5 b4 b5 b6 b3 f3 b1 f1 f2 V M F G {C,B}
  6. 6. Learning for Distributed Category Representation CC: Category specific Codebook for top-down inference UC: Universal Codebook for bottom-up inference … … … … … … Joint appearance and boundary with viewpoint Car Airplane Issue How to select optimal codebook (CB) for category representation? Previous constellation model: fixed no. of parts  Cannot handle large variations Why distributed?  To handle large intra class variations
  7. 7. Codebook Selection Reducing Surface Markings <ul><li>Focus </li></ul><ul><ul><li>What codebook can reduce the effect of surface markin gs ? </li></ul></ul><ul><li>Our strategy </li></ul><ul><ul><li>Intermediate blurring </li></ul></ul><ul><ul><li>Statistical property  Entropy </li></ul></ul>Repeatable part Surface marking part
  8. 8. Entropy of Candidate Codebook Low entropy  surface marking High entropy  Semantic parts Finding : High entropy codebook in should be selected for surface marking reduction
  9. 9. Inference Flow related to Category Model Input … … … Car Airplane … background CB UCB CCB Car category Multi-modal viewpoint Multi-modal figure-ground mask Final result Category Model Part-whole context Part-part context (estimate weight) Dense feature Matching to UC Grouping (similarity & proximity) +
  10. 10. Demo of Categorization and Segmentation
  11. 11. Category Detection: Caltech Face Dataset [DB1] <ul><li>About face DB </li></ul><ul><ul><li>435 face images with clutter </li></ul></ul><ul><ul><li>468 background images </li></ul></ul><ul><li>Learning </li></ul><ul><ul><li>Randomly select 15 faces </li></ul></ul><ul><ul><li>Randomly select 15 background </li></ul></ul><ul><li>Test </li></ul><ul><ul><li>200 novel face images </li></ul></ul><ul><ul><li>200 novel background </li></ul></ul>[DB1] http://www.robots.ox.ac.uk/~vgg/data3.html [Weber00] M. Weber, M. Welling, and P. Perona, “Unsupervised learning of models for recognition”, In Proc. ECCV, pp. 18–32, 2000. [Fergus03] R. Fergus, P. Perona, A. Zisserman, “Object class recognition by unsupervised scale invariant learning”, In CVPR, 2003. [Shotton05] J. Shotton, A. Blake, R. Cipolla, “Contour-based learning for object detection”, In ICCV, 2005. Method N train ROC EER (Region error<25%) Unsegmented Segmented [Weber00] 200 0 94.0% [Fergus03] 220 0 96.4% [Shotton05] 50 10 96.5% Ours 0 15 97.3 %
  12. 12. Examples of Face Detection
  13. 13. Test image Bottom-up viewpoints Bottom-up mask Hypothesized viewpoint Hypothesized mask Final Inference result by Boosted MCMC
  14. 14. Test Results in Real Scene (KAIST) <ul><li>Note: We use Caltech DB and test real images. </li></ul>
  15. 15. Conclusions and Discussions <ul><li>Joint appearance and boundary with viewpoint is suitable object model for the object categorization in cluttered scenes. </li></ul><ul><li>Visual contexts (part-part, part-whole, object-background context) can discriminate ambiguous figure-ground. </li></ul><ul><li>Bayesian Net can model both the categorization and the figure-ground segmentation. </li></ul><ul><li>Boosted MCMC can provide efficient inference for cluttered objects. </li></ul><ul><li>Future work </li></ul><ul><ul><li>Modeling of more flexible figure-ground mask </li></ul></ul><ul><ul><li>Using boundary shape in likelihood calculation </li></ul></ul>
  16. 16. Part II: 감성언어 인식 기법 연구 - Introduction <ul><li>Speech </li></ul><ul><ul><li>A sequence of elementary acoustic symbols </li></ul></ul><ul><li>Information in speech </li></ul><ul><ul><li>Gender information, age, accent, speaker’s identity, health, and emotion </li></ul></ul><ul><li>Emotional speech recognition </li></ul><ul><ul><li>Recently, increased attention in this area </li></ul></ul><ul><ul><li>융합과제 : 반한 감정에 대한 정량적 분석에 도움 . </li></ul></ul>
  17. 17. Structure of Emotional Speech Recognition <ul><li>핵심 </li></ul><ul><ul><li>Feature extractor </li></ul></ul><ul><ul><li>Classifier </li></ul></ul>Recognized emotions MFCC SVM or Nearest class mean classifier
  18. 18. Feature for Emotional Speech Recognition <ul><li>Mel Frequency Cepstral Coefficients ( MFCC ) </li></ul><ul><ul><li>Convey information of short time energy in frequency domain </li></ul></ul>Signal Fourier transform (frequency domain) Mapping the power spectrum onto the mel scale Take Log of the mel frequency Final MFCC: Amplitude of resulting spectrum Mel scale: 사람이 차이를 느끼는 주파수 간격
  19. 19. Classifier: Support Vector Machine Feature space Learning : Finding optimal classifier Recognition : Performed by the learned classifier
  20. 20. Classifier: Nearest Class Mean Feature space Learning : Finding class means Recognition : Finding nearest class
  21. 21. Exp.1 on EMO Database <ul><li>구성 </li></ul><ul><ul><li>7 종의 감정 데이터 (happy, angry, anxious, fearful, bored, disgusted, neutral) </li></ul></ul><ul><ul><li>10 종의 문장 </li></ul></ul><ul><ul><li>10 명의 성우 ( 남 5, 여 5) </li></ul></ul><ul><ul><li>언어 : 독일어 </li></ul></ul>anger happy boredom
  22. 22. Recognition using Nearest Class Mean Classifier <ul><li>Learning: 150 (randomly selected), test: 150 </li></ul>Recognition rate: 47.0%
  23. 23. Recognition using SVM <ul><li>Recognition rate: 38.0% </li></ul>SVM 보다 Nearest Class Mean Classifier 가 우수함 .
  24. 24. Exp2. 독일어로 학습  일본어 테스트 <ul><li>놀람 </li></ul><ul><li>슬픔 </li></ul><ul><li>기쁨 </li></ul>독일어와 일본어의 차이로 인해 인식이 불안정함 .
  25. 25. Exp3. 일본어로 학습  일본어로 테스트 <ul><li>DB 구성 : 5 개 감정 , 57 개 음성클립 ( 언덕 위의 구름 4 화 ) </li></ul>'neutral 'anger’ 'happy’ 'freight’ 'sad'
  26. 26. 인식결과 : Nearest Class Mean Classifier 이용 56.7%
  27. 27. 인식결과 : SVM 이용 86.6% SVM 인식 기법이 더 우수함 .
  28. 28. 결론 및 향후 할일 <ul><li>결론 </li></ul><ul><ul><li>MFCC 특징량 추출 및 인식기 (SVM, Nearest mean class classifier) 개발 </li></ul></ul><ul><ul><li>독일어 7 종 감정 인식 성능은 최대 47% 임 . </li></ul></ul><ul><ul><li>독일어 학습  일본어 감정 인식 성능은 매우 안좋음 . </li></ul></ul><ul><ul><li>일본어 학습  일본어 감정 인식 성능은 86.6% 임 . </li></ul></ul><ul><li>향후 할일 </li></ul><ul><ul><li>‘ 언덕 위의 구름’에 적합한 감정 종류 재선별 </li></ul></ul><ul><ul><li>보다 많은 DB 확보 및 실험 </li></ul></ul><ul><ul><li>‘ 언덕 위의 구름’에 대한 전체적인 감정 통계 도출 및 분석 </li></ul></ul>

×