얼굴 검출 기법과 감성 언어 인식기법

946 views

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
946
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

얼굴 검출 기법과 감성 언어 인식기법

  1. 1. Part I: 얼굴 검출 기법 Part II: 감성 언어 인식 기법 2011. 3. 11( 금 ). 김성호 영남대학교 전자공학과 Brown Bag Seminar
  2. 2. Part I: 얼굴 검출 기법 연구 [IPIU 2011 학회 발표 ] <ul><li>Motivation </li></ul>
  3. 3. Proposed Object Representation Scheme Viewpoint Figure/Ground mask Local appearance For 2D object: (object center, scale) For 3D object: 3D object pose Boundary shape Figure/ground information Appearance codebook Part pose Joint appearance and shape model
  4. 4. Visual Context in the Joint Appearance & Shape Model <ul><li>How to integrate those contextual cues? </li></ul>BU+TD Spatial Context Hierarchical Context Part – Part context (bottom-up) Object - Background context (top-down) Part – Whole context (bottom-up/top-down)  Grouping property  Supporting contextually related category  Predicting figure-ground Weak neighbor support Strong neighbor support Cooperative
  5. 5. Mathematical Formulation for Categorization (1/2) Solution: C ategory label, V iewpoint, M ask Key issue: difficult modeling of prior due to complex high dimensions Our approach appearance pose Utilize graphical model especially Directed graphical model (Bayesian Net) V M F A X {C,B} N Top-down Bottom-up Viewpoint Figure-ground Codebook index b2 f4 f5 b4 b5 b6 b3 f3 b1 f1 f2 V M F G {C,B}
  6. 6. Learning for Distributed Category Representation CC: Category specific Codebook for top-down inference UC: Universal Codebook for bottom-up inference … … … … … … Joint appearance and boundary with viewpoint Car Airplane Issue How to select optimal codebook (CB) for category representation? Previous constellation model: fixed no. of parts  Cannot handle large variations Why distributed?  To handle large intra class variations
  7. 7. Codebook Selection Reducing Surface Markings <ul><li>Focus </li></ul><ul><ul><li>What codebook can reduce the effect of surface markin gs ? </li></ul></ul><ul><li>Our strategy </li></ul><ul><ul><li>Intermediate blurring </li></ul></ul><ul><ul><li>Statistical property  Entropy </li></ul></ul>Repeatable part Surface marking part
  8. 8. Entropy of Candidate Codebook Low entropy  surface marking High entropy  Semantic parts Finding : High entropy codebook in should be selected for surface marking reduction
  9. 9. Inference Flow related to Category Model Input … … … Car Airplane … background CB UCB CCB Car category Multi-modal viewpoint Multi-modal figure-ground mask Final result Category Model Part-whole context Part-part context (estimate weight) Dense feature Matching to UC Grouping (similarity & proximity) +
  10. 10. Demo of Categorization and Segmentation
  11. 11. Category Detection: Caltech Face Dataset [DB1] <ul><li>About face DB </li></ul><ul><ul><li>435 face images with clutter </li></ul></ul><ul><ul><li>468 background images </li></ul></ul><ul><li>Learning </li></ul><ul><ul><li>Randomly select 15 faces </li></ul></ul><ul><ul><li>Randomly select 15 background </li></ul></ul><ul><li>Test </li></ul><ul><ul><li>200 novel face images </li></ul></ul><ul><ul><li>200 novel background </li></ul></ul>[DB1] http://www.robots.ox.ac.uk/~vgg/data3.html [Weber00] M. Weber, M. Welling, and P. Perona, “Unsupervised learning of models for recognition”, In Proc. ECCV, pp. 18–32, 2000. [Fergus03] R. Fergus, P. Perona, A. Zisserman, “Object class recognition by unsupervised scale invariant learning”, In CVPR, 2003. [Shotton05] J. Shotton, A. Blake, R. Cipolla, “Contour-based learning for object detection”, In ICCV, 2005. Method N train ROC EER (Region error<25%) Unsegmented Segmented [Weber00] 200 0 94.0% [Fergus03] 220 0 96.4% [Shotton05] 50 10 96.5% Ours 0 15 97.3 %
  12. 12. Examples of Face Detection
  13. 13. Test image Bottom-up viewpoints Bottom-up mask Hypothesized viewpoint Hypothesized mask Final Inference result by Boosted MCMC
  14. 14. Test Results in Real Scene (KAIST) <ul><li>Note: We use Caltech DB and test real images. </li></ul>
  15. 15. Conclusions and Discussions <ul><li>Joint appearance and boundary with viewpoint is suitable object model for the object categorization in cluttered scenes. </li></ul><ul><li>Visual contexts (part-part, part-whole, object-background context) can discriminate ambiguous figure-ground. </li></ul><ul><li>Bayesian Net can model both the categorization and the figure-ground segmentation. </li></ul><ul><li>Boosted MCMC can provide efficient inference for cluttered objects. </li></ul><ul><li>Future work </li></ul><ul><ul><li>Modeling of more flexible figure-ground mask </li></ul></ul><ul><ul><li>Using boundary shape in likelihood calculation </li></ul></ul>
  16. 16. Part II: 감성언어 인식 기법 연구 - Introduction <ul><li>Speech </li></ul><ul><ul><li>A sequence of elementary acoustic symbols </li></ul></ul><ul><li>Information in speech </li></ul><ul><ul><li>Gender information, age, accent, speaker’s identity, health, and emotion </li></ul></ul><ul><li>Emotional speech recognition </li></ul><ul><ul><li>Recently, increased attention in this area </li></ul></ul><ul><ul><li>융합과제 : 반한 감정에 대한 정량적 분석에 도움 . </li></ul></ul>
  17. 17. Structure of Emotional Speech Recognition <ul><li>핵심 </li></ul><ul><ul><li>Feature extractor </li></ul></ul><ul><ul><li>Classifier </li></ul></ul>Recognized emotions MFCC SVM or Nearest class mean classifier
  18. 18. Feature for Emotional Speech Recognition <ul><li>Mel Frequency Cepstral Coefficients ( MFCC ) </li></ul><ul><ul><li>Convey information of short time energy in frequency domain </li></ul></ul>Signal Fourier transform (frequency domain) Mapping the power spectrum onto the mel scale Take Log of the mel frequency Final MFCC: Amplitude of resulting spectrum Mel scale: 사람이 차이를 느끼는 주파수 간격
  19. 19. Classifier: Support Vector Machine Feature space Learning : Finding optimal classifier Recognition : Performed by the learned classifier
  20. 20. Classifier: Nearest Class Mean Feature space Learning : Finding class means Recognition : Finding nearest class
  21. 21. Exp.1 on EMO Database <ul><li>구성 </li></ul><ul><ul><li>7 종의 감정 데이터 (happy, angry, anxious, fearful, bored, disgusted, neutral) </li></ul></ul><ul><ul><li>10 종의 문장 </li></ul></ul><ul><ul><li>10 명의 성우 ( 남 5, 여 5) </li></ul></ul><ul><ul><li>언어 : 독일어 </li></ul></ul>anger happy boredom
  22. 22. Recognition using Nearest Class Mean Classifier <ul><li>Learning: 150 (randomly selected), test: 150 </li></ul>Recognition rate: 47.0%
  23. 23. Recognition using SVM <ul><li>Recognition rate: 38.0% </li></ul>SVM 보다 Nearest Class Mean Classifier 가 우수함 .
  24. 24. Exp2. 독일어로 학습  일본어 테스트 <ul><li>놀람 </li></ul><ul><li>슬픔 </li></ul><ul><li>기쁨 </li></ul>독일어와 일본어의 차이로 인해 인식이 불안정함 .
  25. 25. Exp3. 일본어로 학습  일본어로 테스트 <ul><li>DB 구성 : 5 개 감정 , 57 개 음성클립 ( 언덕 위의 구름 4 화 ) </li></ul>'neutral 'anger’ 'happy’ 'freight’ 'sad'
  26. 26. 인식결과 : Nearest Class Mean Classifier 이용 56.7%
  27. 27. 인식결과 : SVM 이용 86.6% SVM 인식 기법이 더 우수함 .
  28. 28. 결론 및 향후 할일 <ul><li>결론 </li></ul><ul><ul><li>MFCC 특징량 추출 및 인식기 (SVM, Nearest mean class classifier) 개발 </li></ul></ul><ul><ul><li>독일어 7 종 감정 인식 성능은 최대 47% 임 . </li></ul></ul><ul><ul><li>독일어 학습  일본어 감정 인식 성능은 매우 안좋음 . </li></ul></ul><ul><ul><li>일본어 학습  일본어 감정 인식 성능은 86.6% 임 . </li></ul></ul><ul><li>향후 할일 </li></ul><ul><ul><li>‘ 언덕 위의 구름’에 적합한 감정 종류 재선별 </li></ul></ul><ul><ul><li>보다 많은 DB 확보 및 실험 </li></ul></ul><ul><ul><li>‘ 언덕 위의 구름’에 대한 전체적인 감정 통계 도출 및 분석 </li></ul></ul>

×