Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

얼굴 검출 기법과 감성 언어 인식기법


Published on

Published in: Education
  • Be the first to comment

얼굴 검출 기법과 감성 언어 인식기법

  1. 1. Part I: 얼굴 검출 기법 Part II: 감성 언어 인식 기법 2011. 3. 11( 금 ). 김성호 영남대학교 전자공학과 Brown Bag Seminar
  2. 2. Part I: 얼굴 검출 기법 연구 [IPIU 2011 학회 발표 ] <ul><li>Motivation </li></ul>
  3. 3. Proposed Object Representation Scheme Viewpoint Figure/Ground mask Local appearance For 2D object: (object center, scale) For 3D object: 3D object pose Boundary shape Figure/ground information Appearance codebook Part pose Joint appearance and shape model
  4. 4. Visual Context in the Joint Appearance & Shape Model <ul><li>How to integrate those contextual cues? </li></ul>BU+TD Spatial Context Hierarchical Context Part – Part context (bottom-up) Object - Background context (top-down) Part – Whole context (bottom-up/top-down)  Grouping property  Supporting contextually related category  Predicting figure-ground Weak neighbor support Strong neighbor support Cooperative
  5. 5. Mathematical Formulation for Categorization (1/2) Solution: C ategory label, V iewpoint, M ask Key issue: difficult modeling of prior due to complex high dimensions Our approach appearance pose Utilize graphical model especially Directed graphical model (Bayesian Net) V M F A X {C,B} N Top-down Bottom-up Viewpoint Figure-ground Codebook index b2 f4 f5 b4 b5 b6 b3 f3 b1 f1 f2 V M F G {C,B}
  6. 6. Learning for Distributed Category Representation CC: Category specific Codebook for top-down inference UC: Universal Codebook for bottom-up inference … … … … … … Joint appearance and boundary with viewpoint Car Airplane Issue How to select optimal codebook (CB) for category representation? Previous constellation model: fixed no. of parts  Cannot handle large variations Why distributed?  To handle large intra class variations
  7. 7. Codebook Selection Reducing Surface Markings <ul><li>Focus </li></ul><ul><ul><li>What codebook can reduce the effect of surface markin gs ? </li></ul></ul><ul><li>Our strategy </li></ul><ul><ul><li>Intermediate blurring </li></ul></ul><ul><ul><li>Statistical property  Entropy </li></ul></ul>Repeatable part Surface marking part
  8. 8. Entropy of Candidate Codebook Low entropy  surface marking High entropy  Semantic parts Finding : High entropy codebook in should be selected for surface marking reduction
  9. 9. Inference Flow related to Category Model Input … … … Car Airplane … background CB UCB CCB Car category Multi-modal viewpoint Multi-modal figure-ground mask Final result Category Model Part-whole context Part-part context (estimate weight) Dense feature Matching to UC Grouping (similarity & proximity) +
  10. 10. Demo of Categorization and Segmentation
  11. 11. Category Detection: Caltech Face Dataset [DB1] <ul><li>About face DB </li></ul><ul><ul><li>435 face images with clutter </li></ul></ul><ul><ul><li>468 background images </li></ul></ul><ul><li>Learning </li></ul><ul><ul><li>Randomly select 15 faces </li></ul></ul><ul><ul><li>Randomly select 15 background </li></ul></ul><ul><li>Test </li></ul><ul><ul><li>200 novel face images </li></ul></ul><ul><ul><li>200 novel background </li></ul></ul>[DB1] [Weber00] M. Weber, M. Welling, and P. Perona, “Unsupervised learning of models for recognition”, In Proc. ECCV, pp. 18–32, 2000. [Fergus03] R. Fergus, P. Perona, A. Zisserman, “Object class recognition by unsupervised scale invariant learning”, In CVPR, 2003. [Shotton05] J. Shotton, A. Blake, R. Cipolla, “Contour-based learning for object detection”, In ICCV, 2005. Method N train ROC EER (Region error<25%) Unsegmented Segmented [Weber00] 200 0 94.0% [Fergus03] 220 0 96.4% [Shotton05] 50 10 96.5% Ours 0 15 97.3 %
  12. 12. Examples of Face Detection
  13. 13. Test image Bottom-up viewpoints Bottom-up mask Hypothesized viewpoint Hypothesized mask Final Inference result by Boosted MCMC
  14. 14. Test Results in Real Scene (KAIST) <ul><li>Note: We use Caltech DB and test real images. </li></ul>
  15. 15. Conclusions and Discussions <ul><li>Joint appearance and boundary with viewpoint is suitable object model for the object categorization in cluttered scenes. </li></ul><ul><li>Visual contexts (part-part, part-whole, object-background context) can discriminate ambiguous figure-ground. </li></ul><ul><li>Bayesian Net can model both the categorization and the figure-ground segmentation. </li></ul><ul><li>Boosted MCMC can provide efficient inference for cluttered objects. </li></ul><ul><li>Future work </li></ul><ul><ul><li>Modeling of more flexible figure-ground mask </li></ul></ul><ul><ul><li>Using boundary shape in likelihood calculation </li></ul></ul>
  16. 16. Part II: 감성언어 인식 기법 연구 - Introduction <ul><li>Speech </li></ul><ul><ul><li>A sequence of elementary acoustic symbols </li></ul></ul><ul><li>Information in speech </li></ul><ul><ul><li>Gender information, age, accent, speaker’s identity, health, and emotion </li></ul></ul><ul><li>Emotional speech recognition </li></ul><ul><ul><li>Recently, increased attention in this area </li></ul></ul><ul><ul><li>융합과제 : 반한 감정에 대한 정량적 분석에 도움 . </li></ul></ul>
  17. 17. Structure of Emotional Speech Recognition <ul><li>핵심 </li></ul><ul><ul><li>Feature extractor </li></ul></ul><ul><ul><li>Classifier </li></ul></ul>Recognized emotions MFCC SVM or Nearest class mean classifier
  18. 18. Feature for Emotional Speech Recognition <ul><li>Mel Frequency Cepstral Coefficients ( MFCC ) </li></ul><ul><ul><li>Convey information of short time energy in frequency domain </li></ul></ul>Signal Fourier transform (frequency domain) Mapping the power spectrum onto the mel scale Take Log of the mel frequency Final MFCC: Amplitude of resulting spectrum Mel scale: 사람이 차이를 느끼는 주파수 간격
  19. 19. Classifier: Support Vector Machine Feature space Learning : Finding optimal classifier Recognition : Performed by the learned classifier
  20. 20. Classifier: Nearest Class Mean Feature space Learning : Finding class means Recognition : Finding nearest class
  21. 21. Exp.1 on EMO Database <ul><li>구성 </li></ul><ul><ul><li>7 종의 감정 데이터 (happy, angry, anxious, fearful, bored, disgusted, neutral) </li></ul></ul><ul><ul><li>10 종의 문장 </li></ul></ul><ul><ul><li>10 명의 성우 ( 남 5, 여 5) </li></ul></ul><ul><ul><li>언어 : 독일어 </li></ul></ul>anger happy boredom
  22. 22. Recognition using Nearest Class Mean Classifier <ul><li>Learning: 150 (randomly selected), test: 150 </li></ul>Recognition rate: 47.0%
  23. 23. Recognition using SVM <ul><li>Recognition rate: 38.0% </li></ul>SVM 보다 Nearest Class Mean Classifier 가 우수함 .
  24. 24. Exp2. 독일어로 학습  일본어 테스트 <ul><li>놀람 </li></ul><ul><li>슬픔 </li></ul><ul><li>기쁨 </li></ul>독일어와 일본어의 차이로 인해 인식이 불안정함 .
  25. 25. Exp3. 일본어로 학습  일본어로 테스트 <ul><li>DB 구성 : 5 개 감정 , 57 개 음성클립 ( 언덕 위의 구름 4 화 ) </li></ul>'neutral 'anger’ 'happy’ 'freight’ 'sad'
  26. 26. 인식결과 : Nearest Class Mean Classifier 이용 56.7%
  27. 27. 인식결과 : SVM 이용 86.6% SVM 인식 기법이 더 우수함 .
  28. 28. 결론 및 향후 할일 <ul><li>결론 </li></ul><ul><ul><li>MFCC 특징량 추출 및 인식기 (SVM, Nearest mean class classifier) 개발 </li></ul></ul><ul><ul><li>독일어 7 종 감정 인식 성능은 최대 47% 임 . </li></ul></ul><ul><ul><li>독일어 학습  일본어 감정 인식 성능은 매우 안좋음 . </li></ul></ul><ul><ul><li>일본어 학습  일본어 감정 인식 성능은 86.6% 임 . </li></ul></ul><ul><li>향후 할일 </li></ul><ul><ul><li>‘ 언덕 위의 구름’에 적합한 감정 종류 재선별 </li></ul></ul><ul><ul><li>보다 많은 DB 확보 및 실험 </li></ul></ul><ul><ul><li>‘ 언덕 위의 구름’에 대한 전체적인 감정 통계 도출 및 분석 </li></ul></ul>