SlideShare a Scribd company logo
1 of 34
Supervisor:
Dr. Celia Shahnaz
Professor, Department of EEE
Bangladesh University of Engineering and Technology
Presented By:
Rafat Jamal Tazim (1406068)
Faria Armin (1406151)
• Enhancing naturalness in human machine interaction
• Speech to speech translation system
• Pain monitoring in patients
• Detection and treatment of depression and anxiety
2
The eNTERFACE’05 Audio-Visual Emotion Database[1]
• Public database
• 6 emotions: Anger, Disgust, Fear, Happiness, Sadness, Surprise
• 42 subjects (81% men, 19% women)
• 1166 video sequences
3
VIDEO
RGB
HISTOGRAM
HSV
HISTOGRAM
YCbCr
HISTOGRAM
HOS
IQA
PARAMETERS
FRAME AUDIO
PLPC MFCC
WPDC
PLPC MFCC
FINAL FEATURE
TRAIN SUBJECTS TEST SUBJECTS
CLASSIFICATION
DECISION 4
FINAL FEATURE
TEST SUBJECTS
TRAIN SUBJECTS
CLASSIFICATION
DECISION
VIDEO
SILENT FRAME REMOVAL
PREPROCESSING
FACE DETECTION
RGB HISTOGRAM HSV HISTOGRAM YCbCr HISTOGRAM
HOS IQA PARAMETERS
5
• Doesn’t contain any useful
information
• Removed using Short Term
Energy (STE) thresholding of
audio
• Decreases redundancy and
computational complexity
• Increases accuracy and
efficiency Silent
Region
Speech
Region
Silent
Region
Speech
Region
Silent
Region
Speech
Region
Silent
Region
6
• 3D median filter suppresses
noise
• Unsharp masking sharpens
image
• No need for gray-scale
conversion
• Different color spaces are used
for feature extraction
Filtering
Sharpening
7
• Viola-Jones algorithm is
applied to each frame
• Face region (ROI) is segmented
• Minimizes background effect
• Segmented image is resized to
𝟏𝟎𝟎 × 𝟏𝟎𝟎 × 𝟑 𝐩𝐢𝐱𝐞𝐥𝐬
ROI
Extraction
Resizing
8
Why Entire Face Region is Considered?
• Concentration level of hemoglobin and oxygenation under the
skin vary due to changes in person’s emotional and physical
state
• Subtle changes in a hue and saturation components of skin
color have been observed[2]
• Entire face is taken instead of only eye and mouth region
• Increase in accuracy compensates for increasing complexity
9
Higher Order Statistical (HOS) Features
• Segmented images are converted from RGB plane to HSV, Lab,
Luv, YCbCr and NTSc color planes
• Three HOS (Kurtosis, Skewness, Variance) from each of the
channels of different color planes are taken
Why?
• HOS gives far less number of,
- Relevant
- Non-redundant
- Distinguishable features
in comparison to typical statistics like mean, standard deviation
10
RGB Histogram Features
• Each channel quantized to 8 levels
• 𝟓𝟏𝟐 RGB histogram features are obtained
HSV Histogram Features
• Hue channel is quantized to 16 levels
• Saturation and Value each channel is quantized to 8 levels
• 1024 HSV histogram features are obtained
YCbCr Histogram Features
• Each channel is quantized to 8 levels
• 𝟓𝟏𝟐 YCbCr histogram features are obtained
11
Image Quality Analysis Parameter Extraction
• Smoothed version of input image is used as reference image
• Filter with a Gaussian kernel 0.5 to generates smoothed
version of input image
• The quality between the two images are calculated by the
following parameters
- Structural content
- Mean square error
- Peak signal to noise ratio
- Normalized cross correlation
- Average difference
- Maximum distance
- Normalized absolute error 12
15 512 1024 512 7
HOS Features
HSV Histogram
Features
IQA
Parameters
RGB Histogram
Features
YCbCr Histogram
Features
13
AUDIO
SILENCE REMOVAL
PRE-EMPHASIS
FRAMING & WINDOWING
PLPC MFCC WPDC
TEMPORAL SMOOTHING
HOS
TEAGER OPERATOR
PLPC MFCC
FINAL FEATURE
TEST SUBJECTS
TRAIN SUBJECTS
CLASSIFICATION DECISION 14
• Each channel provides slightly
different values
• Can vary accuracy
• Left, Right and Mono (mean of
two channels) are taken
15
• First order high pass filter
with pre-emphasis coefficient,
𝒂 = −𝟎. 𝟗𝟕𝟖𝟓
• Balances frequency spectrum
• Improves SNR
𝐻 𝑧 = 1 + 𝑎𝑧−1
16
• Down sampled to 16 kHz from 48 kHz
• Audio is segmented into,
- 25ms frames (400 samples)
- 10ms overlapping (160 samples)
• Within 25ms-10ms signal is quasi-
stationary
• Each frame is multiplied by Hamming
window
• Windowing prevents Gibbs phenomena
Hamming Window
Down Sampling
17
Perceptual Linear Predictive
Co-efficient (PLPC)
• Represents the way human
ear perceives frequency
ranges
• Useful for emotion related
information extraction
• 12 order (length 13) PLPCs
are extracted applying Bark
filter bank
Bark Frequency Conversion Formula:
𝐵𝑎𝑟𝑘 = 13 tan−1
0.76𝑓
1000
+ 3.5 tan−1(
𝑓2
75002
)
18
Mel-Frequency Cepstral Co-
efficient (MFCC)
• Mimics the non-linear
human ear perception of
sound
• More discriminative at lower
frequencies and less
discriminative at higher
frequencies
• 13 MFCCs, using 13 filters
of Mel-frequency band are
extracted
Mel Frequency Conversion Formula:
𝑀 = 2595 log(1 +
𝑓
700
)
19
Temporal Smoothing
• Temporal smoothing of
length 3 (taking into
account two previous and
two following frames) is
applied to each frames
• Removes any sudden
changes in features due
to noisy speech samples
Smoothed Feature Vector:
𝑥𝑠𝑚𝑎(𝑛) =
1
𝑊
𝑖=−
𝑊−1
2
𝑖=
𝑊−1
2
𝑥(𝑛 + 1)
x(n) = Feature vector
W = 3, Smoothing window
20
Higher Order Statistics (HOS) Features
• Three HOS (Kurtosis, Skewness and Variance) is taken
• HOS of MFFCs and PLPCs of each frame is taken
21
Wavelet Packet Decomposition
• Presents another scale of
perceptual frequency range
• Three level wavelet packet
decomposition is performed
• Both Coiflet and Daubechiesh
filters are used
Wavelet Packet Decomposition Tree
(0,0)
(1,0) (1,1)
(2,0) (2,1)
(3,0) (3,1)
Level 0
Level 1
Level 2
Level 3
*Bold faced nodes’ coefficients are used
22
Teager Energy Operator
• Non-linear time domain
operator
• Removes any sudden
changes in features due
to noisy speech samples
Energy Operator:
ψ 𝑥 𝑛 = 𝑥2
𝑛 − 𝑥 𝑛 − 1 𝑥 𝑛 + 1
23
PLPC and MFCC of WPD Coefficients
• PLPCs and MFCCs are extracted from each of four wavelet
coefficients
• Combination of three perceptual frequency scale provides
information regarding emotions in grater scales and finer
details
24
MFCC
PLPC Wavelet Packet
MFCC and PLPC
3 3 208
25
Visual Features
Audio Features
214
2070
26
0%
20%
40%
60%
80%
100%
120%
Anger Disgust Fear Happiness Sadness Surprise
Cubic SVM Classifier
Audio Visual Audio-Visual
27
75%
80%
85%
90%
95%
100%
Anger Disgust Fear Happiness Sadness Surprise
Fine KNN Classifier
Audio Visual Audio-Visual
28
Speech Emotion
Recognition:
84.8%
Visual Emotion
Recognition:
96.3%
Audio-Visual
Emotion
Recognition:
97.1%
Classifier: Cubic SVM
Kernel Function: Cubic
Validation: 5 fold cross validation
29
Speech Emotion
Recognition:
87.3%
Visual Emotion
Recognition:
91.9%
Audio-Visual
Emotion
Recognition:
93.8%
Classifier: Fine KNN
No. of Neighbors: 1
Distance Metrics: Euclidean
30
Author
Audio
Recognition Rate
Visual
Recognition Rate
Audio-Visual
Recognition Rate
Datcu et al 55.9% 37.70% 56.30%
Paleari et al. 35.00% 25.00% 67.00%
Mansoorizadeh et al. 33.00% 37.00% 71.00%
Gajsek et al. 62.90% 54.70% 71.30%
Wang et al. 38.00% 58.00% 76.00%
Jiang et al. 52.19% 46.78% 66.54%
Huant et al. 48.40% 54.85% 61.10%
Zhalehpour et al. 72.95% 38.22% 76.40%
Proposed Method 84.8% 96.3% 97.1%
31
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
Audio
Video
Audio-Video
32
[1] O. Martin, I. Kotsia, B. Macq, and I. Pitas, “The eNTERFACE’05
Audio-Visual emotion database,” in ICDEW 2006 - Proceedings
of the 22nd International Conference on Data Engineering
Workshops, 2006
[2] G. A. Ramirez, O. Fuentes, S. L. Crites, M. Jimenez, and J.
Ordonez, “Color analysis of facial skin: Detection of emotional
state,” in IEEE Computer Society Conference on Computer
Vision and Pattern Recognition Workshops, 2014.
33
34

More Related Content

Similar to Emotion Recognition.pptx

Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPC
Disha Modi
 
Digital communication
Digital communicationDigital communication
Digital communication
meashi
 
Active noise control
Active noise controlActive noise control
Active noise control
Rishikesh .
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
Hira Shaukat
 

Similar to Emotion Recognition.pptx (20)

Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
A1mpeg12 2004
A1mpeg12 2004A1mpeg12 2004
A1mpeg12 2004
 
Audio Video Engineering
Audio Video Engineering Audio Video Engineering
Audio Video Engineering
 
lpc and horn noise detection
lpc and horn noise detectionlpc and horn noise detection
lpc and horn noise detection
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPC
 
Feasibility of EEG Super-Resolution Using Deep Convolutional Networks
Feasibility of EEG Super-Resolution Using Deep Convolutional NetworksFeasibility of EEG Super-Resolution Using Deep Convolutional Networks
Feasibility of EEG Super-Resolution Using Deep Convolutional Networks
 
NTSC Software Decoding Presentation
NTSC Software Decoding PresentationNTSC Software Decoding Presentation
NTSC Software Decoding Presentation
 
Color flow medical cardiac ultrasound
Color flow medical cardiac ultrasoundColor flow medical cardiac ultrasound
Color flow medical cardiac ultrasound
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
Laboratory Duct Active noise control using Adaptive Filters
Laboratory Duct Active noise control using Adaptive Filters Laboratory Duct Active noise control using Adaptive Filters
Laboratory Duct Active noise control using Adaptive Filters
 
Digital communication
Digital communicationDigital communication
Digital communication
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
Active noise control
Active noise controlActive noise control
Active noise control
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Tdm fdm
Tdm fdmTdm fdm
Tdm fdm
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing ApplicationsDDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
 
Harmonic speech coding
Harmonic speech codingHarmonic speech coding
Harmonic speech coding
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
 
PCM-Part 1.pptx
PCM-Part 1.pptxPCM-Part 1.pptx
PCM-Part 1.pptx
 

Recently uploaded

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 

Recently uploaded (20)

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 

Emotion Recognition.pptx

  • 1. Supervisor: Dr. Celia Shahnaz Professor, Department of EEE Bangladesh University of Engineering and Technology Presented By: Rafat Jamal Tazim (1406068) Faria Armin (1406151)
  • 2. • Enhancing naturalness in human machine interaction • Speech to speech translation system • Pain monitoring in patients • Detection and treatment of depression and anxiety 2
  • 3. The eNTERFACE’05 Audio-Visual Emotion Database[1] • Public database • 6 emotions: Anger, Disgust, Fear, Happiness, Sadness, Surprise • 42 subjects (81% men, 19% women) • 1166 video sequences 3
  • 4. VIDEO RGB HISTOGRAM HSV HISTOGRAM YCbCr HISTOGRAM HOS IQA PARAMETERS FRAME AUDIO PLPC MFCC WPDC PLPC MFCC FINAL FEATURE TRAIN SUBJECTS TEST SUBJECTS CLASSIFICATION DECISION 4
  • 5. FINAL FEATURE TEST SUBJECTS TRAIN SUBJECTS CLASSIFICATION DECISION VIDEO SILENT FRAME REMOVAL PREPROCESSING FACE DETECTION RGB HISTOGRAM HSV HISTOGRAM YCbCr HISTOGRAM HOS IQA PARAMETERS 5
  • 6. • Doesn’t contain any useful information • Removed using Short Term Energy (STE) thresholding of audio • Decreases redundancy and computational complexity • Increases accuracy and efficiency Silent Region Speech Region Silent Region Speech Region Silent Region Speech Region Silent Region 6
  • 7. • 3D median filter suppresses noise • Unsharp masking sharpens image • No need for gray-scale conversion • Different color spaces are used for feature extraction Filtering Sharpening 7
  • 8. • Viola-Jones algorithm is applied to each frame • Face region (ROI) is segmented • Minimizes background effect • Segmented image is resized to 𝟏𝟎𝟎 × 𝟏𝟎𝟎 × 𝟑 𝐩𝐢𝐱𝐞𝐥𝐬 ROI Extraction Resizing 8
  • 9. Why Entire Face Region is Considered? • Concentration level of hemoglobin and oxygenation under the skin vary due to changes in person’s emotional and physical state • Subtle changes in a hue and saturation components of skin color have been observed[2] • Entire face is taken instead of only eye and mouth region • Increase in accuracy compensates for increasing complexity 9
  • 10. Higher Order Statistical (HOS) Features • Segmented images are converted from RGB plane to HSV, Lab, Luv, YCbCr and NTSc color planes • Three HOS (Kurtosis, Skewness, Variance) from each of the channels of different color planes are taken Why? • HOS gives far less number of, - Relevant - Non-redundant - Distinguishable features in comparison to typical statistics like mean, standard deviation 10
  • 11. RGB Histogram Features • Each channel quantized to 8 levels • 𝟓𝟏𝟐 RGB histogram features are obtained HSV Histogram Features • Hue channel is quantized to 16 levels • Saturation and Value each channel is quantized to 8 levels • 1024 HSV histogram features are obtained YCbCr Histogram Features • Each channel is quantized to 8 levels • 𝟓𝟏𝟐 YCbCr histogram features are obtained 11
  • 12. Image Quality Analysis Parameter Extraction • Smoothed version of input image is used as reference image • Filter with a Gaussian kernel 0.5 to generates smoothed version of input image • The quality between the two images are calculated by the following parameters - Structural content - Mean square error - Peak signal to noise ratio - Normalized cross correlation - Average difference - Maximum distance - Normalized absolute error 12
  • 13. 15 512 1024 512 7 HOS Features HSV Histogram Features IQA Parameters RGB Histogram Features YCbCr Histogram Features 13
  • 14. AUDIO SILENCE REMOVAL PRE-EMPHASIS FRAMING & WINDOWING PLPC MFCC WPDC TEMPORAL SMOOTHING HOS TEAGER OPERATOR PLPC MFCC FINAL FEATURE TEST SUBJECTS TRAIN SUBJECTS CLASSIFICATION DECISION 14
  • 15. • Each channel provides slightly different values • Can vary accuracy • Left, Right and Mono (mean of two channels) are taken 15
  • 16. • First order high pass filter with pre-emphasis coefficient, 𝒂 = −𝟎. 𝟗𝟕𝟖𝟓 • Balances frequency spectrum • Improves SNR 𝐻 𝑧 = 1 + 𝑎𝑧−1 16
  • 17. • Down sampled to 16 kHz from 48 kHz • Audio is segmented into, - 25ms frames (400 samples) - 10ms overlapping (160 samples) • Within 25ms-10ms signal is quasi- stationary • Each frame is multiplied by Hamming window • Windowing prevents Gibbs phenomena Hamming Window Down Sampling 17
  • 18. Perceptual Linear Predictive Co-efficient (PLPC) • Represents the way human ear perceives frequency ranges • Useful for emotion related information extraction • 12 order (length 13) PLPCs are extracted applying Bark filter bank Bark Frequency Conversion Formula: 𝐵𝑎𝑟𝑘 = 13 tan−1 0.76𝑓 1000 + 3.5 tan−1( 𝑓2 75002 ) 18
  • 19. Mel-Frequency Cepstral Co- efficient (MFCC) • Mimics the non-linear human ear perception of sound • More discriminative at lower frequencies and less discriminative at higher frequencies • 13 MFCCs, using 13 filters of Mel-frequency band are extracted Mel Frequency Conversion Formula: 𝑀 = 2595 log(1 + 𝑓 700 ) 19
  • 20. Temporal Smoothing • Temporal smoothing of length 3 (taking into account two previous and two following frames) is applied to each frames • Removes any sudden changes in features due to noisy speech samples Smoothed Feature Vector: 𝑥𝑠𝑚𝑎(𝑛) = 1 𝑊 𝑖=− 𝑊−1 2 𝑖= 𝑊−1 2 𝑥(𝑛 + 1) x(n) = Feature vector W = 3, Smoothing window 20
  • 21. Higher Order Statistics (HOS) Features • Three HOS (Kurtosis, Skewness and Variance) is taken • HOS of MFFCs and PLPCs of each frame is taken 21
  • 22. Wavelet Packet Decomposition • Presents another scale of perceptual frequency range • Three level wavelet packet decomposition is performed • Both Coiflet and Daubechiesh filters are used Wavelet Packet Decomposition Tree (0,0) (1,0) (1,1) (2,0) (2,1) (3,0) (3,1) Level 0 Level 1 Level 2 Level 3 *Bold faced nodes’ coefficients are used 22
  • 23. Teager Energy Operator • Non-linear time domain operator • Removes any sudden changes in features due to noisy speech samples Energy Operator: ψ 𝑥 𝑛 = 𝑥2 𝑛 − 𝑥 𝑛 − 1 𝑥 𝑛 + 1 23
  • 24. PLPC and MFCC of WPD Coefficients • PLPCs and MFCCs are extracted from each of four wavelet coefficients • Combination of three perceptual frequency scale provides information regarding emotions in grater scales and finer details 24
  • 25. MFCC PLPC Wavelet Packet MFCC and PLPC 3 3 208 25
  • 27. 0% 20% 40% 60% 80% 100% 120% Anger Disgust Fear Happiness Sadness Surprise Cubic SVM Classifier Audio Visual Audio-Visual 27
  • 28. 75% 80% 85% 90% 95% 100% Anger Disgust Fear Happiness Sadness Surprise Fine KNN Classifier Audio Visual Audio-Visual 28
  • 31. Author Audio Recognition Rate Visual Recognition Rate Audio-Visual Recognition Rate Datcu et al 55.9% 37.70% 56.30% Paleari et al. 35.00% 25.00% 67.00% Mansoorizadeh et al. 33.00% 37.00% 71.00% Gajsek et al. 62.90% 54.70% 71.30% Wang et al. 38.00% 58.00% 76.00% Jiang et al. 52.19% 46.78% 66.54% Huant et al. 48.40% 54.85% 61.10% Zhalehpour et al. 72.95% 38.22% 76.40% Proposed Method 84.8% 96.3% 97.1% 31
  • 33. [1] O. Martin, I. Kotsia, B. Macq, and I. Pitas, “The eNTERFACE’05 Audio-Visual emotion database,” in ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops, 2006 [2] G. A. Ramirez, O. Fuentes, S. L. Crites, M. Jimenez, and J. Ordonez, “Color analysis of facial skin: Detection of emotional state,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2014. 33
  • 34. 34