SlideShare a Scribd company logo
1 of 17
THE THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-20)
MULTIMODEL(MULTI-INPUT) IN EMOTION RECOGNITION
 Reason:
o Richer information: Cues from different modalities can augment or complement each other, and hence lead to
more sophisticated inference algorithms.
o Robustness to Sensor Noise: Information on different modalities captured through sensors can often be cor
rupted due to signal noise, or be missing altogether when the particular modality is not expressed, or can not be
captured due to occlusion, sensor artifacts, etc. We call such modalities ineffectual. Ineffectual modali ties are
especially prevalent in in-the-wild datasets.
DATASET
 IEMOCAP(2008):
 CMU_MOSEI(2018):
DATASET
 MULTI-COMPARE BETWEEN
CMU-MOSEI AND IEMOCAP
IEMOCAP
CHALLENGE
 Challenge:
o Decide which modalities should be combined and how
o Lack of agreement on the most efficient mechanism for combining(fusing) multi modalities
TECHNIQUES
 Early fusion:
Sikka et al 2013: Multiple Kernel Learning for Emotion
Recognition in the Wild
Majumder et al (2018)
 Late fusion:
Gunes et al 2007: Multimodal emotion recognition
from expressive faces, body gestures
Lee et al (2018) Convolutional Attention
Networks for Multimodal Emotion
Recognition from Speech and Text Data
RELATED WORK
 Multimodalities comparision
Dataset Method Modalities F1 scores MA
IEMOCAP
Kim et al (2013) Deep Belief Network Motion capture and audio
video
72.8 %
Yoon et al(2019) Multi-hop attention Text and Speech 77,6 %
Majumdar et al (2018) Text, Audio and Video 76.5 %
CMU-
MOSEI
Zadeh et al (2018) Dynamic Fusion
Graph
Language, vision and
acoustic
76.3%
Lee et al (2018) Text and Speech 89% 84.08%
Sahay et al(2018) tensor fusion network Text and audio 66.8%
SOLUTION
The general diagram of M3ER
MODALITIES CHECK
 Purpose: filter ineffectual data to increase the accuracy of reality data
Using Canonical Correlation Analysis (CCA) to compute
the correlation score, ρ, of every pair of input modalities
Compute the correlation score for the pair {𝑓𝑖, 𝑓𝑗}
Check them against an empirically chosen
threshold (τ)
REGENERATING PROXY FEATURE VECTORS
 Purpose: decrease the noise of each feature by regenerating proxy feature vectors for the ineffectual
modalities missed
Finding vj = argminjd(vj, ff), where is any distance metric
Compute constants ai ∈ R by solving the following linear system:
MULTIPLICATIVE MODALITY FUSION
 Idea: to explicitly suppress the weaker (not so expressive) modalities, which indirectly boost the stronger
(expressive) modalities
The loss for the 𝑖𝑡ℎ modality:
MODALITY COMBINATION
 Requirement:
o Be able to process the sotisphicated – data driven ( CMU-MOSEI, Youtube…) which has noise, occlusion, …
o Increase the reliability
 Proposal combination:
o Using single-hidden-layer LSTMs, each of output dimension 32.
o Then using multiplicative fusion to combine 3 32 dimensional feature vectors.
o This feature vecto is concatenated with the final value of the memory variable, and the resultant 160 dimensional
feature vector is passed through a 64 dimensional fully connected layer followed by a 6 dimensional fully
connected to generate the network outputs
EXPERIMENTS
 Feature extraction:
 Text(ft): Pre-trained GloVe word with 300-dimension embedding method
 Using the COVAREP software (Degottex et al., 2014) to extract acoustic features including 12 Mel-frequency
cepstral coefficients, pitch, voiced/unvoiced segmenting features, glottal source parameters, peak slope
parameters and maxima dispersion quotients.
 Using the combination of face embeddings obtained from state-ofthe-art facial recognition models, facial action
units, and facial landmarks for CMU-MOSEI
EVALUATION
LIMITATION
• Often confuses between certain class labels
• There is no absolute precision of the human perception of emotion in
an instant moment
• May consider adding context to emotional recognition
THANK YOU
ENA HO

More Related Content

What's hot

Speech emotion recognition
Speech emotion recognitionSpeech emotion recognition
Speech emotion recognitionsaniya shaikh
 
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...mathsjournal
 
Human Emotion Recognition using Machine Learning
Human Emotion Recognition using Machine LearningHuman Emotion Recognition using Machine Learning
Human Emotion Recognition using Machine Learningijtsrd
 
Emotion recognition using image processing in deep learning
Emotion recognition using image     processing in deep learningEmotion recognition using image     processing in deep learning
Emotion recognition using image processing in deep learningvishnuv43
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
A critical insight into multi-languages speech emotion databases
A critical insight into multi-languages speech emotion databasesA critical insight into multi-languages speech emotion databases
A critical insight into multi-languages speech emotion databasesjournalBEEI
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarWithTheBest
 
Automatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAutomatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAnkan Dutta
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition systemavinash raibole
 
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODELASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODELsipij
 
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM ModelASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Modelsipij
 
Short story presentation
Short story presentationShort story presentation
Short story presentationStutiAgarwal36
 

What's hot (20)

H010215561
H010215561H010215561
H010215561
 
Speech emotion recognition
Speech emotion recognitionSpeech emotion recognition
Speech emotion recognition
 
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
 
Human Emotion Recognition using Machine Learning
Human Emotion Recognition using Machine LearningHuman Emotion Recognition using Machine Learning
Human Emotion Recognition using Machine Learning
 
Emotion recognition using image processing in deep learning
Emotion recognition using image     processing in deep learningEmotion recognition using image     processing in deep learning
Emotion recognition using image processing in deep learning
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
A critical insight into multi-languages speech emotion databases
A critical insight into multi-languages speech emotion databasesA critical insight into multi-languages speech emotion databases
A critical insight into multi-languages speech emotion databases
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Speaker recognition.
Speaker recognition.Speaker recognition.
Speaker recognition.
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
 
Automatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAutomatic speech recognition system using deep learning
Automatic speech recognition system using deep learning
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODELASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
 
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM ModelASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
 
Mini Project- Audio Enhancement
Mini Project-  Audio EnhancementMini Project-  Audio Enhancement
Mini Project- Audio Enhancement
 
Short story presentation
Short story presentationShort story presentation
Short story presentation
 

Similar to M3er multiplicative_multimodal_emotion_recognition

An ann approach for network
An ann approach for networkAn ann approach for network
An ann approach for networkIJNSA Journal
 
ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...vijaym148
 
Investigation of the performance of multi-input multi-output detectors based...
Investigation of the performance of multi-input multi-output  detectors based...Investigation of the performance of multi-input multi-output  detectors based...
Investigation of the performance of multi-input multi-output detectors based...IJECEIAES
 
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...IJNSA Journal
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...IJECEIAES
 
SIGNAL DETECTION IN MIMO COMMUNICATIONS SYSTEM WITH NON-GAUSSIAN NOISES BASED...
SIGNAL DETECTION IN MIMO COMMUNICATIONS SYSTEM WITH NON-GAUSSIAN NOISES BASED...SIGNAL DETECTION IN MIMO COMMUNICATIONS SYSTEM WITH NON-GAUSSIAN NOISES BASED...
SIGNAL DETECTION IN MIMO COMMUNICATIONS SYSTEM WITH NON-GAUSSIAN NOISES BASED...ijwmn
 
Signal Detection in MIMO Communications System with Non-Gaussian Noises based...
Signal Detection in MIMO Communications System with Non-Gaussian Noises based...Signal Detection in MIMO Communications System with Non-Gaussian Noises based...
Signal Detection in MIMO Communications System with Non-Gaussian Noises based...ijwmn
 
Hardware efficient singular value decomposition in mimo ofdm system
Hardware efficient singular value decomposition in mimo ofdm systemHardware efficient singular value decomposition in mimo ofdm system
Hardware efficient singular value decomposition in mimo ofdm systemIAEME Publication
 
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...cscpconf
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...csandit
 
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...IJECEIAES
 
An Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAn Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAngie Miller
 
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...IDES Editor
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan forijaia
 
Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...journalBEEI
 

Similar to M3er multiplicative_multimodal_emotion_recognition (20)

An ann approach for network
An ann approach for networkAn ann approach for network
An ann approach for network
 
ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
Investigation of the performance of multi-input multi-output detectors based...
Investigation of the performance of multi-input multi-output  detectors based...Investigation of the performance of multi-input multi-output  detectors based...
Investigation of the performance of multi-input multi-output detectors based...
 
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...
 
I0362048053
I0362048053I0362048053
I0362048053
 
SIGNAL DETECTION IN MIMO COMMUNICATIONS SYSTEM WITH NON-GAUSSIAN NOISES BASED...
SIGNAL DETECTION IN MIMO COMMUNICATIONS SYSTEM WITH NON-GAUSSIAN NOISES BASED...SIGNAL DETECTION IN MIMO COMMUNICATIONS SYSTEM WITH NON-GAUSSIAN NOISES BASED...
SIGNAL DETECTION IN MIMO COMMUNICATIONS SYSTEM WITH NON-GAUSSIAN NOISES BASED...
 
Signal Detection in MIMO Communications System with Non-Gaussian Noises based...
Signal Detection in MIMO Communications System with Non-Gaussian Noises based...Signal Detection in MIMO Communications System with Non-Gaussian Noises based...
Signal Detection in MIMO Communications System with Non-Gaussian Noises based...
 
Hardware efficient singular value decomposition in mimo ofdm system
Hardware efficient singular value decomposition in mimo ofdm systemHardware efficient singular value decomposition in mimo ofdm system
Hardware efficient singular value decomposition in mimo ofdm system
 
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
 
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
 
Deep leaning Vincent Vanhoucke
Deep leaning Vincent VanhouckeDeep leaning Vincent Vanhoucke
Deep leaning Vincent Vanhoucke
 
An Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAn Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer Design
 
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan for
 
Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...
 
D111823
D111823D111823
D111823
 

Recently uploaded

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixingviprabot1
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 

Recently uploaded (20)

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixing
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 

M3er multiplicative_multimodal_emotion_recognition

  • 1. THE THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-20)
  • 2. MULTIMODEL(MULTI-INPUT) IN EMOTION RECOGNITION  Reason: o Richer information: Cues from different modalities can augment or complement each other, and hence lead to more sophisticated inference algorithms. o Robustness to Sensor Noise: Information on different modalities captured through sensors can often be cor rupted due to signal noise, or be missing altogether when the particular modality is not expressed, or can not be captured due to occlusion, sensor artifacts, etc. We call such modalities ineffectual. Ineffectual modali ties are especially prevalent in in-the-wild datasets.
  • 5. CHALLENGE  Challenge: o Decide which modalities should be combined and how o Lack of agreement on the most efficient mechanism for combining(fusing) multi modalities
  • 6. TECHNIQUES  Early fusion: Sikka et al 2013: Multiple Kernel Learning for Emotion Recognition in the Wild Majumder et al (2018)
  • 7.  Late fusion: Gunes et al 2007: Multimodal emotion recognition from expressive faces, body gestures Lee et al (2018) Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
  • 8. RELATED WORK  Multimodalities comparision Dataset Method Modalities F1 scores MA IEMOCAP Kim et al (2013) Deep Belief Network Motion capture and audio video 72.8 % Yoon et al(2019) Multi-hop attention Text and Speech 77,6 % Majumdar et al (2018) Text, Audio and Video 76.5 % CMU- MOSEI Zadeh et al (2018) Dynamic Fusion Graph Language, vision and acoustic 76.3% Lee et al (2018) Text and Speech 89% 84.08% Sahay et al(2018) tensor fusion network Text and audio 66.8%
  • 10. MODALITIES CHECK  Purpose: filter ineffectual data to increase the accuracy of reality data Using Canonical Correlation Analysis (CCA) to compute the correlation score, ρ, of every pair of input modalities Compute the correlation score for the pair {𝑓𝑖, 𝑓𝑗} Check them against an empirically chosen threshold (τ)
  • 11. REGENERATING PROXY FEATURE VECTORS  Purpose: decrease the noise of each feature by regenerating proxy feature vectors for the ineffectual modalities missed Finding vj = argminjd(vj, ff), where is any distance metric Compute constants ai ∈ R by solving the following linear system:
  • 12. MULTIPLICATIVE MODALITY FUSION  Idea: to explicitly suppress the weaker (not so expressive) modalities, which indirectly boost the stronger (expressive) modalities The loss for the 𝑖𝑡ℎ modality:
  • 13. MODALITY COMBINATION  Requirement: o Be able to process the sotisphicated – data driven ( CMU-MOSEI, Youtube…) which has noise, occlusion, … o Increase the reliability  Proposal combination: o Using single-hidden-layer LSTMs, each of output dimension 32. o Then using multiplicative fusion to combine 3 32 dimensional feature vectors. o This feature vecto is concatenated with the final value of the memory variable, and the resultant 160 dimensional feature vector is passed through a 64 dimensional fully connected layer followed by a 6 dimensional fully connected to generate the network outputs
  • 14. EXPERIMENTS  Feature extraction:  Text(ft): Pre-trained GloVe word with 300-dimension embedding method  Using the COVAREP software (Degottex et al., 2014) to extract acoustic features including 12 Mel-frequency cepstral coefficients, pitch, voiced/unvoiced segmenting features, glottal source parameters, peak slope parameters and maxima dispersion quotients.  Using the combination of face embeddings obtained from state-ofthe-art facial recognition models, facial action units, and facial landmarks for CMU-MOSEI
  • 16. LIMITATION • Often confuses between certain class labels • There is no absolute precision of the human perception of emotion in an instant moment • May consider adding context to emotional recognition

Editor's Notes

  1. Đường ống phân loại của phương pháp đề xuất. Khi các tính năng hình ảnh và âm thanh được trích xuất, chúng tôi xây dựng một hạt nhân hàm cơ sở xuyên tâm (RBF) từ mỗi bộ mô tả. Sau đó, chúng tôi sử dụng MKL để kết hợp tối ưu các hạt nhân tính năng cho đầu vào vào bộ phân loại SVM.
  2. A direct way to learn about the relationship between these two feature vectors would be to utilize a shallow model, which is a simple concatenation of two feature vectors. However, since the correlations between feature vectors from speech and text is highly non-linear, it is difficult for a shallow model to properly learn multimodal representations. Therefore, we utilize trainable attention mechanisms to learn nonlinear correlations between these feature vectors. Attention mechanisms also help retain information in the timedomain by forming temporal embedding between two feature vectors. 2:Using the cross-validation method to integrate