Successfully reported this slideshow.
Multi – Sensor Information Fusion and
Correlation
A Project By Priyanka Manchanda (9910103483), Dr. Krishna Asawa
Jaypee I...
AGENDA
• Empirical Study
• Problem Statement
• Architectural Framework
• Multi Modal Data Fusion Tool
• Experiments Conduc...
MULTI-MODAL DATA FUSION
• Integrating multiple media along with their
associated features or intermediate decisions
• To p...
ITSPOKE: An intelligent tutoring spoken
dialogue system
SmartKom
Architecture
An emotional Kaleidoscope
Virtual agent Greta
E-Tree
PROBLEM STATEMENT
To design and develop a technique to analyze and
correlate bimodal data sets using energy based
fusion m...
PROBLEM STATEMENT
Implementati
on of Energy
Based Fusion
Model
Multi Sensor Multi modal Data Input
Fused Output
THE BRAIN ENERGY MAPPING MODEL
Rababaah, Aaron R. "Image-Based Multi-Sensor Data Representation and Fusion Via 2D Non-Line...
TYPES OF FUSION
Feature Level Fusion Decision Level Fusion
Hybrid Level Fusion
Features of
Modality 1
Features of
Modality 1
Features of
Modality 3
Features of
Modality 3
Features of
Modality n
Feature...
ARCHITECTURAL FRAMEWORK
MULTI-MODAL DATA FUSION TOOL
To train the machine with data sets:
MultiModalDataFusion.exe <path_to_input> <emotion_type> ...
MULTI-MODAL DATA FUSION TOOL
(OFFLINE)
To Test an unknown dataset with the trained Machine Model:
MultiModalDataFusion.exe...
MULTI-MODAL DATA FUSION TOOL
(ONLINE)
EXPERIMENTS CONDUCTED
• Energy based Bi-Modal data fusion and emotion
recognition model
• Implemented for the ENTERFACE DA...
IMPLEMENTATION OF SOLUTION
MODULES
• Module 1 – Sampling of Bi-Modal Input
• Module 2 – Feature Extraction and Synchronization
• Module 3 – Fusion vi...
Module 1 – Sampling of Bi-Modal Input
ffprobe tool
ffmpeg tool
FFmpeg is a complete, cross-platform solution to record, convert and
stream audio and video.
-i input
-vn video stream
-an...
Module 2 – Feature Extraction and Synchronization
BIMODAL INPUT PROCESSING
Run Two Threads 
hThreadArray[0] = CreateThread(
NULL, // default security attributes
0, // use d...
BIMODAL INPUT PROCESSING
Synchronize using Mutex 
Audio Thread
Wait (Mutex_Audio)
Release(Mutex_Video)
Video Thread
Wait (...
Praat Tool
Praat Tool
Audio Processing Sub-Module
AUDIO SAMPLE PROCESSING
Praat Tool:
Praat (also the Dutch word for "talk") is a free scientific software program for
the a...
ffmpeg Tool
DAFL
Library
Extract Video
Features (Facial
coordinates)
Video Processing Sub-Module
VIDEO SAMPLE PROCESSING
Extract sample of 1 second
‘-ss position (input/output)’
Seeks in this input file to position
‘-t ...
DISCRETE AREA FILTERS (DAF)
FACE DETECTOR LIBRARY
• Step 1 - Define admissible pixels and their local
neighborhoods of ana...
DISCRETE AREA FILTERS (DAF)
FACE DETECTOR LIBRARY
• Step 3 - Design a classifier which decides
whether the collection of f...
DISCRETE AREA FILTERS (DAF)
FACE DETECTOR LIBRARY
Naruniec, Jack, and Wladyslaw Skarbek. “Face detection by discrete gabor...
Module 3 – Fusion via Energy Mapping
GRADIENT COMPUTATION
A B C
D E F
G H I
energy(E) = sqrt(xenergy2
+ yenergy2
)
xenergy = a + 2d + g - c - 2f - i
yenergy = ...
GRADIENT COMPUTATION
USING OPENCV
Step 1 - Remove noise by blurring with a Gaussian filter
void GaussianBlur(InputArray sr...
GRADIENT COMPUTATION
USING OPENCV
• Step 2 - Convert the image to grayscale
void cvtColor(InputArray src, OutputArray dst,...
Step 3 - Apply Laplace function
Mat abs_dst;
Laplacian( src_gray, dst, ddepth, kernel_size, scale, delta, BORDER_DEFAULT )...
convertScaleAbs( dst, abs_dst );
calculates absolute values, and converts the result
to 8-bit.
void convertScaleAbs(InputA...
Module 4 – Emotion Recognition
SUPPORT VECTOR MACHINE
 In machine learning, support vector machines are supervised
learning models with associated learn...
DESIGNING A MACHINE MODEL
LIBSVM is an integrated software for support vector classification
Training Support Vector Machi...
CLASSIFY A SAMPLE
Classify Data Set into appropriate emotion using trained machine
model
python checkdata.py energy.test /...
RESULTS
• Total No. of Samples = 648
• Total No. of Samples for each emotion = 216
• Total No. of Correctly recognised sam...
FUTURE SCOPE / APPLICATIONS
 Real Time Emotion Recognition
 Expressive Embodied
Conversational Agent
 Virtual Counsello...
CONCLUSION
• Contribution to Computer Vision and Machine Learning
• Developed a Command Line Tool which uses Energy based ...
LINKS
• WEBSITE LINK:
https://sites.google.com/site/bimodalfusion/
• GITHUB LINK:
https://github.com/pmanchanda/Multi-Moda...
REFERENCES
[1] Picard, Rosalind W. Affective computing. MIT press, 2000.
[2] Multimodal Emotion Recognition, Dragos Datcu,...
REFERENCES
[5] Naruniec, Jack, and Wladyslaw Skarbek. “Face detection by
discrete gabor jets and reference graph of fiduci...
THANK YOU
Upcoming SlideShare
Loading in …5
×

Multi-Sensor Information Fusion and Correlation - Project presentation

13,941 views

Published on

In this research study, we have develop a technique to analyze and correlate bimodal data sets using energy based fusion model and further recognized the emotional component from these bimodal data sets using Support Vector Machine classifier.

We have endeavored to map the audio and video feature of Bi-Modal input to a common energy scale.

The Energy based Bi-Modal data fusion and emotion recognition model was implemented for the ENTERFACE DATABASE and 3 discrete emotions that are happy, anger and fear. The model shows 93.05% accuracy for subject dependent data fusion and emotion recognition of Happy, Anger and Fear Emotions.

Published in: Engineering, Technology, Education
  • Be the first to comment

Multi-Sensor Information Fusion and Correlation - Project presentation

  1. 1. Multi – Sensor Information Fusion and Correlation A Project By Priyanka Manchanda (9910103483), Dr. Krishna Asawa Jaypee Institute of Information Technology, Noida
  2. 2. AGENDA • Empirical Study • Problem Statement • Architectural Framework • Multi Modal Data Fusion Tool • Experiments Conducted (Accuracy Calculation) • Approach to Solution (4 Modules) • Results • Applications • Conclusion • References
  3. 3. MULTI-MODAL DATA FUSION • Integrating multiple media along with their associated features or intermediate decisions • To perform a multimedia analysis task such as ▫ semantic concept detection ▫ emotion recognition ▫ speaker detection ▫ human tracking ▫ event detection etc.
  4. 4. ITSPOKE: An intelligent tutoring spoken dialogue system
  5. 5. SmartKom Architecture
  6. 6. An emotional Kaleidoscope Virtual agent Greta E-Tree
  7. 7. PROBLEM STATEMENT To design and develop a technique to analyze and correlate bimodal data sets using energy based fusion model and further recognize the emotional state from these bimodal data sets using Support Vector Machine classifier. Video Audio Emotion
  8. 8. PROBLEM STATEMENT Implementati on of Energy Based Fusion Model Multi Sensor Multi modal Data Input Fused Output
  9. 9. THE BRAIN ENERGY MAPPING MODEL Rababaah, Aaron R. "Image-Based Multi-Sensor Data Representation and Fusion Via 2D Non-Linear Convolution." International Journal of Computer Science and Security (IJCSS) 6.2 (2012): 138.
  10. 10. TYPES OF FUSION Feature Level Fusion Decision Level Fusion Hybrid Level Fusion
  11. 11. Features of Modality 1 Features of Modality 1 Features of Modality 3 Features of Modality 3 Features of Modality n Features of Modality n . . . . E N E R G Y C O M P U T A T I O N E N E R G Y C O M P U T A T I O N Features of Modality 2 Features of Modality 2 Energy of Modality 1 Energy of Modality 1 Energy of Modality 3 Energy of Modality 3 Energy of Modality n Energy of Modality n . . . . Energy of Modality 2 Energy of Modality 2 Fused Energy Fused Energy InferenceInference Energy Mapping Phenomenon Modality 1Modality 1 Modality 2Modality 2 Modality 3Modality 3 Modality nModality n
  12. 12. ARCHITECTURAL FRAMEWORK
  13. 13. MULTI-MODAL DATA FUSION TOOL To train the machine with data sets: MultiModalDataFusion.exe <path_to_input> <emotion_type> <path_to_train_file> Here, <emotion_type> can take the following values •happy •anger • fear •neutral •sad
  14. 14. MULTI-MODAL DATA FUSION TOOL (OFFLINE) To Test an unknown dataset with the trained Machine Model: MultiModalDataFusion.exe <path_to_input> test <path_to_train_file> To view the predicted emotion, open the file “result_final.txt”. The file will contain one of the following class: •0 – Happy •1 – Anger •2 – Fear •3 – Neutral •4 – Sad View the file “result.txt” to view percentage of each emotion. <Happy per><Anger per><Fear per><Neutral per><Sad per>
  15. 15. MULTI-MODAL DATA FUSION TOOL (ONLINE)
  16. 16. EXPERIMENTS CONDUCTED • Energy based Bi-Modal data fusion and emotion recognition model • Implemented for the ENTERFACE DATABASE 3 discrete emotions ▫ Happy, ▫ Anger and, ▫ Fear • Accuracy ≈ 93% (subject dependent)
  17. 17. IMPLEMENTATION OF SOLUTION
  18. 18. MODULES • Module 1 – Sampling of Bi-Modal Input • Module 2 – Feature Extraction and Synchronization • Module 3 – Fusion via Energy Mapping • Module 4 – Emotion Recognition
  19. 19. Module 1 – Sampling of Bi-Modal Input ffprobe tool ffmpeg tool
  20. 20. FFmpeg is a complete, cross-platform solution to record, convert and stream audio and video. -i input -vn video stream -an audio stream ‘-acodec codec (input/output)’ Set the audio codec. ‘-vcodec codec (output)’ Set the video codec. Audio ffmpeg.exe -i file_name -vn -acodec copy ./audio/output.wav Video ffmpeg.exe -i file_name -vcodec copy -an ./video/output.mp4 ffprobe : to compute duration ffprobe -i file_name -show_entries format=duration -v quiet -of csv=""p=0"" > duration.txt
  21. 21. Module 2 – Feature Extraction and Synchronization
  22. 22. BIMODAL INPUT PROCESSING Run Two Threads  hThreadArray[0] = CreateThread( NULL, // default security attributes 0, // use default stack size VideoThread, // thread function name NULL, // argument to thread function 0, // use default creation flags &dwThreadIdArray[0]); // returns the thread identifier hThreadArray[1] = CreateThread( NULL, // default security attributes 0, // use default stack size AudioThread, // thread function name NULL, // argument to thread function 0, // use default creation flags &dwThreadIdArray[1]); // returns the thread identifier
  23. 23. BIMODAL INPUT PROCESSING Synchronize using Mutex  Audio Thread Wait (Mutex_Audio) Release(Mutex_Video) Video Thread Wait (Mutex_Video) Release(Mutex_Audio) Main Thread Wait(Mutex_Sample) Call Audio Thread Call Video Thread Release(Mutex_Sample)
  24. 24. Praat Tool Praat Tool Audio Processing Sub-Module
  25. 25. AUDIO SAMPLE PROCESSING Praat Tool: Praat (also the Dutch word for "talk") is a free scientific software program for the analysis of speech in phonetics. Run extract_audio.praat script  Arguments – Start Time, End Time  Output – Segment of audio sample (b/w start and end time) Run audio_extract_features.praat script  Arguments – Start Time, End Time  Ouput – Extract features (Intensity) of sample (b/w start and end time)
  26. 26. ffmpeg Tool DAFL Library Extract Video Features (Facial coordinates) Video Processing Sub-Module
  27. 27. VIDEO SAMPLE PROCESSING Extract sample of 1 second ‘-ss position (input/output)’ Seeks in this input file to position ‘-t duration (output)’ Stop writing the output after its duration reaches duration. duration may be a number in seconds ‘-r fps’ Set frame rate to fps frames per second ‘-async samples_per_second’ Audio sync method. ffmpeg -i ./video/output.mp4 -ss 1 -t 1 -r 20 -async 1 ./video/out1.mp4
  28. 28. DISCRETE AREA FILTERS (DAF) FACE DETECTOR LIBRARY • Step 1 - Define admissible pixels and their local neighborhoods of analysis • Step 2 - Design a feature extractor which produces a collection of features for each admissible local neighborhood Naruniec, Jack, and Wladyslaw Skarbek. “Face detection by discrete gabor jets and reference graph of fiducial points”. In Rough Sets and Knowledge Technology. Springer Berlin Heidelberg (2007)
  29. 29. DISCRETE AREA FILTERS (DAF) FACE DETECTOR LIBRARY • Step 3 - Design a classifier which decides whether the collection of features extracted from the given neighborhood of analysis could be face relevant. • Step 4 -Define a post processing scheme which selects representative face relevant points defining face locations.
  30. 30. DISCRETE AREA FILTERS (DAF) FACE DETECTOR LIBRARY Naruniec, Jack, and Wladyslaw Skarbek. “Face detection by discrete gabor jets and reference graph of fiducial points”. In Rough Sets and Knowledge Technology. Springer Berlin Heidelberg (2007)
  31. 31. Module 3 – Fusion via Energy Mapping
  32. 32. GRADIENT COMPUTATION A B C D E F G H I energy(E) = sqrt(xenergy2 + yenergy2 ) xenergy = a + 2d + g - c - 2f - i yenergy = a + 2b + c - g - 2h – i Each lowercase letter represents the brightness (sum of the red, blue, and green values) of the corresponding pixel. To compute the energy of edge pixels, you should pretend that the image is surrounded by a 1 pixel wide border of black pixels (with 0 brightness).
  33. 33. GRADIENT COMPUTATION USING OPENCV Step 1 - Remove noise by blurring with a Gaussian filter void GaussianBlur(InputArray src, OutputArray dst, Size ksize, double sigmaX, double sigmaY=0, intborderType=BORDER_DEFAULT ) GaussianBlur( src, src, Size(3,3), 0, 0, BORDER_DEFAULT ); Parameters: src – input image dst – output image of the same size and type as src. ksize – Gaussian kernel size sigmaX – Gaussian kernel standard deviation in X direction. sigmaY – Gaussian kernel standard deviation in Y direction; if sigmaY is zero, it is set to be equal to sigmaX borderType – pixel extrapolation method
  34. 34. GRADIENT COMPUTATION USING OPENCV • Step 2 - Convert the image to grayscale void cvtColor(InputArray src, OutputArray dst, int code, int dstCn=0 ) cvtColor( src, src_gray, COLOR_RGB2GRAY ); Parameters: • src – input image • dst – output image of the same size and depth as src. • code – color space conversion code. • dstCn – number of channels in the destination image; if the parameter is 0, the number of the channels is derived automatically from src and code .
  35. 35. Step 3 - Apply Laplace function Mat abs_dst; Laplacian( src_gray, dst, ddepth, kernel_size, scale, delta, BORDER_DEFAULT ); convertScaleAbs( dst, abs_dst ); void Laplacian(InputArray src, OutputArray dst, int ddepth, int ksize=1, double scale=1, double delta=0, intborderType=BORDER_DEFAULT ) Parameters: • src – Source image. • dst – Destination image of the same size and the same number of channels as src . • ddepth – Desired depth of the destination image. • ksize – Aperture size used to compute the second-derivative filters. • scale – Optional scale factor for the computed Laplacian values. By default, no scaling is applied. • delta – Optional delta value that is added to the results prior to storing them in dst . • borderType – Pixel extrapolation method.
  36. 36. convertScaleAbs( dst, abs_dst ); calculates absolute values, and converts the result to 8-bit. void convertScaleAbs(InputArray src, OutputArray dst, double alpha=1, double beta=0) Parameters:src – input array. dst – output array. alpha – optional scale factor. beta – optional delta added to the scaled values. GRADIENT COMPUTATION USING OPENCV
  37. 37. Module 4 – Emotion Recognition
  38. 38. SUPPORT VECTOR MACHINE  In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyse data and recognize patterns, used for classification.  Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other Example  The objects belong either to class GREEN or RED.  The separating line defines a boundary on the right side of which all objects are GREEN and to the left of which all objects are RED.  Any new object falling to the right is labeled, i.e., classified, as GREEN (or classified as RED should it fall to the left of the separating line).
  39. 39. DESIGNING A MACHINE MODEL LIBSVM is an integrated software for support vector classification Training Support Vector Machine with obtained data set python checkdata.py energy.train //check training data svm-train energy.train Data Set Format <label> 1:<Audio Energy> 2:<Video Energy> Labels: 1 – Happy 2 – Anger 3 – Fear 4 – Neutral 5 – Sad
  40. 40. CLASSIFY A SAMPLE Classify Data Set into appropriate emotion using trained machine model python checkdata.py energy.test //check test data svm-predict energy.test energy.train.model energy.out Data Set Format <label> 1:<Audio Energy> 2:<Video Energy> Labels: Random Number (Not a number assigned to class, anything except 1,2,3,4,5) (say 9) Calculate no. of samples classified as Happy (1), Anger (2), Fear(3), Neutral(4), Sad(5) Find percentages and display the emotion with max proportion.
  41. 41. RESULTS • Total No. of Samples = 648 • Total No. of Samples for each emotion = 216 • Total No. of Correctly recognised samples = 603 • Accuracy = 603 / 648 ≈ 93% Happy Anger Fear Happy 92.59% 4.17% 3.24% Anger 1.85% 94.44% 3.70% Fear 1.85% 6.02% 92.13% Happy Anger Fear Happy 200 9 7 Anger 4 204 8 Fear 4 13 199 Confusion Matrix
  42. 42. FUTURE SCOPE / APPLICATIONS  Real Time Emotion Recognition  Expressive Embodied Conversational Agent  Virtual Counsellor  Virtual Tutor  Questionnaire which analyse verbal and non-verbal behaviour
  43. 43. CONCLUSION • Contribution to Computer Vision and Machine Learning • Developed a Command Line Tool which uses Energy based Bi- Modal Information Fusion to Test/Train emotional state of user • Obtained an accuracy of 93% for ENTERFACE database
  44. 44. LINKS • WEBSITE LINK: https://sites.google.com/site/bimodalfusion/ • GITHUB LINK: https://github.com/pmanchanda/Multi-Modal-Data-Fusion • YOUTUBE LINK: http://youtu.be/FQeQIqIDx_Q • SLIDESHARE LINK: http://www.slideshare.net/pp11/multisensor-information-fusion-and-correl
  45. 45. REFERENCES [1] Picard, Rosalind W. Affective computing. MIT press, 2000. [2] Multimodal Emotion Recognition, Dragos Datcu, Ph.D. Thesis (2009), TU Delft [3] Rababaah, Aaron R. "Image-Based Multi-Sensor Data Representation and Fusion Via 2D Non-Linear Convolution." International Journal of Computer Science and Security (IJCSS) 6.2 (2012): 138. [4] Rababaah, Haroun, and Amir Shirkhodaie. "Energy Logic (EL): a novel fusion engine of multi-modality multi-agent data/information fusion for intelligent surveillance systems." SPIE Defense, Security, and Sensing. International Society for Optics and Photonics, 2009.
  46. 46. REFERENCES [5] Naruniec, Jack, and Wladyslaw Skarbek. “Face detection by discrete gabor jets and reference graph of fiducial points”. In Rough Sets and Knowledge Technology(pp. 187-194). Springer Berlin Heidelberg(2007) [6] Martin, Olivier, et al. "The eNTERFACE&# 146; 05 Audio- Visual Emotion Database." Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on. IEEE, 2006.
  47. 47. THANK YOU

×