Multimodal Analysis Of Stand-up Comedians

Multimodal Analysis Of Stand-up Comedians
Audio, Video and Lexical Analysis
Yash Singh, Madhav Sharan, Sree Priyanka Uppu, Nandan PC,
Harsh Fatepuria, Rahul Agrawal

Motivation
Data set Description
Feature Engineering
Feature Analysis
Machine Learning
Conclusion

Why Stand up comedian?
● We love watching stand ups
● They express variety of emotions
● Feedback from audience in form of laughter available.
● Relatively new
Motivation

H1: Certain facial expression could contribute to laughter
Hypotheses

H2 : Pauses and word elongation contribute towards laughter
Hypotheses

H3 : Voice modulation - Pitch and intensity changes can also play a
crucial role

H4 : Laughter is sequential in nature, meaning small laughter could
add up to bigger laughters.

• We collected 3 hours 46 minutes of data from ‘The Tonight Show
Starring Jimmy Fallon’ or ‘Late Night with Conan O’Brien’.
• 46 Videos (11.76Gb) ~ approx 5 mins each
• 27 males and 19 female artists
• The backdrop in the videos is dark
• Most part of the videos ( 80-90%) the artist faces the camera.
Data Collection

• For facial feature extractions, we blacked out the frames manually
when the camera does not capture the artist’s face ~ setting the
video features to those frames as 0.
• For audio features, there can be 0, during a pause or while the
audience are laughing.
Pre Processing

• Manually segment the videos based on punch lines
• Annotate the laughter level in each segment based on product of
mean pitch and mean intensity to–
o Big (55% ~ 100% intensity)
o Small (36% ~ 55% intensity)
o No (0~36% intensity)
• Pitch range of 75 to 625 Hz gives a good sampling rate of 10 ms
and covers a wide range of frequencies.
• Pitch of laughter varies across videos and hence, it is normalized to
the range [0,1].
Data Annotation

OpenSmile
• Extracted:5 low-level descriptors. we extracted
✧ Musical Chroma features - Tone
✧ Prosody features (Loudness and pitch),
✧ Energy(1),
✧ MFCC(13 MFCC from 0-12 from 26 Mel-frequency bands).
• All these features were captured at a frame rate of 10 ms.
• Processing : Aggregated the features on standard deviation and
mean for each segment
Feature Engineering - Audio

OpenFace
• Extracted
✧ eye gaze direction vector in world coordinates for both the eyes
✧ the location of the head with respect to camera in milimeters and the
rotation (radians)
✧ 68 facial landmark location in 2D pixel format (x,y)
✧ 33 rigid and non-rigid shape parameters
✧ 11 AU intensities and AU occurrences.
• Processing : Aggregated the features on standard deviation and mean
for each segment
Feature Engineering - Video

• Analyze features like Action Units, gaze (y and z direction), pose (rotation of
head) and various facial landmark points, Frown and Eyebrow raise.

IBM Watson
● Pauses
● Last pause
● Word elongation
● Sentiments
Feature Engineering - Textual

H1 : AU related features
Feature Analysis - Visual
AU 07 (Lid tightener) AU 14 (Dimpler)

H1 : facial features
Feature Analysis - Visual
Frown (distance)

H2 : Pause related features
Feature Analysis - Textual
Last Pause Length No of pauses

H3 : Pitch , Loudness, Energy related features
Feature Analysis - Audio
Pitch variation Loudness variation

H3 : Pitch , Loudness, Energy related features
Feature Analysis - Audio
Energy variation Energy mean

Multi modal analysis using boosted decision tree
classifier
Machine Learning

H1 : Certain facial expression could contribute to laughter
H2 : Pauses and word elongation contribute towards laughter
Results

H3 : Voice modulation - Pitch and intensity changes can also play a
crucial role
XGBoost

Early Fusion:
● Min Video frames = 100 frames/segment
● Min Audio frames = 30 frames/segment
● Min text = 0 words /segment
● No good way of taking equal frames from each modality → difficult to
do early fusion
H4 : Laughter is sequential in nature :
LSTM - Challenge

H4 : Laughter is sequential in nature :
Late Fusion

To do
● Tune LSTM
● Try other classifiers :
✧ SVM
✧ Naive Bayes

Multimodal Analysis Of Stand-up Comedians

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Multimodal Analysis Of Stand-up Comedians