This document discusses smart workflows for processing lecture recordings without human intervention. It describes using audio analysis to distinguish lecturer speech from noise to identify appropriate start and end trim points for recordings. The audio analysis results, including speech and non-speech segment durations, are placed in workflow properties that a second operation can use to automatically create an edit list file for processing the recordings.
3. ||
Which projector(s) is/are being used?
0, 1 or 2
If 2 projectors are being used, are they
projecting the same thing?
15/02/2018 3
Dual-projector venues
5. ||
Bitrate
< 15 kbps is a black screen (maybe with a ”no signal” message): drop
> 48.5 kbps is probably a presentation: keep
> 15 kbps but < 48.5 kbps? Not sure…
ffprobe blackdetect filter
first scale video to 160px wide at 1fps
Use filter to “detect video intervals that are (almost) completely black”
Calculate percentage of video that is empty
Drop the track if it’s > 90% empty
https://bitbucket.org/cilt/matterhorn_ansible/raw/master/templates/checkpresentations.py
https://bitbucket.org/cilt/matterhorn_ansible/raw/master/templates/videomatch.pl
15/02/2018 5
How can we tell if a presentation stream is “empty”?
6. ||
Bitrate
Bitrates differ by < 1% : assume they are the same: drop one
Bitrates differ by > 10% : assume they are different: keep both
Bitrates differ by > 1% but < 10%? Not sure…
ffmpeg signature filter
Downscale both videos to 256px wide at 1fps
Check to see if videos are binary-identical (md5 sum)
If different, use the ffmpeg signature filter to establish a similarity percentage
Drop one of the presentation streams if > 90% similar
ffmpeg -i video1.mp4 -i video2.mp4 -filter_complex "[0:v][1:v] signature=nb_inputs=2:detectmode=full" -map :v -f null -
https://trac.ffmpeg.org/ticket/6354 Sometimes ffmpeg will segfault with this filter
15/02/2018 6
How can we tell if two videos are “the same”?
7. ||
Mostly self-service recording studio
Set default username from venue booking calendar
(Office365 calendar)
Type in username and metadata (on keyboard)
RFID scanner (access card)
Clinical skills setting for Health Sciences
student assessments and instructional videos
Touch screen: type in student ID (validated)
Hold up student ID card to camera (OCR)
RFID scanner
Auto-create a personal series for the user (download / edit)
15/02/2018 7
Galicaster: who is the presenter?
12. ||
Goal: process recordings straight-through without human intervention, if:
• The recording is a timetabled course event
• Duration < 1 hour
• High confidence about trimming positions (“reasonable” start/end)
Existing Opencast silence detection workflow is not helpful for us, because all
venues have fall-back boundary microphones with DSPs for automatic fallback. So
there’s always an audio signal.
15/02/2018 12
Audio analysis
13. ||
Description in MH-11767 (initially Voice Activity Detection)
Devan Govender trained an audio classifier (https://github.com/tyiannak/pyAudioAnalysis)
with speech and silence/noise from a set of existing UCT recordings
With the model, the classifier distinguishes between lecturer speech and non-
speech
So we can identify appropriate start and end trim points, and decide on a
confidence level for automatic trimming or manual trimming
15/02/2018 13
Distinguishing between speech and noise
15. ||
Results of the audio analysis are placed into workflow properties:
audio_trim_duration=7200182
audio_trim_segments=0;3522000;3593000;4508000;4743000;5171000;5247000;5258000
audio_trim_segments_no=623
audio_trim_segments_speech_no=311
audio_trim_segments_speech_ms=2862000
audio_trim_segments_notspeech_no=312
audio_trim_segments_notspeech_ms=4338000
audio_trim_exec_time=110.916
The second workflow operation will create a SMIL file for editing or automatic
processing (skip editing but process with SMIL as if edited)
15/02/2018 15
Workflow operation results