This document presents a framework for automatically summarizing news videos and reports. It discusses collecting news videos from major channels on trending topics. Algorithms are developed for shot transition detection, speech recognition, detecting "talking head" shots and day/night shots. Metrics are extracted on video quality. The framework combines these analyses to generate a summary by identifying the key elements in the video.
Early Development of Mammals (Mouse and Human).pdf
Video summarization framework for newscasts and reports – work in progress
1. Video Summarization
Framework for Newscasts
and Reports – Work in
Progress
Mikołaj Leszczuk
Michał Grega
Arian Koźbiał
Jarosław Gliwski
Krzysztof Wasieczko
Kamel Smaïli
2. Introduction
• 300 h of video uploaded to YouTube every min
• Average video length: 4 min & 20 s
• “We live in 140 characters era”
• How to assimilate main ideas carried by video?
• Best way: summarizing information
2017-11-17 2
3. Database of Video Sequences
• Focusing on
summarization of
newscasts & reports in
our research
• Major news channels,
like Euronews,
France24, BBC, Russia
Today & Al Jazeera
• Data on trending topics
(based on Twitter)
• Topics:
• “Syria”
• “Real Madrid – FC
Barcelona”
• “Animal rights”
• “Women’s rights”
• “Homosexual marriage”
• “Drug liberalization”
• “Death sentence”
• “Occupied territories”
• “Trump”
2017-11-17 3
7. Metadata Extraction Algorithms
• Shot transition detection
• Speech recognition
• Detection of “talking head” shots
• Detection of day and night shots
• Video quality indicators
2017-11-17 7
8. Shot Transition Detection
• Automated detection of
transitions between
shots in digital video
• Purpose: temporal
segmentation of videos
• Based on Py-Scene-
Detect
2017-11-17 8
9. Speech recognition
• Automatic recognition & translation of spoken
language into text (by computer)
• Each video frame related to transcription and vice
versa
• Languages: English, French, Arabic
2017-11-17 9
10. Detection of “Talking Head” Shots
(1/2)
Shot
Frame 1 Frame 2
To Grayscale
Histogram Equalization
Face Detection
Mouth Analysis
Frame n
Percentage of number
frames with face to the
number of frames w/o
face
Percentage of area of
face to the area of the
frame
Percentage of frames
with more than one
face
Percentage of frames
with open mouth to
frames with closed
mouth
2017-11-17 10
11. Detection of
“Talking Head”
Shots (2/2) Is ratio
of # of frames with face
to # of frames w/o
face >20%?
Is area
of face to the area of frame
>3%?
Is ratio of #
frames with more than 1
face to # of frames in
shot <10%?
AND
Is ratio of #
frames with open mouth
to # frames with closed
mouth >20%?
NotTalkingHead
Talking Head
Start
T
F
T
F
F
T
Sensitivity: 88%
Specificity: 100%
2017-11-17 11
12. Detection of Day and Night Shots
• Based on neural
network
• Tested on >2000
photos
• Accuracy >90%
2017-11-17 12