“Results on Video Summarization”
Mikołaj Leszczuk, Michał Grega, Jan Derkacz
2017-04-28
Video Summarization
Framework Work-Flow
2
Shot Boundary Detection
(SBD)
» Based on
Py-Scene-
Detect
» Integrated
3
Classification of
Video Sequences
VideoCategories
A
(News Report)
B
(Discussion)
C
(Video Stream)
» Based on pattern
of emerging faces
» Technique of
Hidden Markov
Models
» Pending
4
Detection of “Talking Head”
Shots (1/2)
» Based on Mouth
Region of Interest
processing
» Processed shot-by-
shot
5
Face detection
Mouth
movement
detection
Cascade
classifier
Not Talking HeadTalking Head
Detection of “Talking Head”
Shots (2/2)
» Face detection using Haar Cascades
» Sensitivity 88%, Specificity 100%
» Integrated
6
Detection of Day & Night
Shots
» Based on
neural
network
» Tested on
>2000
photos
» Efficiency
>90%
» Integrated
7
Video Quality Indicators
» Video quality
assessment system for
video sequences
» Quality of Experience
(QoE)
» 13 quality parameters
» Temporal Activity (TA)
» Spatial Activity (SA)
» Integrated
8
Recognition Events for Purpose
of Summarizing Video Sequences
» Creation &
implementation of
algorithms to recognize
motions/gestures &
other events in video
sequences
» Pending
9
By Comixboy at English Wikipedia, CC BY 2.5,
https://commons.wikimedia.org/w/index.php?curid=9672553
Database Statistics
» Number of videos indexed – 5423
» Number of frames indexed – 27 384 115
» Features indexed:
– Shot Boundary Detection
– 13 Video Quality Indicators
– Spatial Activity
– Temporal Activity
» Features pending (expected May 2017):
– Automatic Speech Recognition
– Day/Night
10
1st Version of Content Analysis &
Video Summarization Components
11
0
20
40
60
80
100
120
1
129
257
385
513
641
769
897
1025
1153
1281
1409
1537
1665
1793
1921
2049
2177
2305
2433
2561
2689
2817
2945
3073
3201
3329
3457
3585
3713
3841
3969
4097
4225
4353
4481
4609
4737
4865
4993
5121
5249
5377
5505
5633
5761
5889
6017
6145
6273
6401
6529
6657
6785
6913
7041
7169
7297
7425
7553
7681
7809
7937
Activity
Frame Number
5KPk3rkESlU
Spatial Activity Temporal Activity
Demonstration
(Original)
12
Demonstration
(Summarised)
13
Memes – Updated Schema
14
Evaluation of Multimedia
Content Summarisation Algorithms
» Together with DEUSTO
» Review of State-of-the-Art
» Collaboration with
Video Quality Experts Group
– Project: Quality Assessment
for Recognition and Task-
based multimedia
applications (QART)
– Meeting in May 2017
» Pending
15

Results on video summarization

  • 1.
    “Results on VideoSummarization” Mikołaj Leszczuk, Michał Grega, Jan Derkacz 2017-04-28
  • 2.
  • 3.
    Shot Boundary Detection (SBD) »Based on Py-Scene- Detect » Integrated 3
  • 4.
    Classification of Video Sequences VideoCategories A (NewsReport) B (Discussion) C (Video Stream) » Based on pattern of emerging faces » Technique of Hidden Markov Models » Pending 4
  • 5.
    Detection of “TalkingHead” Shots (1/2) » Based on Mouth Region of Interest processing » Processed shot-by- shot 5 Face detection Mouth movement detection Cascade classifier Not Talking HeadTalking Head
  • 6.
    Detection of “TalkingHead” Shots (2/2) » Face detection using Haar Cascades » Sensitivity 88%, Specificity 100% » Integrated 6
  • 7.
    Detection of Day& Night Shots » Based on neural network » Tested on >2000 photos » Efficiency >90% » Integrated 7
  • 8.
    Video Quality Indicators »Video quality assessment system for video sequences » Quality of Experience (QoE) » 13 quality parameters » Temporal Activity (TA) » Spatial Activity (SA) » Integrated 8
  • 9.
    Recognition Events forPurpose of Summarizing Video Sequences » Creation & implementation of algorithms to recognize motions/gestures & other events in video sequences » Pending 9 By Comixboy at English Wikipedia, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=9672553
  • 10.
    Database Statistics » Numberof videos indexed – 5423 » Number of frames indexed – 27 384 115 » Features indexed: – Shot Boundary Detection – 13 Video Quality Indicators – Spatial Activity – Temporal Activity » Features pending (expected May 2017): – Automatic Speech Recognition – Day/Night 10
  • 11.
    1st Version ofContent Analysis & Video Summarization Components 11 0 20 40 60 80 100 120 1 129 257 385 513 641 769 897 1025 1153 1281 1409 1537 1665 1793 1921 2049 2177 2305 2433 2561 2689 2817 2945 3073 3201 3329 3457 3585 3713 3841 3969 4097 4225 4353 4481 4609 4737 4865 4993 5121 5249 5377 5505 5633 5761 5889 6017 6145 6273 6401 6529 6657 6785 6913 7041 7169 7297 7425 7553 7681 7809 7937 Activity Frame Number 5KPk3rkESlU Spatial Activity Temporal Activity
  • 12.
  • 13.
  • 14.
  • 15.
    Evaluation of Multimedia ContentSummarisation Algorithms » Together with DEUSTO » Review of State-of-the-Art » Collaboration with Video Quality Experts Group – Project: Quality Assessment for Recognition and Task- based multimedia applications (QART) – Meeting in May 2017 » Pending 15

Editor's Notes

  • #7 Face should be large enough (3% of the scene size) One face only (90% of the frames in scene with 1 face) Open/closed ratio 20% or higher
  • #16 We have a video of a real event, for example the election in France. The entire recording is 15 minutes, we want to shorten it to 1.5 minutes. Algorithm cuts and processes video. Now we want to compare how much content from the original video got into the summary. Someone (some researchers from the project) watch these 15 mins and they make a summary, they tell the most important thing they learned. It would be good if they were journalists, not engineers. Now we can ask people to write down what they learned from summaries and do text mining, or this is true of the facts described by professionals, or we can ask to generate questions by specialists and taking a viewing test. In each of these cases we have the problem of knowing before looking at a summary that needs to be addressed in some way.