Mtech First progress PRESENTATION ON VIDEO SUMMARIZATION
1. SEPTEMBER 26th, 2017
Neeraj Baghel
M.Tech, 178150005
Under the Supervision of
Prof. Charul Bhatnagar
Professor, Deptt. of CEA
GLA University, Mathura
1/20
FIRST PROGRESS PRESENTATION
ON
VIDEO SUMMARIZATION
2. OUTLINE
Video Summarization
Types of Video Summarization
Applications
Issues & Challenges
Tools & Datasets
Journals & Conferences
Researchers & Groups
References
2/20
3. Video
• Video data is a great asset
for information extraction
and knowledge discovery.
• Due to its size an variability,
it is extremely hard for
users to monitor.[4]
Video Summarization
• Intelligent video
summarization algorithms
allow us to quickly browse a
lengthy video by capturing
the essence and removing
redundant information.[4]
3/20
Video Summarization
[4] Sharghi, Aidean, "Query-focused video summarization: Dataset, evaluation, and a memory network based
approach." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017.
[9] https://www.slideshare.net/MikolajLeszczuk/results-on-video-summarization (D.L.V 01/09/18)
Fig 1: Video Summarization Work Flow [9]
4. Video can be summarized by two different ways which are as
follows.
4/20
Types of Video summarization
Fig 2: Video Summarization Technique Classification [7]
[7] Mundur, Padmavathi, Yong Rao, and Yelena Yesha. "Keyframe-based video summarization using Delaunay
clustering." International Journal on Digital Libraries 6.2 (2006): 219-232. (D.L.V 20/08/18)
5. Key Frame Extraction
Fig 3: Key Frame Extraction [8]
5/20
[8] Souza, Celso L. de, et al. "A unified approach to content-based indexing and retrieval of digital videos from
television archives." (2014). (D.L.V 05/09/18)
6. Video Skims
• This is also called a moving-image abstract, moving story
board, or summary sequence.
• The original video is segmented into various parts which is a
video clip with shorter duration.
6/20
[11] https://www.cs.cmu.edu/~msmith/skim_homepage.html
Fig 4: Automated Video Skimming Informedia Digital Video Library Project [11]
7. Applications
The application of video summarization can be divided into three
main categories:
1) Consumer Video Applications
Browsing the recorded content
View the interesting parts quickly
7/20
Fig 4: View The Interesting Parts Quickly [12]
[12] https://www.youtube.com/watch?v=OHAWwaYu2H0&t=46s (D.L.V 20/09/18)
8. Cont…
2) Image-Video Databases Management
Video search engine
Digital video library
Object indexing and retrieval
Automatic object labeling
8/20
Fig 5: Digital video library [13]
[13] https://www.searchenginejournal.com/deep-learning-powers-video-seo/175145/ (D.L.V 21/09/18)
10. Issues and Challenges
Some general issues and Challengesrelated to video
summarization:
Loss of information
Computationally expensive
Evaluate the performance of a video summarizer
No single video summarizer fits all users
10/20
11. Tools
Matlab
Matlab is a commercial product that is pretty widely-used in the image
/video processing community. It also has an adequate image processing
`toolbox,' and toolboxes for things like Kalman filters, neural networks,
genetic algorithms, and so on. It runs on most Unices, including Linux, and
on Windows 95/NT. For people who are researching into vision algorithms,
the lack of source code is a killer.
OpenCV
is a library of programming functions mainly aimed at real-time computer
vision. Originally developed by Intel. The library is cross-platform and free
for use under the open-source BSD license
11/20
12. Datasets
UT Egocnetric (UTE)
The dataset contains 4 videos from head-mounted cameras, each about 3-
5 hours long. (Size: 1.4Gb)
SumMe
The dataset consists of 25 videos which are single-shot and range in length
from 1-6 minutes. The dataset contains summaries created by 15 to 18
users with the constraint in length being that the summaries should be 5%
to 15% of the original video. (Size: 2.2 GB)
12/20
13. Datasets Cont…
Dataset
YouTube-8M
YouTube-8M is a large-scale labeled video dataset that consists of millions of
YouTube video IDs and associated labels from a diverse vocabulary of 4700+
visual entities
• Each video must be public and have at least 1000 views
• Each video must be between 120 and 500 seconds long
• Each video must be associated with at least one entity from our target
vocabulary
• Adult & sensitive content is removed (as determined by automated classifiers)
May 2018 version (current): 6.1M videos, 3862 classes, 3.0 labels/video, 2.6B
audio-visual features
13/20
14. Datasets Cont…
Dataset
MED Summaries
The "MED Summaries" is a dataset for evaluation of dynamic video
summaries. It contains annotations of 160 videos: a validation set of 60
videos and a test set of 100 videos. There are 10 event categories in the
test set. The current available dataset is from 235 users, all images are in
bitmap(*.bmp)format. The resolution of these images is 800 * 600 pixels.
(size:12Gb).
14/20
15. Journals
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Image Processing
SPINGER-IPSJ Transactions on Computer Vision and
Applications (CVA)
ELSEVIER- Computer Vision and Image Understanding
ELSEVIER-Pattern Recognition
IJCV - International Journal of Computer Vision
IJIPA- International Journal of Image Processing and Applications
IET- The Institution of Engineering and Technology
15/20
16. Conferences
IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR)
IEEE International Conference on Image Processing (ICIP)
IEEE/CVF International Conference on Computer Vision (ICCV)
IEEE Winter Conference on Applications of Computer Vision (WACV)
ACCV - Asian Conference on Computer Vision
ECCV - European Conference on Computer Vision
CVIP- International Conference on Computer Vision and Image
Processing , India
NCVPRIPG -National Conference on Computer Vision, Pattern
Recognition, Image Processing and Graphics , India
16/20
17. Research Group
17/20
Fei-Fei Li
Professor Director, Stanford AI Lab
Computer Science Department
Feifeili@cs.stanford.edu
Stanford Computer Vision Lab
Animesh Garg
Professor ,Stanford AI Lab
Computer Science Department
garg@cs.standford.edu
18. Research Group
18/20
Aidean Sharghi
Center for Research in Computer Vision,
University of Central Florida
aidean.sharghi@gmail.com
Boqing Gong
Assistant Professor
Center for Research in Computer Vision
Department of Computer Science
University of Central Florida
boqingGo@outlook.com
Center for Research in Computer Vision,
University of Central Florida
19. Research Group
19/20
Abhishek Sarkar
Senior Research Scientist
International Institute of Information Technology
Hyderabad, INDIA
Abhishek.sarkar@iiit.ac.in
Dr. C. V. Jawahar
Researcher,
International Institute of Information Technology
Hyderabad, INDIA
jawahar@iiit.ac.in
International Institute of Information Technology
20. References
[1] Song, Yale, et al. "Tvsum: Summarizing web videos using
titles." Proceedings of the IEEE conference on computer vision and pattern
recognition. 2015.
[2] Zhuang, Yueting, Ruogui Xiao, and Fei Wu. "Key issues in video summarization and
its application." Information, Communications and Signal Processing, 2003 and
Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint
Conference of the Fourth International Conference on. Vol. 1. IEEE, 2003.
[3] Kansagara, Ravi, Darshak Thakore, and Mahasweta Joshi. "A study on video
summarization tech-niques." International journal of innovative research in
computer and communication engi-neering 2 (2014).
[4] Sharghi, Aidean, Jacob S. Laurel, and Boqing Gong. "Query-focused video
summarization: Dataset, evaluation, and a memory network based
approach." The IEEE Conference on Computer Vision and Pattern Recognition (
(CVPR). 2017.
[5] Ramesh, Animesh, et al. "Video Summarization: An Overview of Techniques.“
20/20
21. References
[6] Sabbar, W.; Chergui, A.; Bekkhoucha, A., "Video summarization using shot
segmentation and local motion estimation," InnovativeComputing Technology
(INTECH), 2012 Second International Conference on, vol., no., pp.190, 193, 18-20
Sept. 2012
[7] Mundur, Padmavathi, Yong Rao, and Yelena Yesha. "Keyframe-based video
summarization using Delaunay clustering." International Journal on Digital
Libraries 6.2 (2006): 219-232.
[8] Souza, Celso L. de, et al. "A unified approach to content-based indexing and
retrieval of digital videos from television archives." (2014).
[9] https://www.slideshare.net/MikolajLeszczuk/results-on-video-summarization
[10] Landy, Michael S., Yoav Cohen, and George Sperling. "HIPS: A Unix-based image
processing system." Computer Vision, Graphics, and Image Processing 25.3
(1984): 331-347.
21/20