SEPTEMBER 26th, 2017
Neeraj Baghel
M.Tech, 178150005
Under the Supervision of
Prof. Charul Bhatnagar
Professor, Deptt. of CEA
GLA University, Mathura
1/20
FIRST PROGRESS PRESENTATION
ON
VIDEO SUMMARIZATION
OUTLINE
 Video Summarization
 Types of Video Summarization
 Applications
 Issues & Challenges
 Tools & Datasets
 Journals & Conferences
 Researchers & Groups
 References
2/20
Video
• Video data is a great asset
for information extraction
and knowledge discovery.
• Due to its size an variability,
it is extremely hard for
users to monitor.[4]
Video Summarization
• Intelligent video
summarization algorithms
allow us to quickly browse a
lengthy video by capturing
the essence and removing
redundant information.[4]
3/20
Video Summarization
[4] Sharghi, Aidean, "Query-focused video summarization: Dataset, evaluation, and a memory network based
approach." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017.
[9] https://www.slideshare.net/MikolajLeszczuk/results-on-video-summarization (D.L.V 01/09/18)
Fig 1: Video Summarization Work Flow [9]
Video can be summarized by two different ways which are as
follows.
4/20
Types of Video summarization
Fig 2: Video Summarization Technique Classification [7]
[7] Mundur, Padmavathi, Yong Rao, and Yelena Yesha. "Keyframe-based video summarization using Delaunay
clustering." International Journal on Digital Libraries 6.2 (2006): 219-232. (D.L.V 20/08/18)
Key Frame Extraction
Fig 3: Key Frame Extraction [8]
5/20
[8] Souza, Celso L. de, et al. "A unified approach to content-based indexing and retrieval of digital videos from
television archives." (2014). (D.L.V 05/09/18)
Video Skims
• This is also called a moving-image abstract, moving story
board, or summary sequence.
• The original video is segmented into various parts which is a
video clip with shorter duration.
6/20
[11] https://www.cs.cmu.edu/~msmith/skim_homepage.html
Fig 4: Automated Video Skimming Informedia Digital Video Library Project [11]
Applications
The application of video summarization can be divided into three
main categories:
1) Consumer Video Applications
 Browsing the recorded content
 View the interesting parts quickly
7/20
Fig 4: View The Interesting Parts Quickly [12]
[12] https://www.youtube.com/watch?v=OHAWwaYu2H0&t=46s (D.L.V 20/09/18)
Cont…
2) Image-Video Databases Management
 Video search engine
 Digital video library
 Object indexing and retrieval
 Automatic object labeling
8/20
Fig 5: Digital video library [13]
[13] https://www.searchenginejournal.com/deep-learning-powers-video-seo/175145/ (D.L.V 21/09/18)
Cont…
2) Surveillance
 Outdoor Perimeter Security
 Internet Security Systems
 Parking Lots
 Traffic Monitoring
Fig 6 :Traffic Monitoring[14]
Fig 7:Outdoor Perimeter Security[14]
9/20
[14] https://www.framos.com/en/solutions/mobility/ (D.L.V 21/09/18)
Issues and Challenges
Some general issues and Challengesrelated to video
summarization:
 Loss of information
 Computationally expensive
 Evaluate the performance of a video summarizer
 No single video summarizer fits all users
10/20
Tools
 Matlab
Matlab is a commercial product that is pretty widely-used in the image
/video processing community. It also has an adequate image processing
`toolbox,' and toolboxes for things like Kalman filters, neural networks,
genetic algorithms, and so on. It runs on most Unices, including Linux, and
on Windows 95/NT. For people who are researching into vision algorithms,
the lack of source code is a killer.
 OpenCV
is a library of programming functions mainly aimed at real-time computer
vision. Originally developed by Intel. The library is cross-platform and free
for use under the open-source BSD license
11/20
Datasets
 UT Egocnetric (UTE)
The dataset contains 4 videos from head-mounted cameras, each about 3-
5 hours long. (Size: 1.4Gb)
 SumMe
The dataset consists of 25 videos which are single-shot and range in length
from 1-6 minutes. The dataset contains summaries created by 15 to 18
users with the constraint in length being that the summaries should be 5%
to 15% of the original video. (Size: 2.2 GB)
12/20
Datasets Cont…
Dataset
 YouTube-8M
YouTube-8M is a large-scale labeled video dataset that consists of millions of
YouTube video IDs and associated labels from a diverse vocabulary of 4700+
visual entities
• Each video must be public and have at least 1000 views
• Each video must be between 120 and 500 seconds long
• Each video must be associated with at least one entity from our target
vocabulary
• Adult & sensitive content is removed (as determined by automated classifiers)
May 2018 version (current): 6.1M videos, 3862 classes, 3.0 labels/video, 2.6B
audio-visual features
13/20
Datasets Cont…
Dataset
 MED Summaries
The "MED Summaries" is a dataset for evaluation of dynamic video
summaries. It contains annotations of 160 videos: a validation set of 60
videos and a test set of 100 videos. There are 10 event categories in the
test set. The current available dataset is from 235 users, all images are in
bitmap(*.bmp)format. The resolution of these images is 800 * 600 pixels.
(size:12Gb).
14/20
Journals
 IEEE Transactions on Pattern Analysis and Machine Intelligence
 IEEE Transactions on Image Processing
 SPINGER-IPSJ Transactions on Computer Vision and
Applications (CVA)
 ELSEVIER- Computer Vision and Image Understanding
 ELSEVIER-Pattern Recognition
 IJCV - International Journal of Computer Vision
 IJIPA- International Journal of Image Processing and Applications
 IET- The Institution of Engineering and Technology
15/20
Conferences
 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR)
 IEEE International Conference on Image Processing (ICIP)
 IEEE/CVF International Conference on Computer Vision (ICCV)
 IEEE Winter Conference on Applications of Computer Vision (WACV)
 ACCV - Asian Conference on Computer Vision
 ECCV - European Conference on Computer Vision
 CVIP- International Conference on Computer Vision and Image
Processing , India
 NCVPRIPG -National Conference on Computer Vision, Pattern
Recognition, Image Processing and Graphics , India
16/20
Research Group
17/20
Fei-Fei Li
Professor Director, Stanford AI Lab
Computer Science Department
Feifeili@cs.stanford.edu
Stanford Computer Vision Lab
Animesh Garg
Professor ,Stanford AI Lab
Computer Science Department
garg@cs.standford.edu
Research Group
18/20
Aidean Sharghi
Center for Research in Computer Vision,
University of Central Florida
aidean.sharghi@gmail.com
Boqing Gong
Assistant Professor
Center for Research in Computer Vision
Department of Computer Science
University of Central Florida
boqingGo@outlook.com
Center for Research in Computer Vision,
University of Central Florida
Research Group
19/20
Abhishek Sarkar
Senior Research Scientist
International Institute of Information Technology
Hyderabad, INDIA
Abhishek.sarkar@iiit.ac.in
Dr. C. V. Jawahar
Researcher,
International Institute of Information Technology
Hyderabad, INDIA
jawahar@iiit.ac.in
International Institute of Information Technology
References
[1] Song, Yale, et al. "Tvsum: Summarizing web videos using
titles." Proceedings of the IEEE conference on computer vision and pattern
recognition. 2015.
[2] Zhuang, Yueting, Ruogui Xiao, and Fei Wu. "Key issues in video summarization and
its application." Information, Communications and Signal Processing, 2003 and
Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint
Conference of the Fourth International Conference on. Vol. 1. IEEE, 2003.
[3] Kansagara, Ravi, Darshak Thakore, and Mahasweta Joshi. "A study on video
summarization tech-niques." International journal of innovative research in
computer and communication engi-neering 2 (2014).
[4] Sharghi, Aidean, Jacob S. Laurel, and Boqing Gong. "Query-focused video
summarization: Dataset, evaluation, and a memory network based
approach." The IEEE Conference on Computer Vision and Pattern Recognition (
(CVPR). 2017.
[5] Ramesh, Animesh, et al. "Video Summarization: An Overview of Techniques.“
20/20
References
[6] Sabbar, W.; Chergui, A.; Bekkhoucha, A., "Video summarization using shot
segmentation and local motion estimation," InnovativeComputing Technology
(INTECH), 2012 Second International Conference on, vol., no., pp.190, 193, 18-20
Sept. 2012
[7] Mundur, Padmavathi, Yong Rao, and Yelena Yesha. "Keyframe-based video
summarization using Delaunay clustering." International Journal on Digital
Libraries 6.2 (2006): 219-232.
[8] Souza, Celso L. de, et al. "A unified approach to content-based indexing and
retrieval of digital videos from television archives." (2014).
[9] https://www.slideshare.net/MikolajLeszczuk/results-on-video-summarization
[10] Landy, Michael S., Yoav Cohen, and George Sperling. "HIPS: A Unix-based image
processing system." Computer Vision, Graphics, and Image Processing 25.3
(1984): 331-347.
21/20
References
[11] https://www.cs.cmu.edu/~msmith/skim_homepage.html
[12] https://www.youtube.com/watch?v=OHAWwaYu2H0&t=46s
[13] https://www.searchenginejournal.com/deep-learning-powers-video-
seo/175145/
[14] https://www.framos.com/en/solutions/mobility/
22/20
23

Mtech First progress PRESENTATION ON VIDEO SUMMARIZATION

  • 1.
    SEPTEMBER 26th, 2017 NeerajBaghel M.Tech, 178150005 Under the Supervision of Prof. Charul Bhatnagar Professor, Deptt. of CEA GLA University, Mathura 1/20 FIRST PROGRESS PRESENTATION ON VIDEO SUMMARIZATION
  • 2.
    OUTLINE  Video Summarization Types of Video Summarization  Applications  Issues & Challenges  Tools & Datasets  Journals & Conferences  Researchers & Groups  References 2/20
  • 3.
    Video • Video datais a great asset for information extraction and knowledge discovery. • Due to its size an variability, it is extremely hard for users to monitor.[4] Video Summarization • Intelligent video summarization algorithms allow us to quickly browse a lengthy video by capturing the essence and removing redundant information.[4] 3/20 Video Summarization [4] Sharghi, Aidean, "Query-focused video summarization: Dataset, evaluation, and a memory network based approach." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. [9] https://www.slideshare.net/MikolajLeszczuk/results-on-video-summarization (D.L.V 01/09/18) Fig 1: Video Summarization Work Flow [9]
  • 4.
    Video can besummarized by two different ways which are as follows. 4/20 Types of Video summarization Fig 2: Video Summarization Technique Classification [7] [7] Mundur, Padmavathi, Yong Rao, and Yelena Yesha. "Keyframe-based video summarization using Delaunay clustering." International Journal on Digital Libraries 6.2 (2006): 219-232. (D.L.V 20/08/18)
  • 5.
    Key Frame Extraction Fig3: Key Frame Extraction [8] 5/20 [8] Souza, Celso L. de, et al. "A unified approach to content-based indexing and retrieval of digital videos from television archives." (2014). (D.L.V 05/09/18)
  • 6.
    Video Skims • Thisis also called a moving-image abstract, moving story board, or summary sequence. • The original video is segmented into various parts which is a video clip with shorter duration. 6/20 [11] https://www.cs.cmu.edu/~msmith/skim_homepage.html Fig 4: Automated Video Skimming Informedia Digital Video Library Project [11]
  • 7.
    Applications The application ofvideo summarization can be divided into three main categories: 1) Consumer Video Applications  Browsing the recorded content  View the interesting parts quickly 7/20 Fig 4: View The Interesting Parts Quickly [12] [12] https://www.youtube.com/watch?v=OHAWwaYu2H0&t=46s (D.L.V 20/09/18)
  • 8.
    Cont… 2) Image-Video DatabasesManagement  Video search engine  Digital video library  Object indexing and retrieval  Automatic object labeling 8/20 Fig 5: Digital video library [13] [13] https://www.searchenginejournal.com/deep-learning-powers-video-seo/175145/ (D.L.V 21/09/18)
  • 9.
    Cont… 2) Surveillance  OutdoorPerimeter Security  Internet Security Systems  Parking Lots  Traffic Monitoring Fig 6 :Traffic Monitoring[14] Fig 7:Outdoor Perimeter Security[14] 9/20 [14] https://www.framos.com/en/solutions/mobility/ (D.L.V 21/09/18)
  • 10.
    Issues and Challenges Somegeneral issues and Challengesrelated to video summarization:  Loss of information  Computationally expensive  Evaluate the performance of a video summarizer  No single video summarizer fits all users 10/20
  • 11.
    Tools  Matlab Matlab isa commercial product that is pretty widely-used in the image /video processing community. It also has an adequate image processing `toolbox,' and toolboxes for things like Kalman filters, neural networks, genetic algorithms, and so on. It runs on most Unices, including Linux, and on Windows 95/NT. For people who are researching into vision algorithms, the lack of source code is a killer.  OpenCV is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel. The library is cross-platform and free for use under the open-source BSD license 11/20
  • 12.
    Datasets  UT Egocnetric(UTE) The dataset contains 4 videos from head-mounted cameras, each about 3- 5 hours long. (Size: 1.4Gb)  SumMe The dataset consists of 25 videos which are single-shot and range in length from 1-6 minutes. The dataset contains summaries created by 15 to 18 users with the constraint in length being that the summaries should be 5% to 15% of the original video. (Size: 2.2 GB) 12/20
  • 13.
    Datasets Cont… Dataset  YouTube-8M YouTube-8Mis a large-scale labeled video dataset that consists of millions of YouTube video IDs and associated labels from a diverse vocabulary of 4700+ visual entities • Each video must be public and have at least 1000 views • Each video must be between 120 and 500 seconds long • Each video must be associated with at least one entity from our target vocabulary • Adult & sensitive content is removed (as determined by automated classifiers) May 2018 version (current): 6.1M videos, 3862 classes, 3.0 labels/video, 2.6B audio-visual features 13/20
  • 14.
    Datasets Cont… Dataset  MEDSummaries The "MED Summaries" is a dataset for evaluation of dynamic video summaries. It contains annotations of 160 videos: a validation set of 60 videos and a test set of 100 videos. There are 10 event categories in the test set. The current available dataset is from 235 users, all images are in bitmap(*.bmp)format. The resolution of these images is 800 * 600 pixels. (size:12Gb). 14/20
  • 15.
    Journals  IEEE Transactionson Pattern Analysis and Machine Intelligence  IEEE Transactions on Image Processing  SPINGER-IPSJ Transactions on Computer Vision and Applications (CVA)  ELSEVIER- Computer Vision and Image Understanding  ELSEVIER-Pattern Recognition  IJCV - International Journal of Computer Vision  IJIPA- International Journal of Image Processing and Applications  IET- The Institution of Engineering and Technology 15/20
  • 16.
    Conferences  IEEE/CVF Conferenceon Computer Vision and Pattern Recognition (CVPR)  IEEE International Conference on Image Processing (ICIP)  IEEE/CVF International Conference on Computer Vision (ICCV)  IEEE Winter Conference on Applications of Computer Vision (WACV)  ACCV - Asian Conference on Computer Vision  ECCV - European Conference on Computer Vision  CVIP- International Conference on Computer Vision and Image Processing , India  NCVPRIPG -National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics , India 16/20
  • 17.
    Research Group 17/20 Fei-Fei Li ProfessorDirector, Stanford AI Lab Computer Science Department Feifeili@cs.stanford.edu Stanford Computer Vision Lab Animesh Garg Professor ,Stanford AI Lab Computer Science Department garg@cs.standford.edu
  • 18.
    Research Group 18/20 Aidean Sharghi Centerfor Research in Computer Vision, University of Central Florida aidean.sharghi@gmail.com Boqing Gong Assistant Professor Center for Research in Computer Vision Department of Computer Science University of Central Florida boqingGo@outlook.com Center for Research in Computer Vision, University of Central Florida
  • 19.
    Research Group 19/20 Abhishek Sarkar SeniorResearch Scientist International Institute of Information Technology Hyderabad, INDIA Abhishek.sarkar@iiit.ac.in Dr. C. V. Jawahar Researcher, International Institute of Information Technology Hyderabad, INDIA jawahar@iiit.ac.in International Institute of Information Technology
  • 20.
    References [1] Song, Yale,et al. "Tvsum: Summarizing web videos using titles." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. [2] Zhuang, Yueting, Ruogui Xiao, and Fei Wu. "Key issues in video summarization and its application." Information, Communications and Signal Processing, 2003 and Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint Conference of the Fourth International Conference on. Vol. 1. IEEE, 2003. [3] Kansagara, Ravi, Darshak Thakore, and Mahasweta Joshi. "A study on video summarization tech-niques." International journal of innovative research in computer and communication engi-neering 2 (2014). [4] Sharghi, Aidean, Jacob S. Laurel, and Boqing Gong. "Query-focused video summarization: Dataset, evaluation, and a memory network based approach." The IEEE Conference on Computer Vision and Pattern Recognition ( (CVPR). 2017. [5] Ramesh, Animesh, et al. "Video Summarization: An Overview of Techniques.“ 20/20
  • 21.
    References [6] Sabbar, W.;Chergui, A.; Bekkhoucha, A., "Video summarization using shot segmentation and local motion estimation," InnovativeComputing Technology (INTECH), 2012 Second International Conference on, vol., no., pp.190, 193, 18-20 Sept. 2012 [7] Mundur, Padmavathi, Yong Rao, and Yelena Yesha. "Keyframe-based video summarization using Delaunay clustering." International Journal on Digital Libraries 6.2 (2006): 219-232. [8] Souza, Celso L. de, et al. "A unified approach to content-based indexing and retrieval of digital videos from television archives." (2014). [9] https://www.slideshare.net/MikolajLeszczuk/results-on-video-summarization [10] Landy, Michael S., Yoav Cohen, and George Sperling. "HIPS: A Unix-based image processing system." Computer Vision, Graphics, and Image Processing 25.3 (1984): 331-347. 21/20
  • 22.
    References [11] https://www.cs.cmu.edu/~msmith/skim_homepage.html [12] https://www.youtube.com/watch?v=OHAWwaYu2H0&t=46s [13]https://www.searchenginejournal.com/deep-learning-powers-video- seo/175145/ [14] https://www.framos.com/en/solutions/mobility/ 22/20
  • 23.