Key frame extraction methodology for video annotation

Uploaded on


More in: Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. International Journal of Computer Engineering and TechnologyENGINEERING INTERNATIONAL JOURNAL OF COMPUTER (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME & TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online) IJCETVolume 4, Issue 2, March – April (2013), pp. 221-228© IAEME: Impact Factor (2013): 6.1302 (Calculated by GISI) © KEY FRAME EXTRACTION METHODOLOGY FOR VIDEO ANNOTATION Ms. Khushboo Khurana1, Dr. M. B. Chandak2 M.Tech Scholar, CSE Department, SRCOEM, Nagpur, India 1 Associate Professor and Head, CSE Department, SRCOEM, Nagpur, India 2 ABSTRACT Recent advances in technology have made tremendous amount of multimedia content available. The amount of video content is increasing, due to which the systems that improve the access to the video is needed. This can be done by annotation of video, which facilitate the faster access to the videos. The first step towards the video annotation is the extraction of key frames. Instead of analysing all the frames in the video, only the frames which contain important information of the video can be used for further processing. In this paper, key frame extraction method is discussed which assist the video annotation process. The key frames are found by computing the edge difference between the consecutive frames and those frames exceeding the threshold are considered as key frames. KEYWORDS: Key frame extraction, edge difference, video annotation 1. INTRODUCTION The world as a living space is shrinking, are we really shrinking or have we found a new horizon to live in. It is true we are expanding leaps and bounds in Gbs and terabyte world. Recent advances in technology have made tremendous amounts of multimedia information available to the general population. A video in simplest of words is agglomeration of data. With the ever escalating videos the systems for processing these videos need to be developed. Analyzing these videos as small data packets for the simplicity of human effort is the need of the hour. 221
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME Video annotation is a promising and essential step for content-based video search and retrieval. It refers to attaching a metadata to the video for its faster and easier access. Extraction of key frames from the video and to analyze only these frames instead of all the frames present in the video can greatly improve the performance of the systems. Analysis of these key frames can help in forming the annotations for the video. Key frame is the frame which can represent the salient content and information of the video. The key frames extracted must summarize the characteristics of the video, and the image characteristics of a video can be tracked by all the key frames in time sequence. A basic rule of key frame extraction is that key frame extraction would rather be wrong than not enough [1]. In this paper, we have proposed an algorithm for key frame extraction to facilitate the video annotation process. The algorithm uses edge difference between the two consecutive frames to find the difference between their contents. Our approach is shot-based. In shot based method shots of the original video are first detected, and then one or more key frames are extracted from each shot. Methods of shot transition detection are: pixel-based comparison, template matching and histogram-based method [2-3]. The pixel-based methods are susceptible to motion of objects. So it is suitable to detect segmentation transition of the camera and object movement. But in this method as each pixel is compared the time required is more. Template matching is apt to result in error detection if only this method is used. The Histogram-based methods entirely lose the location information. For example, two images with similar histograms may have completely different content. So we have used the edge- based method. This method considers the content of the frames. The rest of this paper is organized as follows. Section 2 describes the uses of key frame extraction. Section 3 presents the related work in the field of key frame extraction. In Section 4, the proposed approach is described with the help of algorithm and flowchart. In section 5 the results are specified and finally, we conclude in Section 6. 2. USES OF KEY-FRAME EXTRACTION• Video transmission: In order to reduce the transfer stress in network and invalid information transmission, the transmission, storage and management techniques of video information become more and more important [1]. When a video is being transmitted, the use of key frames reduces the amount of data required in video indexing and provides the framework for dealing with the video content [4]. In [5], a key frame based on-line coding video transmission is proposed. Key-frames are fixed in advance. Each frame can only choose the latest coded and reconstructed key frame as its reference frame. After coding and packetisation, compressed video packets are transmitted with differentiated service classes. Key frame along with difference values are sent from the source, using the key frame picture and the difference values the picture is reconstructed at the destination.• Video summarization: Video summarization is a compact representation of a video sequence. It is useful for various video applications such as video browsing and retrieval systems. A video summarization can be a preview sequence which can be a collection of key frames which is a set of chosen frames of a video. Key-frame-based video summarization may lose the spatio-temporal properties and audio content in the original video sequence; it is the simplest and the most common method. When temporal order is maintained in selecting the key frames, users can locate specific video segments of interest by choosing a particular key 222
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME frame using a browsing tool. Key frames are also effective in representing visual content of a video sequence for retrieval purposes. Video indexes may be constructed based on visual features of key frames, and queries may be directed at key frames using image retrieval techniques [6].• Video annotation: Video annotation is the extraction of the information about video, adding this information to the video which can help in browsing, searching, analysis, retrieval, comparison, and categorization. Annotation is to attach data to some other piece of data (i.e. add metadata to data) [7]. To fasten the access of video, it is annotated. It is not momentous to analyze each video frame for this, so key frames are found and only these are analyzed for annotation purpose.• Video indexing: Key frames reduce the amount of data required in video indexing and provides framework for dealing with the video content.• Before downloading any video over the internet, if key frames are shown besides it, users can predict the content of the video and decide whether it is pertinent to his search.• Other applications such as creating chapter titles in DVDs and prints from video. 3. RELATED WORK The work in the area of key frame extraction is either in the spatial domain or in the compressed domain. In [8] key frames are extracted using histogram difference between two consecutive frames. Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee have proposed an approach for the detection of a video shot and its corresponding key frame can be performed based on the visual similarity between adjacent video frames.They used Euclidean distance measure to visual similarity between video frames. First frame of each shot is selected as a key frame [9]. Janko Calic and Ebroul Izquierdo proposed an algorithm for scene change detection and key frame extraction [10]. It generates the frame difference metrics by analyzing statistics of the macro-block feature extracted from MPEG videos. Temporal segmentation is used to detect the scene change. A more elaborate method is employed by [11] that propose an approach which uses shot boundary detection to segment the video into shots and the k-means algorithm to determine cluster representatives for each shot that are used as key frames. MPEG-7 Color Layout Descriptor (CLD) is used as a feature to compute differences between consecutive frames. As k-means is employed after finding shot boundary its complexity increases. 4. THE PROPOSED APPROACH The first step towards video annotation is the extraction of key frames. The key frames must contain the important frames so as to describe the contents of the video in the later processing stages. After the extraction of important frames, instead of analyzing the contents of all video frames, only the key frame images are analyzed to give the annotation. The number of frames should not be reduced to an extent that important information is not covered by the key frames. As the key frames are analyzed after the key frame extraction process, the algorithm for extraction should not be very complex or time consuming. 223
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 4.1 ALGORITHM FOR KEY FRAME EXTRACTION FROM VIDEO All the frames in the video do not contain important information. Each frame is a slight variation of the previous frame. It is not meaningful to analyze all the frames; so we find those frames which contain important information. For the detection of key frame we have used edge difference to calculate the difference between two consecutive frames. Only when the difference exceeds a threshold, one of the consecutive frames is considered as the key frame. The reason we choose edge difference is that the edge is content dependent. The detailed description for key frame extraction from the video is as follows: Input: Video V, consisting of N frames Output: Key frames for input video Algorithm Key frame Extraction { Step 1: For each video frame k = 1 to N { 1. Read frame V k and V k+1 2. Obtain the gray level image for V k and V k+1 G k = gray image of V k G k+1 = gray image of V k+1 3. Find the edge difference between G k and G k+1 using Canny edge detector. Let diff(k) be their difference. diff(k) = ∑ ∑ (G k - G k+1 ) i j where i,j are row and column index } Step 2: Compute the mean and standard deviation Mean, M = Standard deviation, S = Step 3: Compute the threshold value Threshold = M + a x S Where, a is a constant Step 4: Find the key-frames for k = 1 to (N-1) { if diff(k) > Threshold { Write frame V k+1 as the output key-frame } } } 224
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME Video V is given as the input; this video consists of total N frames. We first read 1st and 2nd frame, convert them to gray scale and find their edge difference using Canny edge detector. The difference is stored in diff (1). Next 2nd and 3rd frames are read, the edge difference of their gray scale images are computed. Now the difference is stored in diff(2). Then consider 3rd and 4th frame, 4th and 5th frame, as so on. The procedure is repeated for all the N frames of the video. Diff (k) contains the differences between all the consecutive frames for the given input video V. Fig demonstrates how the edge differences are computed. As show in the fig.1 the last difference is k, where k = N -1. Canny edge detector gives a matrix for the difference between frames; hence diff(k) is Calculated by summation of values of rows and columns to get a single difference value Diff (k) = ∑ ∑ (G k - G k+1 ) i j Where i,j are row and column index. After getting frame differences, mean and standard deviation are calculated (refer step 2 of algorithm). Then threshold is calculated using the formula: Threshold = mean + a x standard deviation Where, a is a constant. After trying for various values, we used value of a=2, as the results were as desired using this value. The differences which exceed the threshold are considered. If so happens the contents have a significant change and may contain important information. If the difference of two consecutive frames exceeds the threshold, the latter frame is considered as the key frame. All the key frame images are stored in a folder. 4.2 Flowchart for key-frame extraction from video The flowchart for key frame extraction from a video is shown in Fig.2. . Fig.2. Flowchart for key frame extraction 225
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 5. RESULTS The videos mainly from transport domain consisting of videos with airplane, bus, car or bike are considered for the input to the system. The videos are downloaded from youtube. Audio part of the video is not considered. Videos with slight moment of the camera and with no or small amount of background changes were used. We have implemented the algorithm in Matlab R2012a. The input video containing airplane had more than 500 frames; some of the frames are shown in Fig.3. Fig.3. Frames of the input video The edge difference between the consecutive frames was found. The edge difference between 1st and 2nd frame was 4138, edge difference between 2nd and 3rd frame was 3352, between 3rd and 4th – 4185, between 4th and 5th – 3564, and so on. After finding the edge differences between all the consecutive frames the following values were computed: Max 5734 Min 162 Median 2725 Mean 2.8222e+03 Standard deviation 1.3575e+03 Threshold 5.5371e+03 Those frames which exceed the threshold value are considered as key frames. Fig. 4 shows the extracted frames as key frames for the input video whose frames are shown in fig.3. Fig.4. Output key frames for airplane video Result of key frame extraction on input video containing car and humans, along with the frame number is shown in fig.5. This video had a still background with humans moving in the video. Analysis of these key frames can result in semantic annotation the videos. The actions or events can also be analyzed. 226
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME Fig.5. Output key frames for car video The fig.6 shows the result on the video where the change in the content is high. In this video many cars are moving on the road. The result shows that each car is captured by the key frames. Fig.6. Output key frames for video with more amount of content change. 227
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 6. CONCLUSION AND FUTURE WORK Depending upon the contents and the change in contents of the video, the key frames are extracted. As seen in the first video the no. of key frames is less; this is because the change of content in this video was very less. In the third video example above, the change of content or the amount of information in the video is more so more number of frames are extracted as key frames. As the key frames need to be processed for annotation purpose, the important information must not be missed. Our algorithm can be improved by further reducing the number of key frames extracted. This can be done by adding one more pass. After the phase 1 the key frames extracted can again be given as input to the algorithm. This will reduce the redundant frames or the frames which contain similar contents, but adding one more pass will increase the execution time. As the frames need to be analyzed after key frame extraction for the purpose of annotation, some amount of redundancy can be considered rather than increasing the execution time. In future, we can design a video annotation system which will utilize the key frames obtained from the above algorithm. REFERENCES [1] G. Liu, and J. Zhao, “Key Frame Extraction from MPEG Video Stream ”, Proceedings of the Second Symposium International Computer Science and Computational Technology (ISCSCT ’09) China, 26-28, Dec. 2009, pp. 007-011. [2] C. F. Lam, M. C. Lee, “Video segmentation using color difference histogram,” Lecture Notes in Computer Science, New York: Springer Press, pp. 159–174., 1998. [3] A. Hampapur, R. Jain, and T. Weymouth, “Production model based digital video segmentation,” Multimedia Tools Application, vol. 1, no. 1, pp.9–46, 1995. [4] T. Liu, H. Zhang, and F. Qi, “A novel video key-frame-extraction algorithm based on perceived motion energy model,” IEEE Transactions on Circuits and Systems. For Video Technology, vol. 13, no. 10, pp. 1006-1013, 2003. [5] Q. Zhang and G. Liu, “A key-frame-based error resilient coding scheme for video transmission over differentiated services networks,” In proceeding of: Packet Video 2007, 12-13 Nov. 2007 , pp. 85 – 90. [6] P. Mundur, Y. Rao, Y. Yesha, “Keyframe-based Video Summarization using Delaunay Clustering,” International Journal on Digital Libraries , Volume 6 Issue 2, April 2006 pp 219 - 232. [7] K. Khurana, M. B. Chandak, “Study of Various Video Annotation Techniques,” International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 1, January 2013. [8] S. Thakare, “Intelligent Processing and Analysis of Image for shot Boundary Detection”, International Journal of Engineering Research and Applications, Vol. 2, Issue 2, Mar-Apr 2012, pp.366-369. [9] J. Jeong, H. Hong, and D. Lee, “Ontology-based Automatic Video Annotation Technique In Smart TV Environment”, IEEE Transaction on consumer Electronics, Vol. 57, No. 4, November 2011 [10] J. Calic and E. Izquierdo, “Efficient Key-frame Extraction And Video Analysis”, International Symposium On Information Technology, April 2002,IEEE. [11] D. Borth, A. Ulges, C. Schulze, T. M. Breuel, “Key frame Extraction for Video Tagging & Summarization”, 2008. [12] Reeja S R and Dr. N. P Kavya, “Motion Detection for Video Denoising – The State of Art And The Challenges” International journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 2, 2012, pp. 518 - 525, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. 228