Your SlideShare is downloading. ×
0
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Automatic Keyframe Selection based on Mutual Reinforcement Algorithm
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Automatic Keyframe Selection based on Mutual Reinforcement Algorithm

300

Published on

Ventura, C.; Giro-i-Nieto, X.; Vilaplana, V.; Giribet, D.; Carasusan, E., "Automatic keyframe selection based on mutual reinforcement algorithm," Content-Based Multimedia Indexing (CBMI), 2013 11th …

Ventura, C.; Giro-i-Nieto, X.; Vilaplana, V.; Giribet, D.; Carasusan, E., "Automatic keyframe selection based on mutual reinforcement algorithm," Content-Based Multimedia Indexing (CBMI), 2013 11th International Workshop on , vol., no., pp.29,34, 17-19 June 2013
doi: 10.1109/CBMI.2013.6576548

This paper addresses the problem of video summarization through an automatic selection of a single representative keyframe. The proposed solution is based on the mutual reinforcement paradigm, where a keyframe is selected thanks to its highest and most frequent similarity to the rest of considered frames. Two variations of the algorithm are explored: a first one where only frames within the same video are used (intraclip mode) and a second one where the decision also depends on the previously selected keyframes of related videos (interclip mode). These two algorithms were evaluated by a set of
professional documentalists from a broadcaster’s archive, and results concluded that the proposed techniques outperform the semi-manual solution adopted so far in the company.

More details:
https://imatge.upc.edu/web/publications/automatic-keyframe-selection-based-mutual-reinforcement-algorithm

Published in: Technology, Art & Photos
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
300
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Good afternoon, I’m Carles Ventura, from the UniversitatPolitècnica de Catalunya, and I’m going to present the work titled Automatic Keyframe Selection based on Mutual Reinforcement Algorithm.
  • First of all, I would like to introduce the problem we are going to face. Given an input video file and its associated metadata, we want to design a system which selects the most representative keyframe. And what’s the application behind these problem?
  • This work has result from the involvement in a national project called Buscamedia with the collaboration of the broadcasting company Televisió de Catalunya. This company has a website where uploads most of the videos broadcasted so that the users can watch them whenever they want. The users can perform a textual search to retrieve the desired videos, which are previsualized using a selected keyframe. Our goal is to design an end-to-end system to automatically select this representative keyframe.
  • How is the representative keyframe currently selected by the broadcasting company? First, three frames are automatically extracted after an arbitrary time-based sampling. Then, a professional from the broadcasting company manually selects the most representative keyframe. The company considers this professional intervention as essential to guarantee that the keyframe selection process fulfill multiple criteria inherent to this specialist job.
  • We propose two different approaches: the intra-clip mode and the inter-clip mode. The intra-clip mode consists on selecting the representative keyframe by using only the input video clip. We have designed a solution based on the maxmum coverage criteria. According to this, the representative keyframe should correspond to the frame which is most visually similar to the rest of frames within the video clip. On the other hand, the inter-clip mode does not only use the visual content but also the associated textual metadata. This mode takes advantage of all the previously uploaded videos on the web as well as their representative keyframes that are available. Therefore, the input metadata is used to retrieve videos with similar metadata.
  • Let’s see in more detail how the system is implemented. First of all, a set of frames is extracted from the input video. These frames result from a uniform downsampling process. Then, for each extracted frame, a visual descriptor is computed. In our implementation of the system, we have used the MPEG-7 Color Structure Descriptor, which is a global scale descriptor. Once visual descriptors have been computed, a similarity graph is built and a graph-based technique called mutual reinforcement is applied over it. These two blocs will be more detailed later. As a result of applying the mutual reinforcement algorithm, the frames are ranked. The frame with the highest score is selected as the representative keyframe if we are working with the intra-clip mode. Otherwise, with the inter-clip mode, these frames are reranked using a set of representative keyframes from some videos which have been retrieved using the textual metadata associated with the input video. This reranking step will be also analyzed later.
  • Now, we’re going to see with more detail the similarity graph and the mutual reinforcement algorithm. The visual similarity graph is built considering the images as nodes and assigning each edge the similarity between the two images that connects. Then, mutual reinforcement algorithm is applied over it. This is an iterative method that is computed as follows. First, a normalized score vector is initialized. The dimensionality of this vector corresponds to the number of extracted frames, and each component represent the score achieved by each frame. Then, at each iteration, the score vector is updated by computing each frame score as a weighted combination of the frame scores to which the frame is connected using the similarity values. Then, the updated score vector is normalized. This algorithm iterates until convergence or a maximum number of iterations are done. The intuition behing this algorithm is that frames which are most visually similar to the rest of frames whithin the video get higher scores.
  • Once we have retrieved the related videos, from which the representative keyframes are available, these are used to rerank the input video frames. By applying this reranking algorithm, each input video frame is assigned a new score which results from a linear combination of the frame score resulting from the mutual reinforcement algorithm and the visual similarity values between such a frame and each representative keyframe of the related videos obtained by the textual searcher. Alpha zero represents the contribution of the intra-mode in the final scores. Weighting parameters alpha m can adopt a uniform distribution or depend on the textual similarity.
  • One of the requirements of the system imposed by the broadcaster company was that the selected representative keyframe should not include any textual caption. Therefore, a post-processing step was added, which consists of a text filtering. This text filtering is based on setting a maximum threshold over the energy coefficients in the different subbands of the Haar Wavelet Transform and applying morphology to avoid some false positives.
  • Now, we’re going to present some experimental results. We will start with a qualitative evaluation of the experiment. Then, we will present the results of a user-based evaluation.
  • Let’s start with the qualitative evaluation. This figure shows the result of applying the mutual reinforcement algorithm in the intra-clip method on a video clip. Therefore, only the input video is used to rank its keyframes and select the most representative one. We can see that the type of frame with most coverage in the video clip obtains the highest score followed by other frames which are not so frequent in the video. In this example, if we were using the post-processing text filtering, the frame with the second highest score would be selected as the representivekeyframe.
  • However, for more complex videos, the maximum coverage criteria is not enough to select the most representative keyframe. Here we have the result of applying the mutual reinforcement algorithm over a video with much more diversity. Due to spatial limitations, only the top ranked frames are displayed. The images below are the representative keyframes from the videos textually retrieved. We can see that the frames which got the highest scores after applying the mutual reinforcement are not similar to the ones selected from the related videos.
  • However, if we re-rank the frames also taking into account the similarity between the input video frames and the representative keyframes from the related videos, some frames which does not have too much coverage within the video but they are similar to these representative keyframe are also reinforced and achieve higher scores. We can see that the frames ranked at first and third positions are very similar to 6 out of the 10 representative keyframes of the videos retrieved by the textual searcher.
  • We have also carried out a user-based evaluation using the Mean Opinion Score test. The proposed automatic techniques are assessed by users who, in our case, are the same professionals from the broadcasting company which are responsible for the manual selection of the current keyframe selection. Therefore, these users are qualified professionals who can determine whether a keyframe is valid, considering multiple criteria inherent to their specialist job. Users were asked to rate selected keyframes in a 5-star scale with the following interpretation: (1) Unacceptable, (2) Fair, (3) Good, (4) Very good, and (5) Excellent. I would like to remark that users were asked to rate with only one star any keyframe that they considered not valid as representative.
  • For the user-based evaluation, two main domains were selected: News and Morning Show. The News domain contains short videos which are very diverse both visually and semantically. Therefore, this domain has as handicap the fact that videos with similar metadata are likely to be visually not similar. This domain have been divided into three semantically different categories. On the other hand, Morning Show domain contain larger videos, but they have been filmed in a controlled environment. As a consequence, there are similar shots between different videos of the same domain. This domain has been divided into two semantically different categories. 20 videos from the News domain and 30 videos from the Morning Show domain were evaluated.
  • In fact, the users were not only asked to evaluate our proposed automatic techniques: intra and inter, but they were also asked to evaluate a baseline and the current system employed by the broadcast company. As previously exposed, the existing solution at the broadcaster company requires a professional to choose among three frames which have been extracted from the video after an arbitrary time-based sampling. On the other hand, we define a baseline method consisting in randomly select a keyframe from the video file. The keyframes resulting from these 4 techniques are showed to the user in a random order. Our goal is to improve both current and random approaches.
  • From a global analysis of the results, we can see that the inter-clip mode performs better than the intra-clip mode for the Morning Show, but this behavior is inverted in the News domain. The reason is that the videos from the News domain are much more diverse and that the representative keyframes from the videos retrieved by textual similarity are not visually similar enough to the frames of the analyzed video. As a result, there is little influence in the ranking scores despite the linear fusion. The contribution of the inter-clip approach is specially remarkable for the Discussion category from the Morning Show domain, in which target representative keyframes do not match maximum coverage criterium within the clip but visually similar representative keyframes are retrieved by the textual searcher.
  • In addition to the MOS analysis, we also evaluate the acceptance rate. The acceptance rate measures how often a technique selects one representative keyframe which is acceptable for the user, independently of the specific grade from 2 to 5 given to it. From this figure, I would like to highlight that the intra-clip approach achieves an averaged acceptance rate of 80.95% in the News domain and the inter-clip approach achieves an acceptance rate of 90% in the News domain.
  • I would like to mention that both the database and the results are available on the Image Processing Group website.
  • Finally, we’re going to draw the conclusions. We have presented a keyframe selection system based on the mutual reinforcement algorithm which uses the maximum coverage criterium for the intra-clip approach. We have also proposed an inter-clip approach where the results given by the mutual reinforcement are reranked using visual similarity to representivekeyframes from videos retrieved by textual similarity. Finally we have evaluated both approaches with a MOS test performed by professionals and we have come to the conclusion that the current semi-manual system employed in the broadcasting company can be replaced by the proposed automatic approach and that inter-clip technique can outperform intra-clip mode in controlled environments.
  • In the inter-clip mode, the textual searcher is used to retrieve related videos from the textual metadata of the input video. We have considered two different methods to retrieve these related videos. One is a simple textual searcher which retrieves videos containing words in their metadata which also belong to the input video metadata. The second approach is based on the TF-IDF descriptors. In this approach, we build a vocabulary from a training set of metadata files and each vocabulary word has an associated value representing how often the term appears in the whole training dataset. Then, every metadata files are represented as feature vectors and compared using cosine similarity. Therefore, TF-IDF approach downweights the matchings between the frequent words and upweights the matchings between the infrequent words.
  • Transcript

    • 1. AUTOMATIC KEYFRAME SELECTION BASED ON MUTUAL REINFORCEMENT ALGORITHM C. Ventura, X. Giró-i-Nieto, V. Vilaplana et al
    • 2. 1. Introducing the problem 2  Automatic selection of the representative keyframe
    • 3. 1.1. What is the application? 3
    • 4. 1.2. Current implementation 4 ARBITRARY SAMPLING MANUAL SELECTION BY PROFESSIONAL
    • 5. 2. Designing the system 5  2 scenarios:  Intra-clip mode  Inter-clip mode Textual search to retrieve related videos Database
    • 6. 2.1. General scheme 6 Frame extraction 3 2 1 visual features Reranking Textual search 3 2 1 Inter-clip mode Similarity graph Mutual Reinforcement
    • 7. 2.2. Intra-clip mode 7  Mutual Reinforcement Algorithm (Joshi04)  Gets the frame with maximum coverage (Joshi04) D. Joshi et al. The story picturing engine: finding elite images to illustrate a story using mutual reinforcement. In MIR ‘04
    • 8. 2.3. Inter-clip mode 8  Reranking (based on Liu11) INTRA INPUT FRAMES INTER REPRESENTATIVE KEYFRAMES FROM TEXTUAL SEARCHER (Liu11) C. Liu et al. Query sensitive dynamic web video thumbnail generation. In ICIP ‘11
    • 9. 2.4. Post-processing block 9 Text filtering  Goal: To avoid representative keyframes with textual captions FRAME HAAR WAVELET MORPHOLOGY
    • 10. 3. Experiments 10   Qualitative evaluation Quantitative evaluation  MOS Test  Experimental dataset
    • 11. 3.1. Qualitative evaluation 11  Intra-clip mode
    • 12. 3.1. Qualitative evaluation 12  Inter-clip mode Ranking scores after mutual reinforcement (INTRA-CLIP MODE) Representative keyframes of the retrieved videos
    • 13. 3.1. Qualitative evaluation 13  Inter-clip mode  Final ranking scores after reranking:
    • 14. 3.2. Quantitative evaluation 14  MOS (Mean Opinion Score) test  Performed by TVC professionals  Scores BAD NON ACCEPTABLE GOOD ACCEPTABLE EXCELLENT
    • 15. 3.2. Quantitative evaluation 15  Database NEWS DOMAIN MORNING SHOW DOMAIN ECONOMY INTERVIEW INTERNATIONAL POLITICS DISCUSSION
    • 16. 3.2. Quantitative evaluation 16  MOS test 4 different approaches  Intra-clip  Inter-clip  Random  Current
    • 17. 3.2. Quantitative evaluation 17 NEWS MORNING SHOW
    • 18. 3.2. Quantitative evaluation 18 NEWS MORNING SHOW
    • 19. 3. Experiments 19  Database and results are available on:  imatge.upc.edu
    • 20. 4. Conclusions 20  Keyframe selection based on mutual reinforcement algorithm   To get the frame with maximum coverage within the video in the intra-clip approach Inter-clip approach Textual similarity to retrieve related videos  Linear fusion to get the new ranking scores   MOS test The semi-manual system (from TVC) can be replaced by the automatic approach.  Inter-clip approach outperforms intra-clip in controlled environments. 
    • 21. 21
    • 22. 2.3. Inter-clip mode 22  Textual search 2 modalities:  Textual searcher binary  TF-IDF descriptors

    ×