Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MediaEval 2017 - Interestingness Task: MediaEval 2017 Predicting Media Interestingness Task (Overview)

151 views

Published on

MediaEval 2017 Predicting Media Interestingness Task

Presenter: Claire-Hélène Demarty, Technicolor, France

Paper: http://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_4.pdf

Video: https://youtu.be/dWhSJuR5DuM

Authors: Claire-Hélène Demarty, Mats Sjöberg, Bogdan Ionescu, Thanh-Toan Do, Michael Gygli, Ngoc Q.K. Duong

Abstract: In this paper, the Predicting Media Interestingness task which is running for the second year as part of the MediaEval 2017 Benchmarking Initiative for Multimedia Evaluation, is presented. For the task, participants are expected to create systems that automatically select images and video segments that are considered to be the most interesting for a common viewer. All task characteristics are described, namely the task use case and challenges, the released data set and ground truth, the required participant runs and the evaluation metrics.

Published in: Science
  • Be the first to comment

  • Be the first to like this

MediaEval 2017 - Interestingness Task: MediaEval 2017 Predicting Media Interestingness Task (Overview)

  1. 1. Predicting Media Interestingness Task Overview Claire-Hélène Demarty – Technicolor Mats Sjöberg – University of Helsinki Bogdan Ionescu – University Polytehnica of Bucharest Thanh-Toan Do – University of Adelaide Michael Gygli, ETH & Gifs.com Ngoc Q.K. Duong, Technicolor MediaEval 2017 Workshop Dublin, 13-16th 2016 In its second year
  2. 2. Derives from a use case at Technicolor  Helping professionals to illustrate a Video on Demand (VOD) web site by selecting some interesting frames and/or video excerpts for the posted movies. 2 Task definition 9/13/2017
  3. 3. Derives from a use case at Technicolor  Helping professionals to illustrate a Video on Demand (VOD) web site by selecting some interesting frames and/or video excerpts for the posted movies. 3 Task definition 9/13/2017
  4. 4. Derives from a use case at Technicolor  Helping professionals to illustrate a Video on Demand (VOD) web site by selecting some interesting frames and/or video excerpts for the posted movies. 4 Task definition 9/13/2017 Definition: The frames and excerpts should be suitable in terms of helping a user to make his/her decision about whether he/she is interested in watching the underlying movie. Emphasized in 2017
  5. 5.  Two subtasks -> Image and Video  Image subtask: given a set of keyframes extracted from a movie, …  Video subtask: given a set of video segments extracted from a movie, … … automatically identify those images/segments that viewers report to be interesting.  Binary classification task on a per movie basis… … but confidence values are also required. 5 Task definition 9/13/2017
  6. 6.  From Hollywood-like movie trailers or full-length movie extracts  Manual segmentation of shots/longer segments with a semantic meaning  Extraction of middle key-frame of each shot/segment 6 Dataset & additional features 9/13/2017 Development Set Test Set 78 trailers 26 trailers 4 movie extracts (ca.15min) Total % interesting Total % interesting Total % interesting Shot # 7,396 9.0 2,192 11.3 243 11.5 Key-frame # 7,396 11.6 2,192 11.9 243 22.6 Modified in 2017
  7. 7.  From Hollywood-like movie trailers or full-length movie extracts  Manual segmentation of shots/longer segments with a semantic meaning  Extraction of middle key-frame of each shot/segment 7 Dataset & additional features 9/13/2017 Development Set Test Set 78 trailers 26 trailers 4 movie extracts (ca.15min) Total % interesting Total % interesting Total % interesting Shot # 7,396 9.0 2,192 11.3 243 11.5 Key-frame # 7,396 11.6 2,192 11.9 243 22.6  Precomputed content descriptors:  Low-level: denseSift, HoG, LBP, GIST, HSV color histograms, MFCC, fc7 and prob layers from AlexNet  Mid-level: face detection and tracking-by-detection  Segment-based: C3D from fc6 layer and averaged over each segment Modified in 2017 Added in 2017
  8. 8. 8 Manual annotations 9/13/2017 Binary decision (manual thresholding) Pair comparison protocol ONE SINGLE aggregation into ranking pairs ranking Annotators: >252 persons for video >188 persons for image From 22 countries Modified in 2017
  9. 9. Up to 5 runs per subtask! Image subtask: Visual information, external data allowed Video subtask: BOTH audio and visual information, external data allowed 9 Required runs 9/13/2017 Modified in 2017 Modified in 2017
  10. 10.  2017 official measure: ➢ Mean Average Precision at 10 (over all movies)  Additional metrics are computed:  2016 Mean Average Precision  False alarm rate, miss detection rate, precision, recall, F-measure, etc. 10 Evaluation metrics 9/13/2017 Modified in 2017
  11. 11. 11 Task participation 9/13/2017  Registrations:  32 teams  18 countries  Submissions:  10 teams  7 ‘experienced’ teams 0 5 10 15 20 25 30 35 40 Registrations Returned agreements Submitting teams Experienced teams Workshop Task participation 2016 2017
  12. 12. 12 Official results – Image subtask – 33 runs 9/13/2017 * organizers Run MAP@10 MAP Official ranking me17in_DUT-MMSR_image_run2.txt 0.1385 0.3075DUT_MMSR me17in_HKBU_image_5.txt 0.1369 0.291HKBU me17in_DUT-MMSR_image_run3.txt 0.1349 0.3052DUT_MMSR me17in_HKBU_image_3.txt 0.1332 0.2898HKBU me17in_HKBU_image_2.txt 0.132 0.2916HKBU me17in_HKBU_image_4.txt 0.1315 0.2884HKBU me17in_DUT-MMSR_image_run1.txt 0.131 0.3002DUT_MMSR me17in_DUT-MMSR_image_run4.txt 0.1213 0.2887DUT_MMSR me17in_HKBU_image_1.txt 0.1184 0.2812HKBU me17in_gibis_image_run1-required.txt 0.1129 0.271GIBIS me17in_technicolor_image_run2.txt 0.1054 0.2525Technicolor* me17in_gibis_image_run2.txt 0.1029 0.2645GIBIS me17in_technicolor_image_run1.txt 0.1028 0.2615Technicolor* me17in_RUC_image_run1-required.txt 0.094 0.2655RUC me17in_gibis_image_run5.txt 0.0939 0.2531GIBIS me17in_gibis_image_run3.txt 0.0924 0.2502GIBIS me17in_gibis_image_run4.txt 0.0916 0.2525GIBIS me17in_IITB_image_run2-required.txt 0.0911 0.257IITB me17in_technicolor_image_run4.txt 0.0875 0.2382Technicolor* me17in_technicolor_image_run5.txt 0.0861 0.2347Technicolor* 2016 BEST RUN 0.2336 me17in_technicolor_image_run3.txt 0.0693 0.2244Technicolor* me17in_DUT-MMSR_image_run5histface.txt 0.0649 0.2105DUT_MMSR me17in_Eurecom_image_run1-required.txt 0.0587 0.2029Eurecom me17in_Eurecom_image_run2.txt 0.0579 0.2016Eurecom me17in_LAPI_image_run3.txt 0.0555 0.1873LAPI* me17in_LAPI_image_run4.txt 0.0529 0.1851LAPI* me17in_IITB_image_run4-required.txt 0.0521 0.2054IITB me17in_IITB_image_run1-required.txt 0.05 0.1886IITB Baseline 0.0495 0.1731 me17in_IITB_image_run3-required.txt 0.0494 0.2038IITB me17in_LAPI_image_run1.txt 0.0463 0.1791LAPI* me17in_LAPI_image_run2.txt 0.0442 0.1789LAPI me17in_DAIICT_image_run1-required.txt 0.0406 0.1824DAIICT me17in_TCNJ-CS_image_run1-required.txt 0.0126 0.1331TCNJ-CS MAP=0.3075 MAP=0.2336 MAP=0.1731 MAP@10=0.1385 MAP@10=0.0495
  13. 13. 13 Official results – Image subtask – best runs 9/13/2017 Run MAP@10 MAP Official ranking me17in_DUT-MMSR_image_run2.txt 0.1385 0.3075 DUT_MMSR me17in_HKBU_image_5.txt 0.1369 0.291 HKBU me17in_gibis_image_run1-required.txt 0.1129 0.271 GIBIS me17in_technicolor_image_run2.txt 0.1054 0.2525 Technicolor* me17in_RUC_image_run1-required.txt 0.094 0.2655 RUC me17in_IITB_image_run2-required.txt 0.0911 0.257 IITB 2016 BEST RUN 0.2336 me17in_Eurecom_image_run1-required.txt 0.0587 0.2029 Eurecom me17in_LAPI_image_run3.txt 0.0555 0.1873 LAPI* Baseline 0.0495 0.1731 me17in_DAIICT_image_run1-required.txt 0.0406 0.1824 DAIICT me17in_TCNJ-CS_image_run1-required.txt 0.0126 0.1331 TCNJ-CS * organizers
  14. 14. 14 Official results – Video subtask – 42 runs 9/13/2017 * organizers Run MAP@10 MAP Official ranking me17in_Eurecom_video_run4.txt 0.0827 0.2094 Eurecom me17in_Eurecom_video_run5.txt 0.0774 0.2002 Eurecom me17in_LAPI_video_run4.txt 0.0732 0.2028 LAPI* me17in_Eurecom_video_run2.txt 0.0732 0.196 Eurecom me17in_Eurecom_video_run1-required.txt 0.0717 0.2034 Eurecom me17in_technicolor_video_run4.txt 0.0641 0.1878 Technicolor* me17in_Eurecom_video_run3.txt 0.064 0.1964 Eurecom me17in_DAIICT_video_run4.txt 0.064 0.1885 DAIICT me17in_RUC_video_run2.txt 0.0637 0.1897 RUC me17in_DAIICT_video_run1-required.txt 0.0636 0.1867 DAIICT me17in_gibis_video_run5.txt 0.0628 0.183 GIBIS me17in_gibis_video_run4.txt 0.0624 0.1836 GIBIS me17in_LAPI_video_run1.txt 0.0619 0.1937 LAPI* me17in_LAPI_video_run3.txt 0.0619 0.1937 LAPI* me17in_gibis_video_run3.txt 0.0614 0.1877 GIBIS me17in_technicolor_video_run5.txt 0.0609 0.1918 Technicolor* me17in_technicolor_video_run1.txt 0.0589 0.1856 Technicolor* me17in_RUC_video_run1-required.txt 0.0589 0.183 RUC me17in_DAIICT_video_run3.txt 0.0585 0.1839 DAIICT me17in_LAPI_video_run5.txt 0.0571 0.1843 LAPI* me17in_DAIICT_video_run5.txt 0.0571 0.1838 DAIICT me17in_LAPI_video_run2.txt 0.0564 0.1819 LAPI* Run MAP@10 MAP Official ranking 2016 BEST RUN 0.1815 Baseline 0.0564 0.1716 me17in_technicolor_video_run3.txt 0.0563 0.1825 Technicolor* me17in_HKBU_video_1.txt 0.0556 0.1813 HKBU me17in_DAIICT_video_run2.txt 0.0553 0.1812 DAIICT me17in_gibis_video_run2-required.txt 0.053 0.1807 GIBIS me17in_IITB_video_run1-required.txt 0.0525 0.1795 IITB me17in_TCNJ-CS_video_run1-required.txt 0.0524 0.1774 TCNJ-CS me17in_DUT- MMSR_video_run5histface.txt 0.0516 0.1791 DUT-MMSR me17in_DUT-MMSR_video_run4.txt 0.0482 0.1783 DUT-MMSR me17in_DUT-MMSR_video_run3.txt 0.0478 0.177 DUT-MMSR me17in_IITB_video_run3-required.txt 0.0474 0.17 IITB me17in_HKBU_video_2.txt 0.0468 0.1761 HKBU me17in_HKBU_video_3.txt 0.0468 0.1761 HKBU me17in_technicolor_video_run2.txt 0.0465 0.1768 Technicolor* me17in_DUT-MMSR_video_run2.txt 0.0465 0.1748 DUT-MMSR me17in_HKBU_video_4.txt 0.0463 0.1742 HKBU me17in_HKBU_video_5.txt 0.0445 0.1746 HKBU me17in_IITB_video_run4-required.txt 0.0445 0.1678 IITB me17in_IITB_video_run2-required.txt 0.0445 0.1675 IITB me17in_DUT-MMSR_video_run1.txt 0.0443 0.1734 DUT-MMSR me17in_gibis_video_run1.txt 0.0396 0.1667 GIBIS 2017 Best run: MAP@10=0.0827 ; MAP=0.2094 2016 Best run: MAP=0.1815 Baseline: MAP@10=0.0564 ; MAP=0.1716
  15. 15. 15 Official results – Video subtask – best runs 9/13/2017 * organizers Run MAP@10 MAP Official ranking me17in_Eurecom_video_run4.txt 0.0827 0.2094 Eurecom me17in_LAPI_video_run4.txt 0.0732 0.2028 LAPI* me17in_technicolor_video_run4.txt 0.0641 0.1878 Technicolor* me17in_DAIICT_video_run4.txt 0.064 0.1885 DAIICT me17in_RUC_video_run2.txt 0.0637 0.1897 RUC me17in_gibis_video_run5.txt 0.0628 0.183 GIBIS 2016 BEST RUN 0.1815 Baseline 0.0564 0.1716 me17in_HKBU_video_1.txt 0.0556 0.1813 HKBU me17in_IITB_video_run1-required.txt 0.0525 0.1795 IITB me17in_TCNJ-CS_video_run1-required.txt 0.0524 0.1774 TCNJ-CS me17in_DUT-MMSR_video_run5histface.txt 0.0516 0.1791 DUT-MMSR
  16. 16.  Reconfirmed that Image interestingness is NOT video interestingness  Some significant improvement, especially for the image subtask  Dataset quality improved:  Increase of number of iterations/annotations per sample  Increase of dataset size  Longer movie extracts ➢ Image subtask: All teams did better: Best MAP@10=0.2105 Best MAP=0.4343 ➢ Video subtask: 1 team cleary improved, 5 teams depending on their runs: Best MAP@10=0.1678 Best MAP=0.2637 16 What we have learned on the TASK itself 9/13/2017
  17. 17.  This year’s trend?  DNN as (last) classifying step is not the majority choice  Dataset size….  Multimodal equals audio+video ONLY (text was used only once)  (Mostly) no temporal approaches  (Mostly) no use of external data  Late fusion, dimension reduction  Adding semantic/affect in the approaches  Genre recognition pre-step  Aesthetics-related features  Movie context (Contextual feature, Textual description) 17 What we have learned on the participants systems 9/13/2017
  18. 18.  This year’s trend?  DNN as (last) classifying step is not the majority choice  Dataset size….  Multimodal equals audio+video ONLY (text was used only once)  (Mostly) no temporal approaches  (Mostly) no use of external data  Late fusion, dimension reduction  Adding semantic/affect in the approaches  Genre recognition pre-step  Aesthetics-related features  Movie context (Contextual feature, Textual description)  Insights  What works for the images does not work for the videos  Monomodal systems (no audio) did as well as multimodal systems  Adding semantic/affect/context in the approaches is promising! 18 What we have learned on the participants systems 9/13/2017
  19. 19. 19 9/13/2017 Thank you!

×