Paper: http://ceur-ws.org/Vol-2670/MediaEval_19_paper_12.pdf
Youtube: https://www.youtube.com/watch?v=38syd9aw2Hk
Van-Tu Ninh, Tu-Khiem Le, Duc-Tien Dang-Nguyen and Cathal Gurrin, Replay Detection and Multi-stream Synchronization in CS:GO Game Streams Using Content-based Image Retrieval and Image Signature Matching. Proc. of MediaEval 2018, 27-29 October 2019, Sophia Antipolis, France.
Abstract:
In this paper, the authors propose a data-driven based approach to detect replays in CS:GO game videos and find their sources from 10 players' streams. Specifically, our solution aims to determine the replays which lie between two logo-transitional endpoints. For both tasks, the authors extract frames from videos and use image processing and retrieval techniques to solve the problems. In detail, the authors utilize Bags of Visual Word to detect the logo-transitional endpoints, which contains multiple replays in between, then employ Image Signature Matching algorithm for multi-stream synchronization and replay boundaries refinement. The two tasks are nominated in GameStory: The 2019 Video Game Analytics Challenge to retrieve critical moments in video games, which can be used for highlights summarizing and further analysis.
Presented by Van-Tu Ninh
Why Teams call analytics are critical to your entire business
Replay Detection and Multi-stream Synchronization Using CBIR and Image Signature Matching
1. Replay Detection and Multi-stream Synchronization in
CS:GO Game Streams Using Content-based Image
Retrieval and Image Signature Matching
Van-Tu Ninh1, Tu-Khiem Le1, Duc-Tien Dang-Nguyen2, Cathal Gurrin1
1 Dublin City University, Ireland
2 University of Bergen, Norway
MediaEval 2019
10th Anniversary Workshop
27-29 October 2019
EURECOM, Sophia Antipolis, France
2. Introduction
● In 2019, the task nominated 2 challenges1:
1) Replay detection and multi-stream synchronization
2) Game story summarization (optional)
1 GameStory Task at MediaEval 2019, Mathias Lux, et al. Proceedings of MediaEval 2019
4. Replay detection in broadcasting sports video1
1 Replay detection in broadcasting sports video, Xiaofeng Tong et al. Journal: Third International Conference on Image
and Graphics (ICIG'04)
5. High-Confidence Near-Duplicate Image Detection1
1 High-Confidence Near-Duplicate Image Detection, Wei Dong et al. Proceedings of the 2Nd ACM International
Conference on Multimedia Retrieval.
6. Proposed Approach
Commentator
stream
Frame extraction
and filtering
Frame extractionPlayer streams
Logo-bounded
Video Retrieval
Image Hashing
Refine to output
replays
Elastic
Search
Engine
(Signature
Matching)
Determine
source streams
7. Frame Extraction and Filtering
● We perform frame extraction at 𝑓𝑝𝑠 = 2 for commentator stream only.
● Filter redundant consecutive frames:
○ Proportion of similar ORB features1: 𝛼 > 0.6 (300/500).
○ Distance between two color histograms: 𝛽 < 0.2.
Similar Similar
Original Frames
Filtered Frames
1 ORB: An efficient alternative to SIFT or SURF, Ethan Rublee et al. ICCV 2011.
9. Image Hashing and Matching
● We inherit the implementation1 of the paper An image signature for any kind
of image, Wong et al2.
● Compare the L2 normed distance of hashed signatures between logo-
bounded video’s frame and the one from players’ streams to find its source.
● We use Elastic Search Engine for fast similarity search in the large database
of source streams’ frames.
1 https://github.com/EdjoLabs/image-match
2 An Image Signature for any kind of image, Wong et al. Proc. of International Conference on Image Processing 2002
10. Replay detection and multi-stream synchronization
● Propose a heuristic to split the replays by combining outputs of logo-
bounded video retrieval and multi-stream synchronization based on:
○ Synchronized source perspective’s stream.
○ The gap between the synchronized source frame indices of two consecutive frames in logo-
bounded video.
○ The estimated time of the replays.
Logo-bounded videos
Frame 1 – player 1 – 3005,
Frame 2 – player 1 – 3050,
...
Frame 203 – player 5 – 5004,
Frame 204 – player 5 – 5012,
….
Synchronized source streams
Replay 1
Replay 2
13. Drawbacks of our approach
● Our approach might fail to detect and split replays directly in these cases:
The replay is not bounded at two endpoints by the logos.
The player wanders around a location for too long.
There are smoke/flash grenade scenes.
Multi-perspective scenes change so fast.
Hello everyone, my name is Van-Tu Ninh and today, I would like to present our work in GameStory task.
Our aim to find the replays in the CS:GO Game Streams and synchronize them with their source streams.
In 2018, the main task of this challenge was to analyze the given data of CS:GO game streams and summarize the main game story into a short video.
As being introduced in the overview talk, the summarization task is optional, thus, we focus to solve the first problem, which detects the replays and determine their players’ perspective streams,
The content of the replay frame in the commentator stream is the same as one of the player’s perspective streams of P1-P10, with a little modification to enhance the visualization.
However, applying direct local feature matching to find the replay frame’s source is difficult due to the huge large-scaled data extracted from the players stream.
Moreover, the extracted local features such as SIFT are not so useful and distinctive enough to apply direct image matching through local features.
Many work in broadcasting sports video to detect the replays, which are often the highlights of the match, were done.
A typical work was published in International Conference on Image and Graphics in 2004, which gives us the idea to solve the same problem in e-sport videos
They do logo-transition detection for logo detection, then segment the video based on the logos to determine the replay
Another related work which has similar target with a step of our work is to find the near-duplicate image in the large-scaled database.
As I have mentioned, the replay frame and its source frame have similar content, are slightly different in the visualization of other information such as time, round, player’s information, kill and death.
In general, our approach has 3 main steps: Logo-bounded video retrieval, Image Hashing and Matching, and combine the outputs to generate the final results.
For this step, we try to eliminate redundant frames extracted from commentator stream.
We propose to compare the proportion of similar ORB features and color histograms in 2 nearby frames, then put a threshold for filtering.
The filtered frames are then inputted through Bag of Visual Words model to create visual words dictionary and BOVW vectors.
Then a logo frame obtained in the train data is used to retrieve the frame with Intel Extreme Masters logo.
The frame is horizontally center-cropped to focus the retrieval model to retrieve the correct images that contain this part at the correct position in the image.
At the end of this step, we obtain many videos bounded by two logo endpoints, which might contain one or multiple replays in it.
After this step, we transform images in both logo-bounded videos and source streams in to a hash and use elastic search engine to find the most near-duplicate image of the replay frame in the large-scaled database.
Hence, after this step, we receive a list of the near-duplicate frame indices in the source stream which respect to the replay frame.
Finally, we combine the two output from our 2 main steps.
Based on the gap of the synchronized source frame indices of two consecutive frames in logo-bounded video (3.5s), we determine the split point of the replays in the logo-bounded videos.
The condition here is that gaps could not be too large while the replay time is too short.
Our best configuration can achieve the second-highest precision, F1 scores. The precision is 73.17%, while the average overlapping between the predicted source stream for the replays and the ground truth is 63.89. It shows that our heuristic to find the source stream works quite well on the correct replays
For Jaccard Index threshold 0.75, our score decreases significantly due to some wrong split point identification, which reduces the length of our predicted replay. However, for the correctly predicted replays, we manage to find its true source stream parts, which resulted in avg.overlap score of 70.31.