Recombinant DNA technology (Immunological screening)
Chance detection in football broadcasts
1. Chance detection in football broadcasts
Feature extraction and classification in football
streams using vision and deep learning
Auke vanderSchaar – Stratagem Technologies – London Machine Learning Meetup – 11 December 2017
2. Chance detection - Stratagem Technologies 2
Stratagem Technologies
● Stratagem Technologies is a machine learning financial technology
company focused on sports betting
– Sports prediction as an alternative financial asset class
● Predictive modeling requires historical data
● Trading based on in-play predictions requires online (real-time) data
● Fortunately, sport broadcast videos are ubiquitous and a rich source
of both historical and real-time information at a reasonable cost
● Challenge: how to exploit this information to improve trading?
– Chance detection (Focus is on detection, not on anticipating chances!)
3. Chance detection - Stratagem Technologies 3
Why chance detection?
● Chance: shot (attempt) on goal
● Chances occur more frequently than goals
– a more meaningful statistic and can signify momentum in football games.
● Human analysts annotate chances in football matches in more than
20 leagues.
– A football match has approx 20 chances per game
– Chances are divided in 6 categories. From “poor” to “superb”
– Goal conversion rate for each category is known (next slide)
– Annotations are used as input to prediction models for trading
– However, they are not precise enough for training a computer vision model
● Precision of tagging is approx. 1 second => very poor
4. Chance detection - Stratagem Technologies 4
Chance types
What is the conversion rate of each chance type to a goal?
● Known and exploited for trading
● E.g. for ‘superb’ (unmissable) chances such as clear open nets, rate ~
0.8 (8 out 10 superb chances result in a goal)
5. Chance detection - Stratagem Technologies 5
Detecting Chances – Challenges for Computer
Vision
● Small Field of View (FOV), short scenes, replays and close-ups prevent building up a
consistent view over time and space.
● Unbalanced
– The duration of a chance is approx 1-2 seconds and thus only 1% of a game is part of a chance.
– One season of one league of approx 300 games has (only) 6000 chances and results in 450 hours (40M
frames) of video.
● Difference between “a chance” and “not a chance” is subtle
– Touched by an attacking player or not makes the difference.
● Noisy labels
– Sometimes a replay is tagged
– Some chances are missed
– Chances close to each other are collapsed into one
8. Chance detection - Stratagem Technologies 8
Datasets
The dataset grows by approx 300 matches (450 hours) per week!
– 600 fixtures accurately annotated, used for training and validation/testing (and
increasing). Images from one fixture can only be in one (training/test) set.
– 40 fixtures holdout set.These are never used for training or validation by any of the
systems.
– 1100 fixtures from last season from four major leagues processed for downstream
evaluation
9. Chance detection - Stratagem Technologies 9
Unique challenges
● Not only unbalanced but difference between positive and negative sample
set is small
– A limited amount of positive samples
● The mapping of the ground truth (annotations) to video frames is not 1:1
– Complicates a data driven approach but rule based approaches are brittle
● 1000s hours of video limits the amount of processing possible and rules out
more sophisticated methods
– Any promising method must be evaluated on numerous videos.
● Only then the impact upon the precision will be become clear
→ exploit the video!
● Raise the signal from the noise
10. Chance detection - Stratagem Technologies 10
What is the aim of the Computer Vision
systems?
Each system takes a video as input and produces a list of chances, at which side of the field
(attacking team) and the game time at which they occurred as output
3 alternatives: why?
● Football vision
– Object detection and camera view to birds’ eye view conversion
– A high level feature extractor that can be used as input to an ML algorithm trained to detect chances
● ConvNet feature extractor
– Off the shelf deep state of the art single model neural network (CNN) with the classification layer removed.
– Train a ML algorithm using the features of the extractor as input
● ConvNet End to End
– Neural net trained end-to-end to do chance detection
11. Chance detection - Stratagem Technologies 11
System requirements
Processing (large number of) videos and training models is very
resource intensive
– Best fast method (best speed/quality trade off)
– A system should process approx 20 to 30 fixtures per day
Must generalize to all leagues
• Robust to different viewing conditions
13. Chance detection - Stratagem Technologies 13
Football vision
Performs object detection and converts the position of each object
(player) to a 2D field position
14. Chance detection - Stratagem Technologies 14
Vision – object detection
●
RCNN1
– Two stage detector
– Region proposals: more precise but slower
– Quoted: ~5 fps @ 600x400
● Football vision Python
– Detector for players based on Resnet50
– Detector for ball and goalpost corner based
onVGG16
– 1 fps
●
SSD2
– Single phase detector
– Quoted: ~ 45 fps @ 300x300
– 600x600 @ 15 fps
● Ball is small
– Faster but less precise
● Football vision C++
– VGG16
– 15 fps
1
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
2
SSD: Single Shot MultiBox Detector
3
Speed/accuracy trade-offs for modern convolutional object detectors
15. Chance detection - Stratagem Technologies 15
Vision - Homography
● A 2D plane (e.g. a football field) viewed from different viewpoints are related by a
homographic transform.
● The transform matrix H can be calculated by finding at least four corresponding
points in the two views.
1
Multiple View Geometry in Computer Vision, Zisserman et al
2
OpenCV tutorial
3
Chess board image credit: 2D projective transformations (homographies) Christiano Gava
16. Chance detection - Stratagem Technologies 16
Vision – Homography – pitch line detection
● With a broadcast there is only one camera view (monocular)
– Follows the action by panning, zooming and sometimes from different view points.
– Pitch line can serve as reference point
– Use pitch line detection to find key markers the camera view
– The position of the pitch line is known in the 2D field (birds eye) view
● There exist no ground truth data-set for pitch line detection
– Excludes a ML (data driven) approach.
● use rule based pitch line detection
– Which can be used to generate data for training a neural net (experimental)
21. Chance detection - Stratagem Technologies 21
Vision – Homography – DL (experimental)
●
Deep Image Homography1
– A neural net can learn the relative homographic parameters given two images related by
a homography
● Can it also learn when only one image is given together with the
homographic parameters?
– The transform parameters are produced by “football vision”.
1
Deep Image Homography Estimation (archive)
22. Chance detection - Stratagem Technologies 22
Vision – Homography – DL - visualization
2
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
● .
What does the DNN use to find the homographic transform parameters?
- visualize with GRAD-CAM
24. Chance detection - Stratagem Technologies 24
Why CNN?
● Data driven
● Fast (faster than real-time, framerate > 25 fps)
● Development is also faster
● Training scales easily to large number of
fixtures
● From a system point of view simple
● Can combine multiple detectors
25. Chance detection - Stratagem Technologies 25
Deep Learning - CNN feature extractor
● Take a state-of-the-art single model CNN architecture, with weights
trained on ImageNet (ILSVRC), remove the last layer and use it as a
feature extractor.
– InceptionV3 (GoogLeNet) @ 299x299
– Features learned for classification on ImageNet useful for other related domains.
ILSVRC: Imagenet Large Scale Visual Recognition Challenge
2
Inception: Rethinking the Inception Architecture for Computer Vision
3
Feature extraction: CNN Features off-the-shelf: an Astounding Baseline for Recognition
26. Chance detection - Stratagem Technologies 26
Deep Learning - CNN feature extractor – cont
– Feature extraction takes approx 4 hours per football game
● Results in 500MB of compressed data per football match
– Once extracted training and classification is relative fast.
● Facilitates experimentation
● Single frame classifier
● Multi frame classifier: combine multiple frames (order not relevant) per prediction.
● Sequence classifier (order frames relevant)
– Loading all features of the training set (400 fixtures) is large.
● PCA reduces the accuracy
● Averaging features over frames seem to have little impact
– Aggregate features per second (25 frames)
● Large number of samples excludes many ML algorithms
– LDA performed best.
29. Chance detection - Stratagem Technologies 29
CNN end-to-end
● Use a state-of-the-art single model CNN (VGG, Resnet, InceptionVx)
remove the classification layer and add your own classifier layer on
top
Allows choice in
– parameters (resolution, image style (flow), …)
– number of targets
● left/right detector
– finetuning
– Domain specific visualization
but requires often training from scratch
30. Chance detection - Stratagem Technologies 30
CNN end-to-end
● Required number of pictures for training
– 400 (fixtures) * 20 (chances) * 4 second window * 2 images/second * 2 (lbl 0/1)= ~ 50 - 200K
● Hyper-parameter search space is large
– Early pruning
● Training and evaluation takes time and early feedback is required to make
progress and avoiding wasting a scarce resource
● Start with a small numbers of images
– All models are evaluated against a fixed set of images while training is ongoing.
– Only the top performers are allowed to the next stage (while optionally keeping the trained weights)
– Likewise the evaluation of the top performers while the training is ongoing is being evaluated against
the ground truth (human analyst annotations).
32. Chance detection - Stratagem Technologies 32
Metric – Precision Recall F1-score
● Performance of minority class
● Precision: fraction of true positives of all detected positives (predicted
chances)
● Recall: fraction of true detected positives out of all positives
● F1 score: harmonic mean of precision and recall
● Not averaged but calculated after taking all detections from all fixtures
into account
● Example: Fixture with 20 chances Chance detector detects 30
chances of which 10 are correct
● Precision = 10/30 = 0.3
● Recall = 10/20 = 0.5
● F1 = 0.375
34. Chance detection - Stratagem Technologies 34
Results
● Noisy labels
-Train with the largest amount of images, for the longest
amount of time with a decaying learning rate schedule with
SGD + momentum
● Promising results are achieved with resnet50 @ 600x400 using
ball positions as extra regression target
● Ball positions are generated by football vision
● Only 200 fixtures with ball positions are available
35. Chance detection - Stratagem Technologies 35
Discussion
● There is now a chance detector with high recall which detects
chances or chance like situations reasonable precise in time.
– A chance is not a (long) sequence
– A chance is a very short event where the attacking player purposely pushes the ball
towards the goal
● With this detector we can thus create a dataset with only (known)
chances and chance like situations and further refine this
36. Chance detection - Stratagem Technologies 36
Conclusion
● Classical CV methods produce very general, high level, easy to
interpret features that can be used as input to many different types of
ML models
– Not a good chance detector!
– Very useful to generate labels which are used to improve the CNN
● CNN feature extractor is flexible, facilities experimentation and has
initially the upper hand.
● CNN end to end results in the best (sharpest), fastest and from a
system POV simplest classifier
● Chance detection results in a useful signal for trading