Supervised sequential pattern mining for identifying important patterns of play in rugby
1. Supervised sequential pattern mining for identifying
important patterns of play in rugby
Rory Bunker1
, Keisuke Fujii1
, Hiroyuki Hanada2
, Ichiro Takeuchi2,3
1
Graduate School of Informatics (Sport Behavior Group, Takeda Lab), Nagoya University, Nagoya, Aichi, Japan
2
RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
3
Department of Computer Science, Nagoya Institute of Technology, Nagoya, Aichi, Japan
8th MathSport International Conference, June 24-25, 2021 (online)
2. Standard performance analysis approach
● Match video is tagged in video analysis systems (e.g., SportsCode, Dartfish either by
performance analysts or coaches at the amateur level).
● Tagged video is then exported to event log (XML) files for a particular match
● Event log files are converted into performance indicators...
...but then the (potentially
valuable) information
contained in the order of
events is lost.
(averages, ratios and frequencies,
e.g., average tackles by player, line
breaks made per ball carry,
penalties conceded in match)
2
3. Approach in this study
● Convert match event logs into labeled passage of play event sequences
● Apply a supervised sequential pattern mining (SPM) method to identify discriminative
patterns of play (sub-sequences) that discriminate between different outcomes, e.g.,
scoring/not scoring, conceding/not conceding.
3
(We refer to “passages of play” and “sequences” interchangeably, and “patterns of play” and
“sub-sequences” interchangeably)
4. Research analyzing sequences in rugby
● Some prior studies have analyzed the duration and location of movements of play,
and how these relate to success (e.g., at the 1995 and 2003 RWC tournaments - van
Rooyen & Noakes, 2006, Intl. Journal of Perf. Analysis in Sport; Carter & Potter, 2001,
Notational Analysis of Sport III).
However, duration/location doesn’t consider the actual events that occur nor the
order in which these events occur.
● There has been increasing interest recently in the analysis of sequences of play in
rugby using advanced analytical methods, e.g., using
○ Multivariate statistics: Coughlan et al. (2019, Intl. Journal of Perf. Analysis in
Sport) employed K-modes clustering to identify sequences leading to tries
being scored.
○ Deep learning: Watson et al. (2020, Journal of the Operational Research Society)
used convolutional/recurrent neural networks to predict outcomes of
sequences of play based on the order and locations of events.
○ Our study adds to this line of research but from a data mining/pattern mining
perspective.
4
5. Decomposing seasons into matches, sequences & events
Season
Matches
Events
Passages of play
(sequences)
Patterns of play
(sub-sequences)
5
[Rules to delimit (split) matches into
passages of play need to be specified]
[Event log files are generally at the
match level, so to perform any
season-level analysis, they must be
combined in some way]
6. Supervised sequential pattern mining
● Invasion sports have many events/patterns that occur frequently and repeatedly.
● But coaches are more interested in events/patterns that are important (e.g., standard
passes vs. shots on target in soccer - Decroos, Van Haaren & Davis, 2018, SIGKDD).
● One solution to this is to employ supervised learning methods.
● Sequential pattern mining (SPM) involves discovering frequent sub-sequences as patterns
from databases that consist of ordered event sequences, which may or may not have
strict notions of time (Mabroukeh & Ezeife, 2010, ACM Computing Surveys).
● Supervised SPM is applied to labeled sequences, and can identify subsequences (patterns)
that discriminate between labels (outcomes).
6
E.g., pattern of play/sub-sequence [A, B, C, B, C] is identified as discriminating between the positive and
negative outcome in this case.
7. Safe Pattern Pruning
The SPP method (Nakagawa et al., 2016, SIGKDD; Sakuma et al., 2019, Advanced
Robotics) takes as input a set of labeled sequences (passages of play): ,
where gi
is the ith passage of play and [n] is the number of passages of play in the dataset.
Each passage of play is assigned a label from {+1,-1}.
SPP constructs a sparse linear combination
otherwise,
7
qj
Q, the set of all possible sub-sequences, is in general very large so the SPP method includes
some pruning mechanisms to reduce computational complexity (see Nakagawa et al., 2016 &
Sakuma et al., 2019 for details)
8. Optimization problem
Linear model parameters are estimated by solving the minimization problem:
where is a vector of weights, is a regularization parameter
that can be tuned by cross-validation, and is a loss function.
8
The magnitudes of the weights obtained from solving this problem gives us an indication
of how discriminative each pattern is (w=0 means not discriminative).
The regularization means that many of the weights will be zero in the optimal solution.
9. Optimization problem (cont’d)
where the feature vector
In other words, the feature vector
is a vector of binary variables: if pattern of play qj
appears in in a specific passage of
play/sequence, gi
, then xij
takes the value 1, otherwise it takes the value 0. 9
In a two-class problem like ours, the squared-hinge loss function is used:
In which case the minimization problem becomes
10. Data
10
● 490 sequences of play across all of the team’s matches
in the 2018 Japan Top League Season were obtained
from video tagged by the team’s performance analyst.
(the team itself isn’t named for confidentiality reasons)
● Each match sequence was made up of 24 unique events
(12 events for the team and 12 for each of the
opposition teams).
● The match sequences were split into passages of play so
that season-level analysis could be conducted.
11. Approach flow
● Match event logs were split into
passage of play sequences based
on specified rules (if a scrum -
except scrum restart, lineout, kick
restart, try or kick at goal occurs,
start a new sequence).
● At this stage, the sequences were
unlabelled and appended to
obtain the sequences for all
passages of play across the
season.
● Then, the scoring events (try
scored, kick at goal attempted)
were extracted from the
sequences and attached as labels
(then removed from the
sequence). 11
12. Approach flow
We then had two datasets:
1) scoring: from the team’s
scoring perspective (label=team
scored=+1, team didn’t score=-1)
2) conceding: from the
opposition teams’ scoring
perspective (i.e., team’s
conceding perspective)
(label=opposition team scored +1,
opposition team didn’t score -1)
SPP was applied to both, and patterns
and corresponding weights were
generated.
12
14. Approach flow
● For a fair comparison with the
supervised SPP method, the
unsupervised SPM methods were
assumed to have knowledge of the
sequence label, so were applied to
subsets containing the sequences
where the team or opposition team
scored: scoring+1, conceding+1
(unlabelled datasets).
● The 5 most discriminative (highest
weight) patterns obtained by SPP
(with support > 5) were compared
with the most frequent patterns
obtained by the unsupervised SPM
methods (PrefixSpan, GSP, Fast,
CM-SPADE and CM-SPAM)
14
15. Results: unsupervised SPM methods
● The unsupervised methods only extracted patterns consisting of phases and/or
breakdowns, which are frequent and repetitive but uninformative for
coaches/analysts (the patterns were also short in general).
15
16. Results: supervised SPM method (SPP)
● SPP-obtained patterns were both more sophisticated (greater variety of events) and more relevant for
coaches/analysts compared to the unsupervised methods, and which could be useful for
coaches/performance analysts for own- and opposition-team analysis, to identify opportunities, and to
devise tactical strategies.
● The odds ratio aids in interpretation. For pattern S1, the odds ratio (OR) is exp(0.919)=2.506, meaning
that the team is 2.5 times more likely to score when a line break occurs in a particular sequence of
play than if a line break is not made in that sequence of play.
16
S1
S2
S3
S4
S5
C1
C2
C3
C4
C5
17. Results: SPP-obtained pattern interpretation
17
● The insights would be practically useful to both the team as well as opposition teams that
are due to play the team.
○ For both the team and their opposition teams: line breaks were most associated
with scoring (S1,C1), and lineouts were found to be more associated with the creation
of scoring opportunities than scrums (S2,C3).
○ For opposition teams: create lineouts/prioritise them over scrums (C3), maintain
possession with repeated phase-breakdown play (C4,C5), shutdown the team’s ability
to regain their kicks in play (S3), and ensure touch is found on exit plays from kick
restarts made by the team (S5).
S1
S2
S3
S4
S5
C1
C2
C3
C4
C5
18. Conclusions & Future work
● The supervised SPM approach appears to be useful as an analytical
framework for performance analysis in sport.
● To the best of our knowledge, this study is the first to apply supervised SPM to
physical sport, to compare unsupervised SPM and supervised SPM in sport,
and to apply any form of SPM to for analyzing sequences of play in rugby.
● Applying the SPP method to more data in rugby and to other sports.
● Considering the order of passage of play sequences within the match (as well as
the order of events within sequences).
18
Acknowledgements: MEXT KAKENHI (20H00601, 16H06538), JST CREST (JPMJCR1502), JST
PRESTO (JPMJPR20CA), RIKEN Center for Advanced Intelligence Project