Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application for E-Sport analytics.
1. Finding duplicate labels in behavioral data
An application for E-Sport analytics
Mehdi Kaytoue
2016, Ekaterinburg, Russia
2. Une histoire de cigognes...
My hometown My teddy-bear Ekaterinburg
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 2 / 59
3. Short bio
2011 – Ph. D. from University de Lorraine, Nancy, France: Mining
numerical data with formal concept analysis with Amedeo Napoli and
a strong collaboration with Sergei O. Kuznetsov.
2011 – Post-doc in Belo Horizonte (Brazil) with Wagner Meira Jr.
2012 – Assistant professor at INSA Lyon, team data mining and
machine learning lead (at the time) by Jean-Fran¸cois Boulicaut (now
by C´eline Robardet).
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 3 / 59
4. What I could have been talking about
Constrained pattern mining
A database, e.g. transaction database
A fixed pattern shape, e.g. itemsets
A search space of all possible patterns (generally a lattice)
Several constraints, e.g. min. frequency
Goal: complete, correct, (non redundant) extraction of patterns sat.
the constraints
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 4 / 59
5. What I could have been talking about
Constrained pattern mining
Numerical data, sequential data, graph data, augmented graphs, ...
Family of constraints, bounds
Discriminant patterns
Formal and generic frameworks, e.g. Formal Concept Analysis
Generic algorithms and pattern domains that can be applied in many
application domains
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 5 / 59
6. Patterns in dynamic attributed graphs
• Triggering patterns: attribute variations
can impact the topology of the graph
< {a+, b+}, {c-},{deg+} >
17
7. Supervised descriptive
rules discovery
description —> class label(s)
Langages with different expressivity
Heuristic approaches (beam search)
Subgroup discovery:
stat. distribution of classes
Redescription mining:
Jaccard betwen the supports
Pareto frontiers:
when several measures
18
9. Journalism
• Mr. Y says: unemployment
decreases!
• He is not wrong but…
• Politicians are experts for
giving facts true in the
favorable contexte
• A context = a pattern!
the goal is to re-
contextualize the fact
automatically
Mandat
Mr. Y
Mandat
Mr. X
12. But today ...
League of Legends – NA LCS Summer Final
Madison Square Garden in New York, NY (19 August 2015)
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 6 / 59
13. Competitive gaming is raising drastically
Video game is a lucrative
industry
People enjoy watching other
playing (streaming via
Twitch.tv)
E-sports: professional
cyberathletes with teams,
commentators, sponsors,
cash prizes, ... ; between
sport and pure marketing
G. Cheung and J. Huang.
Starcraft from the stands: understanding the game spectator.
In SIGCHI Conference on Human Factors in Computing Systems. ACM, 2011, pp. 763–772.
M. Kaytoue, A. Silva, L. Cerf, W. Meira Jr. et C. Ra¨ıssi
Watch me playing, i am a professional: a first study on video game live streaming.
In WWW 2012 (Companion Volume), pages 1181–1188. ACM, 2012.
T. L. Taylor
Raising the Stakes:E-Sports and the Professionalization of Computer Gaming.
In MIT Press, 2012.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 7 / 59
14. A lot of challenges
Millions of games played on a
daily basis
Security issues
Bugs, cheaters
Balance issues
Fun vs challenging agents
Profiling & prediction
Match preparation
Playground for AI research
Arthur von Eschen
Machine Learning and Data Mining in Call of Duty (invited industrial talk).
European Conference on Machine Learning and Knowledge Discovery in Databases,
ECML/PKDD, Nancy, France, Sept. 2014)
S. Ontanon, G. Synnaeve, A. Uriarte, F. Richoux, D. Churchill, and M. Preuss,
A survey of real-time strategy game ai research and competition in starcraft.
Computational Intelligence and AI in Games, IEEE Transactions on, vol. 5, no. 4, pp. 293–311, 2013.)
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 8 / 59
15. StarCraft II: real time strategy game
Description
Two players are battling against each other on a map
Each chooses a faction (Zerg, Terran, Protoss: 6 different match-up
are possible)
Goal: use units to gather resources, to create buildings that can
produce units ... establish a strategy (choose the right buildings and
army composition) to destroy your opponent.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 9 / 59
16. Observation 1
Players and teams observe
game records of others
Complete game logs are
available
Global ranking as well (such
as ATP in tennis)
More and more players use sev-
eral [un-]official accounts to
hide their games and not being
studied by the others
http://leagueoflegends.wikia.com/wiki/Smurf
https://www.reddit.com/r/starcraft/comments/3gkfso/sc2_who_is_that_smurf/
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 10 / 59
17. Problem 1
Player1 Avatar1
Player2 Avatar2
Match
Avatar3
Viewers
?||||||||
Can we identify if two avatars belong to the same player?
We have huge amounts of behavioral data!
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 11 / 59
18. Observation 2 and problem 2
Esport has all elements of a sport (pro, amateurs, coach,
commentators, competition with high prizes, sponsors ...)
Studying the strategies of the players is a key problem
Can we discover automatically strategies from game traces?
Game editors need balanced games
Players need to discover frequent strategies of their opponents
Discovering patterns reveling strategies characteristic of a player of
a win/loss in general
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 12 / 59
19. Outline
1 Predictive models from behavioral data
2 Unscrambling confusion matrices to identify aliases
3 Enumerating the lattice of binary classifiers
4 Discovering strategies and balance issues
5 Conclusion
20. Behavioral data as replay files
The RTS game StarCraft 2:
to improve strategy execution,
players
assign control groups to
units and buildings,
bind them to keyboard
hotkeys (1, 2, ..., 9, 0),
use them intensively along
with the mouse
(see on Youtube ’moon
APM demo’) Source: Yan et al., SIGCHI2015
Avatar Game trace Outcome
RorO s,s,hotkey4a,s,hotkey3a,s,hotkey3s, ... Lose
TAiLS Base,hotkey1a,s,hotkey1s,s,hotkey1s, ... Win
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 14 / 59
21. Keyboard usage patterns
Hypothesis
A player cannot hide behavioural patterns when changing avatars
0510152025 OOOOOOOOOOOOOX OOOOOOOOOOOOOOOOOOOOX OOOOXX OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOX OOOOOOOOOOOOOOOOOOX OOX OOOOXX OOOOOOOOOX OOOOOOOOOOOOOXXXX OOOOOOOOOOOOX OOOOOOOOOOOOOX OOOOOOOOX OOOOOOOOOOOOOXXX OOOOOOOOOOOOOXX OOOOOXXX OOX OOXXXXXXXXXX OOOOXXX OOX OOOX OOOXXXX OOX OOXXXX OOOOOXXX OOOOX OOOOX OOOXXX OOOOOOOX OOOOOOOOXXXX OOOOOOOXXXXX OOXXXXXXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX OXXXXXX OOOXXXXXXXXXXXX OOXXXXXX OXXXXXXXXXXXXX OOOXXXXXXXXXX OOOOXXXXXXXX OOXXXXXXXXXXXXXXX OX OOOOOXX OXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX OOOOOOXXXXXXXXXXXXXX OOXXXXXXXXXXXX OXXXXXXXXXXXXXXXXXXXXXXXXXXX OOOOOOOOOOOOOOOXX OXXXX OXXXXXXXXXXXX OOXXXXXXXXXXXXXXXXXXXXXX OX OOOX OXXX OOOOOOOX OOOOOOOOXX OX OX OOOOOOOOOOX OX OOOOOOOOOO
Dendogram of a hierarchical clustering from 708 traces from 354
games: each color denotes a unique avatar
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 15 / 59
22. Predictive models with high accuracy
101 102 103 1040.5
0.6
0.7
0.8
0.9
1.0
Precision
θ=5
j48
smo
nbayes
knn
101 102 103 1040.5
0.6
0.7
0.8
0.9
1.0θ=10
j48
smo
nbayes
knn
101 102 103 104
log(τ)
0.5
0.6
0.7
0.8
0.9
1.0
Precision
θ=15
j48
smo
nbayes
knn
101 102 103 104
log(τ)
0.5
0.6
0.7
0.8
0.9
1.0θ=20
j48
smo
nbayes
knn
Precision
Hotkeys hide unique patterns
20 first seconds of the game
are enough
20 games are enough
We found a similar result, but
considering on purpose dataset
without avatar aliases, since
precision drastically drops
Eddie Q. Yan, Jeff Huang, Gifford K. Cheung.
Masters of Control: Behavioral Patterns of
Simultaneous Unit Group Manipulation in StarCraft2.
In CHI 2015, Crossings, Seoul, Korea 37–11, 2015.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 16 / 59
23. The duplicate label problem
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 17 / 59
24. Outline
1 Predictive models from behavioral data
2 Unscrambling confusion matrices to identify aliases
3 Enumerating the lattice of binary classifiers
4 Discovering strategies and balance issues
5 Conclusion
25. Notations
A prediction model ρ : T → L is learned
T a set of traces
L a set of trace labels (the avatars)
Tl the set of traces generated by avatar l ∈ L
The model is evaluated (e.g. cross-validation)
ρ(t) ∈ L return the model prediction for the trace t ∈ T
Confusion matrix ˜Cρ = [ci,j /|Tli
|] with
ci,j = |{t ∈ Tli
s.t. ρ(t) = lj }|
l1 l2 l3 l4 l5
l1 0.6 0.4 0 0 0
l2 0.4 0.55 0.05 0 0
l3 0 0 0.8 0.15 0.05
l4 0 0.05 0 0.7 0.25
l5 0 0 0 0.5 0.5
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 19 / 59
26. Objectives
Idea: two avatars of the same player should draw a high confusion
l1 l2 l3 l4 l5
l1 0.6 0.4 0 0 0
l2 0.4 0.55 0.05 0 0
l3 0 0 0.8 0.15 0.05
l4 0 0.05 0 0.7 0.25
l5 0 0 0 0.5 0.5
We are searching for pairs of labels that concentrate the confusion
(arbitrary sets are left for later)
˜Cρ
ij
˜Cρ
ji
˜Cρ
ii
˜Cρ
jj
˜Cρ
ij + ˜Cρ
ji + ˜Cρ
ii + ˜Cρ
jj 2
... li lj ...
... ...
li ... Ci,i Ci,j ...
lj ... Cj,i Cj,j ...
... ...
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 20 / 59
27. Method (1/2): extract fuzzy concepts
Formal Concept Analysis (FCA) with a fuzzy set intersection
Each label (row) is considered as a fuzzy set
Labels and their (fuzzy) intersections form a semi-lattice
Closed sets are extracted and scored (monotone)
M. Kaytoue, V. Codocedo, A. Buzmakov, J. Baixeries, S.O. Kuznetsov, A. Napoli:
Pattern Structures and Concept Lattices for Data Mining and Knowledge Processing.
ECML/PKDD 2015, Nectar track
Example
l1 l2 l3 l4 l5
l1 0.6 0.4 0 0 0
l2 0.4 0.55 0.05 0 0
l3 0 0 0.8 0.15 0.05
l4 0 0.05 0 0.7 0.25
l5 0 0 0 0.5 0.5
δ(l1) = {l0.6
1 , l0.4
2 , l0
3 , l0
4 , l0
5 }
δ(l2) = {l0.4
1 , l0.55
2 , l0.05
3 , l0
4 , l0
5 }
d = δ(l1) δ(l2) = {l0.4
1 , l0.4
2 , l0
3 , l0
4 , l0
5 }
support(d) = {l1, l2}
s(d) =
|L|
j=1
dj
= 0.8
The pair (l1, l2) is an avatar alias candidate
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 21 / 59
28. Method (2/2): rank and filter pairs
Candidate pairs are scored
A cosine similarity is used, the highest the better
cluster score(ai , aj ) = cosine( ˜Cρ
ii , ˜Cρ
ij , ˜Cρ
jj , ˜Cρ
ji )
... li lj ...
... ...
li ... Ci,i Ci,j ...
lj ... Cj,i Cj,j ...
... ...
Why?
ai aj
ai 1 0
aj 1 0
cosine( 1, 0 , 0, 1 ) = 0
Candidates are ranked; the list is cut with a threshold if necessary
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 22 / 59
29. Experimental settings
Datasets
Collection 1 - 2014 World Championship Series: 955 one-versus-one
high level games and 171 unique players
Collection 2 - Spawning Tool Website crawl July 2014: 10,108
one-versus-one games and 3,805 players
1
10
100
1000
200 400 600 800 1000 1200 1400
Numberofgamesplayed(log-scale)
Number of players
Collection 2
Collection 1
0
20
40
60
80
100
0 100 200 300 400 500 600 700 800 900 1000
%Actions
Time (secs)
Base
Selection
SingleMineral
Hotkeys
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 23 / 59
31. Building a ground truth and evaluating aliases retrieval
Idea: each class is split into several; can we retrieve them?
Parameters:: γ = 0.2, θ = 20, λ = 0.9, τ = 90
Surrogates
Classifier F1 MAP Recall AUC Precision P@10
j48 0.468 0.824 0.805 0.904 0.33 1.0
naivebayes 0.226 0.740 0.390 0.915 0.16 0.8
smo 0.312 0.971 0.536 0.993 0.22 1.0
knn 0.567 0.822 0.976 0.882 0.4 0.9
Surrogates & URLS
Classifier F1 MAP Recall AUC Precision P@10
j48 0.588 0.907 0.606 0.866 0.57 1.0
naivebayes 0.443 0.857 0.457 0.864 0.43 1.0
smo 0.257 0.912 0.266 0.945 0.25 1.0
knn 0.670 0.937 0.691 0.874 0.65 1.0
Surrogates & URLS & Names
Classifier F1 MAP Recall AUC Precision P@10
j48 0.689 0.983 0.606 0.935 0.8 1.0
naivebayes 0.560 0.943 0.492 0.906 0.65 1.0
smo 0.258 0.949 0.227 0.960 0.3 1.0
knn 0.758 0.967 0.667 0.792 0.88 1.0
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 25 / 59
32. About false positive
Some FP are not (same
unique id hidden for the
experiments)
Some FP with high
score are actually the
avatars we are looking
for!
0.6 0.7 0.8 0.9 1.0 1.1
Score
0
5
10
15
20
Ranking
EGaLive - aLiveRC
SMO Top 20 : γ=0.05, θ=5, λ=0.9
SUG
URL
NAMES
FP
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 26 / 59
33. Outline
1 Predictive models from behavioral data
2 Unscrambling confusion matrices to identify aliases
3 Enumerating the lattice of binary classifiers
4 Discovering strategies and balance issues
5 Conclusion
34. Can we do better?
(bi)-cluster the confusion matrix
Cavadenti, O., V. Codocedo, J.-F. Boulicaut, et M. Kaytoue.
When cyberathletes conceal their game : Clustering confusion
matrices to identify avatar aliases.
Dans International Conference on Data Science and Advanced
Analytics (IEEE DSAA 2015).
1 2 3 4 5
1 10 8 0 0 0
2 7 8 1 0 0
3 0 0 5 3 1
4 0 1 0 12 6
5 0 0 0 5 8
The model is built a false labeling!
Some labels may be hard to be learned
Imbalanced distribution of the labels
Non enough samples for some labels
Virtual identities may be shared
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 28 / 59
35. General intuition
The problem of finding label duplicates
Given
a set of instances (game traces) T
each taking a label in L
Find a tolerance relation over L, that is, a set of subsets of L covering L,
possibly with non-empty intersections (more general than a partition).
Basically
A tolerance relation is an anti-chain of the lattice of label subsets (2L, ⊆)
{{l1, l2}, {l3}, {l4, l5}}
{{l1, l2, l3}, {l3, l4, l5}}
...
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 29 / 59
36. General idea
Build a binary classifier for all subsets of labels
L
Ø
For each, B ⊂ L, we have
a model ρB : T −→ {+, −} with + = B et − = ¯B,
provided with its confusion matrix
Desiderata
A set B ⊂ L is valid iff it represent a set of duplicate labels
How to select these valid sets?
How to avoid building 2|L| models?
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 30 / 59
37. F1-mesure for each label set B
Predicted
Actual
CρB + −
+ α++ α+−
− α−+ α−−
F1-mesure
Given B ⊂ L and CρB :
ϕB =
2 · α++
(2 · α++) + (α+−) + (α−+)
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 31 / 59
38. First constraint
Given C, D ⊂ L and E = C ∪ D.
Greedy model improvement
E is valid if
ϕE ≥ max(ϕC , ϕD)
φE
?
φc
=0.5 φD
=0.4
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 32 / 59
39. Is it enough? (actually it is...)
Given C, D ⊂ L and how the corresponding models classified 10 instances
C
D
C and D are probably not duplicate labels
C D
C and D are probably duplicate labels
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 33 / 59
40. Constraint 2
For E ⊆ L, PE is composed of the instances classified as TP, FN, FP.
Instance coverage
E ⊆ L is valid if
max(|PC |, |PD|) ≤ |PE | ≤ |PC | + |PD| − µ(PC , PD) · θ
with µ a measure (min, max) and θ ∈ [0; 1].
Intuitively, if E is valid, we should have PE = PC ∩ PD, having similar
traces.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 34 / 59
41. Algorithm
Generate all subsets, level-wise, bottom-up
For each subset B ⊂ L,
Learn model ρB
Validate (crossed validation)
Compute scores
Check constraints (remove from candidates otherwise)
Continue next level with current candidates
The result is given by the maximal elements (size-wise/inclusion-wise)
L
Ø
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 35 / 59
42. Experimental settings
Datasets
Collection C1 - 2014 World Championship Series: 955 one-versus-one
high level games and 171 unique players
Collection C2 - Spawning Tool Website crawl July 2014: 10,108
one-versus-one games and 3,805 players
Need a ground truth from C1.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 36 / 59
43. Ground truth
Imagine several traces/instances of A ∈ L.
A A A A A A
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59
44. Ground truth
Imagine several traces/instances of A ∈ L.
A A A B B B
Balanced split 50% – 50%
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59
45. Ground truth
Imagine several traces/instances of A ∈ L.
A A B B C C
Balanced split 33% – 33% – 33%
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59
46. Ground truth
Imagine several traces/instances of A ∈ L.
A A B B B B
Imbalanced split 33 % – 66 %
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 37 / 59
48. Outline
1 Predictive models from behavioral data
2 Unscrambling confusion matrices to identify aliases
3 Enumerating the lattice of binary classifiers
4 Discovering strategies and balance issues
5 Conclusion
49. Goal
Discovery of strategies
Automatically from a large set of games
Evaluate their capacity to win/loose
Framework
Sequential pattern mining
Discriminant pattern mining
Jian Pei et al.
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth.
In ICDE, 2001.
Guozhu Dong, Jinyan Li
Efficient Mining of Emerging Patterns : Discovering Trends and Differences.
In KDD, 1999.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 40 / 59
50. Sequential pattern mining
id description
s1 a{abc}{ac}d{cf }
s2 {ad}c{bc}{ae}
s3 {ef }{ab}{df }cb
s4 eg{af }cbc
Example
Set of items: I = {a, b, c, d, e, f }
Sequence : s1 = a{abc}{ac}d{cf }
Sub-sequence: abc a{abc}{ac}d{cf }
Frequent sub-sequence: cb s2, s3, s4
⇒ |supportD( cb )| = |{s2, s3, s4}| = 3 ≥ minSupp = 2
Problem : extract the complete and correct collection of frequent
sequential patterns
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 41 / 59
51. Emerging pattern [Dong, Li - 1999]
id description class
s1 a{abc}{ac}d{cf } +
s2 {ad}cc{bbc}{ae} +
s3 {ef }{ab}{df }cb cb −
s4 eg{af }cbcbc −
Discriminating power
Each sequence is labeled (+ or −)
A pattern is emerging if it has a high support in a class and low one
in the other
Growth-rate: gr(s, Dx ) = |support(s,Dx )|
|Dx | × |Dy |
|support(s,Dy )|
gr( cb , D−) = 2
2 × 2
1 = 2
P. K. Novak, N. Lavrac, and G. I. Webb.:
Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining..
J. Mach. Learn. Res., 10:377–403, 2009.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 42 / 59
52. How to encode game logs?
Case 1 :
Sequence Winner
(j1, a){(j1, b)(j1, c)(j2, c)}{(j2, a)(j1, d)}(j2, b) j1
(j3, a){(j3, b)(j3, c)(j3, d)}{(j1, b)(j1, c)}(j1, d) j3
but we wish to generalize to + and − classes only
Case 2 :
Player sequence class
j1 a{bc}d +
j2 c{ab} −
j1 a{bcd} −
j3 {bc}d +
⇒ but we need to take into account the action/reaction principle
Proposed encoding:
Sequence
(a, +){(b, +)(c, +)(c, −)}{(a, −)(d, +)}(b, −)
(a, +){(b, +)(c, +)(d, +)}{(b, −)(c, −)}(d, −)
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 43 / 59
53. Definitions
Items
Sequence can take symbols in I = A × R o`u R = {+, −}.
Dual of an item, of a sequence
The dual of item i = (a, r) ∈ I is given by ˜i = (a, Rr) ∈ I.
The dual of a sequence s, denoted ˜s, is obtained by replacing each item
(a, r) ∈ I with its dual (a, Rr) ∈ I.
Example
s = {(a, −)(b, +)(c, −)}(e, +)
˜s = {(a, +)(b, −)(c, +)}(e, −)
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 44 / 59
54. Discriminating measure
The balance measure)
Let s be a frequent sequential pattern,
balance(s) =
|supportD(s)|
|supportD(s)| + |supportD(˜s)|
Properties
balance(s) ∈ [0; 1]
balance(s) = 0.5 ⇒ balanced strategy
balance(s) = 1 or 0 ⇒ imbalanced strategy
balance(s) + balance(˜s) = 1
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 45 / 59
55. PrefixSpan [Han et al., 2001]
Algorithm that enumerates frequent sequence prefixes
Input:
Sequence database (encoded game logs)
Minimal support (minSupp)
Output :
All frequent sequential patterns and only them
i1
i2 i6
i3
i4 i5
<i1>
<i1 i2> <{i1 i6}>
<i4> <i5>
<i1 i3>
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 46 / 59
56. Algorithms
Balance measure computation
As a post processing
Naively
For each frequent pattern, builds its dual
Scan the base to get its support
Naive optimization
i1
i1
i2
i3
i4
i5
i6
1,q2,q6,q10
1,q2,q20
1,q6
3,q7,q14
3,q8,q9
3,q6,q10,q15
i1 ...
...
...
...
...
...
...
Item Dual(Item)
i1
i2
i3
i4
i5
i6
i4
i6
i5
i1
i3
i2
SupportDual(<i1>)q=qSeq(Dual(i1),i1)q=q{3,7,14}
SupportDual(<i1qi2>)q=qIntersect(SupportDual(<i1>),Seq(Dual(i2),i1)q=q{3}
Seq
i2 i6
i3
i4 i5
<i1>
<i1qi2> <{i1qi6}>
<i4> <i5>
<i1qi3>
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 47 / 59
57. Algorithm
Suppressing redundant patterns
s = {(a, −)(b, +)(c, −)}(e, +)
˜s = {(a, +)(b, −)(c, +)}(e, −)
As a post process
Double search in the prefix tree
i1
Item Dual(Item)
i1
i2
i3
i4
i5
i6
i4
i6
i5
i1
i3
i2
i2 i6
i3
i4 i5
<i1>
<i1 i2> <{i1 i6}>
<i4> <i5>
<i1 i3>
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 48 / 59
58. Algorithms
Actually, plenty of algorithm adaptations are possible for some
particular cases of datasets
We designed an efficient and generic algorithm
Extends PrefixSpan by considering two projected databases per node.
G. Bosc, M. Kaytoue, C. Ra¨ıssy, J.-F. Boulicaut, P. Tan.
Mining Balanced Sequential Patterns in RTS Games.
European Conference on Artificial Intelligence, ECAI 2014
G. Bosc, P.Tan, J.-F. Boulicaut, C. Ra¨ıssy and M. Kaytoue
A Pattern Mining Approach to Study Strategy Balance in RTS Games.
IEEE Transactions on Computational Games and Artificial Intelligence (early access), 2015.
Another work applied to StarCraft II data
C. Low-Kam, C. Ra¨ıssi, M. Kaytoue, J. Pei
Mining Statistically Significant Sequential Patterns.
International Conference on Data Mining (ICDM) 2013.
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 49 / 59
59. Data collection
Scraping 371 267 replays
Filtering to keep 90 768 games, 30 678 different players
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 5 10 15 20 25 30 35 40
Replay
Time (min)
0
100
200
300
400
500
600
0 5 10 15 20 25 30 35 40
APM
Time (min)
Average + Standard deviation
Average
Average - Standard deviation
0
20
40
60
80
100
0 5 10 15 20 25 30 35 40
% Actions
Time (min)
Build
Train
Select
Move
Click
Research
Upgrade
HotKey
Minimap
Other
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 50 / 59
60. Sequence dataset
Data Build
Item Seq. IS I/IS
PvP 1,160 6,668 11.5 2.0
PvT 3,655 18,754 19.0 2.6
PvZ 3,748 22,784 19.6 2.7
TvT 2,201 7,457 20.7 2.8
TvZ 4,492 23,637 22.5 2.8
ZvZ 1,689 9,554 14.2 2.2
Table: Encoding building construction during the 10 first minutes
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 51 / 59
62. Quantitative results
Symmetric axis: y = 0.5
Non perfect symmetry: if a sequence s is frequent,
it does not imply that ˜s is frequent too
Pattern with highest support are the most known strategies and are
balanced
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 53 / 59
63. Example of discovered patterns [Forge-Expand]
Protoss strategy in PvZ
Motivation: favor economy in early game while still being able to
defend
minSupp 5% - 591 patterns
s = {(Nexus, 5, +)}{(Gateway, 6, +)(PhotonCannon, 6, +)} -
balance(s) = 0.52
s = {(Nexus, 5, +)}{(PhotonCannon, 6, +)(Assimilator, 6, +)} -
balance(s) = 0.52
Temps (sec)
36A-A40A:
96A-A106A:
132A-A145A:
132A-A145A:
144A-A158A:
144A-A158A:
144A-A158A:
Action
Pylon
Forge
Nexus
Pylon
Gateway
PhotonACannon
AssimilatorAx2
BuildAOrderA:AForgeAExpand
Source : http://www.teamliquid.net/
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 54 / 59
64. Example of discovered patterns [4 Gates]
Protoss strategy in PvP
Motivation: all-in, aggressive, early game attack (scarifies economy)
minSupp 5% - 3418 motifs
s = {(Gateway, 3, +, 1) (Assimilator, 3, +, 1)} {(Cyb.Core, 4, +, 1)}
{(Gateway, 7, +, 2) (Gateway, 7, +, 3) (Gateway, 7, +, 4)} -
balance(s) = 0.59
Temps (sec)
36W-W40W:
72W-W79W:
96W-W106W:
108W-W119W:
132W-W145W:
192W-W211W:
216W-W238W:
240W-W264W:
240W-W264W:
Action
Pylon
Gateway
Assimilator
Pylon
CyberneticsWCore
Warpgate
GatewayWx3
Pylon
Assimilator
BuildWOrderW:W4WGates
Source : http://www.teamliquid.net/
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 55 / 59
65. Imbalanced strategies
A hot topic for game editors
TvZ + minSupp = 1% : 17 990 patterns
“Bunker-Rush” detected and imbalanced
Bunker contained 602 motifs
20 patterns with balance(s) ≥ 0.6 or ≤ 0.4 when the bunker is done in
early game
s = {(Barracks, 1, S, 1)}, {(SpPool, 4, F, 1)}, {(Bunker, 6, S, 1),
(SpCrawler, 6, F, 1)} (balance(s) = 0.61)
This balance issue has been actually corrected (May 2012): a Zerg
counter unit as been slightly improved and bunker timing is longer.
We divided the dataset into two and run a comparative analysis,
frequent patterns with bunkers are more balanced.
The code is available and can be used for other tasks!
https://github.com/guillaume-bosc/BalanceSpan
(For example, mining (im)-balanced drafting in MOBA games).
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 56 / 59
66. Outline
1 Predictive models from behavioral data
2 Unscrambling confusion matrices to identify aliases
3 Enumerating the lattice of binary classifiers
4 Discovering strategies and balance issues
5 Conclusion
67. Conclusion
Take away facts
E-sport may not be a ’true’ sport, but its development is incredible
New challenges in video game design and analytics: fun/difficulty
paradigm to satisfy standard players and pro
Games traces hide individual patterns
In StarCraft 2, ia customizable keyboard usage
When avatar aliases are present, one needs to unscramble the confusion
matrix
To avoid biases, on can build the lattice of binary classifiers
Games traces hide strategies
Sequential pattern mining with a new measure, the balance measure
can help discovering such patterns
It can be applied in any zero-sum game scenario for descriptive
analytics
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 58 / 59
68. Thanks to my colleagues
at INSA/ LIRIS: Guillaume Bosc, Jean-Fran¸cois Boulicaut,
Victor Codocedo, Quentin Labernia, Marc Plantevit, C´eline Robardet
at MIT Media Lab / Game Lab: Philip Tan
at INRIA: Chedy Ra¨ıssi
and most importantly to you and the AIST organization team!
M. Kaytoue (INSA de Lyon, LIRIS) Video gaming and Digital signatures AIST 2016 59 / 59