The document discusses using network analysis with Gephi software to analyze football team strategies. It analyzes graphs of passes between players for Arsenal and West Ham teams. Statistical analysis of the graphs shows West Ham had higher average degree and more passes, indicating a more cohesive playing style. Key players for each team were identified using centrality metrics like closeness, betweenness, and PageRank. Visualizing the networks provided insights into each team's tactics and formation. The analysis demonstrated network analysis can infer football strategies but has limitations due to incomplete data and variable real-world factors.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
football-network-analysis-with-gephi
1. FOOTBALL NETWORK ANALYSIS
WITH GEPHI
TO DETERMINE A TEAMS STRATEGY.
GROUP 2: ROBERT FERRO, SEAN JAMES, YOGESH
SHINDE,PRATIK DOSHI,MINGYANG CHEN AND
MICHEAL ABAHO
2. FORMALITIES
• Explain basic concepts in a readable form:
• Vertex – a player
• Edge – a relationship between 2 players.
• Objective: Analyse two opposing teams to see what tactics were used.
• Arsenal Vs West Ham : 0-2
3. WHAT HAS BEEN UNDERTAKEN
• Chose between 3 ideas:
• Football domain – most useful with real world application
• Coffee & sandwich habits – JCR,Coffeeshop
• Library habits
• Met at regular intervals/ social media /dropbox
• Initial research to find sites/resources of interest
4. WHY FOOTBALL AS A SUITABLE DATASET
• Other useful applications for this data
• How was data collected?
• Why is football domain a good choice? - Different tactics
• Defensive play: more in than out.
• Good team cohesion: triangles
5. HOW WE SPLIT THE TEAM
• Gephi construction team
• Retrieve relevant data and format ready - .gml format.
• 4-4-2.com
• Gephi graph
• How current is data(year of creation)
• Statistical analysis & visualisation team
• Presentation creation
6. WHAT IS GEPHI?
• Other tools available:
• Mathematica
• MATLAB
• How is it used
• Visual demonstration
7. WHAT WAS ACHIEVED
• Our working methodology
• Statistical analysis techniques
• Effective visualisations
• Ability to infer strategies and relationships between data.
8. OUR METHODOLOGY
• Define objective
• Research and collection of data
• Gephi to draw graphs to visualize collected data
• Then statistical reports,visualisation and analysis
• Conclusions and evaluations
9. STATISTICAL ANALYSIS
Arsenal Team West Ham Team
Total 110 edges Total 96 edges
Average Degree = 8.462 Average Degree = 13.714
Network Diameter = 2 Network Diameter: 3
Network radius = 1 Network Radius: 2
Number of Weakly Connected
Components: 1
Number of Weakly Connected
Components: 1
Number of Strongly Connected
Components: 1
Number of Strongly Connected
Components: 1
10. • Vertex degree chart
Vertex degree chart of Arsenal Team: Vertex degree chart of West Ham Team:
Players Degree In-
degree
Out-degree
Theo Walcott 7 5 2
Petr Cech 10 5 5
Alexis Sanchez 13 7 6
Olivier Giround 16 10 6
Laurent
Koscienly
17 9 8
Mathieu Debuchy 17 8 9
Francis Coquelin 18 9 9
Santiago Cazorla 19 8 11
Per
Mertesacacker
19 9 10
Alex Oxlade 19 8 11
Mesut Ozil 21 10 11
Nacho Monreal 21 11 10
Aaron Ramsey 23 11 12
Players Degree In-degree Out-degree
Modibo Maiga 4 3 1
Matthew Jarvis 8 5 3
Kevin Nolan 8 4 4
Diafra Sakho 10 7 3
Mauro Zarate 13 7 6
Angelo Ogbonna 14 8 6
Winston Reid 15 7 8
Aaron Creswell 15 8 7
Reece Oxford 15 6 9
Cheikhou
Kaouyate 15 6 9
Adrian 16 7 9
James Tomkins 18 7 11
Dimitri Payet 20 10 10
Mark Noble 21 11 10
11. • Degree distribution chart for arsenal team
Degree distribution chart for west ham team
12. FEATURES OF THE GRAPHS
• Nature of the graph: directed
• In/out degrees
• Weighted
• Formation signified by the graph. 4-4-2 visually looks like a 4-4-2 formation
13. STATISTICAL ANALYSIS TECHNIQUES
• Vertex degree
• Number of nodes joined to that node (popularity)
• Vertex degree
• Greater percentage for west ham because they passed more as a team.
• Isomorphic relationships
• Connectivity
14. Connectivity
• In graph theory connectivity indicates whether all nodes in a network can
be reached from any other node.
• If the graph is strongly connected(directed path) then it represents high
possession value of team.
• If the graph is weakly connected(undirected path) then it represents low
possession value of team.
• In our dataset by the graph analysis west ham team has slightly greater
possession value than arsenal team.
15. DIFFERENT TYPES OF CENTRALITY
• Betweeness centrality
• Closeness centrality
• Radius
• Diameter
• Pagerank algorithms
17. CLOSENESS CENTRALITY
• More central player -> higher closeness centrality.
• Easy access to any other player making them key to tactics
• Why is closeness good?
18. BETWEENESS CENTRALITY
• Number of times node acts as a bridge
• Central players ideally should have high betweeness centrality
• E.g. Midfielders: Ramsey,Ozil,Cazorla
• Extreme ends of network
19. RADIUS / DIAMETER
• Radius: Least number of hops to traverse network
• Diameter: most hops to get around network
20. PAGERANK ALGORITHM
• Adjacency matrix for Arsenal.
• Convert probabilities using a
‘random surfer’ based model
• Summarises player importance:
• Ramsey is central to Arsenal.
21. PAGERANK CONT.
• Dead ends – pages with no out links
• How do we address this?
• Spider traps – have out links but never link to other pages.
• Player who only gets passed the ball and is then tackled.
• Teleportation is a good compromise.
• A simplified pagerank formula
• v= (1−β)n+βM v
22. VISUALISATION TECHNIQUES
• Formation based graph(show subs and players positions in a visual way)
• Large edge means important relationship.
• Communities (defensive, midfield and offensive)
• Considered other layout techniques but not that useful(11 nodes is quite
easy to visualise)
24. TABLES AND FEATURES OF IN/OUT
• Explain in and out degrees
• Specific players as examples :
• Monreal: good defender hence has more in than out(trusted also).
• Machine learning could facilitate this sort of analysis.
25. BENEFITS/LIMITATIONS
Benefits:
1. Good analysis tool to visualize formation
2. Infer what sort of position a player is playing
3. Find out about certain players and their roles in the team
4. Discover the football team style of play and its tactics
5. Compared with two different network graphs for two teams to analyze
tactics
differences between two teams
6. Also applicable to many other suitable domains
26. Limitations:
1. Difficult to retrieve and extract relevant data
2. The data from the graph is just theoretical
3. Every match is different. Many uncontrollable factors can also affect the
final
result
4. Players may change
5. ……..
27. ASSUMPTIONS
• From graph analysis
insufficient data
exist substitute
impossible 100% successful passes
• From circumstances
weather effects
home and away
• From players
sports status
• From coaches
change tactics
28. LESSONS LEARNT AND CONCLUSIONS
• Retrieving data is difficult
• Gephi is a powerful network tool
• Visualisation is an important part of analysis
• Graphs provide a very interesting way to visualise sports team
cohesion
29. TEAM CONTRIBUTION
• Rob – Design and implement gephi graphs and pagerank algorithms to draw useful conclusions from them
• Pratik-Statistical analysis & visualisation approach,maintaining dropbox
• Yogesh-Statistical analysis & visualisation approach, how to use gephi,presentation speaker.
• Sean- bring together presentation, review work, research key areas, provide insight into domain area and areas for future
development.
• Chen- Key limitations and analysis, limitations (conclusion)
• Michael- Research into domain area, presentation speaker and detailed analysis of football games and domain.
Rob:
Objective: using Gephi and network analysis to find what similarities and differences can be found(inferred) between the tactics employed by Arsenal and West Ham in their Premier league game where surprisingly Arsenal lost to West Ham 2-0.
Rob: [Talk around the key themes and ideas we had.]
Why we went for the football domain.
What were the source of ideas: being realistic about retrieval but also choosing an area of interest which would work.
The ideas were ours thinking about what could be an interesting research domain. We chose football because we wanted to use a domain which had a real world application
Rob and Michael : Read slide
Also explain defensive tactics but also lobbing and triangles (good team cohesion)
Rob & Michael: embellish upon the slides
Rob focus on the fourfourtwo site: maybe have the url ready to show?
Michael: a visual demonstration is important.
Yogesh:
Read off your paper, adding extra content where appropriate.
Yogesh Shinde
We firstly did defining the objective and selection of the match
Then we did the research and collect the match ball possession detail through websites like fourfourtwo.com.
Yogesh Shinde
The number of edges represents here how many passes are made between players and the passes between the players shows up that how the players are connected to each other. By above number of edges analysis arsenal team has slightly greater passes than west ham.
West ham team uses better passes than arsenal team, the average degree of arsenal team is much lesser than average degree of west ham team.
West ham team applies long passing or pass with loop strategies because the graph diameter and radius of west ham team measured by graph is larger than arsenal team.
Yogesh Shinde
In graph theory vertex degree means how many edges are connected to the node. In our dataset vertex degree represents how many passes are made between two players.
Yogesh Shinde
If the player of the team has high In degree and out degree value then we can say that this player might be playing at midfield of the team and defining the team play of style.
And these midfield player are centrality of team that is these players are having more important value in defining team strategy or tactics.
Yogesh:
Once again explain and embellish on notes.
Yogesh: explain your section further in terms of graph theory.
In graph theory connectivity indicates whether all nodes in a network can be reached from any other node.
If the graph is strongly connected(directed path) then it represents high possession value of team.
If the graph is weakly connected(undirected path) then it represents low possession value of team.
In our dataset by the graph analysis west ham team has slightly greater possession value than arsenal team.
Rob: add to what Yogesh says in terms of pagerank and things which he does not mention.
Using page rank to size the nodes and colouring them by their respective communities, we can try and gain an insight into the different passing and game styles of Arsenal and West Ham.
What can we see?
As we expect Arsenal appear to play the ‘beautiful game’. Their most important passers are their central midfielders including Aaron Ramsey and Mesut Ozil. Using the average formation as the layout and looking at the communities found we find that the communities correspond to the distance between players on the pitch, leading us to believe that Arsenal play with a short passing style. From out analysis with could also conclude that Arsenal play with a narrow style and don’t like to use a lot of width.
West Ham have many similarities to Arsenal when using page rank and average formation as a layout. However of particular interest is the community found containing the goalkeeper and the striker, from this would could conclude that West Ham are more prone to playing the ‘long ball game’ certainly more so than Arsenal. The distribution of the page rank scores also differ some what, West Ham are less dependent on their central midfielder and their fullbacks and central defenders see more of the passing of the ball. Perhaps this means we can conclude that West Ham are more prone to a defensive long ball game, whilst also likely to include more width by getting their fullbacks more involved than Arsenal do.
This analysis to be useful must be combined with expert analyst watching the game in question to confirm or rebuke any claims inferred from the analysis.
Yogesh Shinde
Michael introduces:
Explain how Rob will talk about pagerank.
Micheal:
Analysis of independent nodes and their significance in the network
Essentially, Centrality defines how influential or significant any node can be within a network.
Typically these are the basic facts that centrality will tell us,-
The characteristics of an important node.
How many other nodes a particular node is connected to (i.e. he degree of connectivity).
How often a node appears in a particular path or trail.
How fatal the consequences can be if an important node is eliminated.
Degree Centrality (degree of connectivity)
This metric informs us how many other nodes connect to a particular node. Whether an in-ward connection or outward connection. Nodes with multiple connections have got a high degree of connectivity.
Limitation
- Elimination of Some nodes with a high degree of centrality doesn’t necessarily lead to a network partition, therefore such nodes might not be as significant as the metric proposes.
e.g A player like debuchy (Arsenal’s right back) has got a high degree of centrality but, if taken out of the network, this doesn’t cause a massive network partition, because he fails to pass to three distinct players in the game as opposed to the other players with a relatively similar degree of centrality.
Often, this metric is not utilized when determining a node’s significance.
Michael:Closeness centrality
Is the reciprocal of the average shortest path length of a node. Therefore closeness centrality is indirectly proportional to the shortest path length of any node.
The more centrally a player is positioned within the network, the higher their closeness centrality, which suggests that they can easily access any other player within the network thus making them key in the teams tactical set-up of playing.
Conclusion.
The applicability of these metrics is very subjective in network analysis and therefore, they’re interchangeably used depending on the context of a given study.
Micheal:
Quantifies the number of times a node acts as a bridge along the shortest path of any other pair of nodes.
Players that have got a high probability of appearing along any shortest path have a high betweeness centrality measure.
Ideally, Centrally positioned players would be expected to have a high Betweenness centrality i.e. midfielders – Ramsey, Ozil, Francis and Cazorla. However, this analysis identifies a left-back amongst players having a high value of this metric, which signals that particular players relevance within the network.
As we would expect, Players at the extreme ends of the network would have a low value of this metric thus not as relelvant as the other players.
Rob:
What does this mean for our graphs?
So lobbing tactics etc.
Rob:
Adjacency matrix for Arsenal.
Page rank uses this by converting to the probabilities of where a ‘random surfer’ will end up after one step.
The value mij in row i and column j has value n/k if page j has k arcs out, and n of them are to page i.
Otherwise, mij = 0.
Note, the columns should always sum to 1.
We don’t have any loops because it is assumed no player passes to themselves.
It has to overcome problems such as it needs the web to be a strongly connected graph (Which it is not).
Rob again:
A page with no out links. - Surfers reaching a dead end ‘disappear’ after enough iterations of the algorithm and no page connected to dead end can have any page rank at all. To address this we add outgoing links to any dead ends to all other pages in the web, with an equal weighting of probability.
Spider traps – Groups of pages that all have out links but never link to any other pages. (Think of a community with only one in link and no out links). If the surfer enters this community he will spend the every subsequent iteration of the algorithm in this community. As a result they ‘suck’ all of the page rank value into their pages. To overcome this we introduce the notion of randomly ‘teleporting’ with a small probability each iteration.
Page rank formula uses a constant vector of probability.
Rob and Yogesh:
Formation graph
In our dataset team players represents vertices and ball passes between player represents edges.
Players with high weighted in team represented by large size node. The player in midfield of team controlling the team game and has high centrality or having more connectivity with other players shows with large size node.
Larger or thicker edge represents too many passes are done between the players.
Communities (defensive, midfield and offensive)
In our dataset we define three communities and those are defensive, midfield and offensive.
Players in defensive community if they got the ball they only do is that they clears the ball or they passes to the safe standing player.
Players in midfield communities plays important role in game because they get the ball most of the time and the players in this community will define the play of the game by passing ball to offensive or defensive player.
Players in offensive community are the players playing front side of the team if they get the ball they do pass to shot taking player or they take shot to convert it into goal chances.
This layout uses the average formation over the match the position the nodes (As it makes most sense)
Nodes are coloured by communities (Lots of links between nodes in same community, and few links between communities )
Nodes are sized by their page rank value (Indicating how ‘important’ a player is in regards to the passing style of the team)
Yogesh or Rob:
Table and explain analysis conducted.
Chen:
Read from prepared notes to detail the limitations and benefits of our solution
Chen:
Explain the assumptions we have made
Rob concludes why this is really useful as well as what we have learned.
Did we manage to infer anything useful.
Mention contributions if he asks. Everyone worked hard though.