Using spectral clustering and hierarchical clustering techniques, the author analyzes match data from DOTA 2 to identify successful combinations of 5 playable heroes. DOTA 2 is a multiplayer online game where two teams of 5 heroes battle each other. Picking an effective team with complementary abilities contributes to success. The author samples over 2.7 million matches and constructs a weighted adjacency matrix to represent similarities between hero pairs based on their win rate playing together. Spectral clustering is applied to partition the graph into clusters of heroes expected to form strong teams. Hierarchical clustering is then used to identify specific combinations of 5 heroes within the clusters. The results are evaluated and the method is improved by exponentiating the adjacency matrix to emphasize larger weights. Overall, the document
K-means clustering is an unsupervised machine learning algorithm that groups unlabeled data points into a specified number of clusters (k) based on their similarity. It works by assigning data points to the cluster with the nearest mean (centroid). The algorithm iterates between assigning points to clusters and calculating new centroids until the clusters stabilize. K-means clustering can be used for market segmentation, document classification, and opening locations like delivery centers, hospitals, or de-addiction centers based on analyzing crime or demand data.
This document summarizes the key findings of a research paper on the frequency of convergent games under best-response dynamics. The paper shows that:
1) The frequency of randomly generated games with a unique pure strategy Nash equilibrium goes to zero as the number of players or strategies increases.
2) Convergent games with fewer pure strategy Nash equilibria are more common than those with more equilibria.
3) For 2-player games with less than 10 strategies, games with a unique equilibrium are most common, but games with multiple equilibria are more likely for more than 10 strategies.
The document describes research into developing AI techniques to play Match-3 puzzle games like Candy Crush. It discusses:
1) Defining the problem as minimizing the number of moves needed to reach a target score in a Candy Crush-style game called GemGem.
2) Proposing three algorithms - a baseline greedy search, a heuristic greedy search, and a limited breadth-first search - to solve this problem.
3) Comparing the performance of these algorithms to human performance in GemGem, finding the heuristic techniques provide significant improvements over humans.
Interplay between social influence and competitive strategical games in multi...Kolja Kleineberg
The document discusses the interplay between social influence and competitive strategic games on multiplex networks. It shows that an opinion dynamics model with pro-cooperation bias can transform a prisoner's dilemma game into a snowdrift game. Considering multiplex topology is important, as correlations between network layers can have an even bigger impact on cooperation than individual layer topologies alone. When similarity correlations are present between layers, cooperative clusters can form across both layers through self-organization.
The document discusses using machine learning models to predict point totals in NBA games in order to inform sports betting. It explores using collaborative filtering, neural networks, and LSTMs to predict the combined score of both teams. The best models were able to achieve results similar to sportsbooks, correctly predicting the outcome 51.5% of the time based on the mean squared error between the model predictions and actual scores. Feature engineering included team performance statistics from previous games as well as player and opponent data.
This document describes a social dilemma between two individuals, Robert and Stuart, who must decide how much effort to contribute to a joint project. It presents a game theory model to analyze their strategic situation. The Nash equilibrium is identified as both choosing 1 unit of effort, but their rational self-interest leads them to contribute less than the overall optimal outcome of both choosing the highest effort level. This demonstrates how independent actions in social dilemmas can result in suboptimal collective outcomes.
RATIONAL SECRET SHARING OVER AN ASYNCHRONOUS BROADCAST CHANNEL WITH INFORMATI...IJNSA Journal
We consider the problem of rational secret sharing introduced by Halpern and Teague [1], where the players involved in secret sharing play only if it is to their advantage. This can be characterized in the form of preferences. Players would prefer to get the secret than to not get it and secondly with lesser preference, they would like as few other players to get the secret as possible. Several positive results have already been published to efficiently solve the problem of rational secret sharing but only a handful of papers have touched upon the use of an asynchronous broadcast channel. [2] used cryptographic primitives, [3] used an interactive dealer, and [4] used an honest minority of players in order to handle an asynchronous broadcast channel.
Lecture slides on Mechanism Design, which are entirely based on the following well-known survey article.
Jackson, M. O. (2014). Mechanism theory.
http://papers.ssrn.com/sol3/Papers.cfm?abstract_id=2542983
The below is a link to my corse website:
https://sites.google.com/site/yosukeyasuda2/home/lecture/optimization15
K-means clustering is an unsupervised machine learning algorithm that groups unlabeled data points into a specified number of clusters (k) based on their similarity. It works by assigning data points to the cluster with the nearest mean (centroid). The algorithm iterates between assigning points to clusters and calculating new centroids until the clusters stabilize. K-means clustering can be used for market segmentation, document classification, and opening locations like delivery centers, hospitals, or de-addiction centers based on analyzing crime or demand data.
This document summarizes the key findings of a research paper on the frequency of convergent games under best-response dynamics. The paper shows that:
1) The frequency of randomly generated games with a unique pure strategy Nash equilibrium goes to zero as the number of players or strategies increases.
2) Convergent games with fewer pure strategy Nash equilibria are more common than those with more equilibria.
3) For 2-player games with less than 10 strategies, games with a unique equilibrium are most common, but games with multiple equilibria are more likely for more than 10 strategies.
The document describes research into developing AI techniques to play Match-3 puzzle games like Candy Crush. It discusses:
1) Defining the problem as minimizing the number of moves needed to reach a target score in a Candy Crush-style game called GemGem.
2) Proposing three algorithms - a baseline greedy search, a heuristic greedy search, and a limited breadth-first search - to solve this problem.
3) Comparing the performance of these algorithms to human performance in GemGem, finding the heuristic techniques provide significant improvements over humans.
Interplay between social influence and competitive strategical games in multi...Kolja Kleineberg
The document discusses the interplay between social influence and competitive strategic games on multiplex networks. It shows that an opinion dynamics model with pro-cooperation bias can transform a prisoner's dilemma game into a snowdrift game. Considering multiplex topology is important, as correlations between network layers can have an even bigger impact on cooperation than individual layer topologies alone. When similarity correlations are present between layers, cooperative clusters can form across both layers through self-organization.
The document discusses using machine learning models to predict point totals in NBA games in order to inform sports betting. It explores using collaborative filtering, neural networks, and LSTMs to predict the combined score of both teams. The best models were able to achieve results similar to sportsbooks, correctly predicting the outcome 51.5% of the time based on the mean squared error between the model predictions and actual scores. Feature engineering included team performance statistics from previous games as well as player and opponent data.
This document describes a social dilemma between two individuals, Robert and Stuart, who must decide how much effort to contribute to a joint project. It presents a game theory model to analyze their strategic situation. The Nash equilibrium is identified as both choosing 1 unit of effort, but their rational self-interest leads them to contribute less than the overall optimal outcome of both choosing the highest effort level. This demonstrates how independent actions in social dilemmas can result in suboptimal collective outcomes.
RATIONAL SECRET SHARING OVER AN ASYNCHRONOUS BROADCAST CHANNEL WITH INFORMATI...IJNSA Journal
We consider the problem of rational secret sharing introduced by Halpern and Teague [1], where the players involved in secret sharing play only if it is to their advantage. This can be characterized in the form of preferences. Players would prefer to get the secret than to not get it and secondly with lesser preference, they would like as few other players to get the secret as possible. Several positive results have already been published to efficiently solve the problem of rational secret sharing but only a handful of papers have touched upon the use of an asynchronous broadcast channel. [2] used cryptographic primitives, [3] used an interactive dealer, and [4] used an honest minority of players in order to handle an asynchronous broadcast channel.
Lecture slides on Mechanism Design, which are entirely based on the following well-known survey article.
Jackson, M. O. (2014). Mechanism theory.
http://papers.ssrn.com/sol3/Papers.cfm?abstract_id=2542983
The below is a link to my corse website:
https://sites.google.com/site/yosukeyasuda2/home/lecture/optimization15
Spatial patterns in evolutionary games on scale-free networks and multiplexesKolja Kleineberg
The document discusses evolutionary games on scale-free networks and multiplexes. It finds that cooperation can be sustained in metric clusters that form on scale-free networks. These metric clusters shield cooperators from surrounding defectors similar to spatial selection. The survival of metric clusters is favored when the network is less heterogeneous, has a higher clustering coefficient, and the clusters are larger. Similar clusters are also found for different games played on correlated multiplex networks.
The document discusses a study that found cooperating with others activates reward centers in the brain. Researchers used brain imaging to study women playing a game where they could choose cooperation or not. Surprisingly, the women experienced the most pleasure when both chose cooperation over acting selfishly. The longer they cooperated, the stronger the brain's reward response became. This suggests humans are wired to experience joy from cooperation with others.
This document proposes modifications to Pawlak's conflict theory model based on graph theory. It suggests developing the conflict analysis system to predict how the opinions of neutral agents may change over time. The approach involves:
1) Creating matrices to represent direct conflicts, alliances, and neutral relationships between agents.
2) Computing higher power matrices through multiplication to represent indirect relationships over increasing path lengths.
3) Weighting the matrices based on path length and summing values to predict if neutral relationships may become conflicts or alliances based on direct and indirect influences.
4) Optionally performing logical OR operations on conflict matrices to identify any direct or indirect conflicts between agents.
The document discusses key concepts for quantifying and modeling social networks. It covers the following network properties:
1. Degree distribution - The distribution of the number of connections for each node. Real-world networks often have skewed degree distributions.
2. Path length and diameter - The shortest and longest distances between node pairs, averaged over all pairs. Real-world networks tend to have small path lengths.
3. Clustering coefficient - The likelihood that two neighbors of a node are also neighbors, quantifying local clustering. Social networks exhibit high clustering.
4. Connected components - The size of the largest subset of nodes that are all reachable from each other by paths. Real-world networks often have
Quantum persistent k cores for community detectionColleen Farrelly
PPT overview of paper accepted for 2019 Southeastern International Conference on Combinatorics, Graph Theory & Computing. Details a persistence approach to community detection and a new quantum persistence-based algorithm based on the coloring problem.
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
The document proposes using degree sequences and the degree sequence bound to provide an upper bound on query cardinality during query optimization. It defines degree sequences as the sorted list of value frequencies within each relation. It then proves that the degree sequence bound, which uses the degree sequences to construct a worst-case tensor representation of each relation, provides a tight upper bound on query output size that dominates previous bounds. The document also proposes a more efficient functional degree sequence bound that compresses the degree sequences into piecewise constant functions.
We propose a game where two players take turns assigning precincts to districts. In a simplified setting where districts have no geographic constraints, both players have a strategy that allows them to win a number of districts proportional to their number of voters. For the game in real maps (with geographic constraints) we are developing a player based on neural networks and reinforcement learning that aims to learn how to optimally play this game through self-play (inspired by AlphaZero). As in other simulations-based gerrymandering research, the difficulty in this approach is the size of the problem. In fact, we show that the problem of deciding whether
there exists a 'fair map' in the set of 'legal maps' (for appropriate simple definitions of 'legal' and 'fair') is actually NP-complete.
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...csandit
The high-level contributions of this paper are as f
ollows: We modify an existing branch-and-
bound based exact algorithm (for maximum clique siz
e of an entire graph) to determine the
maximal clique size that the individual vertices in
the graph are part of. We then run this
algorithm on six real-world network graphs (ranging
from random networks to scale-free
networks) and analyze the distribution of the maxim
al clique size of the vertices in these graphs.
We observe five of the six real-world network graph
s to exhibit a Poisson-style distribution for
the maximal clique size of the vertices. We analyze
the correlation between the maximal clique
size and the clustering coefficient of the vertices
, and find these two metrics to be poorly
correlated for the real-world network graphs. Final
ly, we analyze the Assortativity index of the
vertices of the real-world network graphs and obser
ve the graphs to exhibit positive
assortativity with respect to maximal clique size a
nd negative assortativity with respect to node
degree; nevertheless, we observe the Assortativity
index of the real-world network graphs with
respect to both the maximal clique size and node de
gree to increase with decrease in the
spectral radius ratio for node degree, indicating a
positive correlation between the maximal
clique size and node degree.
Towards controlling evolutionary dynamics through network geometry: some very...Kolja Kleineberg
The document discusses how network geometry can control evolutionary dynamics through the formation of cooperating clusters. It presents examples showing how the placement of initial cooperators in metric space clusters versus randomly can influence whether cooperation emerges in evolutionary games and navigation processes on networks. The author suggests that network geometry may allow active control of evolutionary dynamics by strategically placing control agents based on the underlying geometry.
A NEW GENERALIZATION OF EDGE OVERLAP TO WEIGHTED NETWORKSgerogepatton
Finding the strength of an edge in a network has always been a big demand. In the context of social networks, it allows to estimate the relationship strength between users. The best-known method to compute edge strength is the Neighbourhood Overlap. It computes the ratio of common neighbours to all neighbours of an edge terminal nodes. This method has been initially proposed for unweighted networks and later extended for weighted ones. These two versions of the method are not mathematically equivalent: In fact, an unweighted network is commonly considered as weighted with all edge weights equal to one. Using both existent versions of Neighbourhood Overlap on such network produce completely different values. In this paper, we tackle this problem and propose a new generalization for Neighbourhood Overlap that works equally for unweighted and weighted networks. Experiment performed on networks with various parameters showed similar performance of our measure to the existing measures.
A NEW GENERALIZATION OF EDGE OVERLAP TO WEIGHTED NETWORKSijaia
Finding the strength of an edge in a network has always been a big demand. In the context of social networks, it allows to estimate the relationship strength between users. The best-known method to compute edge strength is the Neighbourhood Overlap. It computes the ratio of common neighbours to all neighbours of an edge terminal nodes. This method has been initially proposed for unweighted networks and later extended for weighted ones. These two versions of the method are not mathematically equivalent: In fact, an unweighted network is commonly considered as weighted with all edge weights equal to one. Using both existent versions of Neighbourhood Overlap on such network produce completely different values. In this paper, we tackle this problem and propose a new generalization for Neighbourhood Overlap that works equally for unweighted and weighted networks. Experiment performed on networks with various parameters showed similar performance of our measure to the existing measures.
Given a graph G=(V,E), two subsets S_1 and S_2 of the vertex set V are homometric, if their distance multi sets are equal. The homometric number h(G) of a graph G is the largest integer k such that there exist two disjoint homometric subsets of cardinality k. We find lower bounds for the homometric number of the Mycielskian of a graph and the join and the lexicographic product of two graphs. We also obtain the homometric number of the double graph of a graph, the cartesian product of any graph with K_2 and the complete bipartite graph. We also introduce a new concept called weak homometric number and find weak homometric number of some graphs.
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...Daniel Katz
This document provides a summary of Stanley Milgram's small world experiment and discussion of complex network models. It discusses how Milgram found that the average path length between individuals in society is around 6 degrees of separation. Later work by Watts and Strogatz showed that networks with a small amount of randomness can display both clustering and small world properties. Degree distributions and other network measures like clustering coefficients and connected components are discussed. Preferential attachment models that generate power law degree distributions are presented.
Social networks are not new, even though websites like Facebook and Twitter might make you want to believe they are; and trust me- I’m not talking about Myspace! Social networks are extremely interesting models for human behavior, whose study dates back to the early twentieth century. However, because of those websites, data scientists have access to much more data than the anthropologists who studied the networks of tribes!
Because networks take a relationship-centered view of the world, the data structures that we will analyze model real world behaviors and community. Through a suite of algorithms derived from mathematical Graph theory we are able to compute and predict behavior of individuals and communities through these types of analyses. Clearly this has a number of practical applications from recommendation to law enforcement to election prediction, and more.
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
Deep learning is becoming ubiquitous in Machine Learning (ML) research, and it's also finding its place in industry-related applications. Specifically, deep generative models have proven incredibly useful at generating and remixing realistic content from scratch, making themselves a very appealing technology in the field of AI-enhanced content authoring. As part of this year's Machine Learning Tutorial at the Game Developers Conference 2019 (GDC), Jorge Del Val from SEED will cover in an accessible manner the fundamentals of deep generative modeling, including some common algorithms and architectures. He will also discuss applications to game development and explore some recent advances in the field.
The attendee will gain basic understanding of the fundamentals of generative models and how to implement them. Also, attendees will grasp potential applications in the field of game development to inspire their work and companies. This talk does not require a mathematical or machine learning background, although previous knowledge on either of those is beneficial.
The document discusses classification algorithms in machine learning. It introduces classification problems using the Iris flower dataset as an example, which contains measurements of Iris flowers to classify them into three species. It then discusses two classic classification algorithms - logistic regression and Gaussian discriminant analysis. Logistic regression uses a sigmoid function to generate predictions, while Gaussian discriminant analysis assumes a Gaussian distribution of the data. The document also demonstrates an application of these algorithms to classify handwritten digits.
Profit Maximization via Energy Consumption Reduction in Cooperative Resource...Matteo Sereno
The document discusses using cooperative game theory to maximize profit for mobile network providers through energy consumption reduction. It proposes a framework where network providers form coalitions and one provider's network remains active while others switch off their networks, reducing overall energy costs. The value of each coalition is calculated based on revenue, costs, and coalition formation costs. Stable and fair coalitions are desirable. Simulation results show the approach can significantly increase individual network providers' profits compared to acting alone.
The document discusses using linear regression to model Australian Football League (AFL) match attendance data. It summarizes:
- A linear regression model is created with combined team membership as the predictor of MCG attendance, finding an r-squared value of 0.88, meaning membership accounts for 88% of the variance in attendance.
- Adding an indicator for whether the away team is interstate improves the model fit, raising r-squared to 0.92.
- The t-values indicate both membership and the interstate indicator significantly contribute to predicting attendance.
- The models find a positive relationship between membership and attendance, and a negative relationship between interstate status and attendance.
More Related Content
Similar to Using Spectral Clustering to find Successful Hero Combinations in DOTA 2
Spatial patterns in evolutionary games on scale-free networks and multiplexesKolja Kleineberg
The document discusses evolutionary games on scale-free networks and multiplexes. It finds that cooperation can be sustained in metric clusters that form on scale-free networks. These metric clusters shield cooperators from surrounding defectors similar to spatial selection. The survival of metric clusters is favored when the network is less heterogeneous, has a higher clustering coefficient, and the clusters are larger. Similar clusters are also found for different games played on correlated multiplex networks.
The document discusses a study that found cooperating with others activates reward centers in the brain. Researchers used brain imaging to study women playing a game where they could choose cooperation or not. Surprisingly, the women experienced the most pleasure when both chose cooperation over acting selfishly. The longer they cooperated, the stronger the brain's reward response became. This suggests humans are wired to experience joy from cooperation with others.
This document proposes modifications to Pawlak's conflict theory model based on graph theory. It suggests developing the conflict analysis system to predict how the opinions of neutral agents may change over time. The approach involves:
1) Creating matrices to represent direct conflicts, alliances, and neutral relationships between agents.
2) Computing higher power matrices through multiplication to represent indirect relationships over increasing path lengths.
3) Weighting the matrices based on path length and summing values to predict if neutral relationships may become conflicts or alliances based on direct and indirect influences.
4) Optionally performing logical OR operations on conflict matrices to identify any direct or indirect conflicts between agents.
The document discusses key concepts for quantifying and modeling social networks. It covers the following network properties:
1. Degree distribution - The distribution of the number of connections for each node. Real-world networks often have skewed degree distributions.
2. Path length and diameter - The shortest and longest distances between node pairs, averaged over all pairs. Real-world networks tend to have small path lengths.
3. Clustering coefficient - The likelihood that two neighbors of a node are also neighbors, quantifying local clustering. Social networks exhibit high clustering.
4. Connected components - The size of the largest subset of nodes that are all reachable from each other by paths. Real-world networks often have
Quantum persistent k cores for community detectionColleen Farrelly
PPT overview of paper accepted for 2019 Southeastern International Conference on Combinatorics, Graph Theory & Computing. Details a persistence approach to community detection and a new quantum persistence-based algorithm based on the coloring problem.
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
The document proposes using degree sequences and the degree sequence bound to provide an upper bound on query cardinality during query optimization. It defines degree sequences as the sorted list of value frequencies within each relation. It then proves that the degree sequence bound, which uses the degree sequences to construct a worst-case tensor representation of each relation, provides a tight upper bound on query output size that dominates previous bounds. The document also proposes a more efficient functional degree sequence bound that compresses the degree sequences into piecewise constant functions.
We propose a game where two players take turns assigning precincts to districts. In a simplified setting where districts have no geographic constraints, both players have a strategy that allows them to win a number of districts proportional to their number of voters. For the game in real maps (with geographic constraints) we are developing a player based on neural networks and reinforcement learning that aims to learn how to optimally play this game through self-play (inspired by AlphaZero). As in other simulations-based gerrymandering research, the difficulty in this approach is the size of the problem. In fact, we show that the problem of deciding whether
there exists a 'fair map' in the set of 'legal maps' (for appropriate simple definitions of 'legal' and 'fair') is actually NP-complete.
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...csandit
The high-level contributions of this paper are as f
ollows: We modify an existing branch-and-
bound based exact algorithm (for maximum clique siz
e of an entire graph) to determine the
maximal clique size that the individual vertices in
the graph are part of. We then run this
algorithm on six real-world network graphs (ranging
from random networks to scale-free
networks) and analyze the distribution of the maxim
al clique size of the vertices in these graphs.
We observe five of the six real-world network graph
s to exhibit a Poisson-style distribution for
the maximal clique size of the vertices. We analyze
the correlation between the maximal clique
size and the clustering coefficient of the vertices
, and find these two metrics to be poorly
correlated for the real-world network graphs. Final
ly, we analyze the Assortativity index of the
vertices of the real-world network graphs and obser
ve the graphs to exhibit positive
assortativity with respect to maximal clique size a
nd negative assortativity with respect to node
degree; nevertheless, we observe the Assortativity
index of the real-world network graphs with
respect to both the maximal clique size and node de
gree to increase with decrease in the
spectral radius ratio for node degree, indicating a
positive correlation between the maximal
clique size and node degree.
Towards controlling evolutionary dynamics through network geometry: some very...Kolja Kleineberg
The document discusses how network geometry can control evolutionary dynamics through the formation of cooperating clusters. It presents examples showing how the placement of initial cooperators in metric space clusters versus randomly can influence whether cooperation emerges in evolutionary games and navigation processes on networks. The author suggests that network geometry may allow active control of evolutionary dynamics by strategically placing control agents based on the underlying geometry.
A NEW GENERALIZATION OF EDGE OVERLAP TO WEIGHTED NETWORKSgerogepatton
Finding the strength of an edge in a network has always been a big demand. In the context of social networks, it allows to estimate the relationship strength between users. The best-known method to compute edge strength is the Neighbourhood Overlap. It computes the ratio of common neighbours to all neighbours of an edge terminal nodes. This method has been initially proposed for unweighted networks and later extended for weighted ones. These two versions of the method are not mathematically equivalent: In fact, an unweighted network is commonly considered as weighted with all edge weights equal to one. Using both existent versions of Neighbourhood Overlap on such network produce completely different values. In this paper, we tackle this problem and propose a new generalization for Neighbourhood Overlap that works equally for unweighted and weighted networks. Experiment performed on networks with various parameters showed similar performance of our measure to the existing measures.
A NEW GENERALIZATION OF EDGE OVERLAP TO WEIGHTED NETWORKSijaia
Finding the strength of an edge in a network has always been a big demand. In the context of social networks, it allows to estimate the relationship strength between users. The best-known method to compute edge strength is the Neighbourhood Overlap. It computes the ratio of common neighbours to all neighbours of an edge terminal nodes. This method has been initially proposed for unweighted networks and later extended for weighted ones. These two versions of the method are not mathematically equivalent: In fact, an unweighted network is commonly considered as weighted with all edge weights equal to one. Using both existent versions of Neighbourhood Overlap on such network produce completely different values. In this paper, we tackle this problem and propose a new generalization for Neighbourhood Overlap that works equally for unweighted and weighted networks. Experiment performed on networks with various parameters showed similar performance of our measure to the existing measures.
Given a graph G=(V,E), two subsets S_1 and S_2 of the vertex set V are homometric, if their distance multi sets are equal. The homometric number h(G) of a graph G is the largest integer k such that there exist two disjoint homometric subsets of cardinality k. We find lower bounds for the homometric number of the Mycielskian of a graph and the join and the lexicographic product of two graphs. We also obtain the homometric number of the double graph of a graph, the cartesian product of any graph with K_2 and the complete bipartite graph. We also introduce a new concept called weak homometric number and find weak homometric number of some graphs.
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...Daniel Katz
This document provides a summary of Stanley Milgram's small world experiment and discussion of complex network models. It discusses how Milgram found that the average path length between individuals in society is around 6 degrees of separation. Later work by Watts and Strogatz showed that networks with a small amount of randomness can display both clustering and small world properties. Degree distributions and other network measures like clustering coefficients and connected components are discussed. Preferential attachment models that generate power law degree distributions are presented.
Social networks are not new, even though websites like Facebook and Twitter might make you want to believe they are; and trust me- I’m not talking about Myspace! Social networks are extremely interesting models for human behavior, whose study dates back to the early twentieth century. However, because of those websites, data scientists have access to much more data than the anthropologists who studied the networks of tribes!
Because networks take a relationship-centered view of the world, the data structures that we will analyze model real world behaviors and community. Through a suite of algorithms derived from mathematical Graph theory we are able to compute and predict behavior of individuals and communities through these types of analyses. Clearly this has a number of practical applications from recommendation to law enforcement to election prediction, and more.
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
Deep learning is becoming ubiquitous in Machine Learning (ML) research, and it's also finding its place in industry-related applications. Specifically, deep generative models have proven incredibly useful at generating and remixing realistic content from scratch, making themselves a very appealing technology in the field of AI-enhanced content authoring. As part of this year's Machine Learning Tutorial at the Game Developers Conference 2019 (GDC), Jorge Del Val from SEED will cover in an accessible manner the fundamentals of deep generative modeling, including some common algorithms and architectures. He will also discuss applications to game development and explore some recent advances in the field.
The attendee will gain basic understanding of the fundamentals of generative models and how to implement them. Also, attendees will grasp potential applications in the field of game development to inspire their work and companies. This talk does not require a mathematical or machine learning background, although previous knowledge on either of those is beneficial.
The document discusses classification algorithms in machine learning. It introduces classification problems using the Iris flower dataset as an example, which contains measurements of Iris flowers to classify them into three species. It then discusses two classic classification algorithms - logistic regression and Gaussian discriminant analysis. Logistic regression uses a sigmoid function to generate predictions, while Gaussian discriminant analysis assumes a Gaussian distribution of the data. The document also demonstrates an application of these algorithms to classify handwritten digits.
Profit Maximization via Energy Consumption Reduction in Cooperative Resource...Matteo Sereno
The document discusses using cooperative game theory to maximize profit for mobile network providers through energy consumption reduction. It proposes a framework where network providers form coalitions and one provider's network remains active while others switch off their networks, reducing overall energy costs. The value of each coalition is calculated based on revenue, costs, and coalition formation costs. Stable and fair coalitions are desirable. Simulation results show the approach can significantly increase individual network providers' profits compared to acting alone.
The document discusses using linear regression to model Australian Football League (AFL) match attendance data. It summarizes:
- A linear regression model is created with combined team membership as the predictor of MCG attendance, finding an r-squared value of 0.88, meaning membership accounts for 88% of the variance in attendance.
- Adding an indicator for whether the away team is interstate improves the model fit, raising r-squared to 0.92.
- The t-values indicate both membership and the interstate indicator significantly contribute to predicting attendance.
- The models find a positive relationship between membership and attendance, and a negative relationship between interstate status and attendance.
Similar to Using Spectral Clustering to find Successful Hero Combinations in DOTA 2 (20)
Using Spectral Clustering to find Successful Hero Combinations in DOTA 2
1. 1
Using Spectral Clustering to Find Successful Hero
Combinations in DOTA 2
Roderick Cox
Faculty of Mathematical Studies, Southampton
November 2015
Communitydetectionisabranch of mathematicsthatis becomingincreasinglyrelevantasthe
processingpowersof computingequipmentgrowsandmeasurabledatasetsbecome larger.
Communitieswithinlarge datasetsare oftenindiscernible withoutuse of communitydetection
algorithms.This reportsetsoutto demonstrate andexplainthe use of spectral clusteringwith
hierarchical clusteringmethods,upondatafroman extensivelyplayedmultiplayeronline game,
DOTA 2. The aimof usingthistechnique istopartitionthe datainorderto identifystrongclustersof
5 playable heroesinthe game whohave a highoverall successrate, due tothe advantage gained
fromtheircomplementaryabilities.
1. Introduction
Communitydetectionisarelativelynewbutincreasinglyrelevantbranchof mathematics. Inaworld
where enormousdatasetsare able to be compiled, analyzingandrespondingtothese datasets
effectivelyisanessential taskforscientists,businessesandpoliticians.The overusedbut
nonethelesssignificantexample of social mediaisa very relevantcase where we cansee large data
setsbeingcompiledandmathematicallyanalysedtoidentifycommunities.Similarlyinscience,
proteingroups,hierarchical communitiesof speciesandinteractionbetweencellsare all poignant
examplesof howcommunitydetectionalgorithmscanbe usedtogain vital knowledge of
communitiesinthe real world.
Thisreportfocusescommunitydetectiontechniquesuponthe Valve’s“DOTA 2”,a multiplayer
online game inwhich2 teamsof 5 choose froma pool of 110 unique playable characters(namedin
the game andin thisreportas “heroes” andcompete againstone another,oftenprofessionally
where large sumsof moneyare at stake. Choosinganeffective teamwhoseabilitiescomplement
each otherisa strategycontributing stronglytowardsateam’schance of success; thisisas much a
communitydetectionproblemasthe aforementionedexamples.The communitiesthatwe wishto
findare clustersof 5 heroeswhose complementaryabilitiesmake themasuccessful team.
Thisreportsets outto use the mathematical techniquesof spectral clusteringtogetherwiththe k-
meansalgorithmtodetermine clustersof heroeswhoshouldbe playedtogether.Fromthese pools
of heroes,throughinspectionof the resultsfromhierarchical clustering,strongcombinationsof 5
heroescanbe identified.Thisreportwillexplainthe intuitionbehindthese methods,aswell
includingaworkedexamplewithasubsetof the sampleddata.Itwill explainthe methodusedto
sample the dataand howit correspondstoa communitydetectionproblem.Itwill assessthe results
of the methodbefore improvinguponthe resultthroughadjustingthe datasetaccordinglyto
enhance the method.Finally,itwillexplainthe resultsincontextanddiscuss alternative strategiesin
solvingthiscommunitydetectionproblem
2. 2
Figure 1: Unweighted graph to show social connections in Zachary Karate Club. This famous problem arose when
disagreements between members caused the club to split. Community detection techniques would allow
mathematicians to use the graph to predict how the members would partition Vertices with high degrees are larger
and darker, representing members with more social links to other members. Edgelist source:Santa Fe Institute
2. PreliminaryTerminologyand Equations
A graph G= (V,E) where V isthe setof vertices(ornodes) {v1,…,vn} andE representsthe setof edges
connectingthe vertices v1…vn. The weightwij of an edge connectingvertices vi,vj represents the
value of the similaritybetweenthose vertices.. If wij = 0 thenthe vertices vi,vj are notconnected.
The degree di of a vertex vi ∈V isthe sum of the weightsof all the edgesconnectedtovi.Therefore
𝑑𝑖 = ∑ 𝑤𝑖𝑗
𝑛
𝑗=1 [1]
The weighted adjacencymatrix of a graph is the matrix
𝑊 = (𝑤𝑖𝑗)𝑖,𝑗=1,…𝑛 [2]
The degree matrix D is the diagonal matrix withthe degrees di,…,dninitsdiagonal entries.
𝐷 = (
𝑑1 ⋯ 0
⋮ ⋱ ⋮
0 ⋯ 𝑑 𝑛
) ` [3]
A cut of a subgraph 𝐶 of a graph 𝐺 is definedas
𝑐𝑢𝑡( 𝐴) = ∑ 𝑤𝑖𝑗𝑖∈𝐴,𝑗∉A [4]
The conductance Φ(C) of the subgraphC of a graph G isthe connectivityof agroup to the whole
network relative tothe groupdensity,andisdefinedas
𝛷( 𝐶) =
𝑐(𝐶,𝐺𝐶)
min(𝑘 𝑐,𝑘 𝐺𝐶)
[5]
Where 𝑐(𝐶, 𝐺𝐶) is the cut issize of 𝐶 and 𝑘 𝑐,𝑘 𝐺𝐶 are the total degreesof 𝐶 andthe restof the
graph𝐺𝐶.
A subsetA C V of a graphis connectedif any twoverticesin A can be joinedinapath suchthat all
intermediate pointsalsolie in A
The nonemptysets A1.. ..Ak forma partitionof the graph if Ai ∩ Aj= ∅ and A1 ∪ . . . ∪ Ak = V
For the size of a subset 𝐴 ⊂ 𝑉
| 𝐴| = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑒𝑟𝑡𝑖𝑐𝑒𝑠 𝑖𝑛 𝐴 [6]
𝑣𝑜𝑙( 𝐴) = ∑ 𝑑𝑖𝑖∈A [7]
3. 3
3. Context:DOTA 2 and CommunityDetection
3.1: Brief overviewofDOTA 2
DOTA 2 (Defence of the Ancients2) iscompetitive multiplayerRPG(Role PlayingGame) developedby
Valve andsequel toa popularmodificationof Blizzard’s“WarcraftIII”.The game isplayedcasuallyby
millionsworldwide,butisalsoknownforitsprofessional scene.The prize pool asof today [19/11/15]
stands at $18,429,613, where the 1st place team of 5 stands to win $6,634,661.
2 teams of 5 players are pitted against one another where each player takes control of 1 of 110
playable characters;namedinthe gameandinthisreportas“heroes”.The twoteamsbeginthe game
on opposite sidesof alarge arena,insidetheirrespective bases.The aimof the game istodestroythe
opponents “Ancient”; a heavily defended structure within each team’s base
Progresscanbe made inthe game by gainingexperience andgold(the game’scurrency) bykilling the
opponent’s heroes,who respawn after being killed in the opponent’s base after a period of time.
Similarprogressismadebydestroyingenemystructuresprotectingthe opponent’sancient.The game
followsa“richgetricher”dynamic; gainingexperiencecausesindividual heroesto“levelup”,resulting
in strongerheroeswithmore abilitiesandversatility,whoare in turn, harder to kill and more useful
to their team.
The intricaciesof the gameplayare unimportant.Essentiallythe game comprisesof 5 unique heroes
withdifferingabilitiesandstrengthscompetingagainstanother5.Pickinga stronghero combination
is a factor which will contribute to a team’s success.
3.2: Interpretationsof strong hero combination
Simply choosing heroes which average a high ratio of kills to deaths per game is not an effective
strategy. Many heroes, often named “carry heroes” rely on others, sometimes deemed “support
heroes”tosetupkillsforthem,ortopushagainstenemystructureseffectively.Choosingheroeswith
the highest individual success rate is similarly not ideal, as even their individual success could be
improved upon or worsened by the heroes with which they co-operate. Some heroes are known to
work well together, where their abilities complement each-others and yield greater success.
Chaos_KnightandWisp forinstance,were deemed“bestfriends”byofficials inaninternational match
due to the frequency with which they are found together in the arena.
3.3: Correspondence toCommunity Detection
The aim of the project detailed by this report is to find a community of 5 heroes which would be
deemed a strong, complementary team. We can define the notion of similarity in context as the
successrate betweentwoheroesie.The rate of successwhere twoheroesplayonthe same team.In
thiswaythe problembecomesacommunitydetectionproblem.Vertices representheroes,the weight
wij of the edges represent the similarity between them.
We therefore have,forheroes ℎ 𝑖,ℎ 𝑗
𝑠𝑖𝑗 =
𝑟
𝑛
[8]
Where 𝑟 is the numberof successful matcheswhere ℎ 𝑖 plays with ℎ𝑗 , and 𝑛 isthe numberof
matcheswhere ℎ 𝑖 andℎ 𝑗 playtogether
For 110 playable heroes, a110x110 adjacencymatrix 𝑊 can be formed.Spectral clusteringtogether
witha suitable algorithmsuchask-meanscanthenbe appliedtopartitionthe graphintoclusters
withstrongsimilarities.
4. 4
4. SamplingProcess to IdentifyOptimal Groups
4.1: Source
Match informationwassampledfromthe site Dotabuff.comwhere resultsfromDOTA 2 matchesare
automaticallyuploadedandupdated.
4.2: Data Sampled
Match information from the last 2,764,964 matches of players in the “high skill” category was
analyzed. Only the “high skill” category was considered as more experienced players will better
understandthe needtoutilize strongherocombinations,thereforethe resultsare more indicativeof
the strength of the combinations, rather than solely the players’ skill.
For all heroes ℎ 𝑖,ℎ 𝑗,𝑖, 𝑗𝜖(1, …110)the ratio of matcheswonto matchesplayedbetweentwo
heroesℎ 𝑖,ℎ𝑗onthe same team wasconstructed.
The meannumberof matchesplayedbetweenheroes ℎ 𝑖,ℎ 𝑗, 𝑖, 𝑗𝜖(1,… 110) wasfoundtobe
461.211676
4.3: Conversionto an adjacencymatrix
The weight 𝑤𝑖𝑗 of the corresponding row 𝑖 and column 𝑗 of the weighted adjacency matrix 𝑊
represents the success ratio between ℎ 𝑖,ℎ 𝑗
For hero combinations resulting in a similarity 𝑠𝑖𝑗 < 0.45 the connection is severed and treated as
𝑤𝑖𝑗 = 0 in the weighted adjacency matrix, as is standard practice when introducing algorithms
designed for sparse matrices.
The corresponding undirected weighted graph 𝐺 = (𝑉, 𝐸) is then formed from this matrix,
4.4: Exponentiationof adjacencymatrix
As previously mentioned, clustering algorithms are generally more effective whenapplied to sparse
graphs. This includes graphs withfewer connections, but in the case of weighted graphs, sparse can
alsoreferto large differencesbetweenthe weights.Inorder to furtheremphasise largerresults,the
values in the weighted adjacency matrix can be exponentiatedsuch that a weight 𝑤𝑖𝑗 is substituted
for 𝑤∗
𝑖𝑗 where
𝑤∗
𝑖𝑗 = {
𝑒 𝛼𝑤 𝑖𝑗 𝑓𝑜𝑟 𝑤𝑖𝑗 ≥ 0.45
0 𝑓𝑜𝑟 𝑤𝑖𝑗 < 0.45
[9]
And 𝛼 isa chosenconstantsuch that the graph becomessufficientlysparse.
At thispointitis appropriate to note thatin section8.2, where the final resultsare re-evaluated,
theyare re-evaluated usinganexponentiatedversionof the weightedadjacencymatrix,choosing
𝛼 = 3.5, whichmakesthe matrix sufficientlysparse.Thisisvisualizedbythe sectionsof the
heatmapsinfigures3 and4.
5. 5
5: CommunityDetectionMethod
Figure 2: The corresponding similaritygraph 𝐺 = (𝑉, 𝐸) of the similaritymatrix. Darker edges correspond to greater weights
in connections. Larger nodes and node labels correspond to greater vertex degrees. Communities are clearly indiscernible,
however vertices such as “omniknight”, “bounty_hunter” or “night_stalker” might be more expected to appear in optimal
clusters due to their strong connections to other vertices.
Figure 3: Sample of heatmap corresponding to original
adjacency matrix. Darker cells indicate higher weights.
Figure 4: Sample of heatmap corresponding to
exponentiatied matrix. We see that the larger weights
represented in the original heatmap (figure 3) are now
darker, indicating that they now contribute a higher
relative weight.
6. 6
5.1: Summary ofSpectral Clustering
Thissummarywill followvonLuxburg’s“A tutorial onSpectral Clustering”andwill be consistentwith
the definitionsstatedthere. Whilstthere isnouniversallyagreedupondefinitionof acommunity,the
approachof spectral clusteringisto maximise thesumof the total in-clusterweightsandminimisethe
sum of the total between cluster weights. The process begins with spectral embedding, which
transformsthe data from an adjacencymatrix into a set of pointsinspace, representedbyelements
of eigenvectors. Thisadvantageousstepprepares thedataforasuitablealgorithmsuchthatclustering
issues are bypassed. A great advantage of spectral clustering is the versatility with which it can be
used, since the user can choose any appropriate method following spectral embedding. This report
uses k-means and hierarchical clustering in conjunction, which proves to be appropriate and
informative for the scale of this dataset.
The general stepstakentoachieve thisare as follows
• Construct 𝑊, the weightedadjacencymatrix of agraph G
• Compute the Laplacian 𝐿 of 𝑊, normalisingif necessary
• Compute the first 𝑘 eigenvectorsof 𝐿
• Define amatrix 𝑈 ∈ ℝ, 𝑛𝑥𝑘 where itscolumnscomprise of the eigenvectors 𝑢1,… 𝑢 𝑘 of 𝐿
• Use a k-meansoranothersuitable clusteringalgorithmtoclusterthe points (𝑦𝑖)𝑖 = 1,.. ., 𝑛
intoclusters 𝐶1,… 𝐶 𝑘
5.2: Explanation of the Method
5.2.1: A min-cutproblem
Thisexplanationwill focusongivingintuitivereasoningbehindspectral clustering methodsfromthe
perspective of amin-cutproblem.Forfull proofsandpropositionsthe readershouldrefertovon
Luxburg’stutorial.
We define 𝑊( 𝐴, 𝐵) asthe sumof the weightsbetweentwosubsets 𝐴,𝐵 ⊂ 𝑉
𝑊( 𝐴, 𝐵) = ∑ 𝑤𝑖𝑗𝑖∈𝐴,𝑗∈𝐵 [10]
Where 𝐴, 𝐵 are setssuch that 𝐴, 𝐵 ⊂ 𝑉 where 𝑉 isthe setof vertices 𝑣1, .. , 𝑣 𝑛
Our aimis to partitionthe graphsuch thatthe sumof the weightsbetweentwosubsetsare
minimised.Thistranslatesintochoosingthe partition 𝐴1,… , 𝐴 𝑘 suchthatthe 𝑐𝑢𝑡( 𝐴1,…, 𝐴 𝑘)is
minimisedwhere
𝑐𝑢𝑡( 𝐴1,…, 𝐴 𝑘) =
1
2
∑ 𝑊(𝐴 𝑖,Ā𝑖
𝑘
𝑖=1 ) [11]
Where Ā𝑖 is the complementof 𝐴 𝑖
In orderto avoidthe degenerate case,wheresimplyone node wouldbe partitioned,extensionsof
the conceptof conductance [5] are introduced.
We define forthe unnormalisedcase:
𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡( 𝐴1,… , 𝐴 𝑘) = ∑
𝑐𝑢𝑡(𝐴1Ā 𝑖),
|Ā 𝑖|
𝑘
𝑖=1 [12]
Where | 𝐴 𝑖| is the numberof verticesin 𝐴 𝑖 [6]
7. 7
Andfor the normalisedcase:
𝑁𝑐𝑢𝑡( 𝐴1,… , 𝐴 𝑘) = ∑
𝑐𝑢𝑡(𝐴1Ā 𝑖),
𝑣𝑜𝑙(𝐴𝑖)
𝑘
𝑖=1 [13]
Where 𝑣𝑜𝑙( 𝐴 𝑖)is the sumof the degreesof all verticesin 𝐴 𝑖 [7]
We see nowthatin orderto minimise 𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡( 𝐴1,…, 𝐴 𝑘)the numberof verticespartitionedmust
be large,similarlytominimise 𝑁𝑐𝑢𝑡( 𝐴1,…, 𝐴 𝑘)the sumof the degreesof the verticespartitioned
mustbe high.
To see thisincontext,considerthe followingsubsetof verticesfromthe DOTA 2 graph
We note that 𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡( 𝐴1, 𝐴2) = 1.505361
where (𝑠𝑡𝑜𝑟𝑚 𝑠𝑝𝑖𝑟𝑖𝑡) ⊂ 𝐴1, (𝑑𝑟𝑜𝑤 𝑟𝑎𝑛𝑔𝑒𝑟, 𝑡𝑢𝑠𝑘, 𝑙𝑖𝑛𝑎, 𝑐ℎ𝑎𝑜𝑠 𝑘𝑛𝑖𝑔ℎ𝑡, 𝑜𝑚𝑛𝑖𝑘𝑛𝑖𝑔ℎ𝑡) ⊂ 𝐴2
While 𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡( 𝐴1,𝐴2) = 1.37791 where
(𝑠𝑡𝑜𝑟𝑚 𝑠𝑝𝑖𝑟𝑖𝑡, 𝑑𝑟𝑜𝑤 𝑟𝑎𝑛𝑔𝑒𝑟, 𝑜𝑚𝑛𝑖𝑘𝑛𝑖𝑔ℎ𝑡) ⊂ 𝐴1, (𝑡𝑢𝑠𝑘, 𝑙𝑖𝑛𝑎, 𝑐ℎ𝑎𝑜𝑠 𝑘𝑛𝑖𝑔ℎ𝑡) ⊂ 𝐴2
Therefore whensolvingthe minimumcutproblem,the degenerate case isnolongerapreferable cut
whichsolvesthe partitioningproblem.
0.657658
chaos_knight
0.58907
storm_spirit
0.478144
0.514286
0.564872
lina
0.5229170.556401
omniknight
0.560976
0.658879
0.512931
drow_ranger
0.71028
0.620758 tusk
0.509949
𝑐1 = 1.505361
𝑐2 = 4.13373
Figure 2
Figure 5: Sample vertex set showing 2 cuts: 𝑐1 = 𝑐𝑢𝑡(𝑠𝑡𝑜𝑟𝑚_𝑠𝑝𝑖𝑟𝑖𝑡), the degenerate case, and 𝑐2 =
𝑐𝑢𝑡(𝑠𝑡𝑜𝑟𝑚_𝑠𝑝𝑖𝑟𝑖𝑡, 𝑑𝑟𝑜𝑤_𝑟𝑎𝑛𝑔𝑒𝑟, 𝑜𝑚𝑛𝑖𝑘𝑛𝑖𝑔ℎ𝑡) which partitions the graph into 2 subsets
8. 8
5.2.2 The Graph Laplacian
The methodof spectral clusteringhasat itscore the graph laplacian 𝐿 definedas
𝐿 = 𝐷 − 𝑊 [14]
Where 𝑊 is the weightedadjacencymatrix [2] and 𝐷 isthe degree matrix [3]
The normalizedgraphlaplacian 𝐿 𝑟𝑤,relatedtoa randomwalkis definedas
𝐿 𝑟𝑤 = 𝐷−1 𝐿 [15]
Where 𝐷 is the degree matrix [3] and 𝐿 is the graph laplacian[14]
The effectof the normalizedgraphlaplacianistoconvertthe minimizationproblemfroma
minimizationof RatioCuttoNcut.
Following von Luxburg’s tutorial, we see that the normalized graph laplacian is preferable to the
unnormalised version for two main reasons.
Firstly,the Ncut minimizationmethodimplements2 clusteringobjectives,notonlyfindingpartitions
whichminimizethe similaritybetweendissimilarclusters,butalsoinmaximizingin-clustersimilarities.
Thisisparticularlyrelevantincontextasthisprojectaimstoidentifyanoptimalcommunitypartitioned
from a graph, which emphasizes the maximization of that community’s in-cluster similarities. The
alternative ofusingaRatioCutunnormalizedmethodmaybemore balancedinfavourof otherclusters
by only considering their dissimilarity. This may result in finding larger numbers of strong clusters
rather than 1 or a small number of exceptionally strong clusters.
Secondly,unnormalisedgraphlaplacian’sare notpreferable duetoissuesof convergence.Essentially,
when adding more and more data points such that for 𝑛 data points, 𝑛 → ∞, both the eigenvalues
and eigenvectors of the normalized graph laplacian converge to some operator U. In contrast, the
eigenvaluesandeigenvectorsforevensmallsamplesizescanhave unreliableresultsdue toproblems
associated with spectral convergence. For further details, see von Luxburg’s tutorial.
We will therefore use anormalizedgraphlaplacian,of the form, 𝐿 𝑟𝑤 = 𝐷−1 𝐿.There isalsothe option
of using the symmetric normalized graph laplacian 𝐿 𝑠𝑦𝑚 = 𝐷−1/2 𝐿𝐷−1/2. However as von-Luxburg
points out, this has no computational advantages and may have undesired effects in the
eigendecomposition by additionally multiplying its eigenvalues with 𝐷1/2.
It can be shownthatby definingthe indicatorvectors
ℎ 𝑖𝑗 = {
1
√𝑣𝑜𝑙(𝐴 𝑗
𝑖𝑓 𝑣𝑗 ∈ 𝐴𝑗
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(𝑖 = 1, …. 𝑛; 𝑗 = 1, … 𝑘) [16]
The solutionof Ncutproblemcanbe expressedasthe firstk generalizedeigenvectorsof 𝐿𝑢 = 𝜆𝐷𝑢
Therefore we can solve the Ncut spectral clustering problem by computing the unnormalized graph
laplaciananddegree matrixof agivenadjacencymatrix andthe first keigenvectorsof the normailised
graph laplacian 𝑙 𝑟𝑤 = 𝐷−1 𝐿
9. 9
5.2.3: Eigendecomposition
Computingthe eigenvalues λ1, …, λ 𝑛 andcorrespondingeigenvectors 𝑢1,…, 𝑢 𝑛 of the normalized
graph laplacianisa problemthatcan be solvedwith mathematical softwaresuchasR or Matlab.
Difficultyarisesinchoosinga 𝑘 to determine the first 𝑘 eigenvectors.
A general rule istouse the eigengapheuristic.The number 𝑘 shouldbe chosensuchthat λ1, …, λ 𝑘
are small relative to λ 𝑘+1.A standard methodwouldbe toconstructa histogramandlookfora
jumpinthe size of the eigenvalues.Intuitively,the eigenvaluesandtheirmultiplicityrepresentthe
connectednessof agraph.For graphswith 𝑘 distinct,unconnectedcomponents, λ1 = λ2 = ⋯ =
λ 𝑘 = 0 and λ 𝑘+1 > 0. Where k componentshave veryfew connections λ1 ≈ λ2 ≈ ⋯ ≈ λ 𝑘 with
λ 𝑘+1 beingnoticeablylargerthan λ 𝑘.
Identifyingandchoosingasuitable kthenallowsforthe constructionof amatrix 𝑈 ∈ ℝ, 𝑛𝑥𝑘 of 𝑘
eigenvectors 𝑢1,… 𝑢 𝑘 of 𝐿,upon whichtoperforma clusteringalgorithm
5.2.4: The k-means step
While k-meansis the most common algorithmfollowingspectral embedding, any suitable clustering
algorithm is equally permissible to use at this stage. Very simply, k-means attempts to configure
clusters of data through an iterative process of centralisation. The process follows these steps
1) Choose the numberof clusters
2) Assigninitial clustercentres(preferably farfromeachothertoincrease the efficiencyof the
process).
3) Attribute eachdatapointto itsclosestcluster(where the distance is definedasthe
minimumvalue of the sumof the weightsof the edgesnecessarytoconnectthe datapoint
to the clustercentre)
4) Re-centre the clusters byassigningthe new clustercentre asthe point whichisthe mean
distance of all data pointscurrentlyattributedtothe cluster.
5) Is the clustera final solution?If yes,givethe solution,if no,repeatsteps(3) and(4)
This simple algorithm will effectively partition data, provided that the eigenvalue decompositionof
the normalised laplacian matrix yielded a pronounced eigengap, implying well definedclusters. As
stated in Von Luxburg’s tutorial “in ambiguous cases it also returns ambiguous results”
K-means gives no guarantee of splitting clusters into desired sizes. Where a certain sized cluster is
required, further knowledge of the relationships between vertices may needed.
Figure 6: First Assignment of Cluster
Centres
Figure 7: Connection of Nodes
followed by Re-Centering of Clusters
Figure 8: Final Solution: Steps 3 and
4 when repeated converge to this
result
10. 10
5.2.5: Additional Hierarchical ClusteringMethod
In orderto gainfurtherknowledgeof the clusteringof verticesandtheirrelationships,hierarchical
clusteringmayalsobe used.
Hierarchical clusteringfollowsasimilarprocesstok-meanswithregardstonotionof minimising
distancesbetweenvertices,withthe keyelementof compilingverticesintoagrouponce theyare
joined. The processfollowsthesesteps
Each level of the groupsize isa hierarchical segmentation, withthe final resultingcluster
representingthe topof the hierarchy.
Due to the ease at whichthe vertex relationshipsandhierarchycanbe visualised,thisprovides
greateruserinsightintothe relationshipsbetweenvertices.
6: WorkedExample with Small VertexSet
Considerthe vertex setof figure 5: The weightedadjacencymatrix 𝑊 forthe subsetof vertices
(𝑣1, …, 𝑣6)𝜖 𝑉 is givenby
Table 1: adjacency matrix of the vertex set in Figure 5
Withcorrespondingdegree matrix 𝐷
drow_ranger storm_spririt lina omniknight chaos_knight Tusk
drow_ranger 2.969917 0 0 0 0 0
storm_spirit 0 1.505361 0 0 0 0
lina 0 0 2.154239 0 0 0
omniknight 0 0 0 3.061362 0 0
chaos_knight 0 0 0 0 2.834907 0
tusk 0 0 0 0 0 2.368656
Table 2: Degree matrix of the vertex set in Figure 5
drow_ranger storm_spririt Lina omniknight chaos_knight Tusk
drow_ranger 0 0.512931 0.564972 0.71028 0.560976 0.620758
storm_spirit 0.512931 0 0 0.478144 0.514286 0
lina 0.564972 0 0 0.556401 0.522917 0.509949
omniknight 0.71028 0.478144 0.556401 0 0.657658 0.658879
chaos_knight 0.560976 0.514286 0.522917 0.657658 0 0.57907
tusk 0.620758 0 0.509949 0.658879 0.57907 0
Figure 9: A simple representation of a cluster dendogram;
the result of hierarchical clustering. Of interest are the
relationships between nodes on several levels, which we
see on several levels as a result of connecting groups
(step 2). Loosely defined clusters (indicated by colour) and
subclusters may also be obtained
1) Start withan initial setof datapoints,
where eachdata pointisassignedtobe
a group made up onlyof itself.
2) Connecteachof the two closestgroups,
where the distance isthe weightof the
edge of minimumweightneededto
connectthe 2 groups. Once connected,
these merge intoasingle group
3) Repeat(2) until all data ismergedintoa
single group
4) Give solution
12. 12
Tables 7,8,9: Pools generated by performing k-means
7: Applicationof TechniquesontoDOTA 2 Data Set
Here we applythe methodswe have discussedtothe full datasetwith
the aim of findingstronggroupsfromwithinpoolsof compatible
heroes. Followingspectral embeddingwe obtainthe following
scatterplotof eigenvalues.
The scatterplotimmediatelyimplieschoosingk=2, whichwould
partitionthe graphinto2 large clusters.Thisinformationisnot
particularlyinformative whenattemptingtofindgroupsof 5 from
small pools.Aside from 𝑘 = 2,andthe trivial k=110, the lacking
obviouseigengapand the graph’sownunclearcommunitystructure
implythatthere isno obviouspick for 𝑘 for the 𝑘 firsteigenvectors.
Thisleavesuswitha degree of choice.A potential choice wouldbe
to assignk to be 22, withthe hope to splitthe 110 verticesinto
clustersof around5 (since 110/22 = 5). Althoughλ22 = 0.953124
and λ23 = 0.9562849, showingaverysmall eigengapbetween λ 𝑘
and λ 𝑘+1, thisissimilartoall otherfeasible eigenvalues.
However,since we are lookingtocreate poolsandto allow some roomforclustersizestovary and
still be considered,we assign 𝑘 tobe a slighly smallervalue. 𝑘 = 20wasfoundtogenerate clusters
large enoughthatthe strongestheroeswere inpoolslarge enoughtobe considered,butsmall
enoughtoeasilyextractclustersof 5 by referencingthe clusterdendogram.Afterperformingk-
meanson the eigenvectormatrix of 𝑢1,… , 𝑢 𝑘,the followingclusterswere obtained.
1 2 3 4 5 6 7
bane rubick axe juggernaut
slardar
mirana lion
zuus nyx_assassin bloodseeker riki
lich
shadow_shaman queenofpain
warlock keeper_of_the_light puck luna
venomancer
skeleton_king spectre
beastmaster centaur lone_druid clinkz
dazzle
pugna slark
viper skywrath_mage magnataur huskar bounty_hunter dark_seer troll_warlord
treant shredder doom_bringer spirit_breaker ogre_magi bristleback
abaddon lycan undying disruptor
phoenix tusk winter_wyvern
8 9 10 11 12 13 14
earthshaker storm_spirit windrunner phantom_assassin sand_king Antimage razor
nevermore naga_siren kunkka life_stealer tidehunter Morphling sven
rattletrap alchemist batrider tinker Faceless_void death_prophet
gyrocopter legion_commander ancient_apparition shadow_demon
silencer ember_spirit meepo visage
15 16 17 18 19 20
Furion Lina enigma drow_ranger phantom_lancer crystal_maiden
Enchantress Sniper dragon_knight vengefulspirit leshrac pudge
Wisp invoker weaver necrolyte broodmother tiny
Elder_Titan obsidian_destroyer chen omniknight jakiro witch_doctor
Techies medusa brewmaster night_stalker chaos_knight templar_assassin
oracle ursa terrorblade earth_spirit
0 20 40 60 80 100
0.00.20.40.60.81.01.2
Scatterplot of Eigenvalues
k
Eigenvalue
Figure 10 scatterplot of eigenvalues
from the normalised graph laplacian
where the x axis is labelled k
13. 13
Figure 11: Corresponding cluster dendogram obtained from hierarchical clustering. The
proximity of heroes to each other relates to the strength of their relationship. For cluster 3,
group 18 is shown, blue arrows indicating that they appear in final combination, red indicating
rejection due to low proximity to other heroes.
From these pools,strongclustersof 5heroescan be extractedbyinspectingthe corresponding
clusterdendogram,obtainedbyperforminghierarchical clusteringuponthe dataset. Some strong
groupsgeneratedinclude:
8: Analysisof Results
8.1: Assessmentand Criticismof Results
The Currentresultspresentgroupsof respectable strength,notablycluster3,withan average
weightof 0.6623263 per connection.Howeverthe currentresults canbe improvedupontofindan
evenstrongergroupof 5. Clustersmaybe limiteddue tothe spectral clusteringalgorithmfocusing
on findingclustersthatare strongrelative tothe overall strengthof the heroes theycontain,
creatingmore balancedpools.Thisisnotideal forthe purposesof thisreport,where verystrong
groupsare lookingtobe prioritised,regardlessof the effectuponweakergroups.
crystal_maiden
pudge
witch_doctor
earth_spirit
tiny
bane
Beastmaster
Viper
Treant
abaddon
Ursa
Vengeful_spirit
necrolyte
Omniknight
Night_stalker
Table 10: Cluster 1 from group 20:
Overall sum of weights 5.794858
Average 0.5794858 per connection
Table 11: Cluster 2 from group 1.
Overall sum of weights 5.907897
(Average 0.5907897 per connection)
Table 12: Cluster 3 from group 18.
Overall sum of weights 6.6583247
(Average 0.6583247 per connection)]
14. 14
Table 17: Cluster 5 from pool 16:
Overall sum of weights= 7.007814
Average weight per connection= 0.7007814
Table 16: Cluster 4 from pool 12:
Overall sum of weights= 5.882743.
Average weight per connection= 0.5882743
8.2: Re-Evaluationusing ExponentiatedAdjacencyMatrix
To addressthe problemof findinghigherstrengthgroups, anexponentiatedversionof original
adjacencymatrix will be used(see section4.4) inorderto prioritise higherweightconnections.
Applyingthe same spectral clusteringmethoduponthe exponentiatedadjacencymatrix,the
followingpoolsare generated.
From these poolsthe same clusterdendogramforthe original adjacencymatrix canbe usedtofind
some combinationsof reasonable strength,suchascluster4, whose average weightperconnection
is0.5882743. Howeverthe mainbenefitof usingthe exponentiateddataisrepresentedbycluster5,
whose average weightperconnectionis0.7007814. Spectral clusteringnow succeeds infindinga
clusterof even higherstrengthusingexponentiateddata.
1 2 3 4 5 6
sand_king lina nyx_assassin axe phantom_assassin lion
tinker leshrac keeper_of_the_light bloodseeker life_stealer queenofpain
obsidian_destroyer weaver wisp puck batrider spectre
visage invoker medusa shadow_demon ancient_apparition slark
skywrath_mage elder_titan lone_druid meepo Troll_Warlord
magnataur Bristleback
shredder
7 8 9 10 11 12 13
earthshaker juggernaut vengefulspirit windrunner nevermore crystal_maiden tidehunter
pudge riki slardar kunkka zuus mirana enchantress
tiny skeleton_king lich alchemist pugna shadow_shaman broodmother
rattletrap luna necrolyte rubick dark_seer warlock oracle
huskar dazzle centaur gyrocopter beastmaster phoenix
lycan omniknight legion_commander silencer viper
night_stalker ember_spirit disruptor clinkz
bounty_hunter abaddon ogre_magi
ursa
spirit_breaker
tusk
14 15 16 17 18 19 20
bane razor witch_doctor antimage drow_ranger storm_spirit enigma
templar_assassin sven venomancer morphling phantom_lancer naga_siren dragon_knight
doom_bringer sniper jakiro faceless_void chaos_knight furion
treant death_prophet undying techies chen
terrorblade earth_spirit brewmaster
winter_wyvern
crystal_maiden
viper
clinkz
Ogre magi
Mirana
witch_doctor
venomancer
jakiro
undying
earth_spirit
Tables 13,14,15: Pools generated by performing k-means
(exponentiated data)
15. 15
8.2 Interpretingthe Resultsin Context
It is noteworthy that all the groups presented in this report have a roughly equal ratio of support
heroesto carry heroes,justifyingtheirnecessityinthe game,despite havinggenerallyalowerkill to
death ratio. The exception to this rule is the final group; cluster 5 given by table 11. This group is
almost entirely made up of support heroes, with all heroes aside from earth spirit being deemed
supports. The success of this group may well be explained by the potential that every hero has for
enormous area of effect damage, all 5/5 heroes are assigned as being “nukers” as well as supports,
somethingwhichisnotespeciallyuseful individually,butinteamfightsisveryeffective.Witchdoctor
contributesthe highestweight inthe group.Thisis likelydue toherabilitytoparalyse large numbers
of opponents.Whenthisabilityiscoupledwiththe othermembers’large areaof effectabilities,which
wouldbe likelytodamage the entiretyof aparalysedopposingteam, thisparticulargroupof 5 would
be very strong in large scale team fights. An experiencedteam would use this to their advantage,
which may explain why these heroes in combination have been shown to be so successful.
8.3 Possible Alternative Methods
Since this report aims to find rough communities, leaving potential for overlapping communitiesof
heroes, an overlapping technique such as fuzzy partitioning may have also been a viable option.
Howeverpool andclustersizeswouldbe more difficulttospecify,possiblyresultinginpoolstoolarge
to find hero combinations within by inspection, or with too much overlap to reach any informative
conclusion about the data. In this case the combination of spectral embedding with k-means and
hierarchical clustering was an effective method. For a much larger data set, where using a cluster
dendogramtospecifygroupswouldnolongerbe a reasonable processtoperform, fuzzypartitioning
may be more appropriate.
References:
Ulrike vonLuxburg:A Tutorialon SpectralClustering,March 2007
SantoFortunato:Community detection in Graphs Jan 2010
DavidM. Blei:Hierarchicalclustering February2008
MacQueen,J.B., inProc. of the fifthBerkeleySymposiumonMathematical StatisticsandProbability,
editedbyL.M. L. Cam and J. Neyman(Universityof CaliforniaPress,Berkeley,USA),volume 1,pp.
281–297 1967
Bezdek,J.C., Pattern Recognition withFuzzy ObjectiveFunction Algorithms (KluwerAcademic
Publishers,Norwell,USA) 1981
Đỗ Ngọc Tuấn: YouTube seriesonSpectral Clustering
Golub, G. H., and C. F. V. Loan, Matrix computations (John Hopkins University Press,
Baltimore, USA) 1989
Other
Prize pool informationsourcedondate 19/11/15 from
https://www.dota2.com/international/compendium/
Match InformationsourcedfromDotabuff.com
ClusterVisualisationcreatedwith: Visualising K-MeansClustering byKaranveerMohan.
http://stanford.edu/class/ee103/visualizations/kmeans/kmeans.html
Notable Rlibraries/functionsused:“d3heatmap”“kernlab”“igraph”“stats”“hclust”“solve()”
“plot()”
Figures1,2 createdwithGephi 0.8.2
Source of Zachare Karate ClubEdgelist:SantaFe Institute
http://tuvalu.santafe.edu/~aaronc/randomgraphs