SlideShare a Scribd company logo
1 of 33
Download to read offline
The Clustering Coefficient: A Literature Review and
Formula Extension
Bachelorarbeit
im Studiengang
International Information Systems Management
der FakultΓ€t Wirtschaftsinformatik
und Angewandte Informatik
der Otto-Friedrich-UniversitΓ€t Bamberg
Verfasser:
Nicholas Michael Rubino
Gutachter:
Prof. Dr. Kai Fischbach
2
TABLE OF CONTENTS
TABLE OF CONTENTS.................................................................................................................................................................2
1. INTRODUCTION .................................................................................................................................................................3
1.1. PROBLEM STATEMENT ............................................................................................................................ 3
1.2. RESEARCH QUESTIONS & HYPOTHESES .................................................................................................. 4
1.3. STRUCTURE OF THIS PAPER ..................................................................................................................... 4
1.4. PURPOSE OF THIS PAPER.......................................................................................................................... 4
2. LITERATURE REVIEW .......................................................................................................................................................5
2.1. PRELIMINARIES ....................................................................................................................................... 5
Network Theory........................................................................................................................................................................5
Triadic Relations.......................................................................................................................................................................6
Binary Networks: Triadic Closure........................................................................................................................................6
Directed Networks: Triadic Transitivity...............................................................................................................................6
Weighted Networks: Triadic Value......................................................................................................................................6
Signed Networks: Triadic Balance.......................................................................................................................................7
2.2. CLUSTERING COEFFICIENT ...................................................................................................................... 7
Clustering Classification...........................................................................................................................................................7
Binary Networks.......................................................................................................................................................................8
Weighted Networks ................................................................................................................................................................10
Weighted and Directed Networks ...........................................................................................................................................13
Signed Networks.....................................................................................................................................................................14
Research Trends......................................................................................................................................................................15
Limitations..............................................................................................................................................................................15
3. FORMULA EXTENSION ....................................................................................................................................................16
3.1. BASIS .................................................................................................................................................... 17
3.2. MICRO-COMPARABILITY....................................................................................................................... 17
Formula: Weighted or Directed Networks ..............................................................................................................................17
Example..................................................................................................................................................................................17
3.3. MACRO-COMPARABILITY...................................................................................................................... 18
Formula: Weighted and/or Directed Networks........................................................................................................................20
Example..................................................................................................................................................................................21
Formula: Strongly Balanced Weighted and/or Directed Networks..........................................................................................22
Example..................................................................................................................................................................................23
Formula: Weakly Balanced Weighted and/or Directed Networks...........................................................................................23
Example..................................................................................................................................................................................24
3.4. SUMMARY ............................................................................................................................................. 25
4. CLUSTERING COMPARISON.............................................................................................................................................25
4.1. DATA SET.............................................................................................................................................. 25
4.2. COMPARISON IN UNWEIGHTED NETWORKS........................................................................................... 26
4.3. COMPARISON IN WEIGHTED NETWORKS ............................................................................................... 27
5. CONCLUSION...................................................................................................................................................................29
5.1. LIMITATIONS ......................................................................................................................................... 29
5.2. FUTURE IMPLICATIONS.......................................................................................................................... 29
5.3. SUMMARY ............................................................................................................................................. 29
6. PUBLICATION BIBLIOGRAPHY .........................................................................................................................................30
7. INDEX ..............................................................................................................................................................................32
7.1. LIST OF TABLES..................................................................................................................................... 32
7.2. LIST OF FIGURES ................................................................................................................................... 32
8. AFFIDAVIT.......................................................................................................................................................................33
3
1. INTRODUCTION
The formation of clusters within a social network, isn’t new news. In fact, many see it as one of the earliest features
for measuring a network’s morphology, with approaches dating back to the mid-20th century (Barnes 1969).
However, in the current stream of research the assessment of graph clustering has become particularly popular.
Based on the clustering coefficient 𝐢 developed by Watts and Strogatz (1998) as well as the one by Newman et
al. (2001), various extensions have been developed to depict the clustering ratio more precisely. While these
extensions have improved the understanding on the ways individuals are connected in various networks, its
development has evolved in a chaotic manner (Fortunato 2010). Several approaches can be found within recent
literature to assess the clustering of graphs; a single approach in doing so, however, has yet to have been adopted.
This paper considers a possible rationale for this occurrence based on three main limitations of prior research,
which are explained in the following section.
1.1. PROBLEM STATEMENT
The aforementioned clustering coefficients that started the hype in network literature to adapt said formulae were
originally intended for binary networks. This means this assessment of graph clustering merely acknowledges
structural properties of the graph, i.e. if a relationship between the network’s members exists or doesn’t exist, as
seen in the clustering coefficient of Newman et al. (2001). This equation gives the tendency of the existence of
three members of a network being connected by three different relationships as opposed to these three only being
connected by two relationships. The latter implies that a connection between two of the members doesn’t exist.
This ratio, pertaining to the structural properties of a network, can be seen as the ideal foundation for the
developments that follow. As time passed relational properties were added to the mix, e.g. by attributing weights
or directedness to the relationships in the network. While great advancements have been made, this topic is still
in its research infancy, since many limitations are given for the respective developments.
The various presented clustering coefficients found in prior literature embrace findings to deliver a more precise
ratio of triadic closure over triadic connectedness. This ratio delivers the tendency that β€œthe friend of my friend is
my friend”. However, the advancements found in prior literature and the respective shortcomings of each formula
are distributed disproportionately. This means, different equations offer different solutions to different problems,
but do not always incorporate the given solutions from other findings. Hence, each clustering coefficient found in
prior research incorporates limitations that have already been overcome in prior research. By combining selected
clustering coefficient formulae, the known limitations can be set aside and a synergy of advancements can be
achieved.
While the lack of combination of the coefficients represents the first summarized limitation in this paper, the lack
of micro-comparability is presented to be the second. Micro-comparability in terms of the clustering coefficient
acknowledges that a clustering coefficient doesn’t always allow for a comparability of the same graph with
different relational properties. The issue of micro-comparability is given due to prior literature’s need to assess a
completely connected graph with a clustering coefficient value of 𝐢 = 1, resulting in a meaningless assessment
of the cluster formation in completely connected graphs and a distorted assessment of triads with a large relational
variance. By adapting the formula to take relational variance into consideration, the results can provide micro-
comparability within the clustered triads.
Furthermore, the issue of macro-comparability is given, as well. This means that the clustering coefficient is
unable to compare different social networks indifferent to their relational and structural properties. While great
advancements have been made to incorporate the relational properties of weightedness and directedness, current
literature has yet to provide a clustering coefficient that can be used in binary, weighted, directed and/or signed
4
networks. To overcome the limitation of macro-comparability the formula can be adjusted so that it can be used
in all types of networks regardless of its relational or structural properties.
In summary due to the chaotic development of the clustering coefficient it is vital to collect, organize and compare
the clustering coefficients of prior literature, in order to capture said development, determine research trends, and
address limitations. The discovered limitations should then be resolved in the form of a formula adjustment.
1.2. RESEARCH QUESTIONS & HYPOTHESES
By reviewing these aspects within the scope of this paper, an answer to this problem statement is to be found, in
particular with the aim to answer the following research questions:
Q1: How has the clustering coefficient developed over time?
Q2: Can the formula be improved to incorporate micro- and macro-comparability?
Q3: How would such a formula compare to the formulae found in research literature?
The presented paper sets out to answer Q1 by reviewing prior research on the clustering coefficient. Furthermore,
this paper proposes a formula extension as an answer to Q2, which can hypothetically do the following:
H1: The proposed formula distinguishes between clusters in terms of relational variance, thus offering micro-
comparability.
H2: The proposed formula can be used in binary, weighted, directed, and/or signed networks, thus offering
macro-comparability.
Within the scope of this paper, the proposed formula is compared to those found along this research stream, in
order to provide an answer to Q3. Because the proposed formula should exclude or minder aspects of the formulae
found in previous literature, the following is foreseeable:
H3: The newly developed clustering coefficient should emit a smaller value than those previously found in
research literature.
With regard to this paper’s research intent, the following section aims to present an outline of the paper at hand.
1.3. STRUCTURE OF THIS PAPER
Due to very recent developments of the clustering coefficient, a literature review of prior research is appropriate
and is presented in the first part of this paper. Thereby, preliminary knowledge on network theory is noted briefly
and insight towards triadic relations is given. Subsequently, a holistic review of prior research regarding the
clustering coefficient is presented, in which the term clustering is clarified, its development is documented,
research trends are rendered and its limitations are addressed. The second part of the paper consists of the formula
extension. The gathered research on the clustering coefficient is critically evaluated and modified according to the
reviewed limitations, thereby proposing a new clustering coefficient overcoming the limitations of micro-and
macro-comparability. In the third part of the paper, the new formula is compared to an extent to clustering
coefficients of the past. Finally, a section is reserved for topics of discussion, such as known limitations and future
implications.
1.4. PURPOSE OF THIS PAPER
By critically evaluating prior research in regard to the adaption of the clustering coefficient formula, it is desired
to provide a general understanding of said formula as well as an overview of its most recent development. By
doing so, this paper contributes to social network research as well as to the research of network theory in general.
Moreover, it is desired to extend current research by comparing these latest developments.
5
Prior research presents a fully connected network with a clustering coefficient of 𝐢 = 1, resulting in a meaningless
analysis of the clustering within smaller networks, e.g. smaller companies, where everyone knows everyone, or
in the necessity to manipulate data by performing cut-off measures. The proposed formula intends to alleviate the
equation from this limitation, thus differentiating between clusters with differently weighted edges and equally
weighted ones, thereby offering micro-comparability. This paper, therefore, offers a new perspective towards
assessing the clustering formation, which can spark interest in fellow social network researchers to address this
old concept in a new light.
Additionally, by using the research at hand a new approach on the clustering coefficient is presented, which ideally
can act as a framework for measuring the clustering of a network indifferent of its characteristics, i.e. weighted,
directed networks, and networks that include positive and negative weights. By doing so, it is plausible that new
ideas on older and current theories will emerge, thus expanding the research in social networking analyses.
Because the clustering coefficient differs based on the network at hand, i.e. different equations are used for
assessing the formation of clusters in different networks, this paper intends to provide the means for comparability
in different types of networks, i.e. macro-comparability. Overall, the proposed formula is to the extent of this
paper’s knowledge the first of its kind to deliver clustering results in binary, weighted, directed, and signed
networks.
Moreover, it is also desired for social network researchers to further the presented review by excluding or
minimizing the limitations within this paper, as well as by evaluating other network measurement formulae in
regard to their fit in real-world environments, or even developing this formula within a more extensive empirical
study - possibly giving better insight towards cluster formations in specific environments or even towards other
mathematical analysis aspects along the lines of this study.
2. LITERATURE REVIEW
Since the clustering coefficient has evolved rapidly within the past several years, a review of prior research
literature is necessary in order to understand its origin and the changes already implemented. In order to do so, it
is vital to understand the basic notations and understandings used in network theory. Moreover, insight towards
relevant triadic relations is given. This preliminary understanding is presented in the following and can be used to
identify terms and variables used throughout the paper. Once this preliminary section is handled, the review of
prior research in specific regard to the clustering coefficient is presented, whereby its definition is clarified and
its development is documented. In addition the limitations within this literature review are addressed.
2.1. PRELIMINARIES
NETWORK THEORY
A graph 𝐺 consists of a set of 𝑁 = {𝑛1, 𝑛2, … , 𝑛 𝑛} nodes (vertices, points, or actors), a set of 𝐿 = {𝑙1, 𝑙2, … , 𝑙 𝑛}
links (edges, lines, or ties), and a set of π‘Š = {𝑀1, 𝑀2, … , 𝑀 𝑛} values (weights). The weights attributed to each of
the edges, and correspondingly found within graph 𝐺, can also be portrayed as a matrix in the weight matrix π‘Š,
e.g.: 𝑀𝑖𝑗 describes the weight of the edge between 𝑛𝑖 and 𝑛𝑗 (Boccaletti et al. 2006). The adjacency matrix 𝐴 is the
weight matrix for binary networks, where only values of π‘Ž = 0 or π‘Ž = 1 are permitted. A graph can generally
embody four different structures: undirected and unweighted (binary), undirected and weighted, unweighted and
directed as well as weighted and directed. Weighted graphs have edges weighted with any numeric value. Directed
graphs have an asymmetrical weight matrix. Hereby, the order of the subscript of the weight is important. This
describes the direction to which the weight is relevant, e.g. the weight 𝑀𝑖𝑗 describes the weighted edge going from
node 𝑛𝑖 to node 𝑛𝑗. Typically in network research the prerequisite that 𝑖 β‰  𝑗 β‰  π‘˜ is given, which is also adopted
6
in this paper. Signed graphs can include both positive and negative weights. If two nodes are connected, these are
neighbors or adjacent, with π‘˜π‘– being the number of neighbors that node 𝑛𝑖 has, also known as the node degree
(KivelΓ€ et al. 2014). While the definition of relational and structural characteristics of a graph differs throughout
literature, this papers defines them as such. Structural properties of a graph focus on the existence or non-existence
of an edge connecting two nodes. Relational properties focus on the relational characteristics of this edge, e.g. the
weightedness and directedness. With this preliminary knowledge on network structures and relations therein, the
specific relation of triads is explained in following section.
TRIADIC RELATIONS
BINARY NETWORKS: TRIADIC CLOSURE
Triads or triples illustrate the relationships between a set of three nodes. A triad is connected or open, if the three
nodes are connected by two edges with weights higher than 𝑀 = 0. If this is the case, then the nodes are neighbors
or adjacent. The triad is completely connected or closed if a triangle is formed, i.e. three edges connect the three
nodes (KivelΓ€ et al. 2014). In binary networks this closure is means enough for assessing the formation of a cluster,
as the existence of the relevant edges between the nodes suffices. However, when adding the relational property
of directedness, the transitivity needs to be assessed before determining, whether the triad is closed.
DIRECTED NETWORKS: TRIADIC TRANSITIVITY
The researchers Holland and Leinhardt (1971) propose transitivity to be the key structural concept in the analysis
of sociometric data. A closed triad defined by Wasserman (1994) is transitive if whenever 𝑙𝑖𝑗 and π‘™π‘—π‘˜ are present,
then so is π‘™π‘–π‘˜. In physical terms this portrays a link chain from 𝑛𝑖 to 𝑛 π‘˜ through 𝑛𝑗 and connects 𝑛𝑖 and 𝑛 π‘˜ with a
non-vacuous link from the perspective of 𝑛𝑖. A non-vacuous connection is an out-going link from a focal actor’s
perspective (Wasserman 1994). In any given set of three nodes there are six triadic relations. For example, the
open triadic relations among the three nodes 𝑛1, 𝑛2, and 𝑛3 consist of the following six:
𝑙12 𝑙23 βˆ’ 𝑙13 𝑙32 βˆ’ 𝑙21 𝑙13 βˆ’ 𝑙23 𝑙31 βˆ’ 𝑙31 𝑙12 βˆ’ 𝑙32 𝑙21 ,
which depict the first condition of Wasserman’s transitivity definition. The second condition involves
implementing the third edge connected in a non-vacuous manner with the focal actor being the starting node 𝑛,
displayed as such:
𝑙12 𝑙23 𝑙13 βˆ’ 𝑙13 𝑙32 𝑙12 βˆ’ 𝑙21 𝑙13 𝑙23 βˆ’ 𝑙23 𝑙31 𝑙21 βˆ’ 𝑙31 𝑙12 𝑙32 βˆ’ 𝑙32 𝑙21 𝑙31.
While the transitivity definition helps assess triadic closure in directed networks, determining the weight value is
still open. The various assessment measures are given in the following section.
WEIGHTED NETWORKS: TRIADIC VALUE
When implementing these six triadic relations in clustering formulae, the corresponding weight 𝑀 is taken out of
the weight matrix, resulting in a value of 𝑀 = 0 if two nodes are not connected. However, the weights from the
weight matrix alone don’t suffice to assess the value of the entire triad. As Opsahl and Panzarasa (2009) point out,
there are four ways of assessing the triadic value: the arithmetic mean, the geometric mean, the maximum value
and the minimum value. While the arithmetic mean is simple to use, it is prone to sensitivity issues as it is not
robust against differences in weights, especially in extreme settings. The maximum and minimum value are also
prone to insensitivity, as lower weights in the maximum value and higher weights in the minimum value are
regarded to less of an extent. The geometric mean overcomes these issues of sensitivity (Opsahl, Panzarasa 2009).
The four methods are given in Table 1 with examples to show their deviations from one another. However, when
7
regarding both positive and negative weights, the formulation of balance is key to determining the triadic closure.
This occurrence is presented in the following section.
Table 1: Triadic Value Assessment (Opsahl, Panzarasa 2009)
Maximum Value a) π‘šπ‘Žπ‘₯(2,2) = 2 b) π‘šπ‘Žπ‘₯(1,3) = 3
Minimum Value a) π‘šπ‘–π‘›(2,2) = 2 b) π‘šπ‘–π‘›(1,3) = 1
Arithmetic Mean a) (2 + 2) 2⁄ = 2 b) (1 + 3) 2⁄ = 2
Geometric Mean a) √2 βˆ— 2 = 2 b) √1 βˆ— 3 = 1.73
SIGNED NETWORKS: TRIADIC BALANCE
Since the introduction of the Theory of Structural Balance by Heider (1946), further implications of including
positive and negative weights on triadic relations have been researched thoroughly. Davis (1967) determines that
in order for a local network to be clusterable, it must also be balanced. The formulation of balance is based on the
arithmetic sign of all three weights of a triad. A closed triad with a single negative weight, and thus two positive
weights, is unbalanced and therefore not clusterable. On the contrary, a triad consisting of three positive weights
or of one positive and two negative weights is balanced. Specifically, these occurrences depict a strong
formulation of balance. A weak formulation of balance is also possible, if the given triad embodies negative
weights for all three of its edges (Davis 1967). A visual representation of this is available in Table 2.
Table 2: Clustering in Terms of the Formulation of Balance (Szell et al. 2010)
Strong Formulation of Balance Balanced Balanced Unbalanced Unbalanced
Weak Formulation of Balance Balanced Balanced Balanced Unbalanced
2.2. CLUSTERING COEFFICIENT
With this preliminary understanding at hand, the following section aims to depict a holistic development of the
clustering coefficient in prior research, ranging from the initial proposal, up to recent approaches of developing
the formula to analyze networks with different structural and relational properties. Thereby, the literature review
of the clustering coefficient is categorized into the networks these were developed for. Subsequently, the research
trends are provided, and the limitations towards the more recent developments are addressed. However, given the
aforementioned chaotic development of this stream of research, a classification of the types of clustering is
foremost necessary.
CLUSTERING CLASSIFICATION
As previously mentioned the development of cluster formation assessment has grown in a chaotic manner, which
has led to an unclear collection of clustering definitions. KivelΓ€ et al. (2014) distinguish the formation of clusters
8
three-fold. Firstly, one can use the node degrees to emit the ratio of existing adjacencies against all possible
adjacencies in a graph. In this regard, the term clustering is synonymous with the density or neighborhood of a
node. Secondly, one can use walks and paths to assess the clustering formation of a graph. This assessment is
often used to identify communities, which is also synonymous to clusters and depict dense regions of a network
(Boccaletti et al. 2006). Community detection aims to group nodes in modules based on a graph’s topology.
Fortunato (2010) determines four traditional methods for assessing this type of clustering: hierarchical methods,
partitional and graph partitional methods as well as spectral methods. Other methods such as grid-based and
constraint-based clustering are available to use as well, among many others (Berkhin 2006). Lastly, clustering can
be determined at a triadic level by evaluating the relations of a set of three nodes (KivelΓ€ et al. 2014). In this
respect, the fundamental key formula, the clustering coefficient (Newman et al. 2001), measures transitivity,
which is also often seen as a synonym to clustering (Latora, Marchiori 2003), and gives the ratio of closed triads
or triples over mere connected ones. In physical terms, clustering based on triadic closure gives the tendency, that
β€œthe friend of my friend is my friend”.
This paper chooses to generalize the types of clustering to a further extent by defining two types of cluster
assessment: a macroscopic and microscopic approach. On the one hand, the clustering of a network can be
assessed using a macroscopic approach. This entails that clusters are dense regions of network, for example cliques
or communities. Within these it is not necessary for each member of the dense cluster to be connected with one
another. The fact that the group is highly dense justifies the term clustering. As already pointed out clustering
based on community detection is abundantly present within prior research and many variations of this assessment
are given, as well. On the other hand microscopic clustering is possible. This acknowledges a cluster as a group
of three members that are completely connected. Generally, this can be assessed in terms of triadic closure. A
similar distinction of the two types of clustering is presented by Girvan and Newman (2002). They acknowledge
that the often synonymous terms are misleading and therefore refrain from using the term clustering to describe
the detection of communities. Since the paper at hand shares this view, the triadic route of assessing clusters is
chosen.
In terms of the clustering coefficient there are two main methods available for assessing the tendency of clustered
nodes. On the one hand, the local clustering coefficient is based on the local density of an ego’s network and
provides a result for the clustering tendency from a local actor’s perspective, e.g. by assessing all closed triads
over connected triads, in which the node 𝑛𝑖 is involved. The sum of all local clustering coefficients can then be
averaged by all nodes to globalize its result across the entire network. The second measure is a straightforward
global measure. Here, the global clustering coefficient assesses all closed triads from each nodes’ perspective and
divides them by all open ones. The first measure is prone to sensitivity issues, since each local clustering
coefficient is equally weighted regardless of its node-degree or general connectedness in the network (Opsahl,
Panzarasa 2009). The use of both globalized local and global clustering coefficients is abundantly found prior
research, which can be seen in following section, where the development of the formula is illustrated.
BINARY NETWORKS
Equating the clustering tendency in binary networks has proven to be the simplest. Because relational properties
aren’t acknowledged, the following prerequisites are given π‘Žπ‘–π‘— = π‘Žπ‘—π‘– = 𝑀𝑖𝑗 = 𝑀𝑗𝑖. The term clustering coefficient
was first introduced by Watts and Strogatz (1998) in their attempt to compare random networks to those in the
real-worlds. The clustering coefficient 𝐢 is defined as the average of 𝐢𝑖 over all 𝑁, where 𝑁 is the number of nodes
in the network and 𝐢𝑖 is the ratio of the actual amount of edges (𝐿𝑖) that 𝑛𝑖 has over the maximum possible number
of edges equated using the following formula (π‘˜π‘–(π‘˜π‘– βˆ’ 1)) 2⁄ . Given the fraction form of the entire equation, with
9
the denominator always larger than the numerator, 𝐢 is given between the values of 𝐢 = 0 and 𝐢 = 1. The
clustering coefficient by Watts and Strogatz (1998) can therefore be read as such:
[ 1 ]
𝐢 π‘Šπ‘Žπ‘‘π‘‘π‘  π‘†π‘‘π‘Ÿπ‘œπ‘”π‘Žπ‘‘π‘§ =
1
𝑁
βˆ‘
𝐿𝑖
π‘˜π‘–(π‘˜π‘– βˆ’ 1)/2
𝑖
While this clustering coefficient doesn’t use triadic relations to determine the formation of clusters, its introduction
started the movement towards the clustering coefficient development and is therefore noteworthy. However,
comparable findings were made almost half a century prior to this introduction. The proposal given by Watts and
Strogatz (1998) is very similar to the findings of Kephart (1950), in which the law of family interactions is
proposed to be the ratio of actual relationships over potential ones. Watts and Strogatz (1998) develop this by
using of a focal node’s perspective, which is then globalized over the entire network.
Given this milestone on research development, Newman et al. (2001) adjust the definition of the clustering
coefficient in their study on random graphs and implement it on real-world networks, specifically collaboration
networks and the world-wide web. This claims to be equal to and merely reverses the approach to the original
clustering coefficient by taking the ratio of the means instead of the mean of the ratios. This coefficient is defined
as such (Newman et al. 2001):
[ 2 ]
𝐢 π‘π‘’π‘€π‘šπ‘Žπ‘› 𝑒𝑑 π‘Žπ‘™. =
3π‘βˆ†
𝑁⋀
In general terms, it is read as three times the number of all triangles (π‘βˆ†) divided by all the connected triples
(𝑁⋀). The number 3 in the numerator is present on account of each triangle representing three closed triads. As
pointed out by Schank and Wagner (2005) as well as by Latora and Marchiori (2003), this formula and the one
from Watts and Strogatz (1998) differ. In fact, Latora and Marchiori (2003) define the latter proposed by Watts
and Strogatz (1998) to equate the approximate of a different measure, namely the efficiency. The two also extend
the model, which is explained in the review of the clustering coefficient in weighted networks. A further limitation
of the Watts-Strogatz formula (1998), is the fact that their formula is based on the sum of all local clustering
coefficients which is then globalized of the entire network. As mentioned in section 2.2.1 of this paper, such an
approach is prone to sensitivity issues.
While the presented clustering coefficients are intended for assessing the clustering in binary networks, literature
shows that these can also be implemented in weighted networks, as well. The study of scientific collaboration by
Newman (2001) is a great candidate for implementing weights, as the relations between scientists can be seen as
stronger for a larger amount of co-authored papers and weaker for the contrary. Newman (2001), however,
assesses the formation of clusters based on a binary graph and thus only acknowledges the existence of links
between the scientists and disregards relational attributes. Such a manipulation or symmetrization of the data, is
common in early social networking research, since the mathematical foundations for the clustering assessment
aren’t yet able to include relational properties. As an answer to this problem, the formula is extended to include
weightedness, which is presented in the following section.
10
WEIGHTED NETWORKS
When implementing weighted edges into the clustering coefficient formula the prerequisites from binary networks
are no longer the case. Instead the following is given π‘Žπ‘–π‘— = π‘Žπ‘—π‘– β‰  𝑀𝑖𝑗 = 𝑀𝑗𝑖. The weighted networks described in
this section are also undirected, therefore the order of the variables’ subscript is irrelevant. Furthermore, prior
research shows a separation of development of the clustering coefficient formula. While early on many extensions
and adjustments are given to the local clustering coefficient of Watts and Strogatz (1998), most recent
developments further the global measure introduced by Newman et al. (2001). The following section first reviews
the development of the local clustering coefficient. Subsequently, developments of the global clustering
coefficient follow.
As already mentioned Latora and Marchiori (2003) define the original clustering coefficient model from Watts
and Strogatz (1998) to assess the efficiency of a network rather than its clustering. Specifically, this measures how
well information spreads throughout a network. In addition to providing a new definition of this formula, the two
researchers expand said formula to incorporate weights. This expansion is read as follows.
[ 3 ]
πΆπΏπ‘Žπ‘‘π‘œπ‘Ÿπ‘Ž π‘€π‘Žπ‘Ÿπ‘β„Žπ‘–π‘œπ‘Ÿπ‘– =
1
𝑁(𝑁 βˆ’ 1)
βˆ‘
1
𝑑𝑖𝑗
𝑖,𝑗
1
𝑁(𝑁 βˆ’ 1)
βˆ‘
1
𝑀𝑖𝑗
𝑖,𝑗
The numerator measures the average efficiency between two nodes, in which the shortest path-distance 𝑑𝑖𝑗
between two nodes 𝑛𝑖 and 𝑛𝑗 is seen as inversely proportional to their efficiency. The variable 𝑑𝑖𝑗 gives the shortest
summed weight required to connect the two nodes. In order to normalize the efficiency between 𝐢 = 0 and 𝐢 = 1
the denominator is introduced. This measures the ideal average efficiency, in which 𝑀𝑖𝑗 is equal to 𝑑𝑖𝑗 if a direct
link between nodes 𝑛𝑖 and 𝑛𝑗 is formed. Since this formula is based on the Watts-Strogatz model (1998), it
purposely disregards emitting a clustering result and therefore refrains from using triadic closure (Latora,
Marchiori 2003). Regardless of this, this paper deems the findings of Latora and Marchiori (2003) noteworthy,
due to its provision of insight towards the importance of including weightedness in the original clustering
coefficient formula.
On a further note, Grindrod (2002)1 adapts the formula, as well, in order to equate the clustering tendency in even
larger networks, where the exact link number can be given by an estimate. In this ensemble approach, the number
of connected triads in the denominator above is replaced by the probability 𝑝 that node 𝑛𝑖 is connected to 𝑛𝑗 and
is connected to 𝑛 π‘˜. The numerator then expands this by including the probability 𝑝 that nodes 𝑛𝑗 and 𝑛 π‘˜ are
connected. Thereby, the probability 𝑝 is given between 𝑝 = 0 and 𝑝 = 1 (Grindrod 2002). The formula is read as
such:
[ 4 ]
𝐢 πΊπ‘Ÿπ‘–π‘›π‘‘π‘Ÿπ‘œπ‘‘ =
1
N
βˆ‘ (
βˆ‘ 𝑝𝑖𝑗 π‘π‘–π‘˜ π‘π‘—π‘˜π‘—,π‘˜
βˆ‘ 𝑝𝑖𝑗 π‘π‘–π‘˜π‘—,π‘˜
)
𝑖
1
Grindrod (2002) merely proposes a local clustering coefficient in his article. For the sake of comparability,
the global clustering coefficient based on this formula is given by globalizing the local clustering
coefficients over all nodes N, as seen in Barrat et al. (2004). The same goes for the clustering coefficients
of Onnela et al. (2005), Zhang, Horvath (2005), and Holme et al. (2007).
11
This development utilizes the approach of globalizing local clustering coefficients, as seen in Watts, Strogatz
(1998), but bases its factors on triadic closure, as seen in Newman et al. (2001). In his paper, Grindrod (2002)
further develops the formula to assess the probability values.
In their analysis of an airline transportation network as well as of a social network of scientific collaboration,
Barrat et al. (2004) introduce a local clustering coefficient based on triadic closure that implements weights as
relational properties. The formula is read as follows.
[ 5 ]
𝐢 π΅π‘Žπ‘Ÿπ‘Ÿπ‘Žπ‘‘ 𝑒𝑑 π‘Žπ‘™. =
1
𝑁
βˆ‘ (
1
𝑠𝑖(π‘˜π‘– βˆ’ 1)
βˆ‘
(𝑀𝑖𝑗 + π‘€π‘–π‘˜)
2
π‘Žπ‘–π‘— π‘Žπ‘–π‘˜ π‘Žπ‘—π‘˜
𝑗,π‘˜
)
𝑖
The factor (𝑠𝑖(π‘˜π‘– βˆ’ 1)) is the weight of each edge times the maximum possible number of triples and is used to
normalize the clustering result between 𝐢 = 0 and 𝐢 = 1. This is comparable to the denominator of the Watts,
Strogatz formula (1998). The variable 𝑠𝑖 is the difference and embodies the node strength, which is the weighted
value of all edges connected to node 𝑛𝑖. The second factor accounts for the average amount of the two weighted
values that are connected by a focal actor 𝑛𝑖. However, this is only the case if a triangle is formed. This gives the
local clustering coefficient, which is then averaged overall nodes 𝑁 to give the clustering coefficient for the entire
network. This formula marks a further development, as it utilizes one of the aforementioned triadic values
assessment measures, namely the arithmetic mean (Barrat et al. 2004).
Onnela et al. (2005) critique the clustering coefficient given by Barrat et al. (2004), on account of a disregard
towards the weighted value of the third connecting edge. They, therefore, expand the formula to incorporate the
value of said edge and apply the proposed formula to the undirected financial network of traded stocks. Their
proposal reads as such.
[ 6 ]
𝐢 π‘‚π‘šπ‘›π‘’π‘™π‘Ž 𝑒𝑑 π‘Žπ‘™. =
1
𝑁
βˆ‘ (
1
π‘˜π‘–(π‘˜π‘– βˆ’ 1)
βˆ‘ (
𝑀𝑖𝑗
π‘šπ‘Žπ‘₯(𝑀)
π‘€π‘–π‘˜
π‘šπ‘Žπ‘₯(𝑀)
π‘€π‘—π‘˜
π‘šπ‘Žπ‘₯(𝑀)
)
1
3⁄
𝑗,π‘˜
)
𝑖
This coefficient is read similarly to the one proposed by Barrat et al. (2004). Here, however, the triadic value is
assessed by using the geometric mean as opposed to the arithmetic mean used in Barrat et al. (2004). In addition,
the weights are scaled by the largest weight and the node strength is replaced by the node degree. Furthermore, it
is no longer necessary to regard the adjacency values, since including the weighted value of the connecting edge
enables the formula to differentiate between closed and connected triples.
In their paper for biological networks, Zhang and Horvath (2005) provide a different approach towards assessing
the clustering in weighted networks. They generalize the ratio of the total number of direct connections a node 𝑛𝑖
has by its maximum number of possible connections, which is read as such.
[ 7 ]
𝐢 π‘β„Žπ‘Žπ‘›π‘” π»π‘œπ‘Ÿπ‘£π‘Žπ‘‘β„Ž =
1
N
βˆ‘
(
βˆ‘ (
𝑀𝑖𝑗
π‘šπ‘Žπ‘₯(𝑀)
π‘€π‘—π‘˜
π‘šπ‘Žπ‘₯(𝑀)
𝑀 π‘˜π‘–
π‘šπ‘Žπ‘₯(𝑀)
)𝑗,π‘˜
(βˆ‘
𝑀𝑖𝑗
π‘šπ‘Žπ‘₯(𝑀)𝑗 )
2
βˆ’ βˆ‘ (
𝑀𝑖𝑗
π‘šπ‘Žπ‘₯(𝑀)
)
2
𝑗
)
=
1
N
βˆ‘ (
βˆ‘ (
𝑀𝑖𝑗
π‘šπ‘Žπ‘₯(𝑀)
π‘€π‘—π‘˜
π‘šπ‘Žπ‘₯(𝑀)
𝑀 π‘˜π‘–
π‘šπ‘Žπ‘₯(𝑀)
)𝑗,π‘˜
βˆ‘ (
𝑀𝑖𝑗
π‘šπ‘Žπ‘₯(𝑀)
π‘€π‘–π‘˜
π‘šπ‘Žπ‘₯(𝑀)
)𝑗,π‘˜
)
𝑖𝑖
12
While Zhang and Horvath (2005) originally use an adjacency function to derive the weighted values, the paper of
SaramΓ€ki et al. (2007) shows the formula’s capability to use weights from the weight matrix, as well. Similar to
Onnela et al. (2005) the weights are scaled by the maximum weight in the graph. The denominators are based on
the maximum weights, ensuring a result between 𝐢 = 0 and 𝐢 = 1 (SaramÀki et al. 2007). The equation from
Zhang and Horvath (2005) is also no longer reliant on equating the node degree π‘˜π‘–, instead the weighted values
are used in the denominator. Kalna and Higham (2007) provide further evidence in their paper, that the proposed
local coefficients by Zhang, Horvath (2005) are equal to one another.
The version of the local clustering coefficient provided by Holme et al. (2007) is used to assess the clustering of
students at a Korean university. Their formula aims to meet the following requirements. The coefficient emits a
value between 𝐢 = 0 and 𝐢 = 1, the weight 𝑀 = 0 represents the lack of a connection, a given triad in the formula
should be proportional to its relevance in the clustering result in comparison to the weights of each of its edges,
and the Watts and Strogatz formulated results (1998) should be identical to their formula results, if the weights
are replaced with adjacencies. The maximum value in their formula is used as an answer towards their third
requirement. Specifically, this maximum value represents a matrix, in which the maximum 𝑀𝑖𝑗 is located on all
positions (Holme et al. 2007). This is given below.
[ 8 ]
𝐢 π»π‘œπ‘™π‘šπ‘’ =
1
𝑁
βˆ‘ (
βˆ‘ 𝑀𝑖𝑗 π‘€π‘–π‘˜ π‘€π‘—π‘˜π‘—,π‘˜
π‘šπ‘Žπ‘₯𝑖𝑗(𝑀𝑖𝑗) βˆ‘ 𝑀𝑖𝑗 π‘€π‘–π‘˜π‘—,π‘˜
)
𝑖
In regard to more recent developments of the clustering coefficient the global measure as opposed to the
globalizing of local clustering coefficients has become the standard. EngΓΈ-Monsen and Canright (2011) propose
a global clustering coefficient formula highly based on that of Newman et al. (2001), however here the geometric
mean is used to determine the triadic value. Their proposal reads as such.
[ 9 ]
𝐢EngΓΈβˆ’Monsen Canright =
βˆ‘ √ 𝑀𝑖𝑗 π‘€π‘–π‘˜ π‘€π‘—π‘˜
3
𝑖,𝑗,π‘˜
βˆ‘ √ 𝑀𝑖𝑗 π‘€π‘–π‘˜π‘–,𝑗,π‘˜
Phan et al. (2013) extend this formula and apply it to 1000 Bernoulli random networks. The extension is given by
acknowledging that the third connecting edge plays more a relative role in the clustering assessment. The equation
therefore allows relating the weighted strength of the third connecting edge to that of the other two. The
denominator consists of the weighted value assessment of all open triads plus that of all closed triads. This
approach is given to normalize the clustering coefficient between 𝐢 = 0 and 𝐢 = 1, which is given as follows
(Phan et al. 2013).
[ 10 ]
𝐢Phan et al. =
βˆ‘ √√ 𝑀𝑖𝑗 π‘€π‘–π‘˜ √ 𝑀𝑖𝑗 π‘€π‘–π‘˜ π‘€π‘—π‘˜
3
𝑖,𝑗,π‘˜
βˆ‘ √√ 𝑀𝑖𝑗 π‘€π‘–π‘˜ √ 𝑀𝑖𝑗 π‘€π‘–π‘˜ π‘€π‘—π‘˜
3𝐢
𝑖,𝑗,π‘˜ + βˆ‘ √ 𝑀𝑖𝑗 π‘€π‘–π‘˜
𝑂
𝑖,𝑗,π‘˜
While the presented developments of the formula offer great approaches towards assessing the clustering tendency
in weighted networks, they disregard the relational property of directedness. In the following section proposed
measures are presented to overcome this limitation.
13
WEIGHTED AND DIRECTED NETWORKS
Clustering in directed networks is based on the prerequisite that the tie from node 𝑛𝑖 to node 𝑛𝑗 isn’t necessarily
equal to the tie from node 𝑛𝑗 to node 𝑛𝑖, and thus π‘Žπ‘–π‘— β‰  π‘Žπ‘—π‘– β‰  𝑀𝑖𝑗 β‰  𝑀𝑗𝑖.
Fagiolo (2007) remarks that when examining the triad formation, one can pay special attention to the role a focal
actor plays, and notes four possible patterns. The focal node 𝑛1 a) can be involved in a cycle (β€œcyc”), e.g. 𝑙12 𝑙23 𝑙31,
b) can play the role of a middleman (β€œmid”), e.g. 𝑙21 𝑙13 𝑙23 where node 𝑛2 can reach node 𝑛3 either directly or
through the focal node 𝑛1, c) can be classified as β€œin”, e.g. 𝑙21 𝑙31 𝑙23 where node 𝑛1 holds two incoming edges,
and d) can be classified as β€œout”, e.g. 𝑙12 𝑙13 𝑙23 where node 𝑛1 holds two outgoing edges. Fagiolo (2007) proposes
four clustering coefficients for each of the patterns and then combines the four by defining the clustering
coefficient to be the total of all actual triadic relations of each of the four patterns, divided by all possible ones.
By replacing the adjacency values with the values from the weight matrix the following globalized local clustering
coefficient for weighted and directed networks is proposed.
[ 11 ]
𝐢Fagiolo =
1
𝑁
βˆ‘
1
2
βˆ‘ (𝑀𝑖𝑗
1
3
+ 𝑀𝑗𝑖
1
3
) (π‘€π‘–π‘˜
1
3
+ π‘€π‘˜π‘–
1
3
) (π‘€π‘—π‘˜
1
3
+ π‘€π‘˜π‘—
1
3
)𝑗,π‘˜
2(𝑑𝑖
π‘‘π‘œπ‘‘
(𝑑𝑖
π‘‘π‘œπ‘‘
βˆ’ 1) βˆ’ 2𝑑𝑖
↔)
𝑖
The numerator entails the geometric mean of the weighted in-degrees and out-degrees of a node to two others and
the two weighted, directed edges between these. The denominator is similar to that of Watts and Strogatz (1998)
where the node degree π‘˜π‘– is replaced with the total node degree 𝑑𝑖
π‘‘π‘œπ‘‘
consisting of the weighted sum of the in- and
out-degrees of a node. Thereby, bilateral degrees that were already recognized in the first part of the denominator
are subtracted. The local measure can then be globalized over the entire network. After an empirical application
the weights - similar to Holme et al. (2007), Onnela et al. (2005) as well as Zhang and Horvath (2005) – are
rescaled over a maximum weight value. Squartini et al. (2011) reuse the formula given by Fagiolo (2007) and
replace the respective weight with a differently rescaled weight to wash away trends in their specific example of
the International Trade Network. Tabak et al. (2014) expand the four clustering coefficients of each of the patterns
introduced by Fagiolo (2007) by attributing weights prior to combining them in one single formula.
Opsahl and Panzarasa (2009) use the cycle-, middlemen-, in- and out-approach as well to classify the various
triads, and apply their approach to a vast range of networks, such as acquaintance and relationship networks,
neural networks, organizational networks, networks of political support and networks of interaction through
messages. Contrary to Fagiolo (2007), they base their formula highly on triadic transitivity as seen in Holland and
Leinhardt (1971) and Wasserman (1994). This is relevant, since triads in the form of a cycle aren’t seen as
transitive and therefore not clustered. Furthermore, triads that only contain in- or out-degrees from a focal actor’s
perspective are also not seen as transitive. A straightforward formula is not given. Instead, a framework is provided
in their paper to assess the formula. This paper summarizes this assessment as the following.
[ 12 ]
𝐢Opsahl Panzarasa =
βˆ‘ 𝑀𝑖𝑗 π‘€π‘–π‘˜ π‘Žπ‘—π‘˜π‘–,𝑗,π‘˜
βˆ‘ 𝑀𝑖𝑗 π‘€π‘—π‘˜π‘–,𝑗,π‘˜
Because the formula notes the sum of all 𝑗 as well as the sum of all π‘˜, which are considered as variable subscripts
rather than node descriptions a further distinction between triadic relations as seen in Opsahl and Panzarasa (2009)
is unnecessary. While this formula is the first global clustering measure that can be applied to weighted and
14
directed networks, it lacks considering the weighted value of the third connecting edge in the numerator, as seen
in Phan et al. (2013).
Despite this shortcoming a further advancement in the formula is given. Notice this summarized version of the
clustering coefficient from Opsahl and Panzarasa (2009) differs from the summarized version offered in the paper
of Phan et al. (2013). The discrepancy lies within the subscripts of the denominator. Researchers prior to and even
after the findings of Opsahl and Panzarasa (2009), assess the denominator of the formula either with node-degrees
or with an abstracted version of 𝑀𝑖𝑗 π‘€π‘–π‘˜. While their great findings for the most part go unnoticed, Opsahl and
Panzarasa (2009) achieve a precise acknowledgement of denominator in the formula and accordingly provide a
formula for the correct intended tendency of β€œthe friend of my friend is my friend”. The summarized version
provided by Phan et al. (2013) doesn’t include this consideration. Table 3 shows the extent of this discrepancy.
Table 3: Opsahl, Panzarasa’s (2009) Clustering Coefficient Denominator Differences
Graphical and written
representation of the
tendency
Is my friend the friend of my friend? Is the friend of my friend my friend?
Weights in denominator of
the clustering coefficient
𝑀𝑖𝑗 π‘€π‘–π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜
Triad census for weighted
directed networks
Numerator
𝑀 𝐴𝐡 𝑀𝐴𝐢 π‘Ž 𝐡𝐢 2,3,1
𝑀 𝐴𝐢 𝑀𝐴𝐡 π‘Ž 𝐢𝐡 3,2,0
𝑀 𝐡𝐴 𝑀 𝐡𝐢 π‘Ž 𝐴𝐢 0,1,1
𝑀 𝐡𝐢 𝑀 𝐡𝐴 π‘Ž 𝐢𝐴 1,0,0
𝑀 𝐢𝐴 𝑀 𝐢𝐡 π‘Ž 𝐴𝐡 0,0,1
𝑀 𝐢𝐡 𝑀 𝐢𝐴 π‘Ž 𝐡𝐴 0,0,0
Denominator
𝑀 𝐴𝐡 𝑀𝐴𝐢 2,3
𝑀 𝐴𝐢 𝑀𝐴𝐡 3,2
𝑀 𝐡𝐴 𝑀 𝐡𝐢 0,2
𝑀 𝐡𝐢 𝑀 𝐡𝐴 2,0
𝑀 𝐢𝐴 𝑀 𝐢𝐡 0,0
𝑀 𝐢𝐡 𝑀 𝐢𝐴 0,0
Numerator
𝑀 𝐴𝐡 𝑀 𝐡𝐢 π‘Ž 𝐴𝐢 2,1,1
𝑀 𝐴𝐢 𝑀 𝐢𝐡 π‘Ž 𝐴𝐡 3,0,1
𝑀 𝐡𝐴 𝑀 𝐴𝐢 π‘Ž 𝐡𝐢 0,3,1
𝑀 𝐡𝐢 𝑀 𝐢𝐴 π‘Ž 𝐡𝐴 1,0,0
𝑀 𝐢𝐴 𝑀𝐴𝐡 π‘Ž 𝐢𝐡 0,2,0
𝑀 𝐢𝐡 𝑀 𝐡𝐴 π‘Ž 𝐢𝐴 0,0,0
Denominator
𝑀 𝐴𝐡 𝑀 𝐡𝐢 2,1
𝑀 𝐴𝐢 𝑀 𝐢𝐡 3,0
𝑀 𝐡𝐴 𝑀 𝐴𝐢 0,3
𝑀 𝐡𝐢 𝑀 𝐢𝐴 1,0
𝑀 𝐢𝐴 𝑀𝐴𝐡 0,2
𝑀 𝐢𝐡 𝑀 𝐡𝐴 0,0
COpsahl Panzarasa according to Phan et al. (2013)
𝐢 =
(2 βˆ— 3 βˆ— 1)
(2 βˆ— 3) + (3 βˆ— 2)
= 0,5
COpsahl Panzarasa according to this paper
𝐢 =
(2 βˆ— 1 βˆ— 1)
(2 βˆ— 1)
= 1
Equation [ 12 ] concludes the collection of proposed clustering coefficient formulae found in prior literature. Since
this paper also deems the formulation of structural balance as an equal measure of equating triadic closure, selected
studies on signed networks that intend to deliver clustering results in said networks are presented in the following
section.
SIGNED NETWORKS
Assessing the clustering tendency in signed networks has become increasingly important, since weighted data
with both positive as well as negative connotations is accessible. In regards to social network analyses, signed
networks are for example networks of friends and enemies, or partners and competitors. Beyond mere social
networks, network mash-ups, such as networks of products and customers, are also interesting grounds to assess
the clustering with positive and negative weights, given the provision of an actor’s likes or dislikes of certain
15
products. The aforementioned Theory of Structural Balance is used to assess the clustering in signed binary
networks, in which the clusterability of a triad is dependent on its three arithmetic signs (Heider 1946).
Kunegis et al. (2009) provide clustering insight in signed, directed networks in their paper of the analysis of the
Slashdot Zoo, a technology where users can mark other users as a friend or foe. Thereby, the researchers
acknowledge that the product of two directed edges is the sign of the other directed edge. This approach is identical
to the Theory of Structural Balance for strongly balanced graphs, and is directly applied to directed networks.
Assessing the clustering coefficient in signed networks is common in prior literature. However, must studies don’t
state their exact methodology of doing so. Furthermore, many pieces of prior literature often simplify their
collected data in order to apply certain mathematical analysis measures. For example, Szell et al. (2010) exclude
weighted edges in their analysis of a further friend-enemy network, even though the strength of the interaction
between each player is given. The paper of Szell and Thurner (2012) provides a weighted clustering coefficient
in the form of private messages as an extra and separate result to compare the clustering of friend and of enemy
networks, providing insight that the interaction between positive networks, is larger than that of negative ones.
The assessment of the clustering of friend and enemy networks is thereby measured as an unweighted network.
In sum, formula improvements for including signed values in the clustering coefficient are scarce in prior research.
However, assessing clustering results in such networks is very common. With this in mind, the research trends
assessed from this literature review are given in the following section.
RESEARCH TRENDS
In the development of the clustering coefficient various trends are given. Firstly, the formulae tend to stray away
from using node-degree in the denominator and instead focus on a triadic approach. Moreover, the sensitivity
issues in regard to globalized local coefficients are resolved, as the most recent formulae use global measures. In
addition, the triadic value is no longer assessed with rescaled maximum values, instead the geometric mean is
used. Lastly, acknowledging weightedness and directedness is becoming increasingly important in the assessment.
Given these advancements, the development of the clustering coefficient is still within its early stage of research,
as many limitations are given. The following summarizes these.
LIMITATIONS
This paper summarizes three main limitations in regard to the clustering coefficients of prior research, namely a
lack of clustering coefficient combinations, a lack of comparability in the form of micro-comparability, and a lack
of comparability in the form of macro-comparability. These are presented in the following.
First of all, the different coefficients entail various advancements but also shortcomings that are distributed
disproportionately among the findings. For example, in the stream of weighted clustering coefficients the
acknowledgement of the third connecting edge is taken into consideration in the formula’s numerator, specifically
it can also regard this edge relatively to the two other edges (Phan et al. 2013). The research stream of weighted,
directed networks lacks this acknowledgement. However, here the denominator is assessed correctly (Opsahl,
Panzarasa 2009), which is not found in the research stream of mere weighted networks. A combination of both of
these advancements without their shortcomings has yet to have been provided in current literature. Therefore,
further development of the formula is necessary in order to benefit from prior literature’s findings as well as to
eliminate shortcomings thereof.
The issue of micro-comparability illustrates that recent formula developments of the clustering coefficient don’t
always allow for a comparability of the same graph with different relational properties. This is mostly due to the
fact that the developed clustering coefficients that implement relational properties result in a value of 𝐢 = 1 for a
16
completely connected graph. This value of 𝐢 = 1 indicates that a graph has reached the highest form of clustering
possible. In regard to binary networks, this result is justifiable for a completely connected graph, because only the
existence of ties is taken into consideration. Clustering coefficients with relational attributes should, however,
distinguish between clusters with equally weighted edges and clusters with differently weighted edges. This
approach is comparable to the original tendency developed for binary networks, i.e. β€œthe friend of my friend is my
friend”, where each edge is identical to one another, weighted with the value 𝑀 = 1. Prior research, however,
often equates this ratio mentioned above with the ratio that β€œthe best friend of my friend is my acquaintance”. This
varies from the originally proposed tendency, since the edges are not equally weighted, therefore shouldn’t by
definition reach the value of 𝐢 = 1 and thus not the highest form of clustering possible. Because this paper
acknowledges a sufficient difference between the two ratios mentioned above, room for improvement of the
clustering coefficient formula is available, namely by overcoming this limitation of micro-comparability. Table 4
depicts the limitation in an extreme setting. Notice how network c) resembles network d) the most, yet their
generalized clustering results according to prior literature are polar opposite.
Table 4: Micro-Comparability of Clustered Triads
CC According to Prior Literature: C = 1 C = 1 C = 1 C = 0 C = 1
CC According to this Paper: C = 1 C = 0.4629 C = 0.2203 C = 0 C = 1
On a further note, large-scale networks have gained in popularity within the past years of network research, since
large amounts of data can be acquired easily and used to conduct real-world analyses as opposed to depict mere
generalizations of or approaches to real-world problems. The acquired data not only gives insight towards whether
individuals are connected or not, but also towards the relational manner of the connection by attributing weights,
both positive and negative, as well as directedness. This occurrence calls for macro-comparability of the clustering
coefficient, i.e. the ability to compare different social networks indifferent to their relational and structural
properties. While most recent discoveries are able to compare networks regardless of their weightedness or
directedness, prior research has yet to acknowledge negative and positive weights in their formula. Clustering in
terms of the formulation of balance (Davis 1967) is becoming increasingly relevant, since data regarding both
likes and dislikes of an individual is easily acquirable. The development of a single formula that can be used in
all types of networks can relinquish the need for having multiple versions of the clustering coefficient and thus
offer macro-comparability of the clustering formation across all types of networks.
3. FORMULA EXTENSION
With the knowledge gained from prior research towards the development of the clustering coefficient, the
limitations acknowledged in the previous section, will now be addressed in this new approach for analyzing the
formation of clustering. Thereby, this paper differentiates between equally weighted clusters and clusters with
relational variance, offering micro-comparability. Furthermore, the assessment is geared towards the formation of
clusters in networks indifferent to their relational and structural properties, thereby offering macro-comparability.
17
3.1. BASIS
The fundamental basis of the proposed formula extension is derived from the global clustering coefficient
developed by Newman et al. (2001), where the numerator embodies three times the number of all closed triangles
and the denominator all open or closed triangles. By focusing on each triadic relation rather than the triangles, the
value of the corresponding weights can determine if a triangle is formed or not. Each triadic relation is assessed,
thereby alleviating the numerator of the factor three. The triadic value is assessed by extracting the geometric
mean of the weighted value of the triadic relations. This paper then incorporates the approach of Phan et al. (2013),
in which the third connecting edge is explicitly acknowledged as relative. Unlike the approach of Phan et al.
(2013), this paper purposely disregards the third connecting edge in the denominator, which was only introduced
to normalize the equation and ensure 𝐢 = 1 for completely connected graphs. By focusing on the original
clustering coefficient, one can notice that the numerator, like the proposed formula, acknowledges the occurrence
of closed triples, however the denominator neglects these purposely. By doing so, an accurate ratio of closed triads
to connected triads can be equated. In the proposed formula, the variable 𝑀̅, represents the value assessment of a
closed triad with the third connecting edge seen as relative and the variable 𝑣̅ the value assessment of all triads,
resulting in the clustering coefficient being the ratio of the total sums of each. Thereby, the denominator is based
on that of Opsahl and Panzarasa (2009), which is strongly based on the Transitivity Theory of Wasserman (1994).
The basis for the formula extension can, therefore, be read as follows.
[ 13 ]
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ 𝑣̅𝑖,𝑗,π‘˜
3.2. MICRO-COMPARABILITY
FORMULA: WEIGHTED OR DIRECTED NETWORKS
The aforementioned limitation of micro-comparability is subjected in weighted networks. This revolves around
the fact that the following statements are to an extent seen as equally clustered: β€œthe friend of my friend is my
friend” versus β€œthe best friend of my friend is my acquaintance”. Due to the great preliminary work of the
presented researchers the adjustment of the formula is merely a tweak. For weighted networks, we can thus
expand 𝑀̅ to depict the triadic relation in relation to the relative weight of the third connecting edge. Appropriately,
the variable 𝑣̅ is the triadic value assessment of any and all triads without respect to its connecting edge. Thereby,
the subscript of the weights is aligned to the Transitivity Theory (Wasserman 1994) and only allows for transitive
triads. Ergo, this formula can be implemented in directed networks, as well.
[ 14 ]
𝐢 =
βˆ‘ √√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ βˆ— √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
𝑖,𝑗,π‘˜
βˆ‘ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜π‘–,𝑗,π‘˜
EXAMPLE
To show the impact of this development in weighted networks the following example is used (Refer to Figure 1).
While the clustering coefficients found in previous literature, for example as seen in Phan et al. (2013), embrace
the two networks below as equally clustered, this paper proposes the contrary. According to this paper, the
clustering coefficient of network a) equals 𝐢 = 0.98. For a comparison, the clustering coefficient of the cluster in
network b) is 𝐢 = 1. Both are high, however, the coefficient now allows for comparisons, as seen in Table 5.
18
As seen in Table 4, when subjected to extreme settings an equal result of 𝐢 = 1 can be misleading. Take the
example of network b) in this table and imagine that nodes 𝐡 and 𝐢 work closely together and their weighted
edge, measured through e-mail transfer, is very large, e.g. 𝑀 𝐡𝐢 = 10000. If node 𝐴 were to send out an e-mail
broadcast, including nodes 𝐡 and 𝐢 as recipients, previous literature would render this digraph of the network with
a clustering coefficient value of 𝐢 = 1, even though 𝐴 might barely know the other two. While the clustering
coefficient for binary networks addresses the mere existence of three edges per triple, for weighted networks the
mere existence of edges as seen in binary networks shouldn’t suffice. Instead, like binary networks, these edges
should be equal to form an ideal cluster. With the newly proposed formula, the original intended ratio β€œthe friend
of my friend is my friend” is kept. However in addition, this equation also differentiates its results from the
following statement: β€œthe best friend of my friend is my acquaintance”. The limitation of micro-comparability is,
therefore, resolved, and thus fulfills the first hypothesis by differentiating between equally weighted clusters and
clusters with relational variance.
Figure 1: Weighted, Undirected Networks
Table 5: Triad Census of Graphs in Figure 1)
𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
𝑣̅ = √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
𝑀̅
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ 𝑣̅𝑖,𝑗,π‘˜
A B C 1 3 2 1.732050808 1.817120593 1.774075869
A C B 2 3 1 2.449489743 1.817120593 2.109743646
B A C 1 2 3 1.414213562 1.817120593 1.603058510
B C A 3 2 1 2.449489743 1.817120593 2.109743646
C A B 2 1 3 1.414213562 1.817120593 1.603058510
C B A 3 1 2 1.732050808 1.817120593 1.774075869
Network a): C = 0.9805431
𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
𝑣̅ = √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
𝑀̅
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ 𝑣̅𝑖,𝑗,π‘˜
A B C 2 2 2 2 2 2
A C B 2 2 2 2 2 2
B A C 2 2 2 2 2 2
B C A 2 2 2 2 2 2
C A B 2 2 2 2 2 2
C B A 2 2 2 2 2 2
Network b): C = 1.0000000
3.3. MACRO-COMPARABILITY
The issue of macro-comparability suggests that the developed formula should be able to be implemented in all
types of networks indifferent to their relational properties. The proposed formula above can only be used in
weighted or directed networks. When implementing weights and directions into this equation things become
problematic. Imagine the following two graphs presented in Figure 2.
19
While both of these are proven to be clustered with the clustered transitive triad being 𝐡𝐴𝐢, the results rendered
seem erroneous (Refer to Table 6). The top table refers to graph a) and results in an expected clustering coefficient
of 𝐢 = 0.97. Graph b), however, calculates a clustering coefficient of 𝐢 = 1.03. The rationale for this is because
the geometric mean of all edges is higher than the geometric mean of the examined open triad. This paper,
however, insists on a valid concept. Therefore, the co-ordinate systems of the C-curves are taken into account.
Figure 2: Weighted, Directed Networks
Table 6: Triad Census of Graphs in Figure 2)
𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
𝑀̅
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ 𝑣̅𝑖,𝑗,π‘˜
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C 4 3 3 3.464101615 3.301927249 3.382042507
B C A 3 0 4 0 0 0
C A B 0 0 0 0 0 0
C B A 0 4 0 0 0 0
Network a): C = 0.9763116
𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
𝑣̅ = √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
𝑀̅
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ 𝑣̅𝑖,𝑗,π‘˜
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C 2 3 3 2.449489743 2.620741394 2.533669111
B C A 3 0 2 0 0 0
C A B 0 0 0 0 0 0
C B A 0 2 0 0 0 0
Network b): C = 1.0343661
Figure 3 shows the co-ordinate systems of the following C-curves. The top graph depicts a generalized clustering
coefficient curve based on prior research, with results varying between 𝐢 = 0 and 𝐢 = 1. The second illustrates
the clustering coefficient curve derived from the formula presented in the previous section. 𝐢 = 1 is given for a
completely connected graph, in terms of transitivity, with equally weighted edges. For a local triad, if the
geometric mean of all three weighted edges is larger than that of the original two evaluated edges, then the
clustering coefficient is 𝐢 > 1 and vice versa for the contrary. Such a result is not sought out. Instead a graph as
seen in the last co-ordinate system is desired, where 𝐢 = 1 is given when the geometric mean of the three weighted
edges is equal to that of the original two evaluated edges. If the geometric mean of all three is smaller, the curve
rises up to the point where the conditions for 𝐢 = 1 is met and falls thereafter.
20
Figure 3: Co-Ordinate Systems of C-Curves
FORMULA: WEIGHTED AND/OR DIRECTED NETWORKS
With regard to the problem statement explained in the previous section the following clustering coefficient
formula is derived.
[ 15 ]
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜π‘–,𝑗,π‘˜
The variable 𝑀̅ is thereby read as follows.
For (√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ β‰₯ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
β‰₯ 0):
𝑀̅ = √√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ βˆ— √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
For (0 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
):
𝑀̅ = (√√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ βˆ— √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
)
βˆ’1
βˆ— (√ 𝑀𝑖𝑗 π‘€π‘—π‘˜)
2
This paper acknowledges an ideal cluster as a triad with equally weighted edges, corresponding to the original
clustering coefficient for binary networks. The two formulae found in the case differentiation above depict the
possible occurrences of the closed triadic relations at hand, with √wijwjkwik
3
being either smaller than / equal to
or larger than the triadic relation without respect to the relative connecting edge. For the first case presented in the
case differentiation, the same concept from section 3.2.1 is used. For the second case, the cross fracture (𝑐𝑓) of
the clustering coefficient is taken into account and regards the occurrence that the weighted value assessment of
all three edges is larger than that of the evaluated triad without respect to its connecting edge and is larger than
C=1
C=1
C=1
√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
= √ 𝑀𝑖𝑗 π‘€π‘—π‘˜
√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
< √ 𝑀𝑖𝑗 π‘€π‘—π‘˜
√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
> √ 𝑀𝑖𝑗 π‘€π‘—π‘˜
√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
= √ 𝑀𝑖𝑗 π‘€π‘—π‘˜
√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
< √ 𝑀𝑖𝑗 π‘€π‘—π‘˜
√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
= √ 𝑀𝑖𝑗 π‘€π‘—π‘˜
√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
β‰  √ 𝑀𝑖𝑗 π‘€π‘—π‘˜
21
zero. An appropriate measurement is then equated for the variable 𝑀̅ in this case. The mathematical assessment
of the variable 𝑀̅ for the second case is given as follows. Thereby the variables are simplified with a as the assessed
value of connected triads and b as the assessed value of closed, transitive triads.
For (√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ = 𝒂 β‰₯ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
= 𝒃 β‰₯ 0):
ο‚· 𝐢 =
𝑀̅
𝑣̅
=
βˆšπ‘Žβˆ—π‘
π‘Ž
with 𝑀̅ = βˆšπ‘Ž βˆ— 𝑏 and 𝑣̅ = π‘Ž
For (0 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ = 𝒂 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
= 𝒃):
ο‚· 𝐢𝑐𝑓 = (
𝑀̅
𝑣̅
)βˆ’1
=
𝑣̅
𝑀̅
with 𝑀̅ = βˆšπ‘Ž βˆ— 𝑏 and 𝑣̅ = π‘Ž
ο‚· 𝐢𝑐𝑓 =
π‘Ž
βˆšπ‘Žβˆ—π‘
βˆ—
π‘Ž
π‘Ž
=
π‘Ž2 βˆ— (βˆšπ‘Žβˆ—π‘)βˆ’1
a
ο‚· 𝐢𝑐𝑓 =
𝑀̅
𝑣̅
with 𝑀̅ = π‘Ž2
βˆ— (βˆšπ‘Ž βˆ— 𝑏)βˆ’1
and 𝑣̅ = π‘Ž
Clustering Coefficient C:
ο‚· 𝐢 =
𝑀̅
𝑣̅
with 𝑀̅ = βˆšπ‘Ž βˆ— 𝑏 for (√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ = 𝒂 β‰₯ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
= 𝒃 β‰₯ 0)
with 𝑀̅ = π‘Ž2
βˆ— (βˆšπ‘Ž βˆ— 𝑏)βˆ’1
for (0 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ = 𝒂 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
= 𝒃)
With the proposed formula the desired C-co-ordinate system is achieved. Furthermore, the formula offers macro-
comparability in weighted and/or directed networks and can be used in binary networks, as well.
EXAMPLE
For comparative reasons the same example used in Figure 2 is used here, as well (Refer to Figure 4). The given
formula now allows for micro-comparability in a macro-comparable setting of both weighted and/or directed
networks, which is shown in Table 7. Contrary to Table 6, the clustering coefficient results vary between 𝐢 = 0
and 𝐢 = 1. The value 𝐢 = 1 is given for a fully connected graph with equally weighted edges, comparable to the
original clustering ratio in binary networks from Newman et al. (2001).
Figure 4: Weighted, Directed Networks
22
Table 7: Triad Census of Graphs in Figure 4)
𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
𝑀̅
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ 𝑣̅𝑖,𝑗,π‘˜
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C 4 3 3 3.464101615 3.301927249 3.382042507
B C A 3 0 4 0 0 0
C A B 0 0 0 0 0 0
C B A 0 4 0 0 0 0
Network a): C = 0.9763116
𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
𝑣̅ = √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
3
𝑀̅
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ 𝑣̅𝑖,𝑗,π‘˜
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C 2 3 3 2.449489743 2.620741394 2.368107175
B C A 3 0 2 0 0 0
C A B 0 0 0 0 0 0
C B A 0 2 0 0 0 0
Network b): C = 0.9667757
FORMULA: STRONGLY BALANCED WEIGHTED AND/OR DIRECTED NETWORKS
As already mentioned, the prior literature’s inclusion of negative and positive weights in the clustering coefficient
is scarce at best, even though, a mathematical solution towards the non-acknowledgement of unbalanced triads in
the assessment of cluster formation is easily depicted. As already mentioned, the strong formulation of balance
determines triads with three positive edges as a cluster as well as two negative edges and one positive one. All
other triads aren’t acknowledged as clusters. With small adjustments the current formula can account for these
limitations, as seen below.
[ 16 ]
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
𝑖,𝑗,π‘˜
The denominator 𝑣̅ is constructed so that the arithmetic sign is not a decisive factor. The variable 𝑀̅ reads as
follows:
For (√(𝑀𝑖𝑗 π‘€π‘—π‘˜)
24
β‰₯ √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)
26
β‰₯ 0):
𝑀̅ = √√(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
βˆ— √
𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2
2
3
For (0 < √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
< √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)26
):
𝑀̅ =
(
√√(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
βˆ— √
𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2
2
3
)
βˆ’1
βˆ— (√(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
)
2
23
In order to exclude unbalanced triads in terms of a strong formulation of balance from being counted as clusters
the following notation to the second factor of 𝑀̅ was added. If 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ is made up of three positive edges or two
negative and one positive edges, then the numerator will result in 2 βˆ— 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜. This then gets divided by 2,
resulting in 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜, equal to the equation depicted before implementing the aspect of balance. If, however,
𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ has three negative edges or one negative and two positive edges the numerator will show 0, since these
are not considered to be clusters. In order to account for the inclusion of negative weights in the rest of the formula
the paper squared the products and raised the route to the forth degree, with no impact on the results.
EXAMPLE
The following example includes clusters balanced according to the strong formulation of balance. Network a) is
seen as balanced since the product of the edges is positive. The contrary can be said about network b). As seen in
Table 8, the clustering coefficient of network a) is the same as that for network a) from Figure 4. Despite its
transitive triadic closure, network b) gives a clustering coefficient of 𝐢 = 0, because the network is not strongly
balanced.
Figure 5: Signed, Directed, Weighted Networks
Table 8: Triad Census of Graphs in Figure 5)
𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
𝑣̅ = √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 √
𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2
2
3
𝑀̅
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ 𝑣̅𝑖,𝑗,π‘˜
A B C 0 -3 -3 0 0 0
A C B -3 0 0 0 0 0
B A C 4 -3 -3 3.464101615 3.301927249 3.382042507
B C A -3 0 4 0 0 0
C A B 0 0 0 0 0 0
C B A 0 4 0 0 0 0
Network a): C = 0.9763116
𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
𝑣̅ = √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 √
𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2
2
3
𝑀̅
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ 𝑣̅𝑖,𝑗,π‘˜
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C -2 3 3 2.449489743 0 0
B C A 3 0 -2 0 0 0
C A B 0 0 0 0 0 0
C B A 0 -2 0 0 0 0
Network b): C = 0.0000000
FORMULA: WEAKLY BALANCED WEIGHTED AND/OR DIRECTED NETWORKS
Similar to the clustering coefficient in the strongly balanced networks the adjustment is a mere tweak to allow for
the occurrence of three negative edges in a cluster. The proposed solution is as follows.
24
[ 17 ]
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
𝑖,𝑗,π‘˜
The denominator remains the same. The variable 𝑀̅ is read as follows.
For (√(𝑀𝑖𝑗 π‘€π‘—π‘˜)
24
β‰₯ √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)
26
β‰₯ 0):
𝑀̅ =
√
√(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
βˆ—
(
√
𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2
2
3
βˆ’
√
(𝑀𝑖𝑗 βˆ’ √(𝑀𝑖𝑗)
2
) βˆ— (π‘€π‘—π‘˜ βˆ’ √(π‘€π‘—π‘˜)
2
) βˆ— (π‘€π‘–π‘˜ βˆ’ √(π‘€π‘–π‘˜ )2)
8
3
)
For (0 < √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
< √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)26
):
𝑀̅ =
(√
√(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
βˆ—
(
√
𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2
2
3
βˆ’
√(𝑀𝑖𝑗 βˆ’ √(𝑀𝑖𝑗)
2
) βˆ— (π‘€π‘—π‘˜ βˆ’ √(π‘€π‘—π‘˜)
2
) βˆ— (π‘€π‘–π‘˜ βˆ’ √(π‘€π‘–π‘˜ )2)
8
3
) )
βˆ’1
βˆ— (√(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
)
2
To the right of the last discussed adaption an expansion is given. By doing so, the equation tests, if all edges are
negative. If they are, each part in the brackets of the expansion will result in βˆ’2 𝑀. The numerator under this
expansion then reads (βˆ’2𝑀𝑖𝑗) βˆ— (βˆ’2π‘€π‘—π‘˜) βˆ— (βˆ’2π‘€π‘–π‘˜). This is afterwards divided by 8 and one is left with the
geometric mean of βˆ’π‘€π‘–π‘— π‘€π‘—π‘˜ π‘€π‘–π‘˜. This negative value is then substracted from the value 0 - the value 0, because
three negative edges were excluded in the minuend. By subtracting the negative value the geometric mean
of 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ remains. If one of these edges were to be positive this portion results in 0. The minuend simply then
checks if only one negative edge exists in the triad (which would also result in 0) or if one or three positive edges
appear. If so, the same steps apply that were formulated in the clustering coefficient for strongly balanced
networks.
EXAMPLE
In the following example network a) is weakly balanced and therefore clusterable. In comparison the result is
equal to the other networks a) with three positive edges (Figure 4) and one positive edge (Figure 5). Network b)
in the following example is not balanced and therefore not a cluster.
Figure 6: Signed, Directed, Weighted Networks
25
Table 9: Triad Census of Graphs in Figure 6)
𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
𝑣̅ = √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
√
𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2
2
3
βˆ’
√
( 𝑀𝑖𝑗
βˆ’ √( 𝑀𝑖𝑗
)
2
) βˆ— ( π‘€π‘—π‘˜
βˆ’ √( π‘€π‘—π‘˜
)
2
) βˆ— ( π‘€π‘–π‘˜
βˆ’ √( π‘€π‘–π‘˜
)2
)
8
3
𝑀̅
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ 𝑣̅𝑖,𝑗,π‘˜
A B C 0 -3 -3 0 0 0
A C B -3 0 0 0 0 0
B A C -4 -3 -3 3.464101615 3.301927249 3.382042
B C A -3 0 -4 0 0 0
C A B 0 0 0 0 0 0
C B A 0 -4 0 0 0 0
Network a): C = 0.9763116
𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜
𝑣̅ = √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24
√
𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2
2
3
βˆ’
√
( 𝑀𝑖𝑗
βˆ’ √( 𝑀𝑖𝑗
)
2
) βˆ— ( π‘€π‘—π‘˜
βˆ’ √( π‘€π‘—π‘˜
)
2
) βˆ— ( π‘€π‘–π‘˜
βˆ’ √( π‘€π‘–π‘˜
)2
)
8
3
𝑀̅
𝐢 =
βˆ‘ 𝑀̅𝑖,𝑗,π‘˜
βˆ‘ 𝑣̅𝑖,𝑗,π‘˜
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C -2 3 3 2.449489743 0 0
B C A 3 0 -2 0 0 0
C A B 0 0 0 0 0 0
C B A 0 -2 0 0 0 0
Network b): C = 0.0000000
Equation [ 17 ] concludes the formula extension proposed by this paper. With this expansion, clustering results
can be emitted in all types of networks, i.e. binary, weighted, directed and/or signed networks. This alleviates
prior literature’s limitation of macro-comparability and, thereby, supports the second hypothesis in this paper.
3.4. SUMMARY
By alleviating the shortcomings found in prior research and combining their findings, the proposed clustering
coefficient is considered to be improved. Thereby, the aspect of relational variance among triads is taken into
consideration and offers micro-comparability within its results. Furthermore, the formula extension acknowledges
different types of networks in which the formula can be implemented in. This acknowledgement delivers macro-
comparability in its result and can therefore be used to assess the formation of clusters in terms of triadic closure
in binary, directed, weighted, and/or signed networks. While the toy examples above support the theoretical
concept, an analysis of a real-world network supports the findings in a more practical context. Therefore, the
following section is dedicated to applying this formula in a real-world network and comparing its results to the
results of formulae from prior research.
4. CLUSTERING COMPARISON
4.1. DATA SET
The data set used to assess the comparisons of the clustering coefficient formulae is from the well-known Zachary
Karate Club network (Zachary 1977). The network is binary and consists of 34 nodes and 78 symmetrical edges.
The data set remains unchanged for the comparison in binary networks and is adjusted accordingly to allow for
comparisons in directed and weighted networks. Figure 7 shows the structure of the network.
26
Figure 7: Zachary Karate Club Graph (Zachary 1977)
4.2. COMPARISON IN UNWEIGHTED NETWORKS
Table 10: Zachary Karate Club Adjacency Matrix (Zachary 1977)
The matrix above is the adjacency matrix from the Zachary Karate Club data set (Zachary 1977). This shows the
data set used in the following comparison of an undirected, unweighted network. The clustering coefficient
rendered from this paper equals 𝐢 π‘…π‘’π‘π‘–π‘›π‘œ = 0.22277. The comparable coefficient for binary networks from
Newman et al. (2001) provides the same result of 𝐢 π‘π‘’π‘€π‘šπ‘Žπ‘› 𝑒𝑑 π‘Žπ‘™. = 0.22277 (Newman et al. 2001). This is
expected since the newly proposed formula simplified for binary networks, is merely the weighted values of
triangles over the weighted values of open triads. In the following the matrix for a comparison in directed
unweighted networks is rendered.
27
Table 11: Unweighted, Directed Network Based on the Zachary Karate Club
The matrix above is the same as in Table 10, with the exception that the matrix is no longer symmetric. Here, the
values below the diagonal were removed or left alone at random. By doing so, the structure of the graph remains
the same, where all nodes that were originally connected to one another still are. The property of directedness can
now, however, be assessed. The clustering coefficient based on this paper’s proposal is 𝐢 π‘…π‘’π‘π‘–π‘›π‘œ = 0.19914. The
comparable formula for directed networks is given by Opsahl and Panzarasa (2009). Their result
𝐢 π‘‚π‘π‘ π‘Žβ„Žπ‘™ π‘ƒπ‘Žπ‘›π‘§π‘Žπ‘Ÿπ‘Žπ‘ π‘Ž = 0.19914 is identical to results found in this paper, given the lack of relational variance among
the triads. Because the network is unweighted, each of the edges is seen as identical to one another. The impact
of the formula proposed in this paper, is first recognizable, when weights are attributed to the edges. This is
provided in the following section.
4.3. COMPARISON IN WEIGHTED NETWORKS
Table 12: Weighted, Undirected Network Based on the Zachary Karate Club
28
The above matrix is based on the matrix of the Zachary Karate Club (Zachary 1977). Here, however, the
adjacencies are multiplied by a random factor between 1 and 10 resulting in the assessed weights. The paper
assesses the weighted matrix with a clustering coefficient of 𝐢 π‘…π‘’π‘π‘–π‘›π‘œ = 0.21565. The clustering coefficient by
Opsahl and Panzarasa (2009) provides a result of 𝐢 π‘‚π‘π‘ π‘Žβ„Žπ‘™ π‘ƒπ‘Žπ‘›π‘§π‘Žπ‘Ÿπ‘Žπ‘ π‘Ž = 0.25861.
In comparison, to the unweighted, undirected version of the matrix, the clustering results according to this paper
decrease, since the weighted version offers a case of relational variance among the triads. This deems the clustered
triads as non-ideal, since the three weights of a given triangle are not equally weighted, as in the case of the merely
directed matrix, in which each weight is valued as 𝑀 = 1. Hence, the assessment in the weighted, undirected
network delivers smaller results. On the contrary, the results according to Opsahl and Panzarasa (2009) increase
when implementing weights.
Because the aspect of directedness is not yet added, it is of interest to also compare the results based on the formula
by Phan et al. (2013). This returns a result of 𝐢 π‘ƒβ„Žπ‘Žπ‘› 𝑒𝑑 π‘Žπ‘™. = 0.23537, which is also higher than the clustering result
according to this paper. Because the proposed formula moderates the result by taking the aspect of relational
variance into consideration, the coefficient is smaller than the results from formulae in prior literature.
Table 13: Weighted, Directed Network Based on the Zachary Karate Club
The weight matrix above not only considers the aspect of differently weighted edges, but also incorporates the
aspect of directedness added in the previous section of this paper. This paper determines the clustering coefficient
of the weighted, directed graph to be 𝐢 π‘…π‘’π‘π‘–π‘›π‘œ = 0.18714. In comparison to clustering coefficient results according
to Opsahl and Panzarasa (2009), namely 𝐢 π‘‚π‘π‘ π‘Žβ„Žπ‘™ π‘ƒπ‘Žπ‘›π‘§π‘Žπ‘Ÿπ‘Žπ‘ π‘Ž = 0.22569, the results are once again smaller.
In comparison, to the unweighted, directed version of the matrix, the clustering results according to this paper
decrease, since the weighted version offers a case of relational variance among the triads. This deems the clustered
triads as non-ideal, since the three weights of a given triangle are not equally weighted, as in the case of the merely
directed matrix, where each weight is assessed as 𝑀 = 1. Hence, the assessment in the weighted, directed network
also delivers smaller results. On the contrary, the results according to Opsahl and Panzarasa (2009) increase when
implementing weights, distorting its interpretation.
As foreseen, due to the moderation of the formula for the case of relational variance, the proposal offered in this
paper emits a smaller value than those found in prior literature, thereby supporting this paper’s third hypothesis.
29
5. CONCLUSION
5.1. LIMITATIONS
While improvements of the clustering coefficient are made, this paper is still subjected to certain limitations. First
of all, the data set used is relatively small with only 34 nodes. In a more extensive study the formula could be
tested more thoroughly. Furthermore, the data sets for directed and weighted networks are only based on the real
network of the Zachary Karate Club. The relational properties within were randomized. The formula can be
validated more efficiently in the analysis of real-world weighted and directed networks.
Secondly, including signed networks in social network analyses isn’t common. The data collection process on a
personal level, e.g. in the form of questionnaires or surveys, often shies away from assessing negative relations.
However, while signed networks aren’t common today, the advancements in technology and the automatic
assessment of negative relations shows a future need for such. However, only time will tell whether this prognosis
is valid or not.
5.2. FUTURE IMPLICATIONS
The proposed formula offered combines the advancements found in prior research of the clustering coefficient
and alleviates the equations of their shortcomings. However, the consideration of micro-comparability and macro-
comparability embodies the true advancement in this paper in regard to future implications.
Addressing the limitation of micro-comparability alleviates the need for using cut-off measures to eliminate
insignificant values. By doing so, comparisons between networks can be assessed more precisely and all gathered
data can be implemented. The insignificant cut-off values that would have been disregarded in the past now
provide an accordingly insignificant increase or decrease in the clustering coefficient result. This not only saves
time in the network analysis but also provides a more qualitative result. For example, two graphs, completely
identical with the exception of one insignificant edge, can be compared, resulting in an almost identical yet still
comparable result. Furthermore, the aspect of micro-comparability allows researchers to directly assess the
clustering in completely connected graphs.
In regard to the consideration of macro-comparability in the formula, future researchers can gather all data needed
without worrying about having the necessary mathematical measures to analyze said data. Moreover, the analysis
can be conducted in all types of graphs regardless of the relational properties, thereby offering comparability
between them. Especially, given the assessment of negative edges in online social networking sites, it is
foreseeable that this formula will be used in such environments, thereby depicting the clustering of these networks
more precisely. For example, the amount of dislikes and likes of Youtube comments could be used to assess the
clustering of Youtubers in this directed, weighted, and signed network.
5.3. SUMMARY
Along the lines of this paper, prior literature of the clustering coefficient is reviewed. Thereby, the limitations in
the form of a lack of combinations between the coefficients as well as a lack of micro- and macro-comparability
are uncovered. These are addressed and a formula expansion is proposed to overcome said limitations. The
formula is tested on the real-world network of the Zachary Karate Club (Zachary 1977) and the results show a
more precise clustering assessment than the clustering coefficients discovered in prior literature. The proposed
findings imply alleviating the formula of cut-off measures, assessing the clustering formation in completely
connected networks, and assessing the clustering in all types of networks regardless of their structural and
relational characteristics.
30
6. PUBLICATION BIBLIOGRAPHY
Barnes, J. A. (1969): Networks and political process. In Social networks in urban situations, pp. 51–76.
Barrat, A.; Barthelemy, M.; Pastor-Satorras, R.; Vespignani, A. (2004): The architecture of complex weighted
networks. In Proceedings of the National Academy of Sciences of the United States of America 101 (11),
pp. 3747–3752.
Berkhin, P. (2006): A Survey of Clustering Data Mining Techniques. In Grouping multidimensional data,
pp. 25–71.
Boccaletti, S.; Latora, V.; Moreno, Y.; Chavez, M.; Hwang, D. (2006): Complex networks. Structure and
dynamics. In Physics Reports 424 (4-5), pp. 175–308.
Davis, J. A. (1967): Clustering and structural balance in graphs. In Human Relations 20, pp. 181–187.
EngΓΈ-Monsen, K.; Canright, G. (2011): Weighted Clustering Coefficients. Telenor. Oslo (Telenor Report R5).
Fagiolo, G. (2007): Clustering in complex directed networks. In Physical Review E 76 (2), p. 26107.
Fortunato, S. (2010): Community detection in graphs. In Physics Reports 486 (3-5), pp. 75–174.
Girvan, M.; Newman, M. E. J. (2002): Community structure in social and biological networks. In Proceedings
of the National Academy of Sciences of the United States of America 99 (12), pp. 7821–7826.
Grindrod, P. (2002): Range-dependent random graphs and their application to modeling large small-world
proteome datasets. In Physical Review E 66 (6), p. 66702.
Heider, F. (1946): Attitudes and cognitive organization. In Journal of Psychology 21, pp. 107–112.
Holland, P. W.; Leinhardt, S. (1971): Transitivity in structural models of small groups. In Comparative Group
Studies, pp. 107–124.
Holme, P.; Park, S. M.; Kim, B. J.; Edling, C. R. (2007): Korean university life in a network perspective.
Dynamics of a large affiliation network. In Physica A: Statistical Mechanics and its Applications 373, pp. 821–
830.
Kalna, G.; Higham, D. J. (2007): A clustering coefficient for weighted networks, with application to gene
expression data. In Ai Communications 20 (4), pp. 263–271.
Kephart, W. M. (1950): A Quantitative Analysis of Intragroup Relationships. In American Journal of Sociology
55 (6), pp. 544–549.
KivelΓ€, M.; Arenas, A.; Barthelemy, M.; Gleeson, J. P.; Moreno, Y.; Porter, M. A. (2014): Multilayer networks.
In Journal of Complex Networks 2 (3), pp. 203–271.
Kunegis, J.; Lommatzsch, A.; Bauckhage, C. (2009): The Slashdot Zoo. Mining a Social Network with Negative
Edges. In Proceedings of the 18th international conference on World wide web ACM, pp. 741–750.
Latora, V.; Marchiori, M. (2003): Economic small-world behavior in weighted networks. In The European
Physical Journal B - Condensed Matter 32 (2), pp. 249–263.
Newman, M. E. J. (2001): The structure of scientific collaboration networks. In Proceedings of the National
Academy of Sciences 98 (2), pp. 404–409.
31
Newman, M. E. J.; Strogatz, S. H.; Watts, D. J. (2001): Random graphs with arbitrary degree distributions and
their applications. In Physical Review E 64 (2), p. 26118.
Onnela, J. P.; SaramΓ€ki, J.; KertΓ©sz, J.; Kaski, K. (2005): Intensity and coherence of motifs in weighted complex
networks. In Physical Review E 71 (6), p. 65103.
Opsahl, T.; Panzarasa, P. (2009): Clustering in weighted networks. In Social Networks 31 (2), pp. 155–163.
Phan, B.; Engø-Monsen, K.; Fjeldstad, Ø. D. (2013): Considering clustering measures. Third ties, means, and
triplets. In Social Networks 35 (3), pp. 300–308.
SaramΓ€ki, J.; KivelΓ€, M.; Onnela, J. P.; Kaski, K.; Kertesz, J. (2007): Generalizations of the clustering
coefficient to weighted complex networks. In Physical Review E 75 (2), p. 27105.
Schank, T.; Wagner, D. (2005): Approximating Clustering Coefficient and Transitivity. In Journal of Graph
Algorithms and Applications 9 (2), pp. 265–275.
Squartini, T.; Fagiolo, G.; Garlaschelli, D. (2011): Randomizing world trade. II. A weighted network analysis.
In Physical Review E 84 (4), p. 46118.
Szell, M.; Lambiotte, R.; Thurner, S. (2010): Multirelational organization of large-scale social networks in an
online world. In Proceedings of the National Academy of Sciences of the United States of America 107 (31),
pp. 13636–13641.
Szell, M.; Thurner, S. (2012): Social dynamics in a large-scale online game. In Advances in Complex Systems 15
(6), p. 1250064.
Tabak, B. M.; Takami, M.; Rocha, J. M. C.; Cajueiro, D. O. (2014): Directed clustering coefficient as a measure
of systemic risk in complex banking networks. In Physica A: Statistical Mechanics and its Applications 394,
pp. 211–216.
Wasserman, S. (1994): Social network analysis. Methods and applications. In Cambridge university press 8,
pp. 165–243.
Watts, D. J.; Strogatz, S. H. (1998): Collective dynamics of β€˜small-world’ networks. In Nature 393 (6684),
pp. 440–442.
Zachary, W. W. (1977): An information flow model for conflict and fission in small groups. In Journal of
Anthropological Research 33, pp. 452–473.
Zhang, B.; Horvath, S. (2005): A general framework for weighted gene co-expression network analysis. In
Statistical applications in genetics and molecular biology 4 (1), pp. Article 17.
32
7. INDEX
7.1. LIST OF TABLES
Table 1: Triadic Value Assessment (Opsahl, Panzarasa 2009) .............................................................................. 7
Table 2: Clustering in Terms of the Formulation of Balance (Szell et al. 2010).................................................... 7
Table 3: Opsahl, Panzarasa’s (2009) Clustering Coefficient Denominator Differences ...................................... 14
Table 4: Micro-Comparability of Clustered Triads .............................................................................................. 16
Table 5: Triad Census of Graphs in Figure 1) ...................................................................................................... 18
Table 6: Triad Census of Graphs in Figure 2) ...................................................................................................... 19
Table 7: Triad Census of Graphs in Figure 4) ...................................................................................................... 22
Table 8: Triad Census of Graphs in Figure 5) ...................................................................................................... 23
Table 9: Triad Census of Graphs in Figure 6) ...................................................................................................... 25
Table 10: Zachary Karate Club Adjacency Matrix (Zachary 1977) ..................................................................... 26
Table 11: Unweighted, Directed Network Based on the Zachary Karate Club .................................................... 27
Table 12: Weighted, Undirected Network Based on the Zachary Karate Club .................................................... 27
Table 13: Weighted, Directed Network Based on the Zachary Karate Club ........................................................ 28
7.2. LIST OF FIGURES
Figure 1: Weighted, Undirected Networks........................................................................................................... 18
Figure 2: Weighted, Directed Networks............................................................................................................... 19
Figure 3: Co-Ordinate Systems of C-Curves........................................................................................................ 20
Figure 4: Weighted, Directed Networks............................................................................................................... 21
Figure 5: Signed, Directed, Weighted Networks.................................................................................................. 23
Figure 6: Signed, Directed, Weighted Networks.................................................................................................. 24
Figure 7: Zachary Karate Club Graph (Zachary 1977)......................................................................................... 26
33
8. AFFIDAVIT
Eidesstaatliche ErklΓ€rung:
Ich erklÀre mich hiermit gemÀß § 17 Abs. 2 APO, dass ich die vorstehende Bachelorarbeit selbstÀndig verfasst
und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt habe.
________________ ________________________________
(Datum) (Untrschrift)

More Related Content

Similar to Rubino, Nicholas

New Approaches Social Network
New Approaches Social NetworkNew Approaches Social Network
New Approaches Social Networkepokh
Β 
VILLAFRANCA-THESIS-2016
VILLAFRANCA-THESIS-2016VILLAFRANCA-THESIS-2016
VILLAFRANCA-THESIS-2016Eric Villafranca
Β 
A Hierarchical Graph for Nucleotide Binding Domain 2
A Hierarchical Graph for Nucleotide Binding Domain 2A Hierarchical Graph for Nucleotide Binding Domain 2
A Hierarchical Graph for Nucleotide Binding Domain 2Samuel Kakraba
Β 
Using Networks to Measure Influence and Impact
Using Networks to Measure Influence and ImpactUsing Networks to Measure Influence and Impact
Using Networks to Measure Influence and ImpactYunhao Zhang
Β 
The study about the analysis of responsiveness pair clustering tosocial netwo...
The study about the analysis of responsiveness pair clustering tosocial netwo...The study about the analysis of responsiveness pair clustering tosocial netwo...
The study about the analysis of responsiveness pair clustering tosocial netwo...acijjournal
Β 
Icts and society
Icts and societyIcts and society
Icts and societyamgpanizo
Β 
Power Sector Decarbonisation_Metastudy
Power Sector Decarbonisation_MetastudyPower Sector Decarbonisation_Metastudy
Power Sector Decarbonisation_MetastudyCornelia Rietdorf
Β 
Protein-protein interactions-graph-theoretic-modeling
Protein-protein interactions-graph-theoretic-modelingProtein-protein interactions-graph-theoretic-modeling
Protein-protein interactions-graph-theoretic-modelingRangarajan Chari
Β 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)theijes
Β 
Undergraduated Thesis
Undergraduated ThesisUndergraduated Thesis
Undergraduated ThesisVictor Li
Β 
Geometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural NetworksGeometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural NetworksLorenzo Cassani
Β 
Analysis and Simulation of Scienti c Networks
Analysis and Simulation of Scientic NetworksAnalysis and Simulation of Scientic Networks
Analysis and Simulation of Scienti c NetworksFelix Puetsch
Β 
Biber example
Biber exampleBiber example
Biber exampletuxette
Β 
Dynamic extraction of key paper from the cluster using variance values of cit...
Dynamic extraction of key paper from the cluster using variance values of cit...Dynamic extraction of key paper from the cluster using variance values of cit...
Dynamic extraction of key paper from the cluster using variance values of cit...IJDKP
Β 
Annotating Digital Documents For Asynchronous Collaboration
Annotating Digital Documents For Asynchronous CollaborationAnnotating Digital Documents For Asynchronous Collaboration
Annotating Digital Documents For Asynchronous CollaborationClaire Webber
Β 
Structural Analysis of Scientific Research Group in the Chinese Computer Field
Structural Analysis of Scientific Research Group in the Chinese Computer FieldStructural Analysis of Scientific Research Group in the Chinese Computer Field
Structural Analysis of Scientific Research Group in the Chinese Computer Fieldinventionjournals
Β 
microservice analysis elo
microservice analysis elomicroservice analysis elo
microservice analysis elofatimaahmed125567
Β 
How can the use of computer simulation benefit the monitoring and mitigation ...
How can the use of computer simulation benefit the monitoring and mitigation ...How can the use of computer simulation benefit the monitoring and mitigation ...
How can the use of computer simulation benefit the monitoring and mitigation ...BrennanMinns
Β 

Similar to Rubino, Nicholas (20)

New Approaches Social Network
New Approaches Social NetworkNew Approaches Social Network
New Approaches Social Network
Β 
VILLAFRANCA-THESIS-2016
VILLAFRANCA-THESIS-2016VILLAFRANCA-THESIS-2016
VILLAFRANCA-THESIS-2016
Β 
A Hierarchical Graph for Nucleotide Binding Domain 2
A Hierarchical Graph for Nucleotide Binding Domain 2A Hierarchical Graph for Nucleotide Binding Domain 2
A Hierarchical Graph for Nucleotide Binding Domain 2
Β 
Using Networks to Measure Influence and Impact
Using Networks to Measure Influence and ImpactUsing Networks to Measure Influence and Impact
Using Networks to Measure Influence and Impact
Β 
The study about the analysis of responsiveness pair clustering tosocial netwo...
The study about the analysis of responsiveness pair clustering tosocial netwo...The study about the analysis of responsiveness pair clustering tosocial netwo...
The study about the analysis of responsiveness pair clustering tosocial netwo...
Β 
Icts and society
Icts and societyIcts and society
Icts and society
Β 
Power Sector Decarbonisation_Metastudy
Power Sector Decarbonisation_MetastudyPower Sector Decarbonisation_Metastudy
Power Sector Decarbonisation_Metastudy
Β 
Protein-protein interactions-graph-theoretic-modeling
Protein-protein interactions-graph-theoretic-modelingProtein-protein interactions-graph-theoretic-modeling
Protein-protein interactions-graph-theoretic-modeling
Β 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
Β 
Undergraduated Thesis
Undergraduated ThesisUndergraduated Thesis
Undergraduated Thesis
Β 
Geometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural NetworksGeometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural Networks
Β 
Analysis and Simulation of Scienti c Networks
Analysis and Simulation of Scientic NetworksAnalysis and Simulation of Scientic Networks
Analysis and Simulation of Scienti c Networks
Β 
Biber example
Biber exampleBiber example
Biber example
Β 
Dynamic extraction of key paper from the cluster using variance values of cit...
Dynamic extraction of key paper from the cluster using variance values of cit...Dynamic extraction of key paper from the cluster using variance values of cit...
Dynamic extraction of key paper from the cluster using variance values of cit...
Β 
Thesis
ThesisThesis
Thesis
Β 
Annotating Digital Documents For Asynchronous Collaboration
Annotating Digital Documents For Asynchronous CollaborationAnnotating Digital Documents For Asynchronous Collaboration
Annotating Digital Documents For Asynchronous Collaboration
Β 
Structural Analysis of Scientific Research Group in the Chinese Computer Field
Structural Analysis of Scientific Research Group in the Chinese Computer FieldStructural Analysis of Scientific Research Group in the Chinese Computer Field
Structural Analysis of Scientific Research Group in the Chinese Computer Field
Β 
microservice analysis elo
microservice analysis elomicroservice analysis elo
microservice analysis elo
Β 
thesis.compressed
thesis.compressedthesis.compressed
thesis.compressed
Β 
How can the use of computer simulation benefit the monitoring and mitigation ...
How can the use of computer simulation benefit the monitoring and mitigation ...How can the use of computer simulation benefit the monitoring and mitigation ...
How can the use of computer simulation benefit the monitoring and mitigation ...
Β 

Rubino, Nicholas

  • 1. The Clustering Coefficient: A Literature Review and Formula Extension Bachelorarbeit im Studiengang International Information Systems Management der FakultΓ€t Wirtschaftsinformatik und Angewandte Informatik der Otto-Friedrich-UniversitΓ€t Bamberg Verfasser: Nicholas Michael Rubino Gutachter: Prof. Dr. Kai Fischbach
  • 2. 2 TABLE OF CONTENTS TABLE OF CONTENTS.................................................................................................................................................................2 1. INTRODUCTION .................................................................................................................................................................3 1.1. PROBLEM STATEMENT ............................................................................................................................ 3 1.2. RESEARCH QUESTIONS & HYPOTHESES .................................................................................................. 4 1.3. STRUCTURE OF THIS PAPER ..................................................................................................................... 4 1.4. PURPOSE OF THIS PAPER.......................................................................................................................... 4 2. LITERATURE REVIEW .......................................................................................................................................................5 2.1. PRELIMINARIES ....................................................................................................................................... 5 Network Theory........................................................................................................................................................................5 Triadic Relations.......................................................................................................................................................................6 Binary Networks: Triadic Closure........................................................................................................................................6 Directed Networks: Triadic Transitivity...............................................................................................................................6 Weighted Networks: Triadic Value......................................................................................................................................6 Signed Networks: Triadic Balance.......................................................................................................................................7 2.2. CLUSTERING COEFFICIENT ...................................................................................................................... 7 Clustering Classification...........................................................................................................................................................7 Binary Networks.......................................................................................................................................................................8 Weighted Networks ................................................................................................................................................................10 Weighted and Directed Networks ...........................................................................................................................................13 Signed Networks.....................................................................................................................................................................14 Research Trends......................................................................................................................................................................15 Limitations..............................................................................................................................................................................15 3. FORMULA EXTENSION ....................................................................................................................................................16 3.1. BASIS .................................................................................................................................................... 17 3.2. MICRO-COMPARABILITY....................................................................................................................... 17 Formula: Weighted or Directed Networks ..............................................................................................................................17 Example..................................................................................................................................................................................17 3.3. MACRO-COMPARABILITY...................................................................................................................... 18 Formula: Weighted and/or Directed Networks........................................................................................................................20 Example..................................................................................................................................................................................21 Formula: Strongly Balanced Weighted and/or Directed Networks..........................................................................................22 Example..................................................................................................................................................................................23 Formula: Weakly Balanced Weighted and/or Directed Networks...........................................................................................23 Example..................................................................................................................................................................................24 3.4. SUMMARY ............................................................................................................................................. 25 4. CLUSTERING COMPARISON.............................................................................................................................................25 4.1. DATA SET.............................................................................................................................................. 25 4.2. COMPARISON IN UNWEIGHTED NETWORKS........................................................................................... 26 4.3. COMPARISON IN WEIGHTED NETWORKS ............................................................................................... 27 5. CONCLUSION...................................................................................................................................................................29 5.1. LIMITATIONS ......................................................................................................................................... 29 5.2. FUTURE IMPLICATIONS.......................................................................................................................... 29 5.3. SUMMARY ............................................................................................................................................. 29 6. PUBLICATION BIBLIOGRAPHY .........................................................................................................................................30 7. INDEX ..............................................................................................................................................................................32 7.1. LIST OF TABLES..................................................................................................................................... 32 7.2. LIST OF FIGURES ................................................................................................................................... 32 8. AFFIDAVIT.......................................................................................................................................................................33
  • 3. 3 1. INTRODUCTION The formation of clusters within a social network, isn’t new news. In fact, many see it as one of the earliest features for measuring a network’s morphology, with approaches dating back to the mid-20th century (Barnes 1969). However, in the current stream of research the assessment of graph clustering has become particularly popular. Based on the clustering coefficient 𝐢 developed by Watts and Strogatz (1998) as well as the one by Newman et al. (2001), various extensions have been developed to depict the clustering ratio more precisely. While these extensions have improved the understanding on the ways individuals are connected in various networks, its development has evolved in a chaotic manner (Fortunato 2010). Several approaches can be found within recent literature to assess the clustering of graphs; a single approach in doing so, however, has yet to have been adopted. This paper considers a possible rationale for this occurrence based on three main limitations of prior research, which are explained in the following section. 1.1. PROBLEM STATEMENT The aforementioned clustering coefficients that started the hype in network literature to adapt said formulae were originally intended for binary networks. This means this assessment of graph clustering merely acknowledges structural properties of the graph, i.e. if a relationship between the network’s members exists or doesn’t exist, as seen in the clustering coefficient of Newman et al. (2001). This equation gives the tendency of the existence of three members of a network being connected by three different relationships as opposed to these three only being connected by two relationships. The latter implies that a connection between two of the members doesn’t exist. This ratio, pertaining to the structural properties of a network, can be seen as the ideal foundation for the developments that follow. As time passed relational properties were added to the mix, e.g. by attributing weights or directedness to the relationships in the network. While great advancements have been made, this topic is still in its research infancy, since many limitations are given for the respective developments. The various presented clustering coefficients found in prior literature embrace findings to deliver a more precise ratio of triadic closure over triadic connectedness. This ratio delivers the tendency that β€œthe friend of my friend is my friend”. However, the advancements found in prior literature and the respective shortcomings of each formula are distributed disproportionately. This means, different equations offer different solutions to different problems, but do not always incorporate the given solutions from other findings. Hence, each clustering coefficient found in prior research incorporates limitations that have already been overcome in prior research. By combining selected clustering coefficient formulae, the known limitations can be set aside and a synergy of advancements can be achieved. While the lack of combination of the coefficients represents the first summarized limitation in this paper, the lack of micro-comparability is presented to be the second. Micro-comparability in terms of the clustering coefficient acknowledges that a clustering coefficient doesn’t always allow for a comparability of the same graph with different relational properties. The issue of micro-comparability is given due to prior literature’s need to assess a completely connected graph with a clustering coefficient value of 𝐢 = 1, resulting in a meaningless assessment of the cluster formation in completely connected graphs and a distorted assessment of triads with a large relational variance. By adapting the formula to take relational variance into consideration, the results can provide micro- comparability within the clustered triads. Furthermore, the issue of macro-comparability is given, as well. This means that the clustering coefficient is unable to compare different social networks indifferent to their relational and structural properties. While great advancements have been made to incorporate the relational properties of weightedness and directedness, current literature has yet to provide a clustering coefficient that can be used in binary, weighted, directed and/or signed
  • 4. 4 networks. To overcome the limitation of macro-comparability the formula can be adjusted so that it can be used in all types of networks regardless of its relational or structural properties. In summary due to the chaotic development of the clustering coefficient it is vital to collect, organize and compare the clustering coefficients of prior literature, in order to capture said development, determine research trends, and address limitations. The discovered limitations should then be resolved in the form of a formula adjustment. 1.2. RESEARCH QUESTIONS & HYPOTHESES By reviewing these aspects within the scope of this paper, an answer to this problem statement is to be found, in particular with the aim to answer the following research questions: Q1: How has the clustering coefficient developed over time? Q2: Can the formula be improved to incorporate micro- and macro-comparability? Q3: How would such a formula compare to the formulae found in research literature? The presented paper sets out to answer Q1 by reviewing prior research on the clustering coefficient. Furthermore, this paper proposes a formula extension as an answer to Q2, which can hypothetically do the following: H1: The proposed formula distinguishes between clusters in terms of relational variance, thus offering micro- comparability. H2: The proposed formula can be used in binary, weighted, directed, and/or signed networks, thus offering macro-comparability. Within the scope of this paper, the proposed formula is compared to those found along this research stream, in order to provide an answer to Q3. Because the proposed formula should exclude or minder aspects of the formulae found in previous literature, the following is foreseeable: H3: The newly developed clustering coefficient should emit a smaller value than those previously found in research literature. With regard to this paper’s research intent, the following section aims to present an outline of the paper at hand. 1.3. STRUCTURE OF THIS PAPER Due to very recent developments of the clustering coefficient, a literature review of prior research is appropriate and is presented in the first part of this paper. Thereby, preliminary knowledge on network theory is noted briefly and insight towards triadic relations is given. Subsequently, a holistic review of prior research regarding the clustering coefficient is presented, in which the term clustering is clarified, its development is documented, research trends are rendered and its limitations are addressed. The second part of the paper consists of the formula extension. The gathered research on the clustering coefficient is critically evaluated and modified according to the reviewed limitations, thereby proposing a new clustering coefficient overcoming the limitations of micro-and macro-comparability. In the third part of the paper, the new formula is compared to an extent to clustering coefficients of the past. Finally, a section is reserved for topics of discussion, such as known limitations and future implications. 1.4. PURPOSE OF THIS PAPER By critically evaluating prior research in regard to the adaption of the clustering coefficient formula, it is desired to provide a general understanding of said formula as well as an overview of its most recent development. By doing so, this paper contributes to social network research as well as to the research of network theory in general. Moreover, it is desired to extend current research by comparing these latest developments.
  • 5. 5 Prior research presents a fully connected network with a clustering coefficient of 𝐢 = 1, resulting in a meaningless analysis of the clustering within smaller networks, e.g. smaller companies, where everyone knows everyone, or in the necessity to manipulate data by performing cut-off measures. The proposed formula intends to alleviate the equation from this limitation, thus differentiating between clusters with differently weighted edges and equally weighted ones, thereby offering micro-comparability. This paper, therefore, offers a new perspective towards assessing the clustering formation, which can spark interest in fellow social network researchers to address this old concept in a new light. Additionally, by using the research at hand a new approach on the clustering coefficient is presented, which ideally can act as a framework for measuring the clustering of a network indifferent of its characteristics, i.e. weighted, directed networks, and networks that include positive and negative weights. By doing so, it is plausible that new ideas on older and current theories will emerge, thus expanding the research in social networking analyses. Because the clustering coefficient differs based on the network at hand, i.e. different equations are used for assessing the formation of clusters in different networks, this paper intends to provide the means for comparability in different types of networks, i.e. macro-comparability. Overall, the proposed formula is to the extent of this paper’s knowledge the first of its kind to deliver clustering results in binary, weighted, directed, and signed networks. Moreover, it is also desired for social network researchers to further the presented review by excluding or minimizing the limitations within this paper, as well as by evaluating other network measurement formulae in regard to their fit in real-world environments, or even developing this formula within a more extensive empirical study - possibly giving better insight towards cluster formations in specific environments or even towards other mathematical analysis aspects along the lines of this study. 2. LITERATURE REVIEW Since the clustering coefficient has evolved rapidly within the past several years, a review of prior research literature is necessary in order to understand its origin and the changes already implemented. In order to do so, it is vital to understand the basic notations and understandings used in network theory. Moreover, insight towards relevant triadic relations is given. This preliminary understanding is presented in the following and can be used to identify terms and variables used throughout the paper. Once this preliminary section is handled, the review of prior research in specific regard to the clustering coefficient is presented, whereby its definition is clarified and its development is documented. In addition the limitations within this literature review are addressed. 2.1. PRELIMINARIES NETWORK THEORY A graph 𝐺 consists of a set of 𝑁 = {𝑛1, 𝑛2, … , 𝑛 𝑛} nodes (vertices, points, or actors), a set of 𝐿 = {𝑙1, 𝑙2, … , 𝑙 𝑛} links (edges, lines, or ties), and a set of π‘Š = {𝑀1, 𝑀2, … , 𝑀 𝑛} values (weights). The weights attributed to each of the edges, and correspondingly found within graph 𝐺, can also be portrayed as a matrix in the weight matrix π‘Š, e.g.: 𝑀𝑖𝑗 describes the weight of the edge between 𝑛𝑖 and 𝑛𝑗 (Boccaletti et al. 2006). The adjacency matrix 𝐴 is the weight matrix for binary networks, where only values of π‘Ž = 0 or π‘Ž = 1 are permitted. A graph can generally embody four different structures: undirected and unweighted (binary), undirected and weighted, unweighted and directed as well as weighted and directed. Weighted graphs have edges weighted with any numeric value. Directed graphs have an asymmetrical weight matrix. Hereby, the order of the subscript of the weight is important. This describes the direction to which the weight is relevant, e.g. the weight 𝑀𝑖𝑗 describes the weighted edge going from node 𝑛𝑖 to node 𝑛𝑗. Typically in network research the prerequisite that 𝑖 β‰  𝑗 β‰  π‘˜ is given, which is also adopted
  • 6. 6 in this paper. Signed graphs can include both positive and negative weights. If two nodes are connected, these are neighbors or adjacent, with π‘˜π‘– being the number of neighbors that node 𝑛𝑖 has, also known as the node degree (KivelΓ€ et al. 2014). While the definition of relational and structural characteristics of a graph differs throughout literature, this papers defines them as such. Structural properties of a graph focus on the existence or non-existence of an edge connecting two nodes. Relational properties focus on the relational characteristics of this edge, e.g. the weightedness and directedness. With this preliminary knowledge on network structures and relations therein, the specific relation of triads is explained in following section. TRIADIC RELATIONS BINARY NETWORKS: TRIADIC CLOSURE Triads or triples illustrate the relationships between a set of three nodes. A triad is connected or open, if the three nodes are connected by two edges with weights higher than 𝑀 = 0. If this is the case, then the nodes are neighbors or adjacent. The triad is completely connected or closed if a triangle is formed, i.e. three edges connect the three nodes (KivelΓ€ et al. 2014). In binary networks this closure is means enough for assessing the formation of a cluster, as the existence of the relevant edges between the nodes suffices. However, when adding the relational property of directedness, the transitivity needs to be assessed before determining, whether the triad is closed. DIRECTED NETWORKS: TRIADIC TRANSITIVITY The researchers Holland and Leinhardt (1971) propose transitivity to be the key structural concept in the analysis of sociometric data. A closed triad defined by Wasserman (1994) is transitive if whenever 𝑙𝑖𝑗 and π‘™π‘—π‘˜ are present, then so is π‘™π‘–π‘˜. In physical terms this portrays a link chain from 𝑛𝑖 to 𝑛 π‘˜ through 𝑛𝑗 and connects 𝑛𝑖 and 𝑛 π‘˜ with a non-vacuous link from the perspective of 𝑛𝑖. A non-vacuous connection is an out-going link from a focal actor’s perspective (Wasserman 1994). In any given set of three nodes there are six triadic relations. For example, the open triadic relations among the three nodes 𝑛1, 𝑛2, and 𝑛3 consist of the following six: 𝑙12 𝑙23 βˆ’ 𝑙13 𝑙32 βˆ’ 𝑙21 𝑙13 βˆ’ 𝑙23 𝑙31 βˆ’ 𝑙31 𝑙12 βˆ’ 𝑙32 𝑙21 , which depict the first condition of Wasserman’s transitivity definition. The second condition involves implementing the third edge connected in a non-vacuous manner with the focal actor being the starting node 𝑛, displayed as such: 𝑙12 𝑙23 𝑙13 βˆ’ 𝑙13 𝑙32 𝑙12 βˆ’ 𝑙21 𝑙13 𝑙23 βˆ’ 𝑙23 𝑙31 𝑙21 βˆ’ 𝑙31 𝑙12 𝑙32 βˆ’ 𝑙32 𝑙21 𝑙31. While the transitivity definition helps assess triadic closure in directed networks, determining the weight value is still open. The various assessment measures are given in the following section. WEIGHTED NETWORKS: TRIADIC VALUE When implementing these six triadic relations in clustering formulae, the corresponding weight 𝑀 is taken out of the weight matrix, resulting in a value of 𝑀 = 0 if two nodes are not connected. However, the weights from the weight matrix alone don’t suffice to assess the value of the entire triad. As Opsahl and Panzarasa (2009) point out, there are four ways of assessing the triadic value: the arithmetic mean, the geometric mean, the maximum value and the minimum value. While the arithmetic mean is simple to use, it is prone to sensitivity issues as it is not robust against differences in weights, especially in extreme settings. The maximum and minimum value are also prone to insensitivity, as lower weights in the maximum value and higher weights in the minimum value are regarded to less of an extent. The geometric mean overcomes these issues of sensitivity (Opsahl, Panzarasa 2009). The four methods are given in Table 1 with examples to show their deviations from one another. However, when
  • 7. 7 regarding both positive and negative weights, the formulation of balance is key to determining the triadic closure. This occurrence is presented in the following section. Table 1: Triadic Value Assessment (Opsahl, Panzarasa 2009) Maximum Value a) π‘šπ‘Žπ‘₯(2,2) = 2 b) π‘šπ‘Žπ‘₯(1,3) = 3 Minimum Value a) π‘šπ‘–π‘›(2,2) = 2 b) π‘šπ‘–π‘›(1,3) = 1 Arithmetic Mean a) (2 + 2) 2⁄ = 2 b) (1 + 3) 2⁄ = 2 Geometric Mean a) √2 βˆ— 2 = 2 b) √1 βˆ— 3 = 1.73 SIGNED NETWORKS: TRIADIC BALANCE Since the introduction of the Theory of Structural Balance by Heider (1946), further implications of including positive and negative weights on triadic relations have been researched thoroughly. Davis (1967) determines that in order for a local network to be clusterable, it must also be balanced. The formulation of balance is based on the arithmetic sign of all three weights of a triad. A closed triad with a single negative weight, and thus two positive weights, is unbalanced and therefore not clusterable. On the contrary, a triad consisting of three positive weights or of one positive and two negative weights is balanced. Specifically, these occurrences depict a strong formulation of balance. A weak formulation of balance is also possible, if the given triad embodies negative weights for all three of its edges (Davis 1967). A visual representation of this is available in Table 2. Table 2: Clustering in Terms of the Formulation of Balance (Szell et al. 2010) Strong Formulation of Balance Balanced Balanced Unbalanced Unbalanced Weak Formulation of Balance Balanced Balanced Balanced Unbalanced 2.2. CLUSTERING COEFFICIENT With this preliminary understanding at hand, the following section aims to depict a holistic development of the clustering coefficient in prior research, ranging from the initial proposal, up to recent approaches of developing the formula to analyze networks with different structural and relational properties. Thereby, the literature review of the clustering coefficient is categorized into the networks these were developed for. Subsequently, the research trends are provided, and the limitations towards the more recent developments are addressed. However, given the aforementioned chaotic development of this stream of research, a classification of the types of clustering is foremost necessary. CLUSTERING CLASSIFICATION As previously mentioned the development of cluster formation assessment has grown in a chaotic manner, which has led to an unclear collection of clustering definitions. KivelΓ€ et al. (2014) distinguish the formation of clusters
  • 8. 8 three-fold. Firstly, one can use the node degrees to emit the ratio of existing adjacencies against all possible adjacencies in a graph. In this regard, the term clustering is synonymous with the density or neighborhood of a node. Secondly, one can use walks and paths to assess the clustering formation of a graph. This assessment is often used to identify communities, which is also synonymous to clusters and depict dense regions of a network (Boccaletti et al. 2006). Community detection aims to group nodes in modules based on a graph’s topology. Fortunato (2010) determines four traditional methods for assessing this type of clustering: hierarchical methods, partitional and graph partitional methods as well as spectral methods. Other methods such as grid-based and constraint-based clustering are available to use as well, among many others (Berkhin 2006). Lastly, clustering can be determined at a triadic level by evaluating the relations of a set of three nodes (KivelΓ€ et al. 2014). In this respect, the fundamental key formula, the clustering coefficient (Newman et al. 2001), measures transitivity, which is also often seen as a synonym to clustering (Latora, Marchiori 2003), and gives the ratio of closed triads or triples over mere connected ones. In physical terms, clustering based on triadic closure gives the tendency, that β€œthe friend of my friend is my friend”. This paper chooses to generalize the types of clustering to a further extent by defining two types of cluster assessment: a macroscopic and microscopic approach. On the one hand, the clustering of a network can be assessed using a macroscopic approach. This entails that clusters are dense regions of network, for example cliques or communities. Within these it is not necessary for each member of the dense cluster to be connected with one another. The fact that the group is highly dense justifies the term clustering. As already pointed out clustering based on community detection is abundantly present within prior research and many variations of this assessment are given, as well. On the other hand microscopic clustering is possible. This acknowledges a cluster as a group of three members that are completely connected. Generally, this can be assessed in terms of triadic closure. A similar distinction of the two types of clustering is presented by Girvan and Newman (2002). They acknowledge that the often synonymous terms are misleading and therefore refrain from using the term clustering to describe the detection of communities. Since the paper at hand shares this view, the triadic route of assessing clusters is chosen. In terms of the clustering coefficient there are two main methods available for assessing the tendency of clustered nodes. On the one hand, the local clustering coefficient is based on the local density of an ego’s network and provides a result for the clustering tendency from a local actor’s perspective, e.g. by assessing all closed triads over connected triads, in which the node 𝑛𝑖 is involved. The sum of all local clustering coefficients can then be averaged by all nodes to globalize its result across the entire network. The second measure is a straightforward global measure. Here, the global clustering coefficient assesses all closed triads from each nodes’ perspective and divides them by all open ones. The first measure is prone to sensitivity issues, since each local clustering coefficient is equally weighted regardless of its node-degree or general connectedness in the network (Opsahl, Panzarasa 2009). The use of both globalized local and global clustering coefficients is abundantly found prior research, which can be seen in following section, where the development of the formula is illustrated. BINARY NETWORKS Equating the clustering tendency in binary networks has proven to be the simplest. Because relational properties aren’t acknowledged, the following prerequisites are given π‘Žπ‘–π‘— = π‘Žπ‘—π‘– = 𝑀𝑖𝑗 = 𝑀𝑗𝑖. The term clustering coefficient was first introduced by Watts and Strogatz (1998) in their attempt to compare random networks to those in the real-worlds. The clustering coefficient 𝐢 is defined as the average of 𝐢𝑖 over all 𝑁, where 𝑁 is the number of nodes in the network and 𝐢𝑖 is the ratio of the actual amount of edges (𝐿𝑖) that 𝑛𝑖 has over the maximum possible number of edges equated using the following formula (π‘˜π‘–(π‘˜π‘– βˆ’ 1)) 2⁄ . Given the fraction form of the entire equation, with
  • 9. 9 the denominator always larger than the numerator, 𝐢 is given between the values of 𝐢 = 0 and 𝐢 = 1. The clustering coefficient by Watts and Strogatz (1998) can therefore be read as such: [ 1 ] 𝐢 π‘Šπ‘Žπ‘‘π‘‘π‘  π‘†π‘‘π‘Ÿπ‘œπ‘”π‘Žπ‘‘π‘§ = 1 𝑁 βˆ‘ 𝐿𝑖 π‘˜π‘–(π‘˜π‘– βˆ’ 1)/2 𝑖 While this clustering coefficient doesn’t use triadic relations to determine the formation of clusters, its introduction started the movement towards the clustering coefficient development and is therefore noteworthy. However, comparable findings were made almost half a century prior to this introduction. The proposal given by Watts and Strogatz (1998) is very similar to the findings of Kephart (1950), in which the law of family interactions is proposed to be the ratio of actual relationships over potential ones. Watts and Strogatz (1998) develop this by using of a focal node’s perspective, which is then globalized over the entire network. Given this milestone on research development, Newman et al. (2001) adjust the definition of the clustering coefficient in their study on random graphs and implement it on real-world networks, specifically collaboration networks and the world-wide web. This claims to be equal to and merely reverses the approach to the original clustering coefficient by taking the ratio of the means instead of the mean of the ratios. This coefficient is defined as such (Newman et al. 2001): [ 2 ] 𝐢 π‘π‘’π‘€π‘šπ‘Žπ‘› 𝑒𝑑 π‘Žπ‘™. = 3π‘βˆ† 𝑁⋀ In general terms, it is read as three times the number of all triangles (π‘βˆ†) divided by all the connected triples (𝑁⋀). The number 3 in the numerator is present on account of each triangle representing three closed triads. As pointed out by Schank and Wagner (2005) as well as by Latora and Marchiori (2003), this formula and the one from Watts and Strogatz (1998) differ. In fact, Latora and Marchiori (2003) define the latter proposed by Watts and Strogatz (1998) to equate the approximate of a different measure, namely the efficiency. The two also extend the model, which is explained in the review of the clustering coefficient in weighted networks. A further limitation of the Watts-Strogatz formula (1998), is the fact that their formula is based on the sum of all local clustering coefficients which is then globalized of the entire network. As mentioned in section 2.2.1 of this paper, such an approach is prone to sensitivity issues. While the presented clustering coefficients are intended for assessing the clustering in binary networks, literature shows that these can also be implemented in weighted networks, as well. The study of scientific collaboration by Newman (2001) is a great candidate for implementing weights, as the relations between scientists can be seen as stronger for a larger amount of co-authored papers and weaker for the contrary. Newman (2001), however, assesses the formation of clusters based on a binary graph and thus only acknowledges the existence of links between the scientists and disregards relational attributes. Such a manipulation or symmetrization of the data, is common in early social networking research, since the mathematical foundations for the clustering assessment aren’t yet able to include relational properties. As an answer to this problem, the formula is extended to include weightedness, which is presented in the following section.
  • 10. 10 WEIGHTED NETWORKS When implementing weighted edges into the clustering coefficient formula the prerequisites from binary networks are no longer the case. Instead the following is given π‘Žπ‘–π‘— = π‘Žπ‘—π‘– β‰  𝑀𝑖𝑗 = 𝑀𝑗𝑖. The weighted networks described in this section are also undirected, therefore the order of the variables’ subscript is irrelevant. Furthermore, prior research shows a separation of development of the clustering coefficient formula. While early on many extensions and adjustments are given to the local clustering coefficient of Watts and Strogatz (1998), most recent developments further the global measure introduced by Newman et al. (2001). The following section first reviews the development of the local clustering coefficient. Subsequently, developments of the global clustering coefficient follow. As already mentioned Latora and Marchiori (2003) define the original clustering coefficient model from Watts and Strogatz (1998) to assess the efficiency of a network rather than its clustering. Specifically, this measures how well information spreads throughout a network. In addition to providing a new definition of this formula, the two researchers expand said formula to incorporate weights. This expansion is read as follows. [ 3 ] πΆπΏπ‘Žπ‘‘π‘œπ‘Ÿπ‘Ž π‘€π‘Žπ‘Ÿπ‘β„Žπ‘–π‘œπ‘Ÿπ‘– = 1 𝑁(𝑁 βˆ’ 1) βˆ‘ 1 𝑑𝑖𝑗 𝑖,𝑗 1 𝑁(𝑁 βˆ’ 1) βˆ‘ 1 𝑀𝑖𝑗 𝑖,𝑗 The numerator measures the average efficiency between two nodes, in which the shortest path-distance 𝑑𝑖𝑗 between two nodes 𝑛𝑖 and 𝑛𝑗 is seen as inversely proportional to their efficiency. The variable 𝑑𝑖𝑗 gives the shortest summed weight required to connect the two nodes. In order to normalize the efficiency between 𝐢 = 0 and 𝐢 = 1 the denominator is introduced. This measures the ideal average efficiency, in which 𝑀𝑖𝑗 is equal to 𝑑𝑖𝑗 if a direct link between nodes 𝑛𝑖 and 𝑛𝑗 is formed. Since this formula is based on the Watts-Strogatz model (1998), it purposely disregards emitting a clustering result and therefore refrains from using triadic closure (Latora, Marchiori 2003). Regardless of this, this paper deems the findings of Latora and Marchiori (2003) noteworthy, due to its provision of insight towards the importance of including weightedness in the original clustering coefficient formula. On a further note, Grindrod (2002)1 adapts the formula, as well, in order to equate the clustering tendency in even larger networks, where the exact link number can be given by an estimate. In this ensemble approach, the number of connected triads in the denominator above is replaced by the probability 𝑝 that node 𝑛𝑖 is connected to 𝑛𝑗 and is connected to 𝑛 π‘˜. The numerator then expands this by including the probability 𝑝 that nodes 𝑛𝑗 and 𝑛 π‘˜ are connected. Thereby, the probability 𝑝 is given between 𝑝 = 0 and 𝑝 = 1 (Grindrod 2002). The formula is read as such: [ 4 ] 𝐢 πΊπ‘Ÿπ‘–π‘›π‘‘π‘Ÿπ‘œπ‘‘ = 1 N βˆ‘ ( βˆ‘ 𝑝𝑖𝑗 π‘π‘–π‘˜ π‘π‘—π‘˜π‘—,π‘˜ βˆ‘ 𝑝𝑖𝑗 π‘π‘–π‘˜π‘—,π‘˜ ) 𝑖 1 Grindrod (2002) merely proposes a local clustering coefficient in his article. For the sake of comparability, the global clustering coefficient based on this formula is given by globalizing the local clustering coefficients over all nodes N, as seen in Barrat et al. (2004). The same goes for the clustering coefficients of Onnela et al. (2005), Zhang, Horvath (2005), and Holme et al. (2007).
  • 11. 11 This development utilizes the approach of globalizing local clustering coefficients, as seen in Watts, Strogatz (1998), but bases its factors on triadic closure, as seen in Newman et al. (2001). In his paper, Grindrod (2002) further develops the formula to assess the probability values. In their analysis of an airline transportation network as well as of a social network of scientific collaboration, Barrat et al. (2004) introduce a local clustering coefficient based on triadic closure that implements weights as relational properties. The formula is read as follows. [ 5 ] 𝐢 π΅π‘Žπ‘Ÿπ‘Ÿπ‘Žπ‘‘ 𝑒𝑑 π‘Žπ‘™. = 1 𝑁 βˆ‘ ( 1 𝑠𝑖(π‘˜π‘– βˆ’ 1) βˆ‘ (𝑀𝑖𝑗 + π‘€π‘–π‘˜) 2 π‘Žπ‘–π‘— π‘Žπ‘–π‘˜ π‘Žπ‘—π‘˜ 𝑗,π‘˜ ) 𝑖 The factor (𝑠𝑖(π‘˜π‘– βˆ’ 1)) is the weight of each edge times the maximum possible number of triples and is used to normalize the clustering result between 𝐢 = 0 and 𝐢 = 1. This is comparable to the denominator of the Watts, Strogatz formula (1998). The variable 𝑠𝑖 is the difference and embodies the node strength, which is the weighted value of all edges connected to node 𝑛𝑖. The second factor accounts for the average amount of the two weighted values that are connected by a focal actor 𝑛𝑖. However, this is only the case if a triangle is formed. This gives the local clustering coefficient, which is then averaged overall nodes 𝑁 to give the clustering coefficient for the entire network. This formula marks a further development, as it utilizes one of the aforementioned triadic values assessment measures, namely the arithmetic mean (Barrat et al. 2004). Onnela et al. (2005) critique the clustering coefficient given by Barrat et al. (2004), on account of a disregard towards the weighted value of the third connecting edge. They, therefore, expand the formula to incorporate the value of said edge and apply the proposed formula to the undirected financial network of traded stocks. Their proposal reads as such. [ 6 ] 𝐢 π‘‚π‘šπ‘›π‘’π‘™π‘Ž 𝑒𝑑 π‘Žπ‘™. = 1 𝑁 βˆ‘ ( 1 π‘˜π‘–(π‘˜π‘– βˆ’ 1) βˆ‘ ( 𝑀𝑖𝑗 π‘šπ‘Žπ‘₯(𝑀) π‘€π‘–π‘˜ π‘šπ‘Žπ‘₯(𝑀) π‘€π‘—π‘˜ π‘šπ‘Žπ‘₯(𝑀) ) 1 3⁄ 𝑗,π‘˜ ) 𝑖 This coefficient is read similarly to the one proposed by Barrat et al. (2004). Here, however, the triadic value is assessed by using the geometric mean as opposed to the arithmetic mean used in Barrat et al. (2004). In addition, the weights are scaled by the largest weight and the node strength is replaced by the node degree. Furthermore, it is no longer necessary to regard the adjacency values, since including the weighted value of the connecting edge enables the formula to differentiate between closed and connected triples. In their paper for biological networks, Zhang and Horvath (2005) provide a different approach towards assessing the clustering in weighted networks. They generalize the ratio of the total number of direct connections a node 𝑛𝑖 has by its maximum number of possible connections, which is read as such. [ 7 ] 𝐢 π‘β„Žπ‘Žπ‘›π‘” π»π‘œπ‘Ÿπ‘£π‘Žπ‘‘β„Ž = 1 N βˆ‘ ( βˆ‘ ( 𝑀𝑖𝑗 π‘šπ‘Žπ‘₯(𝑀) π‘€π‘—π‘˜ π‘šπ‘Žπ‘₯(𝑀) 𝑀 π‘˜π‘– π‘šπ‘Žπ‘₯(𝑀) )𝑗,π‘˜ (βˆ‘ 𝑀𝑖𝑗 π‘šπ‘Žπ‘₯(𝑀)𝑗 ) 2 βˆ’ βˆ‘ ( 𝑀𝑖𝑗 π‘šπ‘Žπ‘₯(𝑀) ) 2 𝑗 ) = 1 N βˆ‘ ( βˆ‘ ( 𝑀𝑖𝑗 π‘šπ‘Žπ‘₯(𝑀) π‘€π‘—π‘˜ π‘šπ‘Žπ‘₯(𝑀) 𝑀 π‘˜π‘– π‘šπ‘Žπ‘₯(𝑀) )𝑗,π‘˜ βˆ‘ ( 𝑀𝑖𝑗 π‘šπ‘Žπ‘₯(𝑀) π‘€π‘–π‘˜ π‘šπ‘Žπ‘₯(𝑀) )𝑗,π‘˜ ) 𝑖𝑖
  • 12. 12 While Zhang and Horvath (2005) originally use an adjacency function to derive the weighted values, the paper of SaramΓ€ki et al. (2007) shows the formula’s capability to use weights from the weight matrix, as well. Similar to Onnela et al. (2005) the weights are scaled by the maximum weight in the graph. The denominators are based on the maximum weights, ensuring a result between 𝐢 = 0 and 𝐢 = 1 (SaramΓ€ki et al. 2007). The equation from Zhang and Horvath (2005) is also no longer reliant on equating the node degree π‘˜π‘–, instead the weighted values are used in the denominator. Kalna and Higham (2007) provide further evidence in their paper, that the proposed local coefficients by Zhang, Horvath (2005) are equal to one another. The version of the local clustering coefficient provided by Holme et al. (2007) is used to assess the clustering of students at a Korean university. Their formula aims to meet the following requirements. The coefficient emits a value between 𝐢 = 0 and 𝐢 = 1, the weight 𝑀 = 0 represents the lack of a connection, a given triad in the formula should be proportional to its relevance in the clustering result in comparison to the weights of each of its edges, and the Watts and Strogatz formulated results (1998) should be identical to their formula results, if the weights are replaced with adjacencies. The maximum value in their formula is used as an answer towards their third requirement. Specifically, this maximum value represents a matrix, in which the maximum 𝑀𝑖𝑗 is located on all positions (Holme et al. 2007). This is given below. [ 8 ] 𝐢 π»π‘œπ‘™π‘šπ‘’ = 1 𝑁 βˆ‘ ( βˆ‘ 𝑀𝑖𝑗 π‘€π‘–π‘˜ π‘€π‘—π‘˜π‘—,π‘˜ π‘šπ‘Žπ‘₯𝑖𝑗(𝑀𝑖𝑗) βˆ‘ 𝑀𝑖𝑗 π‘€π‘–π‘˜π‘—,π‘˜ ) 𝑖 In regard to more recent developments of the clustering coefficient the global measure as opposed to the globalizing of local clustering coefficients has become the standard. EngΓΈ-Monsen and Canright (2011) propose a global clustering coefficient formula highly based on that of Newman et al. (2001), however here the geometric mean is used to determine the triadic value. Their proposal reads as such. [ 9 ] 𝐢EngΓΈβˆ’Monsen Canright = βˆ‘ √ 𝑀𝑖𝑗 π‘€π‘–π‘˜ π‘€π‘—π‘˜ 3 𝑖,𝑗,π‘˜ βˆ‘ √ 𝑀𝑖𝑗 π‘€π‘–π‘˜π‘–,𝑗,π‘˜ Phan et al. (2013) extend this formula and apply it to 1000 Bernoulli random networks. The extension is given by acknowledging that the third connecting edge plays more a relative role in the clustering assessment. The equation therefore allows relating the weighted strength of the third connecting edge to that of the other two. The denominator consists of the weighted value assessment of all open triads plus that of all closed triads. This approach is given to normalize the clustering coefficient between 𝐢 = 0 and 𝐢 = 1, which is given as follows (Phan et al. 2013). [ 10 ] 𝐢Phan et al. = βˆ‘ √√ 𝑀𝑖𝑗 π‘€π‘–π‘˜ √ 𝑀𝑖𝑗 π‘€π‘–π‘˜ π‘€π‘—π‘˜ 3 𝑖,𝑗,π‘˜ βˆ‘ √√ 𝑀𝑖𝑗 π‘€π‘–π‘˜ √ 𝑀𝑖𝑗 π‘€π‘–π‘˜ π‘€π‘—π‘˜ 3𝐢 𝑖,𝑗,π‘˜ + βˆ‘ √ 𝑀𝑖𝑗 π‘€π‘–π‘˜ 𝑂 𝑖,𝑗,π‘˜ While the presented developments of the formula offer great approaches towards assessing the clustering tendency in weighted networks, they disregard the relational property of directedness. In the following section proposed measures are presented to overcome this limitation.
  • 13. 13 WEIGHTED AND DIRECTED NETWORKS Clustering in directed networks is based on the prerequisite that the tie from node 𝑛𝑖 to node 𝑛𝑗 isn’t necessarily equal to the tie from node 𝑛𝑗 to node 𝑛𝑖, and thus π‘Žπ‘–π‘— β‰  π‘Žπ‘—π‘– β‰  𝑀𝑖𝑗 β‰  𝑀𝑗𝑖. Fagiolo (2007) remarks that when examining the triad formation, one can pay special attention to the role a focal actor plays, and notes four possible patterns. The focal node 𝑛1 a) can be involved in a cycle (β€œcyc”), e.g. 𝑙12 𝑙23 𝑙31, b) can play the role of a middleman (β€œmid”), e.g. 𝑙21 𝑙13 𝑙23 where node 𝑛2 can reach node 𝑛3 either directly or through the focal node 𝑛1, c) can be classified as β€œin”, e.g. 𝑙21 𝑙31 𝑙23 where node 𝑛1 holds two incoming edges, and d) can be classified as β€œout”, e.g. 𝑙12 𝑙13 𝑙23 where node 𝑛1 holds two outgoing edges. Fagiolo (2007) proposes four clustering coefficients for each of the patterns and then combines the four by defining the clustering coefficient to be the total of all actual triadic relations of each of the four patterns, divided by all possible ones. By replacing the adjacency values with the values from the weight matrix the following globalized local clustering coefficient for weighted and directed networks is proposed. [ 11 ] 𝐢Fagiolo = 1 𝑁 βˆ‘ 1 2 βˆ‘ (𝑀𝑖𝑗 1 3 + 𝑀𝑗𝑖 1 3 ) (π‘€π‘–π‘˜ 1 3 + π‘€π‘˜π‘– 1 3 ) (π‘€π‘—π‘˜ 1 3 + π‘€π‘˜π‘— 1 3 )𝑗,π‘˜ 2(𝑑𝑖 π‘‘π‘œπ‘‘ (𝑑𝑖 π‘‘π‘œπ‘‘ βˆ’ 1) βˆ’ 2𝑑𝑖 ↔) 𝑖 The numerator entails the geometric mean of the weighted in-degrees and out-degrees of a node to two others and the two weighted, directed edges between these. The denominator is similar to that of Watts and Strogatz (1998) where the node degree π‘˜π‘– is replaced with the total node degree 𝑑𝑖 π‘‘π‘œπ‘‘ consisting of the weighted sum of the in- and out-degrees of a node. Thereby, bilateral degrees that were already recognized in the first part of the denominator are subtracted. The local measure can then be globalized over the entire network. After an empirical application the weights - similar to Holme et al. (2007), Onnela et al. (2005) as well as Zhang and Horvath (2005) – are rescaled over a maximum weight value. Squartini et al. (2011) reuse the formula given by Fagiolo (2007) and replace the respective weight with a differently rescaled weight to wash away trends in their specific example of the International Trade Network. Tabak et al. (2014) expand the four clustering coefficients of each of the patterns introduced by Fagiolo (2007) by attributing weights prior to combining them in one single formula. Opsahl and Panzarasa (2009) use the cycle-, middlemen-, in- and out-approach as well to classify the various triads, and apply their approach to a vast range of networks, such as acquaintance and relationship networks, neural networks, organizational networks, networks of political support and networks of interaction through messages. Contrary to Fagiolo (2007), they base their formula highly on triadic transitivity as seen in Holland and Leinhardt (1971) and Wasserman (1994). This is relevant, since triads in the form of a cycle aren’t seen as transitive and therefore not clustered. Furthermore, triads that only contain in- or out-degrees from a focal actor’s perspective are also not seen as transitive. A straightforward formula is not given. Instead, a framework is provided in their paper to assess the formula. This paper summarizes this assessment as the following. [ 12 ] 𝐢Opsahl Panzarasa = βˆ‘ 𝑀𝑖𝑗 π‘€π‘–π‘˜ π‘Žπ‘—π‘˜π‘–,𝑗,π‘˜ βˆ‘ 𝑀𝑖𝑗 π‘€π‘—π‘˜π‘–,𝑗,π‘˜ Because the formula notes the sum of all 𝑗 as well as the sum of all π‘˜, which are considered as variable subscripts rather than node descriptions a further distinction between triadic relations as seen in Opsahl and Panzarasa (2009) is unnecessary. While this formula is the first global clustering measure that can be applied to weighted and
  • 14. 14 directed networks, it lacks considering the weighted value of the third connecting edge in the numerator, as seen in Phan et al. (2013). Despite this shortcoming a further advancement in the formula is given. Notice this summarized version of the clustering coefficient from Opsahl and Panzarasa (2009) differs from the summarized version offered in the paper of Phan et al. (2013). The discrepancy lies within the subscripts of the denominator. Researchers prior to and even after the findings of Opsahl and Panzarasa (2009), assess the denominator of the formula either with node-degrees or with an abstracted version of 𝑀𝑖𝑗 π‘€π‘–π‘˜. While their great findings for the most part go unnoticed, Opsahl and Panzarasa (2009) achieve a precise acknowledgement of denominator in the formula and accordingly provide a formula for the correct intended tendency of β€œthe friend of my friend is my friend”. The summarized version provided by Phan et al. (2013) doesn’t include this consideration. Table 3 shows the extent of this discrepancy. Table 3: Opsahl, Panzarasa’s (2009) Clustering Coefficient Denominator Differences Graphical and written representation of the tendency Is my friend the friend of my friend? Is the friend of my friend my friend? Weights in denominator of the clustering coefficient 𝑀𝑖𝑗 π‘€π‘–π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ Triad census for weighted directed networks Numerator 𝑀 𝐴𝐡 𝑀𝐴𝐢 π‘Ž 𝐡𝐢 2,3,1 𝑀 𝐴𝐢 𝑀𝐴𝐡 π‘Ž 𝐢𝐡 3,2,0 𝑀 𝐡𝐴 𝑀 𝐡𝐢 π‘Ž 𝐴𝐢 0,1,1 𝑀 𝐡𝐢 𝑀 𝐡𝐴 π‘Ž 𝐢𝐴 1,0,0 𝑀 𝐢𝐴 𝑀 𝐢𝐡 π‘Ž 𝐴𝐡 0,0,1 𝑀 𝐢𝐡 𝑀 𝐢𝐴 π‘Ž 𝐡𝐴 0,0,0 Denominator 𝑀 𝐴𝐡 𝑀𝐴𝐢 2,3 𝑀 𝐴𝐢 𝑀𝐴𝐡 3,2 𝑀 𝐡𝐴 𝑀 𝐡𝐢 0,2 𝑀 𝐡𝐢 𝑀 𝐡𝐴 2,0 𝑀 𝐢𝐴 𝑀 𝐢𝐡 0,0 𝑀 𝐢𝐡 𝑀 𝐢𝐴 0,0 Numerator 𝑀 𝐴𝐡 𝑀 𝐡𝐢 π‘Ž 𝐴𝐢 2,1,1 𝑀 𝐴𝐢 𝑀 𝐢𝐡 π‘Ž 𝐴𝐡 3,0,1 𝑀 𝐡𝐴 𝑀 𝐴𝐢 π‘Ž 𝐡𝐢 0,3,1 𝑀 𝐡𝐢 𝑀 𝐢𝐴 π‘Ž 𝐡𝐴 1,0,0 𝑀 𝐢𝐴 𝑀𝐴𝐡 π‘Ž 𝐢𝐡 0,2,0 𝑀 𝐢𝐡 𝑀 𝐡𝐴 π‘Ž 𝐢𝐴 0,0,0 Denominator 𝑀 𝐴𝐡 𝑀 𝐡𝐢 2,1 𝑀 𝐴𝐢 𝑀 𝐢𝐡 3,0 𝑀 𝐡𝐴 𝑀 𝐴𝐢 0,3 𝑀 𝐡𝐢 𝑀 𝐢𝐴 1,0 𝑀 𝐢𝐴 𝑀𝐴𝐡 0,2 𝑀 𝐢𝐡 𝑀 𝐡𝐴 0,0 COpsahl Panzarasa according to Phan et al. (2013) 𝐢 = (2 βˆ— 3 βˆ— 1) (2 βˆ— 3) + (3 βˆ— 2) = 0,5 COpsahl Panzarasa according to this paper 𝐢 = (2 βˆ— 1 βˆ— 1) (2 βˆ— 1) = 1 Equation [ 12 ] concludes the collection of proposed clustering coefficient formulae found in prior literature. Since this paper also deems the formulation of structural balance as an equal measure of equating triadic closure, selected studies on signed networks that intend to deliver clustering results in said networks are presented in the following section. SIGNED NETWORKS Assessing the clustering tendency in signed networks has become increasingly important, since weighted data with both positive as well as negative connotations is accessible. In regards to social network analyses, signed networks are for example networks of friends and enemies, or partners and competitors. Beyond mere social networks, network mash-ups, such as networks of products and customers, are also interesting grounds to assess the clustering with positive and negative weights, given the provision of an actor’s likes or dislikes of certain
  • 15. 15 products. The aforementioned Theory of Structural Balance is used to assess the clustering in signed binary networks, in which the clusterability of a triad is dependent on its three arithmetic signs (Heider 1946). Kunegis et al. (2009) provide clustering insight in signed, directed networks in their paper of the analysis of the Slashdot Zoo, a technology where users can mark other users as a friend or foe. Thereby, the researchers acknowledge that the product of two directed edges is the sign of the other directed edge. This approach is identical to the Theory of Structural Balance for strongly balanced graphs, and is directly applied to directed networks. Assessing the clustering coefficient in signed networks is common in prior literature. However, must studies don’t state their exact methodology of doing so. Furthermore, many pieces of prior literature often simplify their collected data in order to apply certain mathematical analysis measures. For example, Szell et al. (2010) exclude weighted edges in their analysis of a further friend-enemy network, even though the strength of the interaction between each player is given. The paper of Szell and Thurner (2012) provides a weighted clustering coefficient in the form of private messages as an extra and separate result to compare the clustering of friend and of enemy networks, providing insight that the interaction between positive networks, is larger than that of negative ones. The assessment of the clustering of friend and enemy networks is thereby measured as an unweighted network. In sum, formula improvements for including signed values in the clustering coefficient are scarce in prior research. However, assessing clustering results in such networks is very common. With this in mind, the research trends assessed from this literature review are given in the following section. RESEARCH TRENDS In the development of the clustering coefficient various trends are given. Firstly, the formulae tend to stray away from using node-degree in the denominator and instead focus on a triadic approach. Moreover, the sensitivity issues in regard to globalized local coefficients are resolved, as the most recent formulae use global measures. In addition, the triadic value is no longer assessed with rescaled maximum values, instead the geometric mean is used. Lastly, acknowledging weightedness and directedness is becoming increasingly important in the assessment. Given these advancements, the development of the clustering coefficient is still within its early stage of research, as many limitations are given. The following summarizes these. LIMITATIONS This paper summarizes three main limitations in regard to the clustering coefficients of prior research, namely a lack of clustering coefficient combinations, a lack of comparability in the form of micro-comparability, and a lack of comparability in the form of macro-comparability. These are presented in the following. First of all, the different coefficients entail various advancements but also shortcomings that are distributed disproportionately among the findings. For example, in the stream of weighted clustering coefficients the acknowledgement of the third connecting edge is taken into consideration in the formula’s numerator, specifically it can also regard this edge relatively to the two other edges (Phan et al. 2013). The research stream of weighted, directed networks lacks this acknowledgement. However, here the denominator is assessed correctly (Opsahl, Panzarasa 2009), which is not found in the research stream of mere weighted networks. A combination of both of these advancements without their shortcomings has yet to have been provided in current literature. Therefore, further development of the formula is necessary in order to benefit from prior literature’s findings as well as to eliminate shortcomings thereof. The issue of micro-comparability illustrates that recent formula developments of the clustering coefficient don’t always allow for a comparability of the same graph with different relational properties. This is mostly due to the fact that the developed clustering coefficients that implement relational properties result in a value of 𝐢 = 1 for a
  • 16. 16 completely connected graph. This value of 𝐢 = 1 indicates that a graph has reached the highest form of clustering possible. In regard to binary networks, this result is justifiable for a completely connected graph, because only the existence of ties is taken into consideration. Clustering coefficients with relational attributes should, however, distinguish between clusters with equally weighted edges and clusters with differently weighted edges. This approach is comparable to the original tendency developed for binary networks, i.e. β€œthe friend of my friend is my friend”, where each edge is identical to one another, weighted with the value 𝑀 = 1. Prior research, however, often equates this ratio mentioned above with the ratio that β€œthe best friend of my friend is my acquaintance”. This varies from the originally proposed tendency, since the edges are not equally weighted, therefore shouldn’t by definition reach the value of 𝐢 = 1 and thus not the highest form of clustering possible. Because this paper acknowledges a sufficient difference between the two ratios mentioned above, room for improvement of the clustering coefficient formula is available, namely by overcoming this limitation of micro-comparability. Table 4 depicts the limitation in an extreme setting. Notice how network c) resembles network d) the most, yet their generalized clustering results according to prior literature are polar opposite. Table 4: Micro-Comparability of Clustered Triads CC According to Prior Literature: C = 1 C = 1 C = 1 C = 0 C = 1 CC According to this Paper: C = 1 C = 0.4629 C = 0.2203 C = 0 C = 1 On a further note, large-scale networks have gained in popularity within the past years of network research, since large amounts of data can be acquired easily and used to conduct real-world analyses as opposed to depict mere generalizations of or approaches to real-world problems. The acquired data not only gives insight towards whether individuals are connected or not, but also towards the relational manner of the connection by attributing weights, both positive and negative, as well as directedness. This occurrence calls for macro-comparability of the clustering coefficient, i.e. the ability to compare different social networks indifferent to their relational and structural properties. While most recent discoveries are able to compare networks regardless of their weightedness or directedness, prior research has yet to acknowledge negative and positive weights in their formula. Clustering in terms of the formulation of balance (Davis 1967) is becoming increasingly relevant, since data regarding both likes and dislikes of an individual is easily acquirable. The development of a single formula that can be used in all types of networks can relinquish the need for having multiple versions of the clustering coefficient and thus offer macro-comparability of the clustering formation across all types of networks. 3. FORMULA EXTENSION With the knowledge gained from prior research towards the development of the clustering coefficient, the limitations acknowledged in the previous section, will now be addressed in this new approach for analyzing the formation of clustering. Thereby, this paper differentiates between equally weighted clusters and clusters with relational variance, offering micro-comparability. Furthermore, the assessment is geared towards the formation of clusters in networks indifferent to their relational and structural properties, thereby offering macro-comparability.
  • 17. 17 3.1. BASIS The fundamental basis of the proposed formula extension is derived from the global clustering coefficient developed by Newman et al. (2001), where the numerator embodies three times the number of all closed triangles and the denominator all open or closed triangles. By focusing on each triadic relation rather than the triangles, the value of the corresponding weights can determine if a triangle is formed or not. Each triadic relation is assessed, thereby alleviating the numerator of the factor three. The triadic value is assessed by extracting the geometric mean of the weighted value of the triadic relations. This paper then incorporates the approach of Phan et al. (2013), in which the third connecting edge is explicitly acknowledged as relative. Unlike the approach of Phan et al. (2013), this paper purposely disregards the third connecting edge in the denominator, which was only introduced to normalize the equation and ensure 𝐢 = 1 for completely connected graphs. By focusing on the original clustering coefficient, one can notice that the numerator, like the proposed formula, acknowledges the occurrence of closed triples, however the denominator neglects these purposely. By doing so, an accurate ratio of closed triads to connected triads can be equated. In the proposed formula, the variable 𝑀̅, represents the value assessment of a closed triad with the third connecting edge seen as relative and the variable 𝑣̅ the value assessment of all triads, resulting in the clustering coefficient being the ratio of the total sums of each. Thereby, the denominator is based on that of Opsahl and Panzarasa (2009), which is strongly based on the Transitivity Theory of Wasserman (1994). The basis for the formula extension can, therefore, be read as follows. [ 13 ] 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ 𝑣̅𝑖,𝑗,π‘˜ 3.2. MICRO-COMPARABILITY FORMULA: WEIGHTED OR DIRECTED NETWORKS The aforementioned limitation of micro-comparability is subjected in weighted networks. This revolves around the fact that the following statements are to an extent seen as equally clustered: β€œthe friend of my friend is my friend” versus β€œthe best friend of my friend is my acquaintance”. Due to the great preliminary work of the presented researchers the adjustment of the formula is merely a tweak. For weighted networks, we can thus expand 𝑀̅ to depict the triadic relation in relation to the relative weight of the third connecting edge. Appropriately, the variable 𝑣̅ is the triadic value assessment of any and all triads without respect to its connecting edge. Thereby, the subscript of the weights is aligned to the Transitivity Theory (Wasserman 1994) and only allows for transitive triads. Ergo, this formula can be implemented in directed networks, as well. [ 14 ] 𝐢 = βˆ‘ √√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ βˆ— √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 𝑖,𝑗,π‘˜ βˆ‘ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜π‘–,𝑗,π‘˜ EXAMPLE To show the impact of this development in weighted networks the following example is used (Refer to Figure 1). While the clustering coefficients found in previous literature, for example as seen in Phan et al. (2013), embrace the two networks below as equally clustered, this paper proposes the contrary. According to this paper, the clustering coefficient of network a) equals 𝐢 = 0.98. For a comparison, the clustering coefficient of the cluster in network b) is 𝐢 = 1. Both are high, however, the coefficient now allows for comparisons, as seen in Table 5.
  • 18. 18 As seen in Table 4, when subjected to extreme settings an equal result of 𝐢 = 1 can be misleading. Take the example of network b) in this table and imagine that nodes 𝐡 and 𝐢 work closely together and their weighted edge, measured through e-mail transfer, is very large, e.g. 𝑀 𝐡𝐢 = 10000. If node 𝐴 were to send out an e-mail broadcast, including nodes 𝐡 and 𝐢 as recipients, previous literature would render this digraph of the network with a clustering coefficient value of 𝐢 = 1, even though 𝐴 might barely know the other two. While the clustering coefficient for binary networks addresses the mere existence of three edges per triple, for weighted networks the mere existence of edges as seen in binary networks shouldn’t suffice. Instead, like binary networks, these edges should be equal to form an ideal cluster. With the newly proposed formula, the original intended ratio β€œthe friend of my friend is my friend” is kept. However in addition, this equation also differentiates its results from the following statement: β€œthe best friend of my friend is my acquaintance”. The limitation of micro-comparability is, therefore, resolved, and thus fulfills the first hypothesis by differentiating between equally weighted clusters and clusters with relational variance. Figure 1: Weighted, Undirected Networks Table 5: Triad Census of Graphs in Figure 1) 𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 𝑣̅ = √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 𝑀̅ 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ 𝑣̅𝑖,𝑗,π‘˜ A B C 1 3 2 1.732050808 1.817120593 1.774075869 A C B 2 3 1 2.449489743 1.817120593 2.109743646 B A C 1 2 3 1.414213562 1.817120593 1.603058510 B C A 3 2 1 2.449489743 1.817120593 2.109743646 C A B 2 1 3 1.414213562 1.817120593 1.603058510 C B A 3 1 2 1.732050808 1.817120593 1.774075869 Network a): C = 0.9805431 𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 𝑣̅ = √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 𝑀̅ 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ 𝑣̅𝑖,𝑗,π‘˜ A B C 2 2 2 2 2 2 A C B 2 2 2 2 2 2 B A C 2 2 2 2 2 2 B C A 2 2 2 2 2 2 C A B 2 2 2 2 2 2 C B A 2 2 2 2 2 2 Network b): C = 1.0000000 3.3. MACRO-COMPARABILITY The issue of macro-comparability suggests that the developed formula should be able to be implemented in all types of networks indifferent to their relational properties. The proposed formula above can only be used in weighted or directed networks. When implementing weights and directions into this equation things become problematic. Imagine the following two graphs presented in Figure 2.
  • 19. 19 While both of these are proven to be clustered with the clustered transitive triad being 𝐡𝐴𝐢, the results rendered seem erroneous (Refer to Table 6). The top table refers to graph a) and results in an expected clustering coefficient of 𝐢 = 0.97. Graph b), however, calculates a clustering coefficient of 𝐢 = 1.03. The rationale for this is because the geometric mean of all edges is higher than the geometric mean of the examined open triad. This paper, however, insists on a valid concept. Therefore, the co-ordinate systems of the C-curves are taken into account. Figure 2: Weighted, Directed Networks Table 6: Triad Census of Graphs in Figure 2) 𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 𝑀̅ 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ 𝑣̅𝑖,𝑗,π‘˜ A B C 0 3 3 0 0 0 A C B 3 0 0 0 0 0 B A C 4 3 3 3.464101615 3.301927249 3.382042507 B C A 3 0 4 0 0 0 C A B 0 0 0 0 0 0 C B A 0 4 0 0 0 0 Network a): C = 0.9763116 𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 𝑣̅ = √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 𝑀̅ 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ 𝑣̅𝑖,𝑗,π‘˜ A B C 0 3 3 0 0 0 A C B 3 0 0 0 0 0 B A C 2 3 3 2.449489743 2.620741394 2.533669111 B C A 3 0 2 0 0 0 C A B 0 0 0 0 0 0 C B A 0 2 0 0 0 0 Network b): C = 1.0343661 Figure 3 shows the co-ordinate systems of the following C-curves. The top graph depicts a generalized clustering coefficient curve based on prior research, with results varying between 𝐢 = 0 and 𝐢 = 1. The second illustrates the clustering coefficient curve derived from the formula presented in the previous section. 𝐢 = 1 is given for a completely connected graph, in terms of transitivity, with equally weighted edges. For a local triad, if the geometric mean of all three weighted edges is larger than that of the original two evaluated edges, then the clustering coefficient is 𝐢 > 1 and vice versa for the contrary. Such a result is not sought out. Instead a graph as seen in the last co-ordinate system is desired, where 𝐢 = 1 is given when the geometric mean of the three weighted edges is equal to that of the original two evaluated edges. If the geometric mean of all three is smaller, the curve rises up to the point where the conditions for 𝐢 = 1 is met and falls thereafter.
  • 20. 20 Figure 3: Co-Ordinate Systems of C-Curves FORMULA: WEIGHTED AND/OR DIRECTED NETWORKS With regard to the problem statement explained in the previous section the following clustering coefficient formula is derived. [ 15 ] 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜π‘–,𝑗,π‘˜ The variable 𝑀̅ is thereby read as follows. For (√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ β‰₯ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 β‰₯ 0): 𝑀̅ = √√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ βˆ— √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 For (0 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 ): 𝑀̅ = (√√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ βˆ— √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 ) βˆ’1 βˆ— (√ 𝑀𝑖𝑗 π‘€π‘—π‘˜) 2 This paper acknowledges an ideal cluster as a triad with equally weighted edges, corresponding to the original clustering coefficient for binary networks. The two formulae found in the case differentiation above depict the possible occurrences of the closed triadic relations at hand, with √wijwjkwik 3 being either smaller than / equal to or larger than the triadic relation without respect to the relative connecting edge. For the first case presented in the case differentiation, the same concept from section 3.2.1 is used. For the second case, the cross fracture (𝑐𝑓) of the clustering coefficient is taken into account and regards the occurrence that the weighted value assessment of all three edges is larger than that of the evaluated triad without respect to its connecting edge and is larger than C=1 C=1 C=1 √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 = √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 > √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 = √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 = √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 β‰  √ 𝑀𝑖𝑗 π‘€π‘—π‘˜
  • 21. 21 zero. An appropriate measurement is then equated for the variable 𝑀̅ in this case. The mathematical assessment of the variable 𝑀̅ for the second case is given as follows. Thereby the variables are simplified with a as the assessed value of connected triads and b as the assessed value of closed, transitive triads. For (√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ = 𝒂 β‰₯ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 = 𝒃 β‰₯ 0): ο‚· 𝐢 = 𝑀̅ 𝑣̅ = βˆšπ‘Žβˆ—π‘ π‘Ž with 𝑀̅ = βˆšπ‘Ž βˆ— 𝑏 and 𝑣̅ = π‘Ž For (0 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ = 𝒂 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 = 𝒃): ο‚· 𝐢𝑐𝑓 = ( 𝑀̅ 𝑣̅ )βˆ’1 = 𝑣̅ 𝑀̅ with 𝑀̅ = βˆšπ‘Ž βˆ— 𝑏 and 𝑣̅ = π‘Ž ο‚· 𝐢𝑐𝑓 = π‘Ž βˆšπ‘Žβˆ—π‘ βˆ— π‘Ž π‘Ž = π‘Ž2 βˆ— (βˆšπ‘Žβˆ—π‘)βˆ’1 a ο‚· 𝐢𝑐𝑓 = 𝑀̅ 𝑣̅ with 𝑀̅ = π‘Ž2 βˆ— (βˆšπ‘Ž βˆ— 𝑏)βˆ’1 and 𝑣̅ = π‘Ž Clustering Coefficient C: ο‚· 𝐢 = 𝑀̅ 𝑣̅ with 𝑀̅ = βˆšπ‘Ž βˆ— 𝑏 for (√ 𝑀𝑖𝑗 π‘€π‘—π‘˜ = 𝒂 β‰₯ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 = 𝒃 β‰₯ 0) with 𝑀̅ = π‘Ž2 βˆ— (βˆšπ‘Ž βˆ— 𝑏)βˆ’1 for (0 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ = 𝒂 < √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 = 𝒃) With the proposed formula the desired C-co-ordinate system is achieved. Furthermore, the formula offers macro- comparability in weighted and/or directed networks and can be used in binary networks, as well. EXAMPLE For comparative reasons the same example used in Figure 2 is used here, as well (Refer to Figure 4). The given formula now allows for micro-comparability in a macro-comparable setting of both weighted and/or directed networks, which is shown in Table 7. Contrary to Table 6, the clustering coefficient results vary between 𝐢 = 0 and 𝐢 = 1. The value 𝐢 = 1 is given for a fully connected graph with equally weighted edges, comparable to the original clustering ratio in binary networks from Newman et al. (2001). Figure 4: Weighted, Directed Networks
  • 22. 22 Table 7: Triad Census of Graphs in Figure 4) 𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 𝑀̅ 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ 𝑣̅𝑖,𝑗,π‘˜ A B C 0 3 3 0 0 0 A C B 3 0 0 0 0 0 B A C 4 3 3 3.464101615 3.301927249 3.382042507 B C A 3 0 4 0 0 0 C A B 0 0 0 0 0 0 C B A 0 4 0 0 0 0 Network a): C = 0.9763116 𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 𝑣̅ = √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 3 𝑀̅ 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ 𝑣̅𝑖,𝑗,π‘˜ A B C 0 3 3 0 0 0 A C B 3 0 0 0 0 0 B A C 2 3 3 2.449489743 2.620741394 2.368107175 B C A 3 0 2 0 0 0 C A B 0 0 0 0 0 0 C B A 0 2 0 0 0 0 Network b): C = 0.9667757 FORMULA: STRONGLY BALANCED WEIGHTED AND/OR DIRECTED NETWORKS As already mentioned, the prior literature’s inclusion of negative and positive weights in the clustering coefficient is scarce at best, even though, a mathematical solution towards the non-acknowledgement of unbalanced triads in the assessment of cluster formation is easily depicted. As already mentioned, the strong formulation of balance determines triads with three positive edges as a cluster as well as two negative edges and one positive one. All other triads aren’t acknowledged as clusters. With small adjustments the current formula can account for these limitations, as seen below. [ 16 ] 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 𝑖,𝑗,π‘˜ The denominator 𝑣̅ is constructed so that the arithmetic sign is not a decisive factor. The variable 𝑀̅ reads as follows: For (√(𝑀𝑖𝑗 π‘€π‘—π‘˜) 24 β‰₯ √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜) 26 β‰₯ 0): 𝑀̅ = √√(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 βˆ— √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2 2 3 For (0 < √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 < √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)26 ): 𝑀̅ = ( √√(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 βˆ— √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2 2 3 ) βˆ’1 βˆ— (√(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 ) 2
  • 23. 23 In order to exclude unbalanced triads in terms of a strong formulation of balance from being counted as clusters the following notation to the second factor of 𝑀̅ was added. If 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ is made up of three positive edges or two negative and one positive edges, then the numerator will result in 2 βˆ— 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜. This then gets divided by 2, resulting in 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜, equal to the equation depicted before implementing the aspect of balance. If, however, 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ has three negative edges or one negative and two positive edges the numerator will show 0, since these are not considered to be clusters. In order to account for the inclusion of negative weights in the rest of the formula the paper squared the products and raised the route to the forth degree, with no impact on the results. EXAMPLE The following example includes clusters balanced according to the strong formulation of balance. Network a) is seen as balanced since the product of the edges is positive. The contrary can be said about network b). As seen in Table 8, the clustering coefficient of network a) is the same as that for network a) from Figure 4. Despite its transitive triadic closure, network b) gives a clustering coefficient of 𝐢 = 0, because the network is not strongly balanced. Figure 5: Signed, Directed, Weighted Networks Table 8: Triad Census of Graphs in Figure 5) 𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 𝑣̅ = √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2 2 3 𝑀̅ 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ 𝑣̅𝑖,𝑗,π‘˜ A B C 0 -3 -3 0 0 0 A C B -3 0 0 0 0 0 B A C 4 -3 -3 3.464101615 3.301927249 3.382042507 B C A -3 0 4 0 0 0 C A B 0 0 0 0 0 0 C B A 0 4 0 0 0 0 Network a): C = 0.9763116 𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 𝑣̅ = √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2 2 3 𝑀̅ 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ 𝑣̅𝑖,𝑗,π‘˜ A B C 0 3 3 0 0 0 A C B 3 0 0 0 0 0 B A C -2 3 3 2.449489743 0 0 B C A 3 0 -2 0 0 0 C A B 0 0 0 0 0 0 C B A 0 -2 0 0 0 0 Network b): C = 0.0000000 FORMULA: WEAKLY BALANCED WEIGHTED AND/OR DIRECTED NETWORKS Similar to the clustering coefficient in the strongly balanced networks the adjustment is a mere tweak to allow for the occurrence of three negative edges in a cluster. The proposed solution is as follows.
  • 24. 24 [ 17 ] 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 𝑖,𝑗,π‘˜ The denominator remains the same. The variable 𝑀̅ is read as follows. For (√(𝑀𝑖𝑗 π‘€π‘—π‘˜) 24 β‰₯ √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜) 26 β‰₯ 0): 𝑀̅ = √ √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 βˆ— ( √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2 2 3 βˆ’ √ (𝑀𝑖𝑗 βˆ’ √(𝑀𝑖𝑗) 2 ) βˆ— (π‘€π‘—π‘˜ βˆ’ √(π‘€π‘—π‘˜) 2 ) βˆ— (π‘€π‘–π‘˜ βˆ’ √(π‘€π‘–π‘˜ )2) 8 3 ) For (0 < √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 < √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)26 ): 𝑀̅ = (√ √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 βˆ— ( √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2 2 3 βˆ’ √(𝑀𝑖𝑗 βˆ’ √(𝑀𝑖𝑗) 2 ) βˆ— (π‘€π‘—π‘˜ βˆ’ √(π‘€π‘—π‘˜) 2 ) βˆ— (π‘€π‘–π‘˜ βˆ’ √(π‘€π‘–π‘˜ )2) 8 3 ) ) βˆ’1 βˆ— (√(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 ) 2 To the right of the last discussed adaption an expansion is given. By doing so, the equation tests, if all edges are negative. If they are, each part in the brackets of the expansion will result in βˆ’2 𝑀. The numerator under this expansion then reads (βˆ’2𝑀𝑖𝑗) βˆ— (βˆ’2π‘€π‘—π‘˜) βˆ— (βˆ’2π‘€π‘–π‘˜). This is afterwards divided by 8 and one is left with the geometric mean of βˆ’π‘€π‘–π‘— π‘€π‘—π‘˜ π‘€π‘–π‘˜. This negative value is then substracted from the value 0 - the value 0, because three negative edges were excluded in the minuend. By subtracting the negative value the geometric mean of 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ remains. If one of these edges were to be positive this portion results in 0. The minuend simply then checks if only one negative edge exists in the triad (which would also result in 0) or if one or three positive edges appear. If so, the same steps apply that were formulated in the clustering coefficient for strongly balanced networks. EXAMPLE In the following example network a) is weakly balanced and therefore clusterable. In comparison the result is equal to the other networks a) with three positive edges (Figure 4) and one positive edge (Figure 5). Network b) in the following example is not balanced and therefore not a cluster. Figure 6: Signed, Directed, Weighted Networks
  • 25. 25 Table 9: Triad Census of Graphs in Figure 6) 𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 𝑣̅ = √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2 2 3 βˆ’ √ ( 𝑀𝑖𝑗 βˆ’ √( 𝑀𝑖𝑗 ) 2 ) βˆ— ( π‘€π‘—π‘˜ βˆ’ √( π‘€π‘—π‘˜ ) 2 ) βˆ— ( π‘€π‘–π‘˜ βˆ’ √( π‘€π‘–π‘˜ )2 ) 8 3 𝑀̅ 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ 𝑣̅𝑖,𝑗,π‘˜ A B C 0 -3 -3 0 0 0 A C B -3 0 0 0 0 0 B A C -4 -3 -3 3.464101615 3.301927249 3.382042 B C A -3 0 -4 0 0 0 C A B 0 0 0 0 0 0 C B A 0 -4 0 0 0 0 Network a): C = 0.9763116 𝑖 𝑗 π‘˜ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ 𝑣̅ = √(𝑀𝑖𝑗 π‘€π‘—π‘˜)24 √ 𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜ + √(𝑀𝑖𝑗 π‘€π‘—π‘˜ π‘€π‘–π‘˜)2 2 3 βˆ’ √ ( 𝑀𝑖𝑗 βˆ’ √( 𝑀𝑖𝑗 ) 2 ) βˆ— ( π‘€π‘—π‘˜ βˆ’ √( π‘€π‘—π‘˜ ) 2 ) βˆ— ( π‘€π‘–π‘˜ βˆ’ √( π‘€π‘–π‘˜ )2 ) 8 3 𝑀̅ 𝐢 = βˆ‘ 𝑀̅𝑖,𝑗,π‘˜ βˆ‘ 𝑣̅𝑖,𝑗,π‘˜ A B C 0 3 3 0 0 0 A C B 3 0 0 0 0 0 B A C -2 3 3 2.449489743 0 0 B C A 3 0 -2 0 0 0 C A B 0 0 0 0 0 0 C B A 0 -2 0 0 0 0 Network b): C = 0.0000000 Equation [ 17 ] concludes the formula extension proposed by this paper. With this expansion, clustering results can be emitted in all types of networks, i.e. binary, weighted, directed and/or signed networks. This alleviates prior literature’s limitation of macro-comparability and, thereby, supports the second hypothesis in this paper. 3.4. SUMMARY By alleviating the shortcomings found in prior research and combining their findings, the proposed clustering coefficient is considered to be improved. Thereby, the aspect of relational variance among triads is taken into consideration and offers micro-comparability within its results. Furthermore, the formula extension acknowledges different types of networks in which the formula can be implemented in. This acknowledgement delivers macro- comparability in its result and can therefore be used to assess the formation of clusters in terms of triadic closure in binary, directed, weighted, and/or signed networks. While the toy examples above support the theoretical concept, an analysis of a real-world network supports the findings in a more practical context. Therefore, the following section is dedicated to applying this formula in a real-world network and comparing its results to the results of formulae from prior research. 4. CLUSTERING COMPARISON 4.1. DATA SET The data set used to assess the comparisons of the clustering coefficient formulae is from the well-known Zachary Karate Club network (Zachary 1977). The network is binary and consists of 34 nodes and 78 symmetrical edges. The data set remains unchanged for the comparison in binary networks and is adjusted accordingly to allow for comparisons in directed and weighted networks. Figure 7 shows the structure of the network.
  • 26. 26 Figure 7: Zachary Karate Club Graph (Zachary 1977) 4.2. COMPARISON IN UNWEIGHTED NETWORKS Table 10: Zachary Karate Club Adjacency Matrix (Zachary 1977) The matrix above is the adjacency matrix from the Zachary Karate Club data set (Zachary 1977). This shows the data set used in the following comparison of an undirected, unweighted network. The clustering coefficient rendered from this paper equals 𝐢 π‘…π‘’π‘π‘–π‘›π‘œ = 0.22277. The comparable coefficient for binary networks from Newman et al. (2001) provides the same result of 𝐢 π‘π‘’π‘€π‘šπ‘Žπ‘› 𝑒𝑑 π‘Žπ‘™. = 0.22277 (Newman et al. 2001). This is expected since the newly proposed formula simplified for binary networks, is merely the weighted values of triangles over the weighted values of open triads. In the following the matrix for a comparison in directed unweighted networks is rendered.
  • 27. 27 Table 11: Unweighted, Directed Network Based on the Zachary Karate Club The matrix above is the same as in Table 10, with the exception that the matrix is no longer symmetric. Here, the values below the diagonal were removed or left alone at random. By doing so, the structure of the graph remains the same, where all nodes that were originally connected to one another still are. The property of directedness can now, however, be assessed. The clustering coefficient based on this paper’s proposal is 𝐢 π‘…π‘’π‘π‘–π‘›π‘œ = 0.19914. The comparable formula for directed networks is given by Opsahl and Panzarasa (2009). Their result 𝐢 π‘‚π‘π‘ π‘Žβ„Žπ‘™ π‘ƒπ‘Žπ‘›π‘§π‘Žπ‘Ÿπ‘Žπ‘ π‘Ž = 0.19914 is identical to results found in this paper, given the lack of relational variance among the triads. Because the network is unweighted, each of the edges is seen as identical to one another. The impact of the formula proposed in this paper, is first recognizable, when weights are attributed to the edges. This is provided in the following section. 4.3. COMPARISON IN WEIGHTED NETWORKS Table 12: Weighted, Undirected Network Based on the Zachary Karate Club
  • 28. 28 The above matrix is based on the matrix of the Zachary Karate Club (Zachary 1977). Here, however, the adjacencies are multiplied by a random factor between 1 and 10 resulting in the assessed weights. The paper assesses the weighted matrix with a clustering coefficient of 𝐢 π‘…π‘’π‘π‘–π‘›π‘œ = 0.21565. The clustering coefficient by Opsahl and Panzarasa (2009) provides a result of 𝐢 π‘‚π‘π‘ π‘Žβ„Žπ‘™ π‘ƒπ‘Žπ‘›π‘§π‘Žπ‘Ÿπ‘Žπ‘ π‘Ž = 0.25861. In comparison, to the unweighted, undirected version of the matrix, the clustering results according to this paper decrease, since the weighted version offers a case of relational variance among the triads. This deems the clustered triads as non-ideal, since the three weights of a given triangle are not equally weighted, as in the case of the merely directed matrix, in which each weight is valued as 𝑀 = 1. Hence, the assessment in the weighted, undirected network delivers smaller results. On the contrary, the results according to Opsahl and Panzarasa (2009) increase when implementing weights. Because the aspect of directedness is not yet added, it is of interest to also compare the results based on the formula by Phan et al. (2013). This returns a result of 𝐢 π‘ƒβ„Žπ‘Žπ‘› 𝑒𝑑 π‘Žπ‘™. = 0.23537, which is also higher than the clustering result according to this paper. Because the proposed formula moderates the result by taking the aspect of relational variance into consideration, the coefficient is smaller than the results from formulae in prior literature. Table 13: Weighted, Directed Network Based on the Zachary Karate Club The weight matrix above not only considers the aspect of differently weighted edges, but also incorporates the aspect of directedness added in the previous section of this paper. This paper determines the clustering coefficient of the weighted, directed graph to be 𝐢 π‘…π‘’π‘π‘–π‘›π‘œ = 0.18714. In comparison to clustering coefficient results according to Opsahl and Panzarasa (2009), namely 𝐢 π‘‚π‘π‘ π‘Žβ„Žπ‘™ π‘ƒπ‘Žπ‘›π‘§π‘Žπ‘Ÿπ‘Žπ‘ π‘Ž = 0.22569, the results are once again smaller. In comparison, to the unweighted, directed version of the matrix, the clustering results according to this paper decrease, since the weighted version offers a case of relational variance among the triads. This deems the clustered triads as non-ideal, since the three weights of a given triangle are not equally weighted, as in the case of the merely directed matrix, where each weight is assessed as 𝑀 = 1. Hence, the assessment in the weighted, directed network also delivers smaller results. On the contrary, the results according to Opsahl and Panzarasa (2009) increase when implementing weights, distorting its interpretation. As foreseen, due to the moderation of the formula for the case of relational variance, the proposal offered in this paper emits a smaller value than those found in prior literature, thereby supporting this paper’s third hypothesis.
  • 29. 29 5. CONCLUSION 5.1. LIMITATIONS While improvements of the clustering coefficient are made, this paper is still subjected to certain limitations. First of all, the data set used is relatively small with only 34 nodes. In a more extensive study the formula could be tested more thoroughly. Furthermore, the data sets for directed and weighted networks are only based on the real network of the Zachary Karate Club. The relational properties within were randomized. The formula can be validated more efficiently in the analysis of real-world weighted and directed networks. Secondly, including signed networks in social network analyses isn’t common. The data collection process on a personal level, e.g. in the form of questionnaires or surveys, often shies away from assessing negative relations. However, while signed networks aren’t common today, the advancements in technology and the automatic assessment of negative relations shows a future need for such. However, only time will tell whether this prognosis is valid or not. 5.2. FUTURE IMPLICATIONS The proposed formula offered combines the advancements found in prior research of the clustering coefficient and alleviates the equations of their shortcomings. However, the consideration of micro-comparability and macro- comparability embodies the true advancement in this paper in regard to future implications. Addressing the limitation of micro-comparability alleviates the need for using cut-off measures to eliminate insignificant values. By doing so, comparisons between networks can be assessed more precisely and all gathered data can be implemented. The insignificant cut-off values that would have been disregarded in the past now provide an accordingly insignificant increase or decrease in the clustering coefficient result. This not only saves time in the network analysis but also provides a more qualitative result. For example, two graphs, completely identical with the exception of one insignificant edge, can be compared, resulting in an almost identical yet still comparable result. Furthermore, the aspect of micro-comparability allows researchers to directly assess the clustering in completely connected graphs. In regard to the consideration of macro-comparability in the formula, future researchers can gather all data needed without worrying about having the necessary mathematical measures to analyze said data. Moreover, the analysis can be conducted in all types of graphs regardless of the relational properties, thereby offering comparability between them. Especially, given the assessment of negative edges in online social networking sites, it is foreseeable that this formula will be used in such environments, thereby depicting the clustering of these networks more precisely. For example, the amount of dislikes and likes of Youtube comments could be used to assess the clustering of Youtubers in this directed, weighted, and signed network. 5.3. SUMMARY Along the lines of this paper, prior literature of the clustering coefficient is reviewed. Thereby, the limitations in the form of a lack of combinations between the coefficients as well as a lack of micro- and macro-comparability are uncovered. These are addressed and a formula expansion is proposed to overcome said limitations. The formula is tested on the real-world network of the Zachary Karate Club (Zachary 1977) and the results show a more precise clustering assessment than the clustering coefficients discovered in prior literature. The proposed findings imply alleviating the formula of cut-off measures, assessing the clustering formation in completely connected networks, and assessing the clustering in all types of networks regardless of their structural and relational characteristics.
  • 30. 30 6. PUBLICATION BIBLIOGRAPHY Barnes, J. A. (1969): Networks and political process. In Social networks in urban situations, pp. 51–76. Barrat, A.; Barthelemy, M.; Pastor-Satorras, R.; Vespignani, A. (2004): The architecture of complex weighted networks. In Proceedings of the National Academy of Sciences of the United States of America 101 (11), pp. 3747–3752. Berkhin, P. (2006): A Survey of Clustering Data Mining Techniques. In Grouping multidimensional data, pp. 25–71. Boccaletti, S.; Latora, V.; Moreno, Y.; Chavez, M.; Hwang, D. (2006): Complex networks. Structure and dynamics. In Physics Reports 424 (4-5), pp. 175–308. Davis, J. A. (1967): Clustering and structural balance in graphs. In Human Relations 20, pp. 181–187. EngΓΈ-Monsen, K.; Canright, G. (2011): Weighted Clustering Coefficients. Telenor. Oslo (Telenor Report R5). Fagiolo, G. (2007): Clustering in complex directed networks. In Physical Review E 76 (2), p. 26107. Fortunato, S. (2010): Community detection in graphs. In Physics Reports 486 (3-5), pp. 75–174. Girvan, M.; Newman, M. E. J. (2002): Community structure in social and biological networks. In Proceedings of the National Academy of Sciences of the United States of America 99 (12), pp. 7821–7826. Grindrod, P. (2002): Range-dependent random graphs and their application to modeling large small-world proteome datasets. In Physical Review E 66 (6), p. 66702. Heider, F. (1946): Attitudes and cognitive organization. In Journal of Psychology 21, pp. 107–112. Holland, P. W.; Leinhardt, S. (1971): Transitivity in structural models of small groups. In Comparative Group Studies, pp. 107–124. Holme, P.; Park, S. M.; Kim, B. J.; Edling, C. R. (2007): Korean university life in a network perspective. Dynamics of a large affiliation network. In Physica A: Statistical Mechanics and its Applications 373, pp. 821– 830. Kalna, G.; Higham, D. J. (2007): A clustering coefficient for weighted networks, with application to gene expression data. In Ai Communications 20 (4), pp. 263–271. Kephart, W. M. (1950): A Quantitative Analysis of Intragroup Relationships. In American Journal of Sociology 55 (6), pp. 544–549. KivelΓ€, M.; Arenas, A.; Barthelemy, M.; Gleeson, J. P.; Moreno, Y.; Porter, M. A. (2014): Multilayer networks. In Journal of Complex Networks 2 (3), pp. 203–271. Kunegis, J.; Lommatzsch, A.; Bauckhage, C. (2009): The Slashdot Zoo. Mining a Social Network with Negative Edges. In Proceedings of the 18th international conference on World wide web ACM, pp. 741–750. Latora, V.; Marchiori, M. (2003): Economic small-world behavior in weighted networks. In The European Physical Journal B - Condensed Matter 32 (2), pp. 249–263. Newman, M. E. J. (2001): The structure of scientific collaboration networks. In Proceedings of the National Academy of Sciences 98 (2), pp. 404–409.
  • 31. 31 Newman, M. E. J.; Strogatz, S. H.; Watts, D. J. (2001): Random graphs with arbitrary degree distributions and their applications. In Physical Review E 64 (2), p. 26118. Onnela, J. P.; SaramΓ€ki, J.; KertΓ©sz, J.; Kaski, K. (2005): Intensity and coherence of motifs in weighted complex networks. In Physical Review E 71 (6), p. 65103. Opsahl, T.; Panzarasa, P. (2009): Clustering in weighted networks. In Social Networks 31 (2), pp. 155–163. Phan, B.; EngΓΈ-Monsen, K.; Fjeldstad, Ø. D. (2013): Considering clustering measures. Third ties, means, and triplets. In Social Networks 35 (3), pp. 300–308. SaramΓ€ki, J.; KivelΓ€, M.; Onnela, J. P.; Kaski, K.; Kertesz, J. (2007): Generalizations of the clustering coefficient to weighted complex networks. In Physical Review E 75 (2), p. 27105. Schank, T.; Wagner, D. (2005): Approximating Clustering Coefficient and Transitivity. In Journal of Graph Algorithms and Applications 9 (2), pp. 265–275. Squartini, T.; Fagiolo, G.; Garlaschelli, D. (2011): Randomizing world trade. II. A weighted network analysis. In Physical Review E 84 (4), p. 46118. Szell, M.; Lambiotte, R.; Thurner, S. (2010): Multirelational organization of large-scale social networks in an online world. In Proceedings of the National Academy of Sciences of the United States of America 107 (31), pp. 13636–13641. Szell, M.; Thurner, S. (2012): Social dynamics in a large-scale online game. In Advances in Complex Systems 15 (6), p. 1250064. Tabak, B. M.; Takami, M.; Rocha, J. M. C.; Cajueiro, D. O. (2014): Directed clustering coefficient as a measure of systemic risk in complex banking networks. In Physica A: Statistical Mechanics and its Applications 394, pp. 211–216. Wasserman, S. (1994): Social network analysis. Methods and applications. In Cambridge university press 8, pp. 165–243. Watts, D. J.; Strogatz, S. H. (1998): Collective dynamics of β€˜small-world’ networks. In Nature 393 (6684), pp. 440–442. Zachary, W. W. (1977): An information flow model for conflict and fission in small groups. In Journal of Anthropological Research 33, pp. 452–473. Zhang, B.; Horvath, S. (2005): A general framework for weighted gene co-expression network analysis. In Statistical applications in genetics and molecular biology 4 (1), pp. Article 17.
  • 32. 32 7. INDEX 7.1. LIST OF TABLES Table 1: Triadic Value Assessment (Opsahl, Panzarasa 2009) .............................................................................. 7 Table 2: Clustering in Terms of the Formulation of Balance (Szell et al. 2010).................................................... 7 Table 3: Opsahl, Panzarasa’s (2009) Clustering Coefficient Denominator Differences ...................................... 14 Table 4: Micro-Comparability of Clustered Triads .............................................................................................. 16 Table 5: Triad Census of Graphs in Figure 1) ...................................................................................................... 18 Table 6: Triad Census of Graphs in Figure 2) ...................................................................................................... 19 Table 7: Triad Census of Graphs in Figure 4) ...................................................................................................... 22 Table 8: Triad Census of Graphs in Figure 5) ...................................................................................................... 23 Table 9: Triad Census of Graphs in Figure 6) ...................................................................................................... 25 Table 10: Zachary Karate Club Adjacency Matrix (Zachary 1977) ..................................................................... 26 Table 11: Unweighted, Directed Network Based on the Zachary Karate Club .................................................... 27 Table 12: Weighted, Undirected Network Based on the Zachary Karate Club .................................................... 27 Table 13: Weighted, Directed Network Based on the Zachary Karate Club ........................................................ 28 7.2. LIST OF FIGURES Figure 1: Weighted, Undirected Networks........................................................................................................... 18 Figure 2: Weighted, Directed Networks............................................................................................................... 19 Figure 3: Co-Ordinate Systems of C-Curves........................................................................................................ 20 Figure 4: Weighted, Directed Networks............................................................................................................... 21 Figure 5: Signed, Directed, Weighted Networks.................................................................................................. 23 Figure 6: Signed, Directed, Weighted Networks.................................................................................................. 24 Figure 7: Zachary Karate Club Graph (Zachary 1977)......................................................................................... 26
  • 33. 33 8. AFFIDAVIT Eidesstaatliche ErklΓ€rung: Ich erklΓ€re mich hiermit gemÀß Β§ 17 Abs. 2 APO, dass ich die vorstehende Bachelorarbeit selbstΓ€ndig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt habe. ________________ ________________________________ (Datum) (Untrschrift)