How can the use of computer simulation benefit the monitoring and mitigation ...
Β
Rubino, Nicholas
1. The Clustering Coefficient: A Literature Review and
Formula Extension
Bachelorarbeit
im Studiengang
International Information Systems Management
der FakultΓ€t Wirtschaftsinformatik
und Angewandte Informatik
der Otto-Friedrich-UniversitΓ€t Bamberg
Verfasser:
Nicholas Michael Rubino
Gutachter:
Prof. Dr. Kai Fischbach
2. 2
TABLE OF CONTENTS
TABLE OF CONTENTS.................................................................................................................................................................2
1. INTRODUCTION .................................................................................................................................................................3
1.1. PROBLEM STATEMENT ............................................................................................................................ 3
1.2. RESEARCH QUESTIONS & HYPOTHESES .................................................................................................. 4
1.3. STRUCTURE OF THIS PAPER ..................................................................................................................... 4
1.4. PURPOSE OF THIS PAPER.......................................................................................................................... 4
2. LITERATURE REVIEW .......................................................................................................................................................5
2.1. PRELIMINARIES ....................................................................................................................................... 5
Network Theory........................................................................................................................................................................5
Triadic Relations.......................................................................................................................................................................6
Binary Networks: Triadic Closure........................................................................................................................................6
Directed Networks: Triadic Transitivity...............................................................................................................................6
Weighted Networks: Triadic Value......................................................................................................................................6
Signed Networks: Triadic Balance.......................................................................................................................................7
2.2. CLUSTERING COEFFICIENT ...................................................................................................................... 7
Clustering Classification...........................................................................................................................................................7
Binary Networks.......................................................................................................................................................................8
Weighted Networks ................................................................................................................................................................10
Weighted and Directed Networks ...........................................................................................................................................13
Signed Networks.....................................................................................................................................................................14
Research Trends......................................................................................................................................................................15
Limitations..............................................................................................................................................................................15
3. FORMULA EXTENSION ....................................................................................................................................................16
3.1. BASIS .................................................................................................................................................... 17
3.2. MICRO-COMPARABILITY....................................................................................................................... 17
Formula: Weighted or Directed Networks ..............................................................................................................................17
Example..................................................................................................................................................................................17
3.3. MACRO-COMPARABILITY...................................................................................................................... 18
Formula: Weighted and/or Directed Networks........................................................................................................................20
Example..................................................................................................................................................................................21
Formula: Strongly Balanced Weighted and/or Directed Networks..........................................................................................22
Example..................................................................................................................................................................................23
Formula: Weakly Balanced Weighted and/or Directed Networks...........................................................................................23
Example..................................................................................................................................................................................24
3.4. SUMMARY ............................................................................................................................................. 25
4. CLUSTERING COMPARISON.............................................................................................................................................25
4.1. DATA SET.............................................................................................................................................. 25
4.2. COMPARISON IN UNWEIGHTED NETWORKS........................................................................................... 26
4.3. COMPARISON IN WEIGHTED NETWORKS ............................................................................................... 27
5. CONCLUSION...................................................................................................................................................................29
5.1. LIMITATIONS ......................................................................................................................................... 29
5.2. FUTURE IMPLICATIONS.......................................................................................................................... 29
5.3. SUMMARY ............................................................................................................................................. 29
6. PUBLICATION BIBLIOGRAPHY .........................................................................................................................................30
7. INDEX ..............................................................................................................................................................................32
7.1. LIST OF TABLES..................................................................................................................................... 32
7.2. LIST OF FIGURES ................................................................................................................................... 32
8. AFFIDAVIT.......................................................................................................................................................................33
3. 3
1. INTRODUCTION
The formation of clusters within a social network, isnβt new news. In fact, many see it as one of the earliest features
for measuring a networkβs morphology, with approaches dating back to the mid-20th century (Barnes 1969).
However, in the current stream of research the assessment of graph clustering has become particularly popular.
Based on the clustering coefficient πΆ developed by Watts and Strogatz (1998) as well as the one by Newman et
al. (2001), various extensions have been developed to depict the clustering ratio more precisely. While these
extensions have improved the understanding on the ways individuals are connected in various networks, its
development has evolved in a chaotic manner (Fortunato 2010). Several approaches can be found within recent
literature to assess the clustering of graphs; a single approach in doing so, however, has yet to have been adopted.
This paper considers a possible rationale for this occurrence based on three main limitations of prior research,
which are explained in the following section.
1.1. PROBLEM STATEMENT
The aforementioned clustering coefficients that started the hype in network literature to adapt said formulae were
originally intended for binary networks. This means this assessment of graph clustering merely acknowledges
structural properties of the graph, i.e. if a relationship between the networkβs members exists or doesnβt exist, as
seen in the clustering coefficient of Newman et al. (2001). This equation gives the tendency of the existence of
three members of a network being connected by three different relationships as opposed to these three only being
connected by two relationships. The latter implies that a connection between two of the members doesnβt exist.
This ratio, pertaining to the structural properties of a network, can be seen as the ideal foundation for the
developments that follow. As time passed relational properties were added to the mix, e.g. by attributing weights
or directedness to the relationships in the network. While great advancements have been made, this topic is still
in its research infancy, since many limitations are given for the respective developments.
The various presented clustering coefficients found in prior literature embrace findings to deliver a more precise
ratio of triadic closure over triadic connectedness. This ratio delivers the tendency that βthe friend of my friend is
my friendβ. However, the advancements found in prior literature and the respective shortcomings of each formula
are distributed disproportionately. This means, different equations offer different solutions to different problems,
but do not always incorporate the given solutions from other findings. Hence, each clustering coefficient found in
prior research incorporates limitations that have already been overcome in prior research. By combining selected
clustering coefficient formulae, the known limitations can be set aside and a synergy of advancements can be
achieved.
While the lack of combination of the coefficients represents the first summarized limitation in this paper, the lack
of micro-comparability is presented to be the second. Micro-comparability in terms of the clustering coefficient
acknowledges that a clustering coefficient doesnβt always allow for a comparability of the same graph with
different relational properties. The issue of micro-comparability is given due to prior literatureβs need to assess a
completely connected graph with a clustering coefficient value of πΆ = 1, resulting in a meaningless assessment
of the cluster formation in completely connected graphs and a distorted assessment of triads with a large relational
variance. By adapting the formula to take relational variance into consideration, the results can provide micro-
comparability within the clustered triads.
Furthermore, the issue of macro-comparability is given, as well. This means that the clustering coefficient is
unable to compare different social networks indifferent to their relational and structural properties. While great
advancements have been made to incorporate the relational properties of weightedness and directedness, current
literature has yet to provide a clustering coefficient that can be used in binary, weighted, directed and/or signed
4. 4
networks. To overcome the limitation of macro-comparability the formula can be adjusted so that it can be used
in all types of networks regardless of its relational or structural properties.
In summary due to the chaotic development of the clustering coefficient it is vital to collect, organize and compare
the clustering coefficients of prior literature, in order to capture said development, determine research trends, and
address limitations. The discovered limitations should then be resolved in the form of a formula adjustment.
1.2. RESEARCH QUESTIONS & HYPOTHESES
By reviewing these aspects within the scope of this paper, an answer to this problem statement is to be found, in
particular with the aim to answer the following research questions:
Q1: How has the clustering coefficient developed over time?
Q2: Can the formula be improved to incorporate micro- and macro-comparability?
Q3: How would such a formula compare to the formulae found in research literature?
The presented paper sets out to answer Q1 by reviewing prior research on the clustering coefficient. Furthermore,
this paper proposes a formula extension as an answer to Q2, which can hypothetically do the following:
H1: The proposed formula distinguishes between clusters in terms of relational variance, thus offering micro-
comparability.
H2: The proposed formula can be used in binary, weighted, directed, and/or signed networks, thus offering
macro-comparability.
Within the scope of this paper, the proposed formula is compared to those found along this research stream, in
order to provide an answer to Q3. Because the proposed formula should exclude or minder aspects of the formulae
found in previous literature, the following is foreseeable:
H3: The newly developed clustering coefficient should emit a smaller value than those previously found in
research literature.
With regard to this paperβs research intent, the following section aims to present an outline of the paper at hand.
1.3. STRUCTURE OF THIS PAPER
Due to very recent developments of the clustering coefficient, a literature review of prior research is appropriate
and is presented in the first part of this paper. Thereby, preliminary knowledge on network theory is noted briefly
and insight towards triadic relations is given. Subsequently, a holistic review of prior research regarding the
clustering coefficient is presented, in which the term clustering is clarified, its development is documented,
research trends are rendered and its limitations are addressed. The second part of the paper consists of the formula
extension. The gathered research on the clustering coefficient is critically evaluated and modified according to the
reviewed limitations, thereby proposing a new clustering coefficient overcoming the limitations of micro-and
macro-comparability. In the third part of the paper, the new formula is compared to an extent to clustering
coefficients of the past. Finally, a section is reserved for topics of discussion, such as known limitations and future
implications.
1.4. PURPOSE OF THIS PAPER
By critically evaluating prior research in regard to the adaption of the clustering coefficient formula, it is desired
to provide a general understanding of said formula as well as an overview of its most recent development. By
doing so, this paper contributes to social network research as well as to the research of network theory in general.
Moreover, it is desired to extend current research by comparing these latest developments.
5. 5
Prior research presents a fully connected network with a clustering coefficient of πΆ = 1, resulting in a meaningless
analysis of the clustering within smaller networks, e.g. smaller companies, where everyone knows everyone, or
in the necessity to manipulate data by performing cut-off measures. The proposed formula intends to alleviate the
equation from this limitation, thus differentiating between clusters with differently weighted edges and equally
weighted ones, thereby offering micro-comparability. This paper, therefore, offers a new perspective towards
assessing the clustering formation, which can spark interest in fellow social network researchers to address this
old concept in a new light.
Additionally, by using the research at hand a new approach on the clustering coefficient is presented, which ideally
can act as a framework for measuring the clustering of a network indifferent of its characteristics, i.e. weighted,
directed networks, and networks that include positive and negative weights. By doing so, it is plausible that new
ideas on older and current theories will emerge, thus expanding the research in social networking analyses.
Because the clustering coefficient differs based on the network at hand, i.e. different equations are used for
assessing the formation of clusters in different networks, this paper intends to provide the means for comparability
in different types of networks, i.e. macro-comparability. Overall, the proposed formula is to the extent of this
paperβs knowledge the first of its kind to deliver clustering results in binary, weighted, directed, and signed
networks.
Moreover, it is also desired for social network researchers to further the presented review by excluding or
minimizing the limitations within this paper, as well as by evaluating other network measurement formulae in
regard to their fit in real-world environments, or even developing this formula within a more extensive empirical
study - possibly giving better insight towards cluster formations in specific environments or even towards other
mathematical analysis aspects along the lines of this study.
2. LITERATURE REVIEW
Since the clustering coefficient has evolved rapidly within the past several years, a review of prior research
literature is necessary in order to understand its origin and the changes already implemented. In order to do so, it
is vital to understand the basic notations and understandings used in network theory. Moreover, insight towards
relevant triadic relations is given. This preliminary understanding is presented in the following and can be used to
identify terms and variables used throughout the paper. Once this preliminary section is handled, the review of
prior research in specific regard to the clustering coefficient is presented, whereby its definition is clarified and
its development is documented. In addition the limitations within this literature review are addressed.
2.1. PRELIMINARIES
NETWORK THEORY
A graph πΊ consists of a set of π = {π1, π2, β¦ , π π} nodes (vertices, points, or actors), a set of πΏ = {π1, π2, β¦ , π π}
links (edges, lines, or ties), and a set of π = {π€1, π€2, β¦ , π€ π} values (weights). The weights attributed to each of
the edges, and correspondingly found within graph πΊ, can also be portrayed as a matrix in the weight matrix π,
e.g.: π€ππ describes the weight of the edge between ππ and ππ (Boccaletti et al. 2006). The adjacency matrix π΄ is the
weight matrix for binary networks, where only values of π = 0 or π = 1 are permitted. A graph can generally
embody four different structures: undirected and unweighted (binary), undirected and weighted, unweighted and
directed as well as weighted and directed. Weighted graphs have edges weighted with any numeric value. Directed
graphs have an asymmetrical weight matrix. Hereby, the order of the subscript of the weight is important. This
describes the direction to which the weight is relevant, e.g. the weight π€ππ describes the weighted edge going from
node ππ to node ππ. Typically in network research the prerequisite that π β π β π is given, which is also adopted
6. 6
in this paper. Signed graphs can include both positive and negative weights. If two nodes are connected, these are
neighbors or adjacent, with ππ being the number of neighbors that node ππ has, also known as the node degree
(KivelΓ€ et al. 2014). While the definition of relational and structural characteristics of a graph differs throughout
literature, this papers defines them as such. Structural properties of a graph focus on the existence or non-existence
of an edge connecting two nodes. Relational properties focus on the relational characteristics of this edge, e.g. the
weightedness and directedness. With this preliminary knowledge on network structures and relations therein, the
specific relation of triads is explained in following section.
TRIADIC RELATIONS
BINARY NETWORKS: TRIADIC CLOSURE
Triads or triples illustrate the relationships between a set of three nodes. A triad is connected or open, if the three
nodes are connected by two edges with weights higher than π€ = 0. If this is the case, then the nodes are neighbors
or adjacent. The triad is completely connected or closed if a triangle is formed, i.e. three edges connect the three
nodes (KivelΓ€ et al. 2014). In binary networks this closure is means enough for assessing the formation of a cluster,
as the existence of the relevant edges between the nodes suffices. However, when adding the relational property
of directedness, the transitivity needs to be assessed before determining, whether the triad is closed.
DIRECTED NETWORKS: TRIADIC TRANSITIVITY
The researchers Holland and Leinhardt (1971) propose transitivity to be the key structural concept in the analysis
of sociometric data. A closed triad defined by Wasserman (1994) is transitive if whenever πππ and πππ are present,
then so is πππ. In physical terms this portrays a link chain from ππ to π π through ππ and connects ππ and π π with a
non-vacuous link from the perspective of ππ. A non-vacuous connection is an out-going link from a focal actorβs
perspective (Wasserman 1994). In any given set of three nodes there are six triadic relations. For example, the
open triadic relations among the three nodes π1, π2, and π3 consist of the following six:
π12 π23 β π13 π32 β π21 π13 β π23 π31 β π31 π12 β π32 π21 ,
which depict the first condition of Wassermanβs transitivity definition. The second condition involves
implementing the third edge connected in a non-vacuous manner with the focal actor being the starting node π,
displayed as such:
π12 π23 π13 β π13 π32 π12 β π21 π13 π23 β π23 π31 π21 β π31 π12 π32 β π32 π21 π31.
While the transitivity definition helps assess triadic closure in directed networks, determining the weight value is
still open. The various assessment measures are given in the following section.
WEIGHTED NETWORKS: TRIADIC VALUE
When implementing these six triadic relations in clustering formulae, the corresponding weight π€ is taken out of
the weight matrix, resulting in a value of π€ = 0 if two nodes are not connected. However, the weights from the
weight matrix alone donβt suffice to assess the value of the entire triad. As Opsahl and Panzarasa (2009) point out,
there are four ways of assessing the triadic value: the arithmetic mean, the geometric mean, the maximum value
and the minimum value. While the arithmetic mean is simple to use, it is prone to sensitivity issues as it is not
robust against differences in weights, especially in extreme settings. The maximum and minimum value are also
prone to insensitivity, as lower weights in the maximum value and higher weights in the minimum value are
regarded to less of an extent. The geometric mean overcomes these issues of sensitivity (Opsahl, Panzarasa 2009).
The four methods are given in Table 1 with examples to show their deviations from one another. However, when
7. 7
regarding both positive and negative weights, the formulation of balance is key to determining the triadic closure.
This occurrence is presented in the following section.
Table 1: Triadic Value Assessment (Opsahl, Panzarasa 2009)
Maximum Value a) πππ₯(2,2) = 2 b) πππ₯(1,3) = 3
Minimum Value a) πππ(2,2) = 2 b) πππ(1,3) = 1
Arithmetic Mean a) (2 + 2) 2β = 2 b) (1 + 3) 2β = 2
Geometric Mean a) β2 β 2 = 2 b) β1 β 3 = 1.73
SIGNED NETWORKS: TRIADIC BALANCE
Since the introduction of the Theory of Structural Balance by Heider (1946), further implications of including
positive and negative weights on triadic relations have been researched thoroughly. Davis (1967) determines that
in order for a local network to be clusterable, it must also be balanced. The formulation of balance is based on the
arithmetic sign of all three weights of a triad. A closed triad with a single negative weight, and thus two positive
weights, is unbalanced and therefore not clusterable. On the contrary, a triad consisting of three positive weights
or of one positive and two negative weights is balanced. Specifically, these occurrences depict a strong
formulation of balance. A weak formulation of balance is also possible, if the given triad embodies negative
weights for all three of its edges (Davis 1967). A visual representation of this is available in Table 2.
Table 2: Clustering in Terms of the Formulation of Balance (Szell et al. 2010)
Strong Formulation of Balance Balanced Balanced Unbalanced Unbalanced
Weak Formulation of Balance Balanced Balanced Balanced Unbalanced
2.2. CLUSTERING COEFFICIENT
With this preliminary understanding at hand, the following section aims to depict a holistic development of the
clustering coefficient in prior research, ranging from the initial proposal, up to recent approaches of developing
the formula to analyze networks with different structural and relational properties. Thereby, the literature review
of the clustering coefficient is categorized into the networks these were developed for. Subsequently, the research
trends are provided, and the limitations towards the more recent developments are addressed. However, given the
aforementioned chaotic development of this stream of research, a classification of the types of clustering is
foremost necessary.
CLUSTERING CLASSIFICATION
As previously mentioned the development of cluster formation assessment has grown in a chaotic manner, which
has led to an unclear collection of clustering definitions. KivelΓ€ et al. (2014) distinguish the formation of clusters
8. 8
three-fold. Firstly, one can use the node degrees to emit the ratio of existing adjacencies against all possible
adjacencies in a graph. In this regard, the term clustering is synonymous with the density or neighborhood of a
node. Secondly, one can use walks and paths to assess the clustering formation of a graph. This assessment is
often used to identify communities, which is also synonymous to clusters and depict dense regions of a network
(Boccaletti et al. 2006). Community detection aims to group nodes in modules based on a graphβs topology.
Fortunato (2010) determines four traditional methods for assessing this type of clustering: hierarchical methods,
partitional and graph partitional methods as well as spectral methods. Other methods such as grid-based and
constraint-based clustering are available to use as well, among many others (Berkhin 2006). Lastly, clustering can
be determined at a triadic level by evaluating the relations of a set of three nodes (KivelΓ€ et al. 2014). In this
respect, the fundamental key formula, the clustering coefficient (Newman et al. 2001), measures transitivity,
which is also often seen as a synonym to clustering (Latora, Marchiori 2003), and gives the ratio of closed triads
or triples over mere connected ones. In physical terms, clustering based on triadic closure gives the tendency, that
βthe friend of my friend is my friendβ.
This paper chooses to generalize the types of clustering to a further extent by defining two types of cluster
assessment: a macroscopic and microscopic approach. On the one hand, the clustering of a network can be
assessed using a macroscopic approach. This entails that clusters are dense regions of network, for example cliques
or communities. Within these it is not necessary for each member of the dense cluster to be connected with one
another. The fact that the group is highly dense justifies the term clustering. As already pointed out clustering
based on community detection is abundantly present within prior research and many variations of this assessment
are given, as well. On the other hand microscopic clustering is possible. This acknowledges a cluster as a group
of three members that are completely connected. Generally, this can be assessed in terms of triadic closure. A
similar distinction of the two types of clustering is presented by Girvan and Newman (2002). They acknowledge
that the often synonymous terms are misleading and therefore refrain from using the term clustering to describe
the detection of communities. Since the paper at hand shares this view, the triadic route of assessing clusters is
chosen.
In terms of the clustering coefficient there are two main methods available for assessing the tendency of clustered
nodes. On the one hand, the local clustering coefficient is based on the local density of an egoβs network and
provides a result for the clustering tendency from a local actorβs perspective, e.g. by assessing all closed triads
over connected triads, in which the node ππ is involved. The sum of all local clustering coefficients can then be
averaged by all nodes to globalize its result across the entire network. The second measure is a straightforward
global measure. Here, the global clustering coefficient assesses all closed triads from each nodesβ perspective and
divides them by all open ones. The first measure is prone to sensitivity issues, since each local clustering
coefficient is equally weighted regardless of its node-degree or general connectedness in the network (Opsahl,
Panzarasa 2009). The use of both globalized local and global clustering coefficients is abundantly found prior
research, which can be seen in following section, where the development of the formula is illustrated.
BINARY NETWORKS
Equating the clustering tendency in binary networks has proven to be the simplest. Because relational properties
arenβt acknowledged, the following prerequisites are given πππ = πππ = π€ππ = π€ππ. The term clustering coefficient
was first introduced by Watts and Strogatz (1998) in their attempt to compare random networks to those in the
real-worlds. The clustering coefficient πΆ is defined as the average of πΆπ over all π, where π is the number of nodes
in the network and πΆπ is the ratio of the actual amount of edges (πΏπ) that ππ has over the maximum possible number
of edges equated using the following formula (ππ(ππ β 1)) 2β . Given the fraction form of the entire equation, with
9. 9
the denominator always larger than the numerator, πΆ is given between the values of πΆ = 0 and πΆ = 1. The
clustering coefficient by Watts and Strogatz (1998) can therefore be read as such:
[ 1 ]
πΆ πππ‘π‘π ππ‘πππππ‘π§ =
1
π
β
πΏπ
ππ(ππ β 1)/2
π
While this clustering coefficient doesnβt use triadic relations to determine the formation of clusters, its introduction
started the movement towards the clustering coefficient development and is therefore noteworthy. However,
comparable findings were made almost half a century prior to this introduction. The proposal given by Watts and
Strogatz (1998) is very similar to the findings of Kephart (1950), in which the law of family interactions is
proposed to be the ratio of actual relationships over potential ones. Watts and Strogatz (1998) develop this by
using of a focal nodeβs perspective, which is then globalized over the entire network.
Given this milestone on research development, Newman et al. (2001) adjust the definition of the clustering
coefficient in their study on random graphs and implement it on real-world networks, specifically collaboration
networks and the world-wide web. This claims to be equal to and merely reverses the approach to the original
clustering coefficient by taking the ratio of the means instead of the mean of the ratios. This coefficient is defined
as such (Newman et al. 2001):
[ 2 ]
πΆ πππ€πππ ππ‘ ππ. =
3πβ
πβ
In general terms, it is read as three times the number of all triangles (πβ) divided by all the connected triples
(πβ). The number 3 in the numerator is present on account of each triangle representing three closed triads. As
pointed out by Schank and Wagner (2005) as well as by Latora and Marchiori (2003), this formula and the one
from Watts and Strogatz (1998) differ. In fact, Latora and Marchiori (2003) define the latter proposed by Watts
and Strogatz (1998) to equate the approximate of a different measure, namely the efficiency. The two also extend
the model, which is explained in the review of the clustering coefficient in weighted networks. A further limitation
of the Watts-Strogatz formula (1998), is the fact that their formula is based on the sum of all local clustering
coefficients which is then globalized of the entire network. As mentioned in section 2.2.1 of this paper, such an
approach is prone to sensitivity issues.
While the presented clustering coefficients are intended for assessing the clustering in binary networks, literature
shows that these can also be implemented in weighted networks, as well. The study of scientific collaboration by
Newman (2001) is a great candidate for implementing weights, as the relations between scientists can be seen as
stronger for a larger amount of co-authored papers and weaker for the contrary. Newman (2001), however,
assesses the formation of clusters based on a binary graph and thus only acknowledges the existence of links
between the scientists and disregards relational attributes. Such a manipulation or symmetrization of the data, is
common in early social networking research, since the mathematical foundations for the clustering assessment
arenβt yet able to include relational properties. As an answer to this problem, the formula is extended to include
weightedness, which is presented in the following section.
10. 10
WEIGHTED NETWORKS
When implementing weighted edges into the clustering coefficient formula the prerequisites from binary networks
are no longer the case. Instead the following is given πππ = πππ β π€ππ = π€ππ. The weighted networks described in
this section are also undirected, therefore the order of the variablesβ subscript is irrelevant. Furthermore, prior
research shows a separation of development of the clustering coefficient formula. While early on many extensions
and adjustments are given to the local clustering coefficient of Watts and Strogatz (1998), most recent
developments further the global measure introduced by Newman et al. (2001). The following section first reviews
the development of the local clustering coefficient. Subsequently, developments of the global clustering
coefficient follow.
As already mentioned Latora and Marchiori (2003) define the original clustering coefficient model from Watts
and Strogatz (1998) to assess the efficiency of a network rather than its clustering. Specifically, this measures how
well information spreads throughout a network. In addition to providing a new definition of this formula, the two
researchers expand said formula to incorporate weights. This expansion is read as follows.
[ 3 ]
πΆπΏππ‘πππ ππππβππππ =
1
π(π β 1)
β
1
πππ
π,π
1
π(π β 1)
β
1
π€ππ
π,π
The numerator measures the average efficiency between two nodes, in which the shortest path-distance πππ
between two nodes ππ and ππ is seen as inversely proportional to their efficiency. The variable πππ gives the shortest
summed weight required to connect the two nodes. In order to normalize the efficiency between πΆ = 0 and πΆ = 1
the denominator is introduced. This measures the ideal average efficiency, in which π€ππ is equal to πππ if a direct
link between nodes ππ and ππ is formed. Since this formula is based on the Watts-Strogatz model (1998), it
purposely disregards emitting a clustering result and therefore refrains from using triadic closure (Latora,
Marchiori 2003). Regardless of this, this paper deems the findings of Latora and Marchiori (2003) noteworthy,
due to its provision of insight towards the importance of including weightedness in the original clustering
coefficient formula.
On a further note, Grindrod (2002)1 adapts the formula, as well, in order to equate the clustering tendency in even
larger networks, where the exact link number can be given by an estimate. In this ensemble approach, the number
of connected triads in the denominator above is replaced by the probability π that node ππ is connected to ππ and
is connected to π π. The numerator then expands this by including the probability π that nodes ππ and π π are
connected. Thereby, the probability π is given between π = 0 and π = 1 (Grindrod 2002). The formula is read as
such:
[ 4 ]
πΆ πΊπππππππ =
1
N
β (
β πππ πππ ππππ,π
β πππ ππππ,π
)
π
1
Grindrod (2002) merely proposes a local clustering coefficient in his article. For the sake of comparability,
the global clustering coefficient based on this formula is given by globalizing the local clustering
coefficients over all nodes N, as seen in Barrat et al. (2004). The same goes for the clustering coefficients
of Onnela et al. (2005), Zhang, Horvath (2005), and Holme et al. (2007).
11. 11
This development utilizes the approach of globalizing local clustering coefficients, as seen in Watts, Strogatz
(1998), but bases its factors on triadic closure, as seen in Newman et al. (2001). In his paper, Grindrod (2002)
further develops the formula to assess the probability values.
In their analysis of an airline transportation network as well as of a social network of scientific collaboration,
Barrat et al. (2004) introduce a local clustering coefficient based on triadic closure that implements weights as
relational properties. The formula is read as follows.
[ 5 ]
πΆ π΅πππππ‘ ππ‘ ππ. =
1
π
β (
1
π π(ππ β 1)
β
(π€ππ + π€ππ)
2
πππ πππ πππ
π,π
)
π
The factor (π π(ππ β 1)) is the weight of each edge times the maximum possible number of triples and is used to
normalize the clustering result between πΆ = 0 and πΆ = 1. This is comparable to the denominator of the Watts,
Strogatz formula (1998). The variable π π is the difference and embodies the node strength, which is the weighted
value of all edges connected to node ππ. The second factor accounts for the average amount of the two weighted
values that are connected by a focal actor ππ. However, this is only the case if a triangle is formed. This gives the
local clustering coefficient, which is then averaged overall nodes π to give the clustering coefficient for the entire
network. This formula marks a further development, as it utilizes one of the aforementioned triadic values
assessment measures, namely the arithmetic mean (Barrat et al. 2004).
Onnela et al. (2005) critique the clustering coefficient given by Barrat et al. (2004), on account of a disregard
towards the weighted value of the third connecting edge. They, therefore, expand the formula to incorporate the
value of said edge and apply the proposed formula to the undirected financial network of traded stocks. Their
proposal reads as such.
[ 6 ]
πΆ ππππππ ππ‘ ππ. =
1
π
β (
1
ππ(ππ β 1)
β (
π€ππ
πππ₯(π€)
π€ππ
πππ₯(π€)
π€ππ
πππ₯(π€)
)
1
3β
π,π
)
π
This coefficient is read similarly to the one proposed by Barrat et al. (2004). Here, however, the triadic value is
assessed by using the geometric mean as opposed to the arithmetic mean used in Barrat et al. (2004). In addition,
the weights are scaled by the largest weight and the node strength is replaced by the node degree. Furthermore, it
is no longer necessary to regard the adjacency values, since including the weighted value of the connecting edge
enables the formula to differentiate between closed and connected triples.
In their paper for biological networks, Zhang and Horvath (2005) provide a different approach towards assessing
the clustering in weighted networks. They generalize the ratio of the total number of direct connections a node ππ
has by its maximum number of possible connections, which is read as such.
[ 7 ]
πΆ πβπππ π»πππ£ππ‘β =
1
N
β
(
β (
π€ππ
πππ₯(π€)
π€ππ
πππ₯(π€)
π€ ππ
πππ₯(π€)
)π,π
(β
π€ππ
πππ₯(π€)π )
2
β β (
π€ππ
πππ₯(π€)
)
2
π
)
=
1
N
β (
β (
π€ππ
πππ₯(π€)
π€ππ
πππ₯(π€)
π€ ππ
πππ₯(π€)
)π,π
β (
π€ππ
πππ₯(π€)
π€ππ
πππ₯(π€)
)π,π
)
ππ
12. 12
While Zhang and Horvath (2005) originally use an adjacency function to derive the weighted values, the paper of
SaramΓ€ki et al. (2007) shows the formulaβs capability to use weights from the weight matrix, as well. Similar to
Onnela et al. (2005) the weights are scaled by the maximum weight in the graph. The denominators are based on
the maximum weights, ensuring a result between πΆ = 0 and πΆ = 1 (SaramΓ€ki et al. 2007). The equation from
Zhang and Horvath (2005) is also no longer reliant on equating the node degree ππ, instead the weighted values
are used in the denominator. Kalna and Higham (2007) provide further evidence in their paper, that the proposed
local coefficients by Zhang, Horvath (2005) are equal to one another.
The version of the local clustering coefficient provided by Holme et al. (2007) is used to assess the clustering of
students at a Korean university. Their formula aims to meet the following requirements. The coefficient emits a
value between πΆ = 0 and πΆ = 1, the weight π€ = 0 represents the lack of a connection, a given triad in the formula
should be proportional to its relevance in the clustering result in comparison to the weights of each of its edges,
and the Watts and Strogatz formulated results (1998) should be identical to their formula results, if the weights
are replaced with adjacencies. The maximum value in their formula is used as an answer towards their third
requirement. Specifically, this maximum value represents a matrix, in which the maximum π€ππ is located on all
positions (Holme et al. 2007). This is given below.
[ 8 ]
πΆ π»ππππ =
1
π
β (
β π€ππ π€ππ π€πππ,π
πππ₯ππ(π€ππ) β π€ππ π€πππ,π
)
π
In regard to more recent developments of the clustering coefficient the global measure as opposed to the
globalizing of local clustering coefficients has become the standard. EngΓΈ-Monsen and Canright (2011) propose
a global clustering coefficient formula highly based on that of Newman et al. (2001), however here the geometric
mean is used to determine the triadic value. Their proposal reads as such.
[ 9 ]
πΆEngΓΈβMonsen Canright =
β β π€ππ π€ππ π€ππ
3
π,π,π
β β π€ππ π€πππ,π,π
Phan et al. (2013) extend this formula and apply it to 1000 Bernoulli random networks. The extension is given by
acknowledging that the third connecting edge plays more a relative role in the clustering assessment. The equation
therefore allows relating the weighted strength of the third connecting edge to that of the other two. The
denominator consists of the weighted value assessment of all open triads plus that of all closed triads. This
approach is given to normalize the clustering coefficient between πΆ = 0 and πΆ = 1, which is given as follows
(Phan et al. 2013).
[ 10 ]
πΆPhan et al. =
β ββ π€ππ π€ππ β π€ππ π€ππ π€ππ
3
π,π,π
β ββ π€ππ π€ππ β π€ππ π€ππ π€ππ
3πΆ
π,π,π + β β π€ππ π€ππ
π
π,π,π
While the presented developments of the formula offer great approaches towards assessing the clustering tendency
in weighted networks, they disregard the relational property of directedness. In the following section proposed
measures are presented to overcome this limitation.
13. 13
WEIGHTED AND DIRECTED NETWORKS
Clustering in directed networks is based on the prerequisite that the tie from node ππ to node ππ isnβt necessarily
equal to the tie from node ππ to node ππ, and thus πππ β πππ β π€ππ β π€ππ.
Fagiolo (2007) remarks that when examining the triad formation, one can pay special attention to the role a focal
actor plays, and notes four possible patterns. The focal node π1 a) can be involved in a cycle (βcycβ), e.g. π12 π23 π31,
b) can play the role of a middleman (βmidβ), e.g. π21 π13 π23 where node π2 can reach node π3 either directly or
through the focal node π1, c) can be classified as βinβ, e.g. π21 π31 π23 where node π1 holds two incoming edges,
and d) can be classified as βoutβ, e.g. π12 π13 π23 where node π1 holds two outgoing edges. Fagiolo (2007) proposes
four clustering coefficients for each of the patterns and then combines the four by defining the clustering
coefficient to be the total of all actual triadic relations of each of the four patterns, divided by all possible ones.
By replacing the adjacency values with the values from the weight matrix the following globalized local clustering
coefficient for weighted and directed networks is proposed.
[ 11 ]
πΆFagiolo =
1
π
β
1
2
β (π€ππ
1
3
+ π€ππ
1
3
) (π€ππ
1
3
+ π€ππ
1
3
) (π€ππ
1
3
+ π€ππ
1
3
)π,π
2(ππ
π‘ππ‘
(ππ
π‘ππ‘
β 1) β 2ππ
β)
π
The numerator entails the geometric mean of the weighted in-degrees and out-degrees of a node to two others and
the two weighted, directed edges between these. The denominator is similar to that of Watts and Strogatz (1998)
where the node degree ππ is replaced with the total node degree ππ
π‘ππ‘
consisting of the weighted sum of the in- and
out-degrees of a node. Thereby, bilateral degrees that were already recognized in the first part of the denominator
are subtracted. The local measure can then be globalized over the entire network. After an empirical application
the weights - similar to Holme et al. (2007), Onnela et al. (2005) as well as Zhang and Horvath (2005) β are
rescaled over a maximum weight value. Squartini et al. (2011) reuse the formula given by Fagiolo (2007) and
replace the respective weight with a differently rescaled weight to wash away trends in their specific example of
the International Trade Network. Tabak et al. (2014) expand the four clustering coefficients of each of the patterns
introduced by Fagiolo (2007) by attributing weights prior to combining them in one single formula.
Opsahl and Panzarasa (2009) use the cycle-, middlemen-, in- and out-approach as well to classify the various
triads, and apply their approach to a vast range of networks, such as acquaintance and relationship networks,
neural networks, organizational networks, networks of political support and networks of interaction through
messages. Contrary to Fagiolo (2007), they base their formula highly on triadic transitivity as seen in Holland and
Leinhardt (1971) and Wasserman (1994). This is relevant, since triads in the form of a cycle arenβt seen as
transitive and therefore not clustered. Furthermore, triads that only contain in- or out-degrees from a focal actorβs
perspective are also not seen as transitive. A straightforward formula is not given. Instead, a framework is provided
in their paper to assess the formula. This paper summarizes this assessment as the following.
[ 12 ]
πΆOpsahl Panzarasa =
β π€ππ π€ππ ππππ,π,π
β π€ππ π€πππ,π,π
Because the formula notes the sum of all π as well as the sum of all π, which are considered as variable subscripts
rather than node descriptions a further distinction between triadic relations as seen in Opsahl and Panzarasa (2009)
is unnecessary. While this formula is the first global clustering measure that can be applied to weighted and
14. 14
directed networks, it lacks considering the weighted value of the third connecting edge in the numerator, as seen
in Phan et al. (2013).
Despite this shortcoming a further advancement in the formula is given. Notice this summarized version of the
clustering coefficient from Opsahl and Panzarasa (2009) differs from the summarized version offered in the paper
of Phan et al. (2013). The discrepancy lies within the subscripts of the denominator. Researchers prior to and even
after the findings of Opsahl and Panzarasa (2009), assess the denominator of the formula either with node-degrees
or with an abstracted version of π€ππ π€ππ. While their great findings for the most part go unnoticed, Opsahl and
Panzarasa (2009) achieve a precise acknowledgement of denominator in the formula and accordingly provide a
formula for the correct intended tendency of βthe friend of my friend is my friendβ. The summarized version
provided by Phan et al. (2013) doesnβt include this consideration. Table 3 shows the extent of this discrepancy.
Table 3: Opsahl, Panzarasaβs (2009) Clustering Coefficient Denominator Differences
Graphical and written
representation of the
tendency
Is my friend the friend of my friend? Is the friend of my friend my friend?
Weights in denominator of
the clustering coefficient
π€ππ π€ππ π€ππ π€ππ
Triad census for weighted
directed networks
Numerator
π€ π΄π΅ π€π΄πΆ π π΅πΆ 2,3,1
π€ π΄πΆ π€π΄π΅ π πΆπ΅ 3,2,0
π€ π΅π΄ π€ π΅πΆ π π΄πΆ 0,1,1
π€ π΅πΆ π€ π΅π΄ π πΆπ΄ 1,0,0
π€ πΆπ΄ π€ πΆπ΅ π π΄π΅ 0,0,1
π€ πΆπ΅ π€ πΆπ΄ π π΅π΄ 0,0,0
Denominator
π€ π΄π΅ π€π΄πΆ 2,3
π€ π΄πΆ π€π΄π΅ 3,2
π€ π΅π΄ π€ π΅πΆ 0,2
π€ π΅πΆ π€ π΅π΄ 2,0
π€ πΆπ΄ π€ πΆπ΅ 0,0
π€ πΆπ΅ π€ πΆπ΄ 0,0
Numerator
π€ π΄π΅ π€ π΅πΆ π π΄πΆ 2,1,1
π€ π΄πΆ π€ πΆπ΅ π π΄π΅ 3,0,1
π€ π΅π΄ π€ π΄πΆ π π΅πΆ 0,3,1
π€ π΅πΆ π€ πΆπ΄ π π΅π΄ 1,0,0
π€ πΆπ΄ π€π΄π΅ π πΆπ΅ 0,2,0
π€ πΆπ΅ π€ π΅π΄ π πΆπ΄ 0,0,0
Denominator
π€ π΄π΅ π€ π΅πΆ 2,1
π€ π΄πΆ π€ πΆπ΅ 3,0
π€ π΅π΄ π€ π΄πΆ 0,3
π€ π΅πΆ π€ πΆπ΄ 1,0
π€ πΆπ΄ π€π΄π΅ 0,2
π€ πΆπ΅ π€ π΅π΄ 0,0
COpsahl Panzarasa according to Phan et al. (2013)
πΆ =
(2 β 3 β 1)
(2 β 3) + (3 β 2)
= 0,5
COpsahl Panzarasa according to this paper
πΆ =
(2 β 1 β 1)
(2 β 1)
= 1
Equation [ 12 ] concludes the collection of proposed clustering coefficient formulae found in prior literature. Since
this paper also deems the formulation of structural balance as an equal measure of equating triadic closure, selected
studies on signed networks that intend to deliver clustering results in said networks are presented in the following
section.
SIGNED NETWORKS
Assessing the clustering tendency in signed networks has become increasingly important, since weighted data
with both positive as well as negative connotations is accessible. In regards to social network analyses, signed
networks are for example networks of friends and enemies, or partners and competitors. Beyond mere social
networks, network mash-ups, such as networks of products and customers, are also interesting grounds to assess
the clustering with positive and negative weights, given the provision of an actorβs likes or dislikes of certain
15. 15
products. The aforementioned Theory of Structural Balance is used to assess the clustering in signed binary
networks, in which the clusterability of a triad is dependent on its three arithmetic signs (Heider 1946).
Kunegis et al. (2009) provide clustering insight in signed, directed networks in their paper of the analysis of the
Slashdot Zoo, a technology where users can mark other users as a friend or foe. Thereby, the researchers
acknowledge that the product of two directed edges is the sign of the other directed edge. This approach is identical
to the Theory of Structural Balance for strongly balanced graphs, and is directly applied to directed networks.
Assessing the clustering coefficient in signed networks is common in prior literature. However, must studies donβt
state their exact methodology of doing so. Furthermore, many pieces of prior literature often simplify their
collected data in order to apply certain mathematical analysis measures. For example, Szell et al. (2010) exclude
weighted edges in their analysis of a further friend-enemy network, even though the strength of the interaction
between each player is given. The paper of Szell and Thurner (2012) provides a weighted clustering coefficient
in the form of private messages as an extra and separate result to compare the clustering of friend and of enemy
networks, providing insight that the interaction between positive networks, is larger than that of negative ones.
The assessment of the clustering of friend and enemy networks is thereby measured as an unweighted network.
In sum, formula improvements for including signed values in the clustering coefficient are scarce in prior research.
However, assessing clustering results in such networks is very common. With this in mind, the research trends
assessed from this literature review are given in the following section.
RESEARCH TRENDS
In the development of the clustering coefficient various trends are given. Firstly, the formulae tend to stray away
from using node-degree in the denominator and instead focus on a triadic approach. Moreover, the sensitivity
issues in regard to globalized local coefficients are resolved, as the most recent formulae use global measures. In
addition, the triadic value is no longer assessed with rescaled maximum values, instead the geometric mean is
used. Lastly, acknowledging weightedness and directedness is becoming increasingly important in the assessment.
Given these advancements, the development of the clustering coefficient is still within its early stage of research,
as many limitations are given. The following summarizes these.
LIMITATIONS
This paper summarizes three main limitations in regard to the clustering coefficients of prior research, namely a
lack of clustering coefficient combinations, a lack of comparability in the form of micro-comparability, and a lack
of comparability in the form of macro-comparability. These are presented in the following.
First of all, the different coefficients entail various advancements but also shortcomings that are distributed
disproportionately among the findings. For example, in the stream of weighted clustering coefficients the
acknowledgement of the third connecting edge is taken into consideration in the formulaβs numerator, specifically
it can also regard this edge relatively to the two other edges (Phan et al. 2013). The research stream of weighted,
directed networks lacks this acknowledgement. However, here the denominator is assessed correctly (Opsahl,
Panzarasa 2009), which is not found in the research stream of mere weighted networks. A combination of both of
these advancements without their shortcomings has yet to have been provided in current literature. Therefore,
further development of the formula is necessary in order to benefit from prior literatureβs findings as well as to
eliminate shortcomings thereof.
The issue of micro-comparability illustrates that recent formula developments of the clustering coefficient donβt
always allow for a comparability of the same graph with different relational properties. This is mostly due to the
fact that the developed clustering coefficients that implement relational properties result in a value of πΆ = 1 for a
16. 16
completely connected graph. This value of πΆ = 1 indicates that a graph has reached the highest form of clustering
possible. In regard to binary networks, this result is justifiable for a completely connected graph, because only the
existence of ties is taken into consideration. Clustering coefficients with relational attributes should, however,
distinguish between clusters with equally weighted edges and clusters with differently weighted edges. This
approach is comparable to the original tendency developed for binary networks, i.e. βthe friend of my friend is my
friendβ, where each edge is identical to one another, weighted with the value π€ = 1. Prior research, however,
often equates this ratio mentioned above with the ratio that βthe best friend of my friend is my acquaintanceβ. This
varies from the originally proposed tendency, since the edges are not equally weighted, therefore shouldnβt by
definition reach the value of πΆ = 1 and thus not the highest form of clustering possible. Because this paper
acknowledges a sufficient difference between the two ratios mentioned above, room for improvement of the
clustering coefficient formula is available, namely by overcoming this limitation of micro-comparability. Table 4
depicts the limitation in an extreme setting. Notice how network c) resembles network d) the most, yet their
generalized clustering results according to prior literature are polar opposite.
Table 4: Micro-Comparability of Clustered Triads
CC According to Prior Literature: C = 1 C = 1 C = 1 C = 0 C = 1
CC According to this Paper: C = 1 C = 0.4629 C = 0.2203 C = 0 C = 1
On a further note, large-scale networks have gained in popularity within the past years of network research, since
large amounts of data can be acquired easily and used to conduct real-world analyses as opposed to depict mere
generalizations of or approaches to real-world problems. The acquired data not only gives insight towards whether
individuals are connected or not, but also towards the relational manner of the connection by attributing weights,
both positive and negative, as well as directedness. This occurrence calls for macro-comparability of the clustering
coefficient, i.e. the ability to compare different social networks indifferent to their relational and structural
properties. While most recent discoveries are able to compare networks regardless of their weightedness or
directedness, prior research has yet to acknowledge negative and positive weights in their formula. Clustering in
terms of the formulation of balance (Davis 1967) is becoming increasingly relevant, since data regarding both
likes and dislikes of an individual is easily acquirable. The development of a single formula that can be used in
all types of networks can relinquish the need for having multiple versions of the clustering coefficient and thus
offer macro-comparability of the clustering formation across all types of networks.
3. FORMULA EXTENSION
With the knowledge gained from prior research towards the development of the clustering coefficient, the
limitations acknowledged in the previous section, will now be addressed in this new approach for analyzing the
formation of clustering. Thereby, this paper differentiates between equally weighted clusters and clusters with
relational variance, offering micro-comparability. Furthermore, the assessment is geared towards the formation of
clusters in networks indifferent to their relational and structural properties, thereby offering macro-comparability.
17. 17
3.1. BASIS
The fundamental basis of the proposed formula extension is derived from the global clustering coefficient
developed by Newman et al. (2001), where the numerator embodies three times the number of all closed triangles
and the denominator all open or closed triangles. By focusing on each triadic relation rather than the triangles, the
value of the corresponding weights can determine if a triangle is formed or not. Each triadic relation is assessed,
thereby alleviating the numerator of the factor three. The triadic value is assessed by extracting the geometric
mean of the weighted value of the triadic relations. This paper then incorporates the approach of Phan et al. (2013),
in which the third connecting edge is explicitly acknowledged as relative. Unlike the approach of Phan et al.
(2013), this paper purposely disregards the third connecting edge in the denominator, which was only introduced
to normalize the equation and ensure πΆ = 1 for completely connected graphs. By focusing on the original
clustering coefficient, one can notice that the numerator, like the proposed formula, acknowledges the occurrence
of closed triples, however the denominator neglects these purposely. By doing so, an accurate ratio of closed triads
to connected triads can be equated. In the proposed formula, the variable π€Μ , represents the value assessment of a
closed triad with the third connecting edge seen as relative and the variable π£Μ the value assessment of all triads,
resulting in the clustering coefficient being the ratio of the total sums of each. Thereby, the denominator is based
on that of Opsahl and Panzarasa (2009), which is strongly based on the Transitivity Theory of Wasserman (1994).
The basis for the formula extension can, therefore, be read as follows.
[ 13 ]
πΆ =
β π€Μ π,π,π
β π£Μ π,π,π
3.2. MICRO-COMPARABILITY
FORMULA: WEIGHTED OR DIRECTED NETWORKS
The aforementioned limitation of micro-comparability is subjected in weighted networks. This revolves around
the fact that the following statements are to an extent seen as equally clustered: βthe friend of my friend is my
friendβ versus βthe best friend of my friend is my acquaintanceβ. Due to the great preliminary work of the
presented researchers the adjustment of the formula is merely a tweak. For weighted networks, we can thus
expand π€Μ to depict the triadic relation in relation to the relative weight of the third connecting edge. Appropriately,
the variable π£Μ is the triadic value assessment of any and all triads without respect to its connecting edge. Thereby,
the subscript of the weights is aligned to the Transitivity Theory (Wasserman 1994) and only allows for transitive
triads. Ergo, this formula can be implemented in directed networks, as well.
[ 14 ]
πΆ =
β ββ π€ππ π€ππ β β π€ππ π€ππ π€ππ
3
π,π,π
β β π€ππ π€πππ,π,π
EXAMPLE
To show the impact of this development in weighted networks the following example is used (Refer to Figure 1).
While the clustering coefficients found in previous literature, for example as seen in Phan et al. (2013), embrace
the two networks below as equally clustered, this paper proposes the contrary. According to this paper, the
clustering coefficient of network a) equals πΆ = 0.98. For a comparison, the clustering coefficient of the cluster in
network b) is πΆ = 1. Both are high, however, the coefficient now allows for comparisons, as seen in Table 5.
18. 18
As seen in Table 4, when subjected to extreme settings an equal result of πΆ = 1 can be misleading. Take the
example of network b) in this table and imagine that nodes π΅ and πΆ work closely together and their weighted
edge, measured through e-mail transfer, is very large, e.g. π€ π΅πΆ = 10000. If node π΄ were to send out an e-mail
broadcast, including nodes π΅ and πΆ as recipients, previous literature would render this digraph of the network with
a clustering coefficient value of πΆ = 1, even though π΄ might barely know the other two. While the clustering
coefficient for binary networks addresses the mere existence of three edges per triple, for weighted networks the
mere existence of edges as seen in binary networks shouldnβt suffice. Instead, like binary networks, these edges
should be equal to form an ideal cluster. With the newly proposed formula, the original intended ratio βthe friend
of my friend is my friendβ is kept. However in addition, this equation also differentiates its results from the
following statement: βthe best friend of my friend is my acquaintanceβ. The limitation of micro-comparability is,
therefore, resolved, and thus fulfills the first hypothesis by differentiating between equally weighted clusters and
clusters with relational variance.
Figure 1: Weighted, Undirected Networks
Table 5: Triad Census of Graphs in Figure 1)
π π π π€ππ π€ππ π€ππ
π£Μ = β π€ππ π€ππ β π€ππ π€ππ π€ππ
3
π€Μ
πΆ =
β π€Μ π,π,π
β π£Μ π,π,π
A B C 1 3 2 1.732050808 1.817120593 1.774075869
A C B 2 3 1 2.449489743 1.817120593 2.109743646
B A C 1 2 3 1.414213562 1.817120593 1.603058510
B C A 3 2 1 2.449489743 1.817120593 2.109743646
C A B 2 1 3 1.414213562 1.817120593 1.603058510
C B A 3 1 2 1.732050808 1.817120593 1.774075869
Network a): C = 0.9805431
π π π π€ππ π€ππ π€ππ
π£Μ = β π€ππ π€ππ β π€ππ π€ππ π€ππ
3
π€Μ
πΆ =
β π€Μ π,π,π
β π£Μ π,π,π
A B C 2 2 2 2 2 2
A C B 2 2 2 2 2 2
B A C 2 2 2 2 2 2
B C A 2 2 2 2 2 2
C A B 2 2 2 2 2 2
C B A 2 2 2 2 2 2
Network b): C = 1.0000000
3.3. MACRO-COMPARABILITY
The issue of macro-comparability suggests that the developed formula should be able to be implemented in all
types of networks indifferent to their relational properties. The proposed formula above can only be used in
weighted or directed networks. When implementing weights and directions into this equation things become
problematic. Imagine the following two graphs presented in Figure 2.
19. 19
While both of these are proven to be clustered with the clustered transitive triad being π΅π΄πΆ, the results rendered
seem erroneous (Refer to Table 6). The top table refers to graph a) and results in an expected clustering coefficient
of πΆ = 0.97. Graph b), however, calculates a clustering coefficient of πΆ = 1.03. The rationale for this is because
the geometric mean of all edges is higher than the geometric mean of the examined open triad. This paper,
however, insists on a valid concept. Therefore, the co-ordinate systems of the C-curves are taken into account.
Figure 2: Weighted, Directed Networks
Table 6: Triad Census of Graphs in Figure 2)
π π π π€ππ π€ππ π€ππ β π€ππ π€ππ β π€ππ π€ππ π€ππ
3
π€Μ
πΆ =
β π€Μ π,π,π
β π£Μ π,π,π
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C 4 3 3 3.464101615 3.301927249 3.382042507
B C A 3 0 4 0 0 0
C A B 0 0 0 0 0 0
C B A 0 4 0 0 0 0
Network a): C = 0.9763116
π π π π€ππ π€ππ π€ππ
π£Μ = β π€ππ π€ππ β π€ππ π€ππ π€ππ
3
π€Μ
πΆ =
β π€Μ π,π,π
β π£Μ π,π,π
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C 2 3 3 2.449489743 2.620741394 2.533669111
B C A 3 0 2 0 0 0
C A B 0 0 0 0 0 0
C B A 0 2 0 0 0 0
Network b): C = 1.0343661
Figure 3 shows the co-ordinate systems of the following C-curves. The top graph depicts a generalized clustering
coefficient curve based on prior research, with results varying between πΆ = 0 and πΆ = 1. The second illustrates
the clustering coefficient curve derived from the formula presented in the previous section. πΆ = 1 is given for a
completely connected graph, in terms of transitivity, with equally weighted edges. For a local triad, if the
geometric mean of all three weighted edges is larger than that of the original two evaluated edges, then the
clustering coefficient is πΆ > 1 and vice versa for the contrary. Such a result is not sought out. Instead a graph as
seen in the last co-ordinate system is desired, where πΆ = 1 is given when the geometric mean of the three weighted
edges is equal to that of the original two evaluated edges. If the geometric mean of all three is smaller, the curve
rises up to the point where the conditions for πΆ = 1 is met and falls thereafter.
20. 20
Figure 3: Co-Ordinate Systems of C-Curves
FORMULA: WEIGHTED AND/OR DIRECTED NETWORKS
With regard to the problem statement explained in the previous section the following clustering coefficient
formula is derived.
[ 15 ]
πΆ =
β π€Μ π,π,π
β β π€ππ π€πππ,π,π
The variable π€Μ is thereby read as follows.
For (β π€ππ π€ππ β₯ β π€ππ π€ππ π€ππ
3
β₯ 0):
π€Μ = ββ π€ππ π€ππ β β π€ππ π€ππ π€ππ
3
For (0 < β π€ππ π€ππ < β π€ππ π€ππ π€ππ
3
):
π€Μ = (ββ π€ππ π€ππ β β π€ππ π€ππ π€ππ
3
)
β1
β (β π€ππ π€ππ)
2
This paper acknowledges an ideal cluster as a triad with equally weighted edges, corresponding to the original
clustering coefficient for binary networks. The two formulae found in the case differentiation above depict the
possible occurrences of the closed triadic relations at hand, with βwijwjkwik
3
being either smaller than / equal to
or larger than the triadic relation without respect to the relative connecting edge. For the first case presented in the
case differentiation, the same concept from section 3.2.1 is used. For the second case, the cross fracture (ππ) of
the clustering coefficient is taken into account and regards the occurrence that the weighted value assessment of
all three edges is larger than that of the evaluated triad without respect to its connecting edge and is larger than
C=1
C=1
C=1
β π€ππ π€ππ π€ππ
3
= β π€ππ π€ππ
β π€ππ π€ππ π€ππ
3
< β π€ππ π€ππ
β π€ππ π€ππ π€ππ
3
> β π€ππ π€ππ
β π€ππ π€ππ π€ππ
3
= β π€ππ π€ππ
β π€ππ π€ππ π€ππ
3
< β π€ππ π€ππ
β π€ππ π€ππ π€ππ
3
= β π€ππ π€ππ
β π€ππ π€ππ π€ππ
3
β β π€ππ π€ππ
21. 21
zero. An appropriate measurement is then equated for the variable π€Μ in this case. The mathematical assessment
of the variable π€Μ for the second case is given as follows. Thereby the variables are simplified with a as the assessed
value of connected triads and b as the assessed value of closed, transitive triads.
For (β π€ππ π€ππ = π β₯ β π€ππ π€ππ π€ππ
3
= π β₯ 0):
ο· πΆ =
π€Μ
π£Μ
=
βπβπ
π
with π€Μ = βπ β π and π£Μ = π
For (0 < β π€ππ π€ππ = π < β π€ππ π€ππ π€ππ
3
= π):
ο· πΆππ = (
π€Μ
π£Μ
)β1
=
π£Μ
π€Μ
with π€Μ = βπ β π and π£Μ = π
ο· πΆππ =
π
βπβπ
β
π
π
=
π2 β (βπβπ)β1
a
ο· πΆππ =
π€Μ
π£Μ
with π€Μ = π2
β (βπ β π)β1
and π£Μ = π
Clustering Coefficient C:
ο· πΆ =
π€Μ
π£Μ
with π€Μ = βπ β π for (β π€ππ π€ππ = π β₯ β π€ππ π€ππ π€ππ
3
= π β₯ 0)
with π€Μ = π2
β (βπ β π)β1
for (0 < β π€ππ π€ππ = π < β π€ππ π€ππ π€ππ
3
= π)
With the proposed formula the desired C-co-ordinate system is achieved. Furthermore, the formula offers macro-
comparability in weighted and/or directed networks and can be used in binary networks, as well.
EXAMPLE
For comparative reasons the same example used in Figure 2 is used here, as well (Refer to Figure 4). The given
formula now allows for micro-comparability in a macro-comparable setting of both weighted and/or directed
networks, which is shown in Table 7. Contrary to Table 6, the clustering coefficient results vary between πΆ = 0
and πΆ = 1. The value πΆ = 1 is given for a fully connected graph with equally weighted edges, comparable to the
original clustering ratio in binary networks from Newman et al. (2001).
Figure 4: Weighted, Directed Networks
22. 22
Table 7: Triad Census of Graphs in Figure 4)
π π π π€ππ π€ππ π€ππ β π€ππ π€ππ β π€ππ π€ππ π€ππ
3
π€Μ
πΆ =
β π€Μ π,π,π
β π£Μ π,π,π
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C 4 3 3 3.464101615 3.301927249 3.382042507
B C A 3 0 4 0 0 0
C A B 0 0 0 0 0 0
C B A 0 4 0 0 0 0
Network a): C = 0.9763116
π π π π€ππ π€ππ π€ππ
π£Μ = β π€ππ π€ππ β π€ππ π€ππ π€ππ
3
π€Μ
πΆ =
β π€Μ π,π,π
β π£Μ π,π,π
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C 2 3 3 2.449489743 2.620741394 2.368107175
B C A 3 0 2 0 0 0
C A B 0 0 0 0 0 0
C B A 0 2 0 0 0 0
Network b): C = 0.9667757
FORMULA: STRONGLY BALANCED WEIGHTED AND/OR DIRECTED NETWORKS
As already mentioned, the prior literatureβs inclusion of negative and positive weights in the clustering coefficient
is scarce at best, even though, a mathematical solution towards the non-acknowledgement of unbalanced triads in
the assessment of cluster formation is easily depicted. As already mentioned, the strong formulation of balance
determines triads with three positive edges as a cluster as well as two negative edges and one positive one. All
other triads arenβt acknowledged as clusters. With small adjustments the current formula can account for these
limitations, as seen below.
[ 16 ]
πΆ =
β π€Μ π,π,π
β β(π€ππ π€ππ)24
π,π,π
The denominator π£Μ is constructed so that the arithmetic sign is not a decisive factor. The variable π€Μ reads as
follows:
For (β(π€ππ π€ππ)
24
β₯ β(π€ππ π€ππ π€ππ)
26
β₯ 0):
π€Μ = ββ(π€ππ π€ππ)24
β β
π€ππ π€ππ π€ππ + β(π€ππ π€ππ π€ππ)2
2
3
For (0 < β(π€ππ π€ππ)24
< β(π€ππ π€ππ π€ππ)26
):
π€Μ =
(
ββ(π€ππ π€ππ)24
β β
π€ππ π€ππ π€ππ + β(π€ππ π€ππ π€ππ)2
2
3
)
β1
β (β(π€ππ π€ππ)24
)
2
23. 23
In order to exclude unbalanced triads in terms of a strong formulation of balance from being counted as clusters
the following notation to the second factor of π€Μ was added. If π€ππ π€ππ π€ππ is made up of three positive edges or two
negative and one positive edges, then the numerator will result in 2 β π€ππ π€ππ π€ππ. This then gets divided by 2,
resulting in π€ππ π€ππ π€ππ, equal to the equation depicted before implementing the aspect of balance. If, however,
π€ππ π€ππ π€ππ has three negative edges or one negative and two positive edges the numerator will show 0, since these
are not considered to be clusters. In order to account for the inclusion of negative weights in the rest of the formula
the paper squared the products and raised the route to the forth degree, with no impact on the results.
EXAMPLE
The following example includes clusters balanced according to the strong formulation of balance. Network a) is
seen as balanced since the product of the edges is positive. The contrary can be said about network b). As seen in
Table 8, the clustering coefficient of network a) is the same as that for network a) from Figure 4. Despite its
transitive triadic closure, network b) gives a clustering coefficient of πΆ = 0, because the network is not strongly
balanced.
Figure 5: Signed, Directed, Weighted Networks
Table 8: Triad Census of Graphs in Figure 5)
π π π π€ππ π€ππ π€ππ
π£Μ = β(π€ππ π€ππ)24 β
π€ππ π€ππ π€ππ + β(π€ππ π€ππ π€ππ)2
2
3
π€Μ
πΆ =
β π€Μ π,π,π
β π£Μ π,π,π
A B C 0 -3 -3 0 0 0
A C B -3 0 0 0 0 0
B A C 4 -3 -3 3.464101615 3.301927249 3.382042507
B C A -3 0 4 0 0 0
C A B 0 0 0 0 0 0
C B A 0 4 0 0 0 0
Network a): C = 0.9763116
π π π π€ππ π€ππ π€ππ
π£Μ = β(π€ππ π€ππ)24 β
π€ππ π€ππ π€ππ + β(π€ππ π€ππ π€ππ)2
2
3
π€Μ
πΆ =
β π€Μ π,π,π
β π£Μ π,π,π
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C -2 3 3 2.449489743 0 0
B C A 3 0 -2 0 0 0
C A B 0 0 0 0 0 0
C B A 0 -2 0 0 0 0
Network b): C = 0.0000000
FORMULA: WEAKLY BALANCED WEIGHTED AND/OR DIRECTED NETWORKS
Similar to the clustering coefficient in the strongly balanced networks the adjustment is a mere tweak to allow for
the occurrence of three negative edges in a cluster. The proposed solution is as follows.
24. 24
[ 17 ]
πΆ =
β π€Μ π,π,π
β β(π€ππ π€ππ)24
π,π,π
The denominator remains the same. The variable π€Μ is read as follows.
For (β(π€ππ π€ππ)
24
β₯ β(π€ππ π€ππ π€ππ)
26
β₯ 0):
π€Μ =
β
β(π€ππ π€ππ)24
β
(
β
π€ππ π€ππ π€ππ + β(π€ππ π€ππ π€ππ)2
2
3
β
β
(π€ππ β β(π€ππ)
2
) β (π€ππ β β(π€ππ)
2
) β (π€ππ β β(π€ππ )2)
8
3
)
For (0 < β(π€ππ π€ππ)24
< β(π€ππ π€ππ π€ππ)26
):
π€Μ =
(β
β(π€ππ π€ππ)24
β
(
β
π€ππ π€ππ π€ππ + β(π€ππ π€ππ π€ππ)2
2
3
β
β(π€ππ β β(π€ππ)
2
) β (π€ππ β β(π€ππ)
2
) β (π€ππ β β(π€ππ )2)
8
3
) )
β1
β (β(π€ππ π€ππ)24
)
2
To the right of the last discussed adaption an expansion is given. By doing so, the equation tests, if all edges are
negative. If they are, each part in the brackets of the expansion will result in β2 π€. The numerator under this
expansion then reads (β2π€ππ) β (β2π€ππ) β (β2π€ππ). This is afterwards divided by 8 and one is left with the
geometric mean of βπ€ππ π€ππ π€ππ. This negative value is then substracted from the value 0 - the value 0, because
three negative edges were excluded in the minuend. By subtracting the negative value the geometric mean
of π€ππ π€ππ π€ππ remains. If one of these edges were to be positive this portion results in 0. The minuend simply then
checks if only one negative edge exists in the triad (which would also result in 0) or if one or three positive edges
appear. If so, the same steps apply that were formulated in the clustering coefficient for strongly balanced
networks.
EXAMPLE
In the following example network a) is weakly balanced and therefore clusterable. In comparison the result is
equal to the other networks a) with three positive edges (Figure 4) and one positive edge (Figure 5). Network b)
in the following example is not balanced and therefore not a cluster.
Figure 6: Signed, Directed, Weighted Networks
25. 25
Table 9: Triad Census of Graphs in Figure 6)
π π π π€ππ π€ππ π€ππ
π£Μ = β(π€ππ π€ππ)24
β
π€ππ π€ππ π€ππ + β(π€ππ π€ππ π€ππ)2
2
3
β
β
( π€ππ
β β( π€ππ
)
2
) β ( π€ππ
β β( π€ππ
)
2
) β ( π€ππ
β β( π€ππ
)2
)
8
3
π€Μ
πΆ =
β π€Μ π,π,π
β π£Μ π,π,π
A B C 0 -3 -3 0 0 0
A C B -3 0 0 0 0 0
B A C -4 -3 -3 3.464101615 3.301927249 3.382042
B C A -3 0 -4 0 0 0
C A B 0 0 0 0 0 0
C B A 0 -4 0 0 0 0
Network a): C = 0.9763116
π π π π€ππ π€ππ π€ππ
π£Μ = β(π€ππ π€ππ)24
β
π€ππ π€ππ π€ππ + β(π€ππ π€ππ π€ππ)2
2
3
β
β
( π€ππ
β β( π€ππ
)
2
) β ( π€ππ
β β( π€ππ
)
2
) β ( π€ππ
β β( π€ππ
)2
)
8
3
π€Μ
πΆ =
β π€Μ π,π,π
β π£Μ π,π,π
A B C 0 3 3 0 0 0
A C B 3 0 0 0 0 0
B A C -2 3 3 2.449489743 0 0
B C A 3 0 -2 0 0 0
C A B 0 0 0 0 0 0
C B A 0 -2 0 0 0 0
Network b): C = 0.0000000
Equation [ 17 ] concludes the formula extension proposed by this paper. With this expansion, clustering results
can be emitted in all types of networks, i.e. binary, weighted, directed and/or signed networks. This alleviates
prior literatureβs limitation of macro-comparability and, thereby, supports the second hypothesis in this paper.
3.4. SUMMARY
By alleviating the shortcomings found in prior research and combining their findings, the proposed clustering
coefficient is considered to be improved. Thereby, the aspect of relational variance among triads is taken into
consideration and offers micro-comparability within its results. Furthermore, the formula extension acknowledges
different types of networks in which the formula can be implemented in. This acknowledgement delivers macro-
comparability in its result and can therefore be used to assess the formation of clusters in terms of triadic closure
in binary, directed, weighted, and/or signed networks. While the toy examples above support the theoretical
concept, an analysis of a real-world network supports the findings in a more practical context. Therefore, the
following section is dedicated to applying this formula in a real-world network and comparing its results to the
results of formulae from prior research.
4. CLUSTERING COMPARISON
4.1. DATA SET
The data set used to assess the comparisons of the clustering coefficient formulae is from the well-known Zachary
Karate Club network (Zachary 1977). The network is binary and consists of 34 nodes and 78 symmetrical edges.
The data set remains unchanged for the comparison in binary networks and is adjusted accordingly to allow for
comparisons in directed and weighted networks. Figure 7 shows the structure of the network.
26. 26
Figure 7: Zachary Karate Club Graph (Zachary 1977)
4.2. COMPARISON IN UNWEIGHTED NETWORKS
Table 10: Zachary Karate Club Adjacency Matrix (Zachary 1977)
The matrix above is the adjacency matrix from the Zachary Karate Club data set (Zachary 1977). This shows the
data set used in the following comparison of an undirected, unweighted network. The clustering coefficient
rendered from this paper equals πΆ π π’ππππ = 0.22277. The comparable coefficient for binary networks from
Newman et al. (2001) provides the same result of πΆ πππ€πππ ππ‘ ππ. = 0.22277 (Newman et al. 2001). This is
expected since the newly proposed formula simplified for binary networks, is merely the weighted values of
triangles over the weighted values of open triads. In the following the matrix for a comparison in directed
unweighted networks is rendered.
27. 27
Table 11: Unweighted, Directed Network Based on the Zachary Karate Club
The matrix above is the same as in Table 10, with the exception that the matrix is no longer symmetric. Here, the
values below the diagonal were removed or left alone at random. By doing so, the structure of the graph remains
the same, where all nodes that were originally connected to one another still are. The property of directedness can
now, however, be assessed. The clustering coefficient based on this paperβs proposal is πΆ π π’ππππ = 0.19914. The
comparable formula for directed networks is given by Opsahl and Panzarasa (2009). Their result
πΆ πππ πβπ ππππ§ππππ π = 0.19914 is identical to results found in this paper, given the lack of relational variance among
the triads. Because the network is unweighted, each of the edges is seen as identical to one another. The impact
of the formula proposed in this paper, is first recognizable, when weights are attributed to the edges. This is
provided in the following section.
4.3. COMPARISON IN WEIGHTED NETWORKS
Table 12: Weighted, Undirected Network Based on the Zachary Karate Club
28. 28
The above matrix is based on the matrix of the Zachary Karate Club (Zachary 1977). Here, however, the
adjacencies are multiplied by a random factor between 1 and 10 resulting in the assessed weights. The paper
assesses the weighted matrix with a clustering coefficient of πΆ π π’ππππ = 0.21565. The clustering coefficient by
Opsahl and Panzarasa (2009) provides a result of πΆ πππ πβπ ππππ§ππππ π = 0.25861.
In comparison, to the unweighted, undirected version of the matrix, the clustering results according to this paper
decrease, since the weighted version offers a case of relational variance among the triads. This deems the clustered
triads as non-ideal, since the three weights of a given triangle are not equally weighted, as in the case of the merely
directed matrix, in which each weight is valued as π€ = 1. Hence, the assessment in the weighted, undirected
network delivers smaller results. On the contrary, the results according to Opsahl and Panzarasa (2009) increase
when implementing weights.
Because the aspect of directedness is not yet added, it is of interest to also compare the results based on the formula
by Phan et al. (2013). This returns a result of πΆ πβππ ππ‘ ππ. = 0.23537, which is also higher than the clustering result
according to this paper. Because the proposed formula moderates the result by taking the aspect of relational
variance into consideration, the coefficient is smaller than the results from formulae in prior literature.
Table 13: Weighted, Directed Network Based on the Zachary Karate Club
The weight matrix above not only considers the aspect of differently weighted edges, but also incorporates the
aspect of directedness added in the previous section of this paper. This paper determines the clustering coefficient
of the weighted, directed graph to be πΆ π π’ππππ = 0.18714. In comparison to clustering coefficient results according
to Opsahl and Panzarasa (2009), namely πΆ πππ πβπ ππππ§ππππ π = 0.22569, the results are once again smaller.
In comparison, to the unweighted, directed version of the matrix, the clustering results according to this paper
decrease, since the weighted version offers a case of relational variance among the triads. This deems the clustered
triads as non-ideal, since the three weights of a given triangle are not equally weighted, as in the case of the merely
directed matrix, where each weight is assessed as π€ = 1. Hence, the assessment in the weighted, directed network
also delivers smaller results. On the contrary, the results according to Opsahl and Panzarasa (2009) increase when
implementing weights, distorting its interpretation.
As foreseen, due to the moderation of the formula for the case of relational variance, the proposal offered in this
paper emits a smaller value than those found in prior literature, thereby supporting this paperβs third hypothesis.
29. 29
5. CONCLUSION
5.1. LIMITATIONS
While improvements of the clustering coefficient are made, this paper is still subjected to certain limitations. First
of all, the data set used is relatively small with only 34 nodes. In a more extensive study the formula could be
tested more thoroughly. Furthermore, the data sets for directed and weighted networks are only based on the real
network of the Zachary Karate Club. The relational properties within were randomized. The formula can be
validated more efficiently in the analysis of real-world weighted and directed networks.
Secondly, including signed networks in social network analyses isnβt common. The data collection process on a
personal level, e.g. in the form of questionnaires or surveys, often shies away from assessing negative relations.
However, while signed networks arenβt common today, the advancements in technology and the automatic
assessment of negative relations shows a future need for such. However, only time will tell whether this prognosis
is valid or not.
5.2. FUTURE IMPLICATIONS
The proposed formula offered combines the advancements found in prior research of the clustering coefficient
and alleviates the equations of their shortcomings. However, the consideration of micro-comparability and macro-
comparability embodies the true advancement in this paper in regard to future implications.
Addressing the limitation of micro-comparability alleviates the need for using cut-off measures to eliminate
insignificant values. By doing so, comparisons between networks can be assessed more precisely and all gathered
data can be implemented. The insignificant cut-off values that would have been disregarded in the past now
provide an accordingly insignificant increase or decrease in the clustering coefficient result. This not only saves
time in the network analysis but also provides a more qualitative result. For example, two graphs, completely
identical with the exception of one insignificant edge, can be compared, resulting in an almost identical yet still
comparable result. Furthermore, the aspect of micro-comparability allows researchers to directly assess the
clustering in completely connected graphs.
In regard to the consideration of macro-comparability in the formula, future researchers can gather all data needed
without worrying about having the necessary mathematical measures to analyze said data. Moreover, the analysis
can be conducted in all types of graphs regardless of the relational properties, thereby offering comparability
between them. Especially, given the assessment of negative edges in online social networking sites, it is
foreseeable that this formula will be used in such environments, thereby depicting the clustering of these networks
more precisely. For example, the amount of dislikes and likes of Youtube comments could be used to assess the
clustering of Youtubers in this directed, weighted, and signed network.
5.3. SUMMARY
Along the lines of this paper, prior literature of the clustering coefficient is reviewed. Thereby, the limitations in
the form of a lack of combinations between the coefficients as well as a lack of micro- and macro-comparability
are uncovered. These are addressed and a formula expansion is proposed to overcome said limitations. The
formula is tested on the real-world network of the Zachary Karate Club (Zachary 1977) and the results show a
more precise clustering assessment than the clustering coefficients discovered in prior literature. The proposed
findings imply alleviating the formula of cut-off measures, assessing the clustering formation in completely
connected networks, and assessing the clustering in all types of networks regardless of their structural and
relational characteristics.
30. 30
6. PUBLICATION BIBLIOGRAPHY
Barnes, J. A. (1969): Networks and political process. In Social networks in urban situations, pp. 51β76.
Barrat, A.; Barthelemy, M.; Pastor-Satorras, R.; Vespignani, A. (2004): The architecture of complex weighted
networks. In Proceedings of the National Academy of Sciences of the United States of America 101 (11),
pp. 3747β3752.
Berkhin, P. (2006): A Survey of Clustering Data Mining Techniques. In Grouping multidimensional data,
pp. 25β71.
Boccaletti, S.; Latora, V.; Moreno, Y.; Chavez, M.; Hwang, D. (2006): Complex networks. Structure and
dynamics. In Physics Reports 424 (4-5), pp. 175β308.
Davis, J. A. (1967): Clustering and structural balance in graphs. In Human Relations 20, pp. 181β187.
EngΓΈ-Monsen, K.; Canright, G. (2011): Weighted Clustering Coefficients. Telenor. Oslo (Telenor Report R5).
Fagiolo, G. (2007): Clustering in complex directed networks. In Physical Review E 76 (2), p. 26107.
Fortunato, S. (2010): Community detection in graphs. In Physics Reports 486 (3-5), pp. 75β174.
Girvan, M.; Newman, M. E. J. (2002): Community structure in social and biological networks. In Proceedings
of the National Academy of Sciences of the United States of America 99 (12), pp. 7821β7826.
Grindrod, P. (2002): Range-dependent random graphs and their application to modeling large small-world
proteome datasets. In Physical Review E 66 (6), p. 66702.
Heider, F. (1946): Attitudes and cognitive organization. In Journal of Psychology 21, pp. 107β112.
Holland, P. W.; Leinhardt, S. (1971): Transitivity in structural models of small groups. In Comparative Group
Studies, pp. 107β124.
Holme, P.; Park, S. M.; Kim, B. J.; Edling, C. R. (2007): Korean university life in a network perspective.
Dynamics of a large affiliation network. In Physica A: Statistical Mechanics and its Applications 373, pp. 821β
830.
Kalna, G.; Higham, D. J. (2007): A clustering coefficient for weighted networks, with application to gene
expression data. In Ai Communications 20 (4), pp. 263β271.
Kephart, W. M. (1950): A Quantitative Analysis of Intragroup Relationships. In American Journal of Sociology
55 (6), pp. 544β549.
KivelΓ€, M.; Arenas, A.; Barthelemy, M.; Gleeson, J. P.; Moreno, Y.; Porter, M. A. (2014): Multilayer networks.
In Journal of Complex Networks 2 (3), pp. 203β271.
Kunegis, J.; Lommatzsch, A.; Bauckhage, C. (2009): The Slashdot Zoo. Mining a Social Network with Negative
Edges. In Proceedings of the 18th international conference on World wide web ACM, pp. 741β750.
Latora, V.; Marchiori, M. (2003): Economic small-world behavior in weighted networks. In The European
Physical Journal B - Condensed Matter 32 (2), pp. 249β263.
Newman, M. E. J. (2001): The structure of scientific collaboration networks. In Proceedings of the National
Academy of Sciences 98 (2), pp. 404β409.
32. 32
7. INDEX
7.1. LIST OF TABLES
Table 1: Triadic Value Assessment (Opsahl, Panzarasa 2009) .............................................................................. 7
Table 2: Clustering in Terms of the Formulation of Balance (Szell et al. 2010).................................................... 7
Table 3: Opsahl, Panzarasaβs (2009) Clustering Coefficient Denominator Differences ...................................... 14
Table 4: Micro-Comparability of Clustered Triads .............................................................................................. 16
Table 5: Triad Census of Graphs in Figure 1) ...................................................................................................... 18
Table 6: Triad Census of Graphs in Figure 2) ...................................................................................................... 19
Table 7: Triad Census of Graphs in Figure 4) ...................................................................................................... 22
Table 8: Triad Census of Graphs in Figure 5) ...................................................................................................... 23
Table 9: Triad Census of Graphs in Figure 6) ...................................................................................................... 25
Table 10: Zachary Karate Club Adjacency Matrix (Zachary 1977) ..................................................................... 26
Table 11: Unweighted, Directed Network Based on the Zachary Karate Club .................................................... 27
Table 12: Weighted, Undirected Network Based on the Zachary Karate Club .................................................... 27
Table 13: Weighted, Directed Network Based on the Zachary Karate Club ........................................................ 28
7.2. LIST OF FIGURES
Figure 1: Weighted, Undirected Networks........................................................................................................... 18
Figure 2: Weighted, Directed Networks............................................................................................................... 19
Figure 3: Co-Ordinate Systems of C-Curves........................................................................................................ 20
Figure 4: Weighted, Directed Networks............................................................................................................... 21
Figure 5: Signed, Directed, Weighted Networks.................................................................................................. 23
Figure 6: Signed, Directed, Weighted Networks.................................................................................................. 24
Figure 7: Zachary Karate Club Graph (Zachary 1977)......................................................................................... 26
33. 33
8. AFFIDAVIT
Eidesstaatliche ErklΓ€rung:
Ich erklÀre mich hiermit gemÀà § 17 Abs. 2 APO, dass ich die vorstehende Bachelorarbeit selbstÀndig verfasst
und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt habe.
________________ ________________________________
(Datum) (Untrschrift)