Network
05.
Lecturer: Dr. Reem Essameldin Ebrahim
Introduction to Social Networks
Based on CS224W Analysis of Networks Mining and Learning with Graphs: Stanford University
Copyright © Dr. Reem Essameldin 2023-2024
Properties
In this Lecture
Topics to be covered are:
Quantifying Networks
Key Network Properties
Social Networks Modeling
Copyright © Dr. Reem Essameldin 2023-2024
Quantifying Social Structure
You’ve learned the basic mechanics behind network analysis. Without a firm
understanding of those foundations, you are unable to construct the more
advanced concepts and their associated measures which are used by network
analysts to understand the social world.
Given a graph, we have two questions in hand:
What are the properties of the graph? And once
we will have that we start to ask, how could we
generate artificial graphs to mimic the real
graphs. This is why we generate artificial
graphs to understand and give a real sense of
what processes might be generating networks
that we see in real life (e.g. what is the good
generative model for how to form a friendships).
Q:howtoquantify andmodela network?
1
2
Copyright © Dr. Reem Essameldin 2023-2024
Quantifying Social Structure
There are some fundamental measurements that we can use to quantify the
structure of the networks.
Key Network Properties
Degree distribution: 𝑝(𝑘).
Path length: ℎ.
Clustering coefficient: 𝐶.
Connected components: 𝑠.
Certain of these characteristics are shared among
different types of networks.
Copyright © Dr. Reem Essameldin 2023-2024
Degree Distribution
𝑷(𝒌) is simply a histogram that tells how many nodes have a given degree.
Degree distribution 𝑷(𝒌) : Probability that a
randomly chosen node has degree 𝒌. Since
𝑷(𝒌) is a probability, it must be normalized:
𝑝𝑘
∞
𝑘=1 =1
For a network with 𝑵 nodes, the degree
distribution is the normalized histogram.
Where, 𝑁𝑘 = # nodes with degree 𝒌 .
𝑝 𝑘 =
𝑁𝑘
𝑁
TheMathematical Definition
Key
Network
Properties
Copyright © Dr. Reem Essameldin 2023-2024
Degree Distribution
For the histogram, on the x-axis we plot the degree, on the y-axis we plot the number the
count or the proportional of nodes having that degree. Note that: for the second case, we can
normalize the y-axis so that the height of the bars are summed up to 1 and this is a
distribution so that it’s a portion of nodes with a given degree or we can leave it to express
the count as shown in Figure.
1
1
2
2
For the given graph, to build the histogram we can count how many nodes
have a degree of one (𝒌 = 𝟏), and plot its bar.
Copyright © Dr. Reem Essameldin 2023-2024
One property of real world networks is that they have
what is called skewed degree distribution.
Test Yourself
For the given graphs, find the degree distribution and the corresponding histograms.
Solution:
Copyright © Dr. Reem Essameldin 2023-2024
4
3
2 1
a)
b)
Test Yourself
For the given graphs, find the degree distribution and the corresponding histograms.
Solution:
Copyright © Dr. Reem Essameldin 2023-2024
4
3
2 1
a)
b)
𝑁 = 4, then 𝑃1 = 1/4 , 𝑃2 = 2/4 = 1/2, 𝑃3 = 1/4, 𝑃4 = 0.
E𝑎𝑐ℎ 𝑛𝑜𝑑𝑒 ℎ𝑎𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑑𝑒𝑔𝑟𝑒𝑒 𝑘 = 2.
Paths in a Graph
How many edges are between different pairs of nodes.
A path is a sequence of nodes in which each
node is linked to the next one. A path between
nodes 𝑖0 and 𝑖𝑛 is an ordered list of 𝑛 links
𝑃𝑛 = (𝑖0, 𝑖1), (𝑖1, 𝑖2), (𝑖2, 𝑖3) … . , (𝑖𝑛−1, 𝑖𝑛)
Note that:
Path can intersect itself and pass through the
same edge multiple times e.g.: ACBDCDEG
In a directed graph a path can only follow the
direction of the “arrow”
TheMathematical Definition
Key
Network
Properties
Copyright © Dr. Reem Essameldin 2023-2024
The Shortest Path
We are not interested in the general path, but in the shortest path (𝒉)that the least number of
hubs/edges to get from one node to the other. We can quantify the distance between a pair of
nodes as the distance between the shortest path between that pair.
Undirected Directed
We have to
traverse 2 edges to
get from B to D this
is the minimum # of
edge we have to
traverse to go from
B to D. We can go
through A but that
is longer.
If the graph is disconnected then there is no
shortest path between X and A because
there is no connection for us to traverse.
In directed graph the idea is the same but the path must
follow the edge direction. Thus, in undirected graphs
distances are symmetric while in directed graphs
distances are not symmetric. E.g. ℎ𝐵,𝐷=2 but ℎ𝐷,𝐵= ∞
because we cannot traverse in the opposite direction.
Q: what is the distance
of the node to itself?
Copyright © Dr. Reem Essameldin 2023-2024
Network Diameter
The network diameter is the largest distance in the network. This is the longest shortest
path that exists in the graph. This is what we do in graph theory, but for real data the
graph might be disconnected then the diameter would be infinite so what we generally
do is to quantify the network by its average shortest path length.
Copyright © Dr. Reem Essameldin 2023-2024
AverageShortestPath
where basically we will go over all pairs of nodes and asking what is the average
shortest path between all pairs of nodes. Here I how we could compute it:
ℎ =
1
2 𝐸𝑚𝑎𝑥
ℎ𝑖𝑗
𝑖,𝑗≠𝑖
This is the normalization factor. The
reason we put Emax here is basically as
we can ask what is the possible # of
pairs in a network
we go over all pairs 𝑖𝑗 where 𝑖 ≠ 𝑗 , ℎ𝑖𝑗is the length of the
shortest path, the total number of possible edges in the
network (sum of overall pairs of nodes).
Where ℎ𝑖𝑗 is the distance from node 𝑖 to node 𝑗. 𝐸𝑚𝑎𝑥 is
max number of edges (total number of node pairs) =
𝑛(𝑛 − 1)/2
Example
Copyright © Dr. Reem Essameldin 2023-2024
Clustering coefficient: C
This quantity a real application of social networks analysis. The way we define
this quantity is to ask do edges cluster in the network. what do we mean by
clustering is do edges appear more densely in certain part of the network or are
there social communities exist in the network?.
The way we can quantity this mathematically is to say
what proportion of one’s neighbors are connected
among themselves. For a node 𝑖 with degree 𝑘𝑖 the
local clustering coefficient is defined as:
𝐶𝑖 =
2 𝑒𝑖
𝑘𝑖(𝑘𝑖 − 1)
TheMathematicalDefinition
Key
Network
Properties
Copyright © Dr. Reem Essameldin 2023-2024
𝑒𝑖represents the number of links between the 𝑘𝑖 neighbors of
node 𝑖. 𝐶𝑖= 0 if none of the neighbors of node 𝑖 link to each
other. 𝐶𝑖= 1 if the neighbors of node 𝑖 form a complete graph
(i.e., they all link to each other).
𝐶𝑖 ∈ [0, 1]
Clustering coefficient: C
𝐶𝑖 is the probability that two neighbors of a node link to each other. So for every
node we ask what fraction of your friends are also friends with themselves. In
social networks this is known as social triadic closure because it says if two of us
are friends and you have another friend there then we will likely to be friends as
well. Then you are likely to be friend with someone if you have common friends in
between.
What we see in social networks is that social networks
have a high clustering coefficient, people tends to
group to in a connected dense communities where
there is a lot of friendships between this set of people.
Key
Network
Properties
Copyright © Dr. Reem Essameldin 2023-2024
So this is we define clustering coefficient of a node and then how do we
quantify the network is by compuingt the average over all the nodes 𝑖.
Average clustering coefficient:
𝐶 =
1
𝑁
𝐶𝑖
𝑁
𝑖
Examples
What portion of 𝑖’s neighbors are connected? 𝐶𝑖 =
2 𝑒𝑖
𝑘𝑖(𝑘𝑖 − 1)
Q: if a node has a cluster
coeff = 0, is it must be a
bridge? Cycle
Copyright © Dr. Reem Essameldin 2023-2024
Q: what for A, G, F (degree 1 nodes) who
has no possibilities to have clusters? we
define it as zero or ignore it.
a)
b)
Connectivity
Is the size of the largest connected component, where any two vertices can be
joined by a path (Largest component = Giant component).
• Start from random node and perform
Breadth First Search (BFS).
• Label the nodes BFS visited.
• If all nodes are visited, the network is
connected.
• Otherwise find an unvisited node and
repeat BFS.
Key
Network
Properties
Copyright © Dr. Reem Essameldin 2023-2024
Howtofind connected components:
Note that: BFS algorithm is used to search a graph data
structure for a node that meets a set of criteria. It starts at
the root of the graph and visits all nodes at the current
depth level before moving on to the nodes at the next
depth level.
Connectivity
Is the size of the largest connected component, where any two vertices can be
joined by a path (Largest component = Giant component).
• Start from random node and perform
Breadth First Search (BFS).
• Label the nodes BFS visited.
• If all nodes are visited, the network is
connected.
• Otherwise find an unvisited node and
repeat BFS.
Key
Network
Properties
Copyright © Dr. Reem Essameldin 2023-2024
Howtofind connected components:
Note that: BFS algorithm is used to search a graph data
structure for a node that meets a set of criteria. It starts at
the root of the graph and visits all nodes at the current
depth level before moving on to the nodes at the next
depth level.
Social Networks Modeling
Therandomgraphmodel
A network or graph is known as a
scale-free network whose degree
distribution follows a power law, at
least asymptotically. Examples of
scale-free networks are:
• Barabási Albert model (BAM)
• Bianconi–Barabási model (BBM).
Various models of random graphs
have been proposed for the social
network such as:
• Erdos–Renyi model
• Small-world model (SWM)
• Preferential attachment model
• Forest-fire model
Social networks can be represented and measured by
using two basic mathematical models
Thescale-free graphmodel
Copyright © Dr. Reem Essameldin 2023-2024

Lecture 5 - Qunatifying a Network.pdf

  • 1.
    Network 05. Lecturer: Dr. ReemEssameldin Ebrahim Introduction to Social Networks Based on CS224W Analysis of Networks Mining and Learning with Graphs: Stanford University Copyright © Dr. Reem Essameldin 2023-2024 Properties
  • 2.
    In this Lecture Topicsto be covered are: Quantifying Networks Key Network Properties Social Networks Modeling Copyright © Dr. Reem Essameldin 2023-2024
  • 3.
    Quantifying Social Structure You’velearned the basic mechanics behind network analysis. Without a firm understanding of those foundations, you are unable to construct the more advanced concepts and their associated measures which are used by network analysts to understand the social world. Given a graph, we have two questions in hand: What are the properties of the graph? And once we will have that we start to ask, how could we generate artificial graphs to mimic the real graphs. This is why we generate artificial graphs to understand and give a real sense of what processes might be generating networks that we see in real life (e.g. what is the good generative model for how to form a friendships). Q:howtoquantify andmodela network? 1 2 Copyright © Dr. Reem Essameldin 2023-2024
  • 4.
    Quantifying Social Structure Thereare some fundamental measurements that we can use to quantify the structure of the networks. Key Network Properties Degree distribution: 𝑝(𝑘). Path length: ℎ. Clustering coefficient: 𝐶. Connected components: 𝑠. Certain of these characteristics are shared among different types of networks. Copyright © Dr. Reem Essameldin 2023-2024
  • 5.
    Degree Distribution 𝑷(𝒌) issimply a histogram that tells how many nodes have a given degree. Degree distribution 𝑷(𝒌) : Probability that a randomly chosen node has degree 𝒌. Since 𝑷(𝒌) is a probability, it must be normalized: 𝑝𝑘 ∞ 𝑘=1 =1 For a network with 𝑵 nodes, the degree distribution is the normalized histogram. Where, 𝑁𝑘 = # nodes with degree 𝒌 . 𝑝 𝑘 = 𝑁𝑘 𝑁 TheMathematical Definition Key Network Properties Copyright © Dr. Reem Essameldin 2023-2024
  • 6.
    Degree Distribution For thehistogram, on the x-axis we plot the degree, on the y-axis we plot the number the count or the proportional of nodes having that degree. Note that: for the second case, we can normalize the y-axis so that the height of the bars are summed up to 1 and this is a distribution so that it’s a portion of nodes with a given degree or we can leave it to express the count as shown in Figure. 1 1 2 2 For the given graph, to build the histogram we can count how many nodes have a degree of one (𝒌 = 𝟏), and plot its bar. Copyright © Dr. Reem Essameldin 2023-2024 One property of real world networks is that they have what is called skewed degree distribution.
  • 7.
    Test Yourself For thegiven graphs, find the degree distribution and the corresponding histograms. Solution: Copyright © Dr. Reem Essameldin 2023-2024 4 3 2 1 a) b)
  • 8.
    Test Yourself For thegiven graphs, find the degree distribution and the corresponding histograms. Solution: Copyright © Dr. Reem Essameldin 2023-2024 4 3 2 1 a) b) 𝑁 = 4, then 𝑃1 = 1/4 , 𝑃2 = 2/4 = 1/2, 𝑃3 = 1/4, 𝑃4 = 0. E𝑎𝑐ℎ 𝑛𝑜𝑑𝑒 ℎ𝑎𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑑𝑒𝑔𝑟𝑒𝑒 𝑘 = 2.
  • 9.
    Paths in aGraph How many edges are between different pairs of nodes. A path is a sequence of nodes in which each node is linked to the next one. A path between nodes 𝑖0 and 𝑖𝑛 is an ordered list of 𝑛 links 𝑃𝑛 = (𝑖0, 𝑖1), (𝑖1, 𝑖2), (𝑖2, 𝑖3) … . , (𝑖𝑛−1, 𝑖𝑛) Note that: Path can intersect itself and pass through the same edge multiple times e.g.: ACBDCDEG In a directed graph a path can only follow the direction of the “arrow” TheMathematical Definition Key Network Properties Copyright © Dr. Reem Essameldin 2023-2024
  • 10.
    The Shortest Path Weare not interested in the general path, but in the shortest path (𝒉)that the least number of hubs/edges to get from one node to the other. We can quantify the distance between a pair of nodes as the distance between the shortest path between that pair. Undirected Directed We have to traverse 2 edges to get from B to D this is the minimum # of edge we have to traverse to go from B to D. We can go through A but that is longer. If the graph is disconnected then there is no shortest path between X and A because there is no connection for us to traverse. In directed graph the idea is the same but the path must follow the edge direction. Thus, in undirected graphs distances are symmetric while in directed graphs distances are not symmetric. E.g. ℎ𝐵,𝐷=2 but ℎ𝐷,𝐵= ∞ because we cannot traverse in the opposite direction. Q: what is the distance of the node to itself? Copyright © Dr. Reem Essameldin 2023-2024
  • 11.
    Network Diameter The networkdiameter is the largest distance in the network. This is the longest shortest path that exists in the graph. This is what we do in graph theory, but for real data the graph might be disconnected then the diameter would be infinite so what we generally do is to quantify the network by its average shortest path length. Copyright © Dr. Reem Essameldin 2023-2024 AverageShortestPath where basically we will go over all pairs of nodes and asking what is the average shortest path between all pairs of nodes. Here I how we could compute it: ℎ = 1 2 𝐸𝑚𝑎𝑥 ℎ𝑖𝑗 𝑖,𝑗≠𝑖 This is the normalization factor. The reason we put Emax here is basically as we can ask what is the possible # of pairs in a network we go over all pairs 𝑖𝑗 where 𝑖 ≠ 𝑗 , ℎ𝑖𝑗is the length of the shortest path, the total number of possible edges in the network (sum of overall pairs of nodes). Where ℎ𝑖𝑗 is the distance from node 𝑖 to node 𝑗. 𝐸𝑚𝑎𝑥 is max number of edges (total number of node pairs) = 𝑛(𝑛 − 1)/2
  • 12.
    Example Copyright © Dr.Reem Essameldin 2023-2024
  • 13.
    Clustering coefficient: C Thisquantity a real application of social networks analysis. The way we define this quantity is to ask do edges cluster in the network. what do we mean by clustering is do edges appear more densely in certain part of the network or are there social communities exist in the network?. The way we can quantity this mathematically is to say what proportion of one’s neighbors are connected among themselves. For a node 𝑖 with degree 𝑘𝑖 the local clustering coefficient is defined as: 𝐶𝑖 = 2 𝑒𝑖 𝑘𝑖(𝑘𝑖 − 1) TheMathematicalDefinition Key Network Properties Copyright © Dr. Reem Essameldin 2023-2024 𝑒𝑖represents the number of links between the 𝑘𝑖 neighbors of node 𝑖. 𝐶𝑖= 0 if none of the neighbors of node 𝑖 link to each other. 𝐶𝑖= 1 if the neighbors of node 𝑖 form a complete graph (i.e., they all link to each other). 𝐶𝑖 ∈ [0, 1]
  • 14.
    Clustering coefficient: C 𝐶𝑖is the probability that two neighbors of a node link to each other. So for every node we ask what fraction of your friends are also friends with themselves. In social networks this is known as social triadic closure because it says if two of us are friends and you have another friend there then we will likely to be friends as well. Then you are likely to be friend with someone if you have common friends in between. What we see in social networks is that social networks have a high clustering coefficient, people tends to group to in a connected dense communities where there is a lot of friendships between this set of people. Key Network Properties Copyright © Dr. Reem Essameldin 2023-2024 So this is we define clustering coefficient of a node and then how do we quantify the network is by compuingt the average over all the nodes 𝑖. Average clustering coefficient: 𝐶 = 1 𝑁 𝐶𝑖 𝑁 𝑖
  • 15.
    Examples What portion of𝑖’s neighbors are connected? 𝐶𝑖 = 2 𝑒𝑖 𝑘𝑖(𝑘𝑖 − 1) Q: if a node has a cluster coeff = 0, is it must be a bridge? Cycle Copyright © Dr. Reem Essameldin 2023-2024 Q: what for A, G, F (degree 1 nodes) who has no possibilities to have clusters? we define it as zero or ignore it. a) b)
  • 16.
    Connectivity Is the sizeof the largest connected component, where any two vertices can be joined by a path (Largest component = Giant component). • Start from random node and perform Breadth First Search (BFS). • Label the nodes BFS visited. • If all nodes are visited, the network is connected. • Otherwise find an unvisited node and repeat BFS. Key Network Properties Copyright © Dr. Reem Essameldin 2023-2024 Howtofind connected components: Note that: BFS algorithm is used to search a graph data structure for a node that meets a set of criteria. It starts at the root of the graph and visits all nodes at the current depth level before moving on to the nodes at the next depth level.
  • 17.
    Connectivity Is the sizeof the largest connected component, where any two vertices can be joined by a path (Largest component = Giant component). • Start from random node and perform Breadth First Search (BFS). • Label the nodes BFS visited. • If all nodes are visited, the network is connected. • Otherwise find an unvisited node and repeat BFS. Key Network Properties Copyright © Dr. Reem Essameldin 2023-2024 Howtofind connected components: Note that: BFS algorithm is used to search a graph data structure for a node that meets a set of criteria. It starts at the root of the graph and visits all nodes at the current depth level before moving on to the nodes at the next depth level.
  • 18.
    Social Networks Modeling Therandomgraphmodel Anetwork or graph is known as a scale-free network whose degree distribution follows a power law, at least asymptotically. Examples of scale-free networks are: • Barabási Albert model (BAM) • Bianconi–Barabási model (BBM). Various models of random graphs have been proposed for the social network such as: • Erdos–Renyi model • Small-world model (SWM) • Preferential attachment model • Forest-fire model Social networks can be represented and measured by using two basic mathematical models Thescale-free graphmodel Copyright © Dr. Reem Essameldin 2023-2024