Observing Social Phenomena with #OccupyWallStNY Retweet Data
Me and my colleague Matteo Stori, during the Statistical Methods for Network Data course, had the chance to develop a comprehensive analysis of a network dataset focused on retweets of the famous hashtag #OccupyWallStNY. Through this analysis, we gained valuable insights into social phenomena and observed fascinating patterns within the network.
Our goal was to uncover the underlying dynamics and understand the impact of this social movement on the digital landscape.
During the analysis, we measured various network metrics, including centrality, clustering coefficients, and community detection algorithms. These metrics provided a comprehensive view of the network structure and shed light on how information and influence flowed within the hashtag's retweet ecosystem.
We identified influential users and information flow pathways within the network. This information can help us better understand the diffusion of ideas and the formation of collective actions in the digital space.
1. Twitter – Occupy Wall Street NY
Network Data Analysis
Fabrizio Lanubile (5107565)
Matteo Stori (5114117)
Università Cattolica del Sacro Cuore (Milan), Statistical Methods for Network Data - a.y 2022/23
2. Occupy WSNY -
about
The Occupy Wall Street movement emerged in
September 2011 as a grassroots protest against
economic inequality and corporate influence in the
United States.
Starting in New York City's Zuccotti Park, the
movement quickly spread nationwide, drawing attention
to issues such as wealth disparity, political corruption,
and the influence of big banks.
Protesters, inspired by the slogan "We are the 99%,"
demanded a fairer economic system and
accountability from the financial sector.
Occupy Wall Street sparked a global conversation
about income inequality and became a symbol of
discontent with the status quo, even though its impact
on policy change was limited.
5. Network Description
We create, then, a graph object called "net" using the
"graph.data.frame" function:
• The graph is constructed from "y", which represents the
edges of the network.
• The parameter "directed = T" indicates that the graph is
directed, meaning there is a direction associated with each
edge.
• We retrieve the vertices (nodes) of the graph using the
function "V(net)" and assigns them to a variable.
• After setting the label attribute of each vertex to be the same
as its name, computing the degree of each vertex (the
number of edges incident to a vertex), we plot the graph.
• In order to have a better representation of the network and to
explore for each vertex every relationship, we plot an
interactive plot from which degrees are represented with
different colors based on their node sizes.
6. Measures of Importance: closeness
For a given node takes the sum of the inverses of the length of the
shortest paths to every node.
7. Measures of Importance: betweenness
It counts how many shortest paths each node is involved in.
8. Measures of Importance: pagerank
It measures nodes as being more important if they are connected
to other nodes that are more important.
9. Measures of Importance: edge betweenness
Highlights important relationships; it counts how many shortest
paths go through a given edge.
10. Measures of Importance: network diameter
It is a measure of the largest shortest path between any two vertices in
the graph.
It represents the maximum distance between any pair of nodes.
11. Clustering edge betweenness
It is used to identify densely interconnected groups of vertices in a network.
It quantifies the importance of edges in connecting different clusters by
computing the # of shortest paths passing through each edge.
12. Degree Distributions
• They are fundamental as they are the simplex vertex level description of a network.
• We have a heterogeneous distribution, therefore, characterized by heavy-tail:
1. Many nodes with small degrees
2. Some nodes with medium degrees
3. Few nodes with large degrees
14. Degree Distributions estimates
• Data follows a power law distribution? Plot data on log-log scale
• We expect to observe a straight line on the log-log scale if we assume a linear
model
• Few nodes have a very high degree
• Majority of the nodes have relatively low degrees
21. Network sampling:
estimating the average degree
• Estimate properties of large
networks by examining a subset of
nodes or edges
• Sampling helps infer the average
number of connections per node
• Ensure that the sampling process is
representative to obtain accurate
estimates.
• Various sampling techniques, such
as random, snowball, or stratified
sampling, can be employed to
capture the network's structure
effectively Repeat this many times…