Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter

Hashtag Conversations,
Eventgraphs,
and User Ego Neighborhoods:
Extracting Social Network Data
from Twitter
Shalin Hai-Jew
Kansas State University
2014 National Extension Technology Conference
May 2014

Presentation Overview
• This introduces methods for extracting and analyzing social network
data from Twitter for hashtag conversations (and emergent events),
event graphs, search networks, and user ego neighborhoods (using
NodeXL). There will be direct demonstrations and discussions of how
to analyze social network graphs. This information may be extended
with human- and / or machine-based sentiment analysis.
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
2

Self-Intros
3
• Do you use Twitter? If so, how?
• Who do you follow on Twitter, and why?
• Have you analyzed your own social networks on Twitter? What’s the
company you keep (online)?
• Have you ever created a hashtag for a formal conference event?
Were you able to gain some insights about what your participants
were experiencing during the conference?
• What would you like to learn in this session?
* My goal for you is to
learn capability (what
is fairly easily
possible), not
method… Method is
for another day,
another time.

Twitter Social Networking and Microblogging
Social Media Platform
• 140-character text-based Tweets
• Images (Twitpics) and videos (Vine)
• Accounts as humans, ‘bots (collecting and re-tweeting information,
sensor networks), and cyborgs (humans and ‘bots co-Tweeting)
• Created in 2006 and based out of San Francisco, California
• 500 million registered users in 2012
• 340 million Tweets a day as the “SMS of the Internet”
• Has attracted a range of public, private, and governmental
organizations; groups (religious, political, advocacy, and others);
individuals
• Has an application programming interface (API) which enables some
limited access to their public data
4

Electronic Social Network Analysis
• Extraction of social network data from social media platforms
(through their APIs): social networking sites, email systems, wikis,
blogs, microblogging sites, web networks, and others
• Node-link, vertex-edge, entity-relationship
• A form of structure mining with implications for
• Organizational analysis
• Entity (node) analysis
• Social ties
• Understandings of social structure and power
• Diffusion of innovation, information, culture, attitudes, and other
transmissible resources
• Electronic event analysis
5

Some Basics of E-SNA (cont.)
• Core-periphery dynamic and influence (and power) / “primary” and
“secondary” membership in the network
• Knowledge and influence
• Collection of resources
• Clustering
• Motif censuses, network structures, network topologies, geodesic
distance, connectivity
• Bridging
• Network structure, network topology
• Thick ties / tight coupling in electronic social spaces
• Thin ties / loose coupling in electronic social spaces
• Homophily vs. heterophily
• The company you keep
7

Some Basics of E-SNA (cont.)
Global Social Network Structures
• Betweenness centrality
(shortest path betweenness
centrality)
• Closeness centrality (closeness
of a node to all other nodes in
the network graph)
• Eigenvector centrality
(closeness to important
neighbors)
• Clustering coefficient (the
amount of clustering in a
network)
Local Social Network Structures
• Degree centrality (in-degree and
out-degree)
• Clustering coefficient
(embeddedness)
8

Units of Analysis
• Entity: Node or vertex
• Relationships: Links, edges
• Dyads, triads, … motifs (different relational structures)
• Clusters and sub-clusters (groups or meta-nodes)
• Islands
• Pendants (one node, one link); whiskers (one link, multiple nodes)
• Isolates
• Ego neighborhoods
• Social network
• Multiple social networks
• “Big data” universes
9

Why Learn about Electronic Social Networks?
• Understand respective roles in the community
• Identify informally influential individuals who are otherwise hidden
• Monitor what messages are moving through the network to
understand public sentiment and understandings
• Plan diffusion of prosocial information and actions; head off negative
diffusions in a social network
• Wire new networks for social and individual resilience (such as
regarding health, emotion, economics, and other)
• Rewire social networks for different objectives and aims; optimize
social groups based on what is known about people’s socializing and
preferences
10

E-SNA on Twitter….
• Hashtag conversations (#)
• Event graphs (unfolding formal and informal events by hashtags and
key words)
• Search networks
• Understanding user (account) social networks
• Ego neighborhoods on Twitter (direct alters)
• Clusters and sub-clusters; islands; pendants; isolates
• Motif censuses
• Egos
11

Questions so Far?
• What do you think about (electronic) social network analysis (and
structure mining)? Do you think that the assumptions are valid?
Why or why not?
• What do you think about electronic social network analysis?
12

Hashtag Conversations
• Narrow-casting (to a distinct small group) and broad-casting
(communicating broadly to any who care to follow)
• Identifying the messages shared
• Sentiments
• Semantics
• Main conversationalists
• Calls to action
• Identifying the networks of accounts in connection to each other
around this discussion
• Observing the interactions between accounts (nodes or vertices)
around the particular discussion
• Identifying the “mayor of your hashtag” (using Dr. Marc A. Smith’s
phrasing) or the influential discussants and their important (central,
widely followed, re-tweeted) messaging
13

Eventgraphs
• Mapped networks of interactions based around a physical or virtual
or other event (in this case)
• Formal, informal, or semi-formal
• Planned or unplanned events
• Conferences with disambiguated or original hashtags; may include online or augmented
reality games to increase participation (planned)
• Accidents, mass health events, or unusual “spectacle” occurrences (unplanned)
• Micro (local or distributed) or mass (locationally clustered or distributed)
• Trending microblogging messaging over time (exponential messaging
to peaks or multiple peaks and gradual diminishment or steep drop-
off)
• Multimedial with microblogged text, images, and video; interactive;
dynamic
• Identification of the main geographical locations of the discussants
14

Search (Social) Networks (Online)
• Identification of
• particular topics in discussion (the less
ambiguity of the term, the better;
otherwise, the tools will track a broad
range of terms with various word senses)
• discussants (social media platform
accounts)
• main messaging of the discussants
(Tweet or microblogging streams)
• main physical locations of the discussants
(based on noisy geo information)
15

User Social Networks
• Node / vertex / entity / agent analysis
• Link / edge / arc / tie / relationship analysis
• Identification of the alters in the ego neighborhood
• Analysis of transitivity among the alters in the ego neighborhood
• Capture of a 2-degree social network on Twitter
16

Motif Censuses
• Understanding of the global nature of the network
• The power structures within the network
• The clusters, sub-clusters, islands, pendants, and isolates
• The social individuals and entities within the network
• The transmissibles moving through the network
• Static (vs. dynamic information captures)
17

The Data Extraction and Network
Visualization Tool: NodeXL
Network Overview, Discovery and Exploration for Excel
18

Network Overview, Discovery and Exploration
for Excel (NodeXL)
• NodeXL
• Free and open-source code
• Data scraping from social media
platforms through their respect APIs (of
publicly available information only)
• Add-on to Excel (formerly known as
NetMap)
• Available on the Microsoft CodePlex
platform
• Requires Windows (or parallels on Mac)
• Sponsored by the Social Media
Research Foundation
• NodeXL Graph Gallery for shared
graphs and datasets
19

Types of Data Extractions from Twitter
NodeXL (relations, structure, select
contents)
• #hashtag
• Search
• Twitter “List Network”
• Twitter User Network
NCapture of NVivo (semantics,
message contents)
• Twitter User Tweets
• Twitter List Tweets
20

Input Parameters
• Size of the crawl
• Degree of the crawl
• Image capture
• Tweet capture
• Direction (followed by/ following /
both)
• Edge definition: Followed /
following; replies-to; mentions
• Tweet column
21

Data Processing: Graph Metrics
• Degree, in-degree, out-degree
• Betweenness and closeness
centralities
• Eigenvector centrality
• Vertex clustering coefficient
• Vertex pagerank
• Edge reciprocation
• Words and word pairs
• Twitter search network top items
• …and others
22

Data Processing: Grouping
• Group by vertex attribute
• Group by connected component
• Group by cluster
• Group by motif
23

Data Visualization
• Type of layout algorithm applied to the data
• Autofill
• Labeling of vertices
• Labeling of edges
• Graph pane
• Graph options
• Zoom
• Scale
24

Dynamic Filtering
• Adjust parameters
(with the sliders) to
limit what is visualized
• Change up the time
zones to analyze what
is being
communicating and by
whom at which time
(UTC / coordinated
universal time)
• Capture broadly and
then focus in using
dynamic filtering
25

Data Analysis
• Use both the dataset and the visualizations (they both complement
each other and are necessary for full understanding)
• Capture the Tweets column and import that into a text analysis
software program
26

Limits -> Controlling for Input Parameters for
the Data Extraction
• Social media platform (Twitter
and its data processing rate
limits), even with an account for
“whitelisting” (and the time-of-
day of the data extraction
through its data-streaming API)
• NodeXL (up to about 300,000
records or so)
• Computational power of
researcher machine
• Computer memory of researcher
machine
27
• No early indicator of size of
data crawl or the acquire-
ability of the electronic social
network
• Costly (computational and
time expense) non-captures
at system limits

Addendum
• May apply Boolean operators into the query (and query multiple
terms simultaneously)
• May use macros
• May re-crawl using original parameters of a data extraction
• May automate data extractions
28

Some Sample
Graph Visualizations
From NodeXL Extractions from Twitter
29
Note: Other details have been excluded because these visualizations
are incomplete without the graph metrics and other complementary
data…and it would be misrepresentational to explain the contexts of
the data crawl behind the social network graphs incompletely. All of
these graphs may be found in fuller detail and some with downloadable
data sets on the NodeXL Graph Gallery. At the graph gallery, put “SHJ”
in the Search bar at the top right.

Grid
30

Circle Layout (Ring Lattice Graph)
31

Harel-Koren Fast Multiscale with Vertex
Labels
32

Random Layout Algorithm, Images at the
Vertices
33

Sugiyama Layout of Groups, Force-Based
Overall Network Layout
34

Harel-Koren Fast Multiscale
35

Horizontal Sine Wave
36

37

Motif, Harel-Koren Fast Multiscale
38

39

Fruchterman-Reingold Layout, Partitioned
40

3D Fruchterman-Reingold Force-Based Graph
41

Circle Layout / Ring Lattice Graph at Group
Level, Force-Based Layout at Network Level
42

43

44

Fruchterman-Reingold Layout, Imagery for
Vertices
45

Random Layout of Groups, Force-Based
Layout of Network with Combined Edges
46

Harel-Koren Fast Multiscale Layout at Cluster
Level, Force-Based Layout at Network Level
47

Motifs Extraction (Census), Sugiyama Layout
at Network Level
48

Harel-Koren Fast Multiscale for Groups,
Force-Based Layout at Network Level
49

Clustering by Clauset-Newman-Moore, Network
Layout with Harel-Koren Fast Multiscale
50

Motifs at Group Level, Spiral at Network Level
51

Random at Group Level, Packed Rectangles
for Network
52

Harel-Koren Fast Multiscale for Clusters,
Treemap Layout for Network
53

Horizontal Sine Wave Layout (on beta)
54

55

Sugiyama, Stacked Rectangles
56

Fruchterman-Reingold
57

Fruchterman-Reingold
58

59

60

Motif, Fruchterman-Reingold, on Grid
61

Grid, Imagery on Vertices
62

Multi-Sequence Mixed Visualization
63

And…
64

NodeXL Graph Server
• Continuous crawl based on a certain term or account for over a
month
• Academic purposes only
• Must be requested through Dr. Marc A. Smith (Connected Action Consulting
Group @ marc@connectedaction.net)
• Not retroactive crawls (a limitation of Twitter)
65

NodeXL Beta Layouts
• Treemap
• Packed rectangles
• Force directed
66

Mixing Up Datasets
Twitter Data Grants
• Feb. 2014
• Twitter Engineering Blog
Other Sources
• Content-sharing sites (with
public APIs)
• YouTube
• Flickr
• Social networking sites (with
public APIs)
• Facebook
• LinkedIn
• Email Networks
• Web networks
• Wiki networks
67

Semantic (Meaning) Analysis of a
Tweet Stream
Using NCapture (add-in to Google Chrome and MS Internet Explorer browsers) and
NVivo (a qualitative and mixed methods data analysis tool)
68

(Partial) Twitter Feed Capture using NCapture
of NVivo 10
69

Word Cloud based on Word Frequency Count
from Twitter Feed (Gist)
70

Geolocation (Lat / Long) Data of Active Twitter
User Accounts on a Tweet Stream / Feed
71

Word Similarity Analysis
72

Word Frequency Treemap
(classical content analysis)
73

Word Search Word Tree (and Stemming)
74

Manual Analysis…through Coding,
Categorizing, and Evaluation
75
• Data reduction
• Summary
• Matrix analysis
• Coding and analysis
Topic Pro (sentiment) Con (sentiment)

Human-Machine Analysis
• Network Text Analysis Theory (language modeled as networks of
words and relations)
• Semantic network
• Nodes: concepts or ideas, ideational kernels
• Links: statements, relationships (strength of relationship, directionality such
as agreement / disagreement or positive / negative, type of relation,
sentiment
• Network: semantic map, union of all statements
• May be a one-mode network (all nodes of a type)
• Concepts
• May be a multi-modal network (based on ontological coding with
various mixes of node types)
• Persons, places, concepts, sentiments, locations, and others
76

Human-Machine Analysis (cont.)
• Meta-network analysis based on a text corpus / merged text
corpuses
• Drawn from unstructured natural language text data
• Identification of users (account holders on Twitter) and their
interrelationships with others based on messaging and re-Tweeting and
following / not following
• May use Carnegie Mellon University’s freeware text-mining tool
AutoMap 3.0.10.18 on Windows (by Center for Computational
Analysis of Social and Organizational Systems, CASOS) (2001 –
present)
• Graph visualizations in 2D and 3D made in ORA-NetScenes (CASOS)
77

• AutoMap…requires data pre-processing (setting parameters)
• Requires text corpuses as .txt files (transcoding from .doc, .docx, .HTML, or
other)
• May combine multiple text sets (through merging); can then query on the
whole set or on the individual text sets
• May create “stop words” (or “delete”) lists to de-noise data (with “stop
words” like relative pronouns, personal pronouns, articles, conjunctions, and
other words with less semantic meaning, etc.)
• May use universal or domain-specific “thesauruses” to define, filter, and
hone the meta-network extractions
• Enables the defining of sentiment
• Requires testing of a sample set and meta network visualization to ensure
appropriateness of the data refinements
• Involves the design of meta-networks and ontologies from the text corpuses
78

• …requires data processing and data visualization
• May run the textual data processing
• Includes a web scraper to main social media platforms in its ScriptRunner
feature
• …requires data post-processing
• Includes accessing AutoMap data from ORA-NetSense to create network
visualizations
• Includes data “mining” for meaning / sense-making (identification of
patterns)
• Includes data visualization analysis
• Note: The work may require re-running this cycle multiple times for
different data queries.
79

Sampler: Wordle™ Word Cloud to Create an
Emergent Thesaurus
80

Sampler: Excerpt from a Year’s Worth of a
Blog’s Text Corpus
81

Sampler: @kstate_pres Tweets Visualization
82

Demos?
• Would you like to see how to set up a simple data crawl from Twitter
using NodeXL? (Note: Twitter rate limiting may mean that a
completed data extraction may not be achieved, but you can at least
see what a basic setup may look like.)
• Any questions?
83

Conclusion and Contact
• Dr. Shalin Hai-Jew
• Instructional Designer
• Information Technology Assistance Center
• Kansas State University
• 212 Hale Library
• 785-532-5262
• shalin@k-state.edu
• Thanks to Dr. Marc A. Smith, sociologist and Chief Social Scientist for
Connected Action, for generously presenting a webinar at K-State to
our faculty and staff. Also, Tony Capone, NodeXL developer, made
the NodeXL beta available to me and has been very gracious and
encouraging.
84

Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter

Similar to Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter (20)

More from Shalin Hai-Jew

More from Shalin Hai-Jew (20)

Recently uploaded

Recently uploaded (13)

Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter