This introduces methods for extracting and analyzing social network data from Twitter for hashtag conversations (and emergent events), event graphs, search networks, and user ego neighborhoods (using NodeXL). There will be direct demonstrations and discussions of how to analyze social network graphs. This information may be extended with human- and / or machine-based sentiment analysis.
Night 7k Call Girls Noida Sector 121 Call Me: 8448380779
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting Social Network Data from Twitter
1. Hashtag Conversations,
Eventgraphs,
and User Ego Neighborhoods:
Extracting Social Network Data
from Twitter
Shalin Hai-Jew
Kansas State University
2014 National Extension Technology Conference
May 2014
2. Presentation Overview
• This introduces methods for extracting and analyzing social network
data from Twitter for hashtag conversations (and emergent events),
event graphs, search networks, and user ego neighborhoods (using
NodeXL). There will be direct demonstrations and discussions of how
to analyze social network graphs. This information may be extended
with human- and / or machine-based sentiment analysis.
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
2
3. Self-Intros
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
3
• Do you use Twitter? If so, how?
• Who do you follow on Twitter, and why?
• Have you analyzed your own social networks on Twitter? What’s the
company you keep (online)?
• Have you ever created a hashtag for a formal conference event?
Were you able to gain some insights about what your participants
were experiencing during the conference?
• What would you like to learn in this session?
* My goal for you is to
learn capability (what
is fairly easily
possible), not
method… Method is
for another day,
another time.
4. Twitter Social Networking and Microblogging
Social Media Platform
• 140-character text-based Tweets
• Images (Twitpics) and videos (Vine)
• Accounts as humans, ‘bots (collecting and re-tweeting information,
sensor networks), and cyborgs (humans and ‘bots co-Tweeting)
• Created in 2006 and based out of San Francisco, California
• 500 million registered users in 2012
• 340 million Tweets a day as the “SMS of the Internet”
• Has attracted a range of public, private, and governmental
organizations; groups (religious, political, advocacy, and others);
individuals
• Has an application programming interface (API) which enables some
limited access to their public data
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
4
5. Electronic Social Network Analysis
• Extraction of social network data from social media platforms
(through their APIs): social networking sites, email systems, wikis,
blogs, microblogging sites, web networks, and others
• Node-link, vertex-edge, entity-relationship
• A form of structure mining with implications for
• Organizational analysis
• Entity (node) analysis
• Social ties
• Understandings of social structure and power
• Diffusion of innovation, information, culture, attitudes, and other
transmissible resources
• Electronic event analysis
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
5
7. Some Basics of E-SNA (cont.)
• Core-periphery dynamic and influence (and power) / “primary” and
“secondary” membership in the network
• Knowledge and influence
• Collection of resources
• Clustering
• Motif censuses, network structures, network topologies, geodesic
distance, connectivity
• Bridging
• Network structure, network topology
• Thick ties / tight coupling in electronic social spaces
• Thin ties / loose coupling in electronic social spaces
• Homophily vs. heterophily
• The company you keep
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
7
8. Some Basics of E-SNA (cont.)
Global Social Network Structures
• Betweenness centrality
(shortest path betweenness
centrality)
• Closeness centrality (closeness
of a node to all other nodes in
the network graph)
• Eigenvector centrality
(closeness to important
neighbors)
• Clustering coefficient (the
amount of clustering in a
network)
Local Social Network Structures
• Degree centrality (in-degree and
out-degree)
• Clustering coefficient
(embeddedness)
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
8
9. Units of Analysis
• Entity: Node or vertex
• Relationships: Links, edges
• Dyads, triads, … motifs (different relational structures)
• Clusters and sub-clusters (groups or meta-nodes)
• Islands
• Pendants (one node, one link); whiskers (one link, multiple nodes)
• Isolates
• Ego neighborhoods
• Social network
• Multiple social networks
• “Big data” universes
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
9
10. Why Learn about Electronic Social Networks?
• Understand respective roles in the community
• Identify informally influential individuals who are otherwise hidden
• Monitor what messages are moving through the network to
understand public sentiment and understandings
• Plan diffusion of prosocial information and actions; head off negative
diffusions in a social network
• Wire new networks for social and individual resilience (such as
regarding health, emotion, economics, and other)
• Rewire social networks for different objectives and aims; optimize
social groups based on what is known about people’s socializing and
preferences
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
10
11. E-SNA on Twitter….
• Hashtag conversations (#)
• Event graphs (unfolding formal and informal events by hashtags and
key words)
• Search networks
• Understanding user (account) social networks
• Ego neighborhoods on Twitter (direct alters)
• Clusters and sub-clusters; islands; pendants; isolates
• Motif censuses
• Egos
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
11
12. Questions so Far?
• What do you think about (electronic) social network analysis (and
structure mining)? Do you think that the assumptions are valid?
Why or why not?
• What do you think about electronic social network analysis?
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
12
13. Hashtag Conversations
• Narrow-casting (to a distinct small group) and broad-casting
(communicating broadly to any who care to follow)
• Identifying the messages shared
• Sentiments
• Semantics
• Main conversationalists
• Calls to action
• Identifying the networks of accounts in connection to each other
around this discussion
• Observing the interactions between accounts (nodes or vertices)
around the particular discussion
• Identifying the “mayor of your hashtag” (using Dr. Marc A. Smith’s
phrasing) or the influential discussants and their important (central,
widely followed, re-tweeted) messaging
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
13
14. Eventgraphs
• Mapped networks of interactions based around a physical or virtual
or other event (in this case)
• Formal, informal, or semi-formal
• Planned or unplanned events
• Conferences with disambiguated or original hashtags; may include online or augmented
reality games to increase participation (planned)
• Accidents, mass health events, or unusual “spectacle” occurrences (unplanned)
• Micro (local or distributed) or mass (locationally clustered or distributed)
• Trending microblogging messaging over time (exponential messaging
to peaks or multiple peaks and gradual diminishment or steep drop-
off)
• Multimedial with microblogged text, images, and video; interactive;
dynamic
• Identification of the main geographical locations of the discussants
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
14
15. Search (Social) Networks (Online)
• Identification of
• particular topics in discussion (the less
ambiguity of the term, the better;
otherwise, the tools will track a broad
range of terms with various word senses)
• discussants (social media platform
accounts)
• main messaging of the discussants
(Tweet or microblogging streams)
• main physical locations of the discussants
(based on noisy geo information)
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
15
16. User Social Networks
• Node / vertex / entity / agent analysis
• Link / edge / arc / tie / relationship analysis
• Identification of the alters in the ego neighborhood
• Analysis of transitivity among the alters in the ego neighborhood
• Capture of a 2-degree social network on Twitter
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
16
17. Motif Censuses
• Understanding of the global nature of the network
• The power structures within the network
• The clusters, sub-clusters, islands, pendants, and isolates
• The social individuals and entities within the network
• The transmissibles moving through the network
• Static (vs. dynamic information captures)
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
17
18. The Data Extraction and Network
Visualization Tool: NodeXL
Network Overview, Discovery and Exploration for Excel
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
18
19. Network Overview, Discovery and Exploration
for Excel (NodeXL)
• NodeXL
• Free and open-source code
• Data scraping from social media
platforms through their respect APIs (of
publicly available information only)
• Add-on to Excel (formerly known as
NetMap)
• Available on the Microsoft CodePlex
platform
• Requires Windows (or parallels on Mac)
• Sponsored by the Social Media
Research Foundation
• NodeXL Graph Gallery for shared
graphs and datasets
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
19
20. Types of Data Extractions from Twitter
NodeXL (relations, structure, select
contents)
• #hashtag
• Search
• Twitter “List Network”
• Twitter User Network
NCapture of NVivo (semantics,
message contents)
• Twitter User Tweets
• Twitter List Tweets
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
20
21. Input Parameters
• Size of the crawl
• Degree of the crawl
• Image capture
• Tweet capture
• Direction (followed by/ following /
both)
• Edge definition: Followed /
following; replies-to; mentions
• Tweet column
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
21
22. Data Processing: Graph Metrics
• Degree, in-degree, out-degree
• Betweenness and closeness
centralities
• Eigenvector centrality
• Vertex clustering coefficient
• Vertex pagerank
• Edge reciprocation
• Words and word pairs
• Twitter search network top items
• …and others
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
22
23. Data Processing: Grouping
• Group by vertex attribute
• Group by connected component
• Group by cluster
• Group by motif
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
23
24. Data Visualization
• Type of layout algorithm applied to the data
• Autofill
• Labeling of vertices
• Labeling of edges
• Graph pane
• Graph options
• Zoom
• Scale
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
24
25. Dynamic Filtering
• Adjust parameters
(with the sliders) to
limit what is visualized
• Change up the time
zones to analyze what
is being
communicating and by
whom at which time
(UTC / coordinated
universal time)
• Capture broadly and
then focus in using
dynamic filtering
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
25
26. Data Analysis
• Use both the dataset and the visualizations (they both complement
each other and are necessary for full understanding)
• Capture the Tweets column and import that into a text analysis
software program
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
26
27. Limits -> Controlling for Input Parameters for
the Data Extraction
• Social media platform (Twitter
and its data processing rate
limits), even with an account for
“whitelisting” (and the time-of-
day of the data extraction
through its data-streaming API)
• NodeXL (up to about 300,000
records or so)
• Computational power of
researcher machine
• Computer memory of researcher
machine
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
27
• No early indicator of size of
data crawl or the acquire-
ability of the electronic social
network
• Costly (computational and
time expense) non-captures
at system limits
28. Addendum
• May apply Boolean operators into the query (and query multiple
terms simultaneously)
• May use macros
• May re-crawl using original parameters of a data extraction
• May automate data extractions
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
28
29. Some Sample
Graph Visualizations
From NodeXL Extractions from Twitter
29
Note: Other details have been excluded because these visualizations
are incomplete without the graph metrics and other complementary
data…and it would be misrepresentational to explain the contexts of
the data crawl behind the social network graphs incompletely. All of
these graphs may be found in fuller detail and some with downloadable
data sets on the NodeXL Graph Gallery. At the graph gallery, put “SHJ”
in the Search bar at the top right.
31. Circle Layout (Ring Lattice Graph)
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
31
32. Harel-Koren Fast Multiscale with Vertex
Labels
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
32
33. Random Layout Algorithm, Images at the
Vertices
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
33
34. Sugiyama Layout of Groups, Force-Based
Overall Network Layout
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
34
41. 3D Fruchterman-Reingold Force-Based Graph
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
41
42. Circle Layout / Ring Lattice Graph at Group
Level, Force-Based Layout at Network Level
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
42
45. Fruchterman-Reingold Layout, Imagery for
Vertices
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
45
46. Random Layout of Groups, Force-Based
Layout of Network with Combined Edges
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
46
47. Harel-Koren Fast Multiscale Layout at Cluster
Level, Force-Based Layout at Network Level
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
47
48. Motifs Extraction (Census), Sugiyama Layout
at Network Level
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
48
49. Harel-Koren Fast Multiscale for Groups,
Force-Based Layout at Network Level
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
49
50. Clustering by Clauset-Newman-Moore, Network
Layout with Harel-Koren Fast Multiscale
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
50
51. Motifs at Group Level, Spiral at Network Level
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
51
52. Random at Group Level, Packed Rectangles
for Network
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
52
53. Harel-Koren Fast Multiscale for Clusters,
Treemap Layout for Network
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
53
54. Horizontal Sine Wave Layout (on beta)
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
54
61. Motif, Fruchterman-Reingold, on Grid
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
61
62. Grid, Imagery on Vertices
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
62
65. NodeXL Graph Server
• Continuous crawl based on a certain term or account for over a
month
• Academic purposes only
• Must be requested through Dr. Marc A. Smith (Connected Action Consulting
Group @ marc@connectedaction.net)
• Not retroactive crawls (a limitation of Twitter)
65
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
67. Mixing Up Datasets
Twitter Data Grants
• Feb. 2014
• Twitter Engineering Blog
Other Sources
• Content-sharing sites (with
public APIs)
• YouTube
• Flickr
• Social networking sites (with
public APIs)
• Facebook
• LinkedIn
• Email Networks
• Web networks
• Wiki networks
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
67
68. Semantic (Meaning) Analysis of a
Tweet Stream
Using NCapture (add-in to Google Chrome and MS Internet Explorer browsers) and
NVivo (a qualitative and mixed methods data analysis tool)
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
68
69. (Partial) Twitter Feed Capture using NCapture
of NVivo 10
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
69
70. Word Cloud based on Word Frequency Count
from Twitter Feed (Gist)
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
70
71. Geolocation (Lat / Long) Data of Active Twitter
User Accounts on a Tweet Stream / Feed
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
71
72. Word Similarity Analysis
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
72
73. Word Frequency Treemap
(classical content analysis)
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
73
74. Word Search Word Tree (and Stemming)
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
74
75. Manual Analysis…through Coding,
Categorizing, and Evaluation
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
75
• Data reduction
• Summary
• Matrix analysis
• Coding and analysis
Topic Pro (sentiment) Con (sentiment)
76. Human-Machine Analysis
• Network Text Analysis Theory (language modeled as networks of
words and relations)
• Semantic network
• Nodes: concepts or ideas, ideational kernels
• Links: statements, relationships (strength of relationship, directionality such
as agreement / disagreement or positive / negative, type of relation,
sentiment
• Network: semantic map, union of all statements
• May be a one-mode network (all nodes of a type)
• Concepts
• May be a multi-modal network (based on ontological coding with
various mixes of node types)
• Persons, places, concepts, sentiments, locations, and others
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
76
77. Human-Machine Analysis (cont.)
• Meta-network analysis based on a text corpus / merged text
corpuses
• Drawn from unstructured natural language text data
• Identification of users (account holders on Twitter) and their
interrelationships with others based on messaging and re-Tweeting and
following / not following
• May use Carnegie Mellon University’s freeware text-mining tool
AutoMap 3.0.10.18 on Windows (by Center for Computational
Analysis of Social and Organizational Systems, CASOS) (2001 –
present)
• Graph visualizations in 2D and 3D made in ORA-NetScenes (CASOS)
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
77
78. Human-Machine Analysis (cont.)
• AutoMap…requires data pre-processing (setting parameters)
• Requires text corpuses as .txt files (transcoding from .doc, .docx, .HTML, or
other)
• May combine multiple text sets (through merging); can then query on the
whole set or on the individual text sets
• May create “stop words” (or “delete”) lists to de-noise data (with “stop
words” like relative pronouns, personal pronouns, articles, conjunctions, and
other words with less semantic meaning, etc.)
• May use universal or domain-specific “thesauruses” to define, filter, and
hone the meta-network extractions
• Enables the defining of sentiment
• Requires testing of a sample set and meta network visualization to ensure
appropriateness of the data refinements
• Involves the design of meta-networks and ontologies from the text corpuses
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
78
79. Human-Machine Analysis (cont.)
• …requires data processing and data visualization
• May run the textual data processing
• Includes a web scraper to main social media platforms in its ScriptRunner
feature
• …requires data post-processing
• Includes accessing AutoMap data from ORA-NetSense to create network
visualizations
• Includes data “mining” for meaning / sense-making (identification of
patterns)
• Includes data visualization analysis
• Note: The work may require re-running this cycle multiple times for
different data queries.
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
79
80. Sampler: Wordle™ Word Cloud to Create an
Emergent Thesaurus
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
80
81. Sampler: Excerpt from a Year’s Worth of a
Blog’s Text Corpus
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
81
82. Sampler: @kstate_pres Tweets Visualization
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
82
83. Demos?
• Would you like to see how to set up a simple data crawl from Twitter
using NodeXL? (Note: Twitter rate limiting may mean that a
completed data extraction may not be achieved, but you can at least
see what a basic setup may look like.)
• Any questions?
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
83
84. Conclusion and Contact
• Dr. Shalin Hai-Jew
• Instructional Designer
• Information Technology Assistance Center
• Kansas State University
• 212 Hale Library
• 785-532-5262
• shalin@k-state.edu
• Thanks to Dr. Marc A. Smith, sociologist and Chief Social Scientist for
Connected Action, for generously presenting a webinar at K-State to
our faculty and staff. Also, Tony Capone, NodeXL developer, made
the NodeXL beta available to me and has been very gracious and
encouraging.
Hashtag Conversations, Eventgraphs, and User Ego
Neighborhoods: Extracting Social Network Data from Twitter
84