Tcat

SMAC LAB, LSU

Sep 28, 2018
SMAC Talks
TCAT
Instructor: Dr. Ke (Jenny) Jiang

TCAT
Twitter Capture and Analysis Toolkit (DMI-TCAT) - By Keywords
Tweet Statistics and Activity Metrics 1: User Stats (.csv)
(overall, /min, /hour, /day, /week, /month, /year, custom…)
# of tweets, tweets with links, tweets with hashtags, tweets with
mentions, retweets, replies
Get a feel for the overall characteristics of your data set

TCAT
Tweet Statistics and Activity Metrics 2: User Stats Overall(.csv)
Contains the min, max, average, Q1, median, Q3, and trimmed mean for:
number of tweets per user, urls per user, number of followers, number of
friends, number of tweets

Get a better feel for the users in your data set

TCAT
Tweet Statistics and Activity Metrics 3: User Stats Individual(.csv)
Lists users and their number of tweets, number of followers, number of
friends, how many times they are listed, their UTC time oﬀset, whether
the user has a veriﬁed account and how many times they appear in the
data set.

TCAT
Tweet Statistics and Activity Metrics 4: Hashtag frequency(.csv)
ﬁnd out which hashtags are most often associated with your subject.

TCAT
Tweet Statistics and Activity Metrics 5: Hashtag-user activity(.csv)
Lists hashtags, the number of tweets with that hashtag, the number of
distinct users tweeting with that hashtag, the number of distinct
mentions tweeted together with the hashtag, and the total number of
mentions tweeted together with the hashtag.

TCAT
Tweet Statistics and Activity Metrics 6: Twitter client (source) frequency(.csv)
List the frequency of tweet software sources per interval.

TCAT
Tweet Statistics and Activity Metrics 7:
Twitter client (source) stats (individual)(.csv)
Lists sources and their number of tweets, retweets, hashtags, URLs and mentions

TCAT
User visibility (mention frequency)(.csv)
Lists usernames and the number of times they were mentioned by others.
ﬁnd out which users are "inﬂuentials"

TCAT
User activity (tweet frequency)(.csv)
Lists usernames and the amount of tweets posted.
ﬁnd the most active tweeters
see if the dataset is dominated by certain twitterati.

TCAT
User activity + visibility (tweet+mention frequency)(.csv)
Lists usernames and the amount of tweets posted.
see wether the users mentioned are also those who tweet a lot

TCAT
Url frequency (.csv)
Contains the frequencies of tweeted URLs.
ﬁnd out which contents (articles, videos, etc.) are referenced most often

TCAT
Host name frequency (.csv)
Contains the frequencies of tweeted domain names.
ﬁnd out which sources (media, platforms, etc.) are referenced most often

TCAT
Identical tweet frequency (.csv)
Contains tweets and the number of times they have been (re)tweeted identically
get a grasp of the most "popular" content

TCAT
Word frequency (.csv)
Contains words and the number of times they have been used
get a grasp of the most used language

TCAT
Media frequency (.csv)
Contains media URLs and the number of times they have been used
get a grasp of the most popular media

TCAT
Export table with potential gaps in your data (.csv)
Exports a spreadsheet with all known data gaps in your current query, during which
TCAT was not running or capturing data for this bin
Gain insight in possible missing data due to outages

TCAT
Tweet exports 1:
Random set of tweets from selection (.csv)
Contains 1000 randomly selected tweets and information about them (user, date
created, from_user_name, retweet_count, favorite_count, lang, to_user_name
in_reply_to_status_id, quoted_status_id source, location, lat, lng, from_user_id
from_user_realname, from_user_verified, from_user_description, from_user_url,
from_user_profile_image_url, from_user_timezone, from_user_tweetcount
from_user_followercount, from_user_friendcount, from_user_favourites_count
from_user_listed, from_user_created_at)
a random subset of tweets is a representative sample that can be manually
classified and coded much more easily than the full set

TCAT
Tweet exports 2:
List each individual retweet (.csv)
Contains all tweets and information about them (user, date created, ...)
spend time with your data

TCAT
Tweet exports 3:
List each individual retweet (.csv)
Lists all retweets (and all the tweets metadata like follower_count)
chronologically.:RT @
This script is slow. Small datasets only!

TCAT
Tweet exports 4:
Only tweets with lat/lon (.csv)
Contains only geo-located tweets
Geo location is diﬀerent from the self-reported location

TCAT
Tweet exports 5-6:
Export tweet ids (.csv), Export hashtag table (tweet id, hashtag)
For co-hashtag network

TCAT
Tweet exports 7:
Export mentions table (tweet id, user from id, user from name, user to
id, user to name, mention, mention type)
Contains tweet ids from your selection, with mentions and the mention type.
Mention network

TCAT
Tweet exports 8:
Export URLs table (tweet id, url, expanded url, followed url)
Contains tweet ids from your selection and URLs.

TCAT
Networks 1: All network exports come as .gexf or .gdf files which you can
open in Gephi or similar

Social graph by mentions
Produces a directed graph based on interactions between users. If a users
mentions another one, a directed link is created. The more often a user
mentions another, the stronger the link ("link weight"). The "count" value
contains the number of tweets for each user in the specified period.
analyze patterns in communication, find "hubs" and "communities",
categorize user accounts.

TCAT

Social graph by in_reply_to_status_id
Produces a directed graph based on interactions between users. If a tweet
was written in reply to another one, a directed link is created.
analyze patterns in communication, ﬁnd "hubs" and "communities",
categorize user accounts.

TCAT

Co-hashtag graph
Produces an undirected graph based on co-word analysis of hashtags. If
two hashtags appear in the same tweet, they are linked. The more often they
appear together, the stronger the link ("link weight").
explore the relations between hashtags, ﬁnd and analyze sub-issues,
distinguish between diﬀerent types of hashtags (event related, etc.).

TCAT

Bipartite hashtag-mention graph
Produces a bipartite graph based on co-occurence of hashtags and
@mentions. If an @mention co-occurs in a tweet with a certain hashtag,
there will be a link between that @mention and the hashtag. The more often
they appear together, the stronger the link ("link weight").
explore the relational activity between mentioned users and hashtags,
ﬁnd and analyze which users are considered experts around which
topics.

TCAT

Bipartite hashtag-source graph
Produces a bipartite graph based on co-occurence of hashtags and
"sources" (the client a tweet was sent from is its source) . If a hashtag is
tweeted from a particular client, there will be a link between that client and
the hashtag. The more often they appear together, the stronger the link ("link
weight").
explore the relations between clients and hashtags, ﬁnd and analyze
which clients are related to which topics.

TCAT

user-source graph
Produces a bipartite graph based on co-occurence of users and
"sources" (the client a tweet was sent from is its source) . If a users tweets
from a particular client, there will be a link between that client and the user.
The more often they appear together, the stronger the link ("link weight").
explore the relations between clients and users, ﬁnd and analyze which
users use which clients.

TCAT

Bipartite domain-source graph
Produces a bipartite graph based on co-occurence of (URL-)domains and
"sources" (the client a tweet was sent from is its source) . If a domain is
tweeted from a particular client, there will be a link between that client and
the domain. The more often they appear together, the stronger the link ("link
weight").
explore the relations between domains and hashtags, ﬁnd and analyze
which domains are related to which sources.

TCAT

Bipartite URL-user graph
Produces a bipartite graph based on co-occurence of URLS and users. If a
user wrote a tweet with a certain URL, there will be a link between that user
and the URL. The more often they appear together, the stronger the link
("link weight").
explore the relations between users and URLs, ﬁnd and analyze which
users group around which URLs.

TCAT

Bipartite hashtag-URL graph
Creates a .csv file that contains URLs and the number of times they have
co-occured with a particular hashtag.
Creates a .gexf file that contains a bipartite graph (.gexf, open in gephi)
based on co-occurence of URLs and hashtags. If a URL co-occurs with a
certain hashtag, there will be a link between that URL and the hashtag. The
more often they appear together, the stronger the link ("link weight").
get a grasp of how urls are qualified

TCAT

Bipartite hashtag-host (domain) graph
Creates a .csv file that contains hosts and the number of times they have
co-occured with a particular hashtag.
Creates a .gexf file that contains a bipartite graph (.gexf, open in gephi)
based on co-occurence of hosts and hashtags. If a hosts co-occurs with a
certain hashtag, there will be a link between that host and the hashtag. The
more often they appear together, the stronger the link ("link weight").
get a grasp of how hosts are qualified

TCAT
Experimental 1:

Cascade
User accounts are distributed vertically; tweets - shown as dots - are spread
out horizontally over time. Lines indicate retweets..
visually explore temporal structures and retweets patterns.

TCAT
Experimental 1:

Cascade
User accounts are distributed vertically; tweets - shown as dots - are spread
out horizontally over time. Lines indicate retweets.
visually explore temporal structures and retweets patterns.

TCAT
Experimental 2:

The Sankey Maker
Produces an alluvial diagram. Alluvial diagrams are a type of ﬂow diagram
originally developed to represent changes in network structure over time.
plot the relation between various ﬁelds such as from_user_lang,
hashtags or Twitter client

TCAT
Experimental 2:

The Sankey Maker

TCAT
Experimental 3:

Associational proﬁle (hashtags)
Produces an associational proﬁle as well as a time-encoded co-hashtag
network.
explore shifts in hashtags associations

TCAT
Experimental 3:

Associational proﬁle (hashtags)
explore shifts in hashtags associations

TCAT
Please
visit
this
TCAT
installa/on
at
these
URLs:

h5p://18.223.107.254/analysis/

TCAT
standard
login
(for
analysis
only):

Username:
tcat

Password:
FTHnX73cFuUVp7KyVzGZLxdkLPSEp7KCMc

Tcat

More Related Content

What's hot

Similar to Tcat

More from Ke Jiang

Recently uploaded

Tcat