Dr. W. Scott Sanders
Office: SK 308A ● Email: scottsanders@louisville.edu
Social Media
Data Collection & Analysis
The Nature of Social Media Data
• Value of Social Media Data
• Types of Social Media Data
• Representativeness of Social Media Data
How to Get Social Media Data
• Database Dumps & Public Datasets
• Application Programing Interfaces (APIs)
• Web Scraping
Analyzing Social Media Data
• Text Analysis and Topic Modeling
• Social Network Analysis & Survival Analyses
Overview: What We’re Talking about Today
Our goal today is to
discuss the pipeline of
social media data
analysis and to
understand what is
possible!!! We will only
get in the weeds if you
have questions!
The Nature of Social
Media Data
Data is the primary asset of social media companies as their
business model is based on data driven micro-targeting of
advertisements.
When companies share data they are trying to negotiate two
conflicting goals:
1) To allow 3rd party developers to add value to the platform by
creating new functionality.
2) To control access to their data to a) maintain their competitive
advantage and b) guard users privacy.
Value of Social Media Data
Cambridge Analytica exploited liberal permissions
within the Facebook API to collect large amounts
of data prior to 2014 using an app “This is Your
Digital Life” .
Facebook only restricted API access once it
became clear that Cambridge Analytica could
replicate substantial portions of their graph.
The data was used to segment potential audiences
and to microtarget political advertisements with
the goal of influencing the 2016 election in favor
of Trump.
Example: Cambridge Analytica
• Network – Friendship Networks (e.g. Facebook), Interest Networks (e.g.
Pinterest), Semantic Networks (e.g. relationships between texts).
• Text/Written Data – Facebook Posts, Tweets, Blog Posts, Instagram
Captions, et.
• Behavioral Records – timestamps (e.g. account creation, posting
times), counts of activity (e.g. logins, retweets, etc.), purchasing history,
reputation data.
• Geographic Information – Geo-location data embedded in tweets or
pictures, text-based mentions of landmarks, self-reported locations.
• Images – Instagram posts, reddit memes, Facebook profile pictures,
Tindr pictures, etc.
Types of Data On Social Media
Huge amount of research has been done on
Twitter data due to their liberal data sharing
policies.
As such, Twitter is in some ways social media’s
model organism.
All social platforms have a unique set of
technological affordances (e.g. message
length, private backchannels, etc..).
Pitfalls of Social Media: Representativeness of Platform
Demographic variables such as age, race,
and education differ across platforms.
Within a platform it is possible for different
communities of users to use the platform
differently.
Some accounts such as organizational or
bot accounts may violate the assumption of
an individual user.
Pitfalls of Social Media: Representativeness of Population
You may not have access to a representative
sample (i.e. platforms restrict data access).
Platforms may sample data in a manner
opaque to the researchers.
Social datasets decay overtime.
Pitfalls of Social Media: Representativeness of Data
How to Get
Social Media Data
Database Dumps, Public Data Sets, & Buying Data
Database dumps are copies of
collected data provided in
mass by companies or other
researchers.
Sometimes these data sets are
made public for research or
analysis.
You can also buy data from
companies with a business
relationship with specific social
media platforms (e.g. Gnip &
Twitter).
Database Dumps, Public Data Sets, & Buying Data
Benefits
Super easy!!!
You may get a *lot* of data.
Some websites (e.g. Wikipedia, reddit)
will provide dump files.
Drawbacks
Low odds of just being given data.
Low control over the types of data you
will get.
Official APIs – JSON & XML
A web API (Application
Programming Interface) is
an interface that allows for
the exchange data via a
(relatively) stable interface.
Returns formatted data
that is easy to parse with a
computer.
Official APIs – JSON & XML
Benefits
Even when the internal working of
a site changes, the API often
stays the same.
Easy to access with simple
programs.
Drawbacks
Limited types of data are available.
May not answer the questions you
want answered.
Platform Package Link
Reddit Praw https://github.com/praw-dev/praw
eBay eBay
SDK
https://github.com/timotheus/ebaysdk-python
Twitter Python
Twitter
https://github.com/bear/python-twitter
Twitter Tweepy https://github.com/tweepy/tweepy
Facebook NA -
Script
https://github.com/dfreelon/fb_scrape_public
Relevant Python Packages: Interacting w/ APIs
Install Python packages from command prompt using: “pip install PACKAGE_NAME”
• Create a
Developer
Account &
Application
• Get Tokens/Secret
Keys
• Write Code to
Make API Call
• Parsing the
Response &
Storing the Data
Using an API
You will need to create a developers
account on the social media platform.
Next, you will need to create an
application (i.e. an app) for data
collection.
You may be asked for a redirect URI –
leave this blank since you do not need
other users to authorize the app.
• Create a
Developer
Account &
Application
• Get Tokens/
Secret Keys
• Write Code to
Make API Call
• Parsing the
Response &
Storing the Data
Using an API
You will be provided with a set of
tokens/keys so that your script can
authenticate itself.
Best practice is to store keys in a
separate file than your main script
so that you can share code without
compromising your account.
• Create a
Developer
Account &
Application
• Get Tokens/Secret
Keys
• Write Code to
Make API Call
• Parsing the
Response &
Storing the Data
Using an API
Most web API’s are rate limited. Do
back of napkin calculations to see how
many calls you can make in a given
timeframe.
Remember the calls take time – you may
be able to do more than you think.
You can often make bulk requests
which lowers the total number of calls
you need to make.
• Create a
Developer
Account &
Application
• Get Tokens/Secret
Keys
• Write Code to
Make API Call
• Parsing the
Response &
Storing the Data
Using an API
Data is typically returned as JSON or XML.
Use Try/Except statements as some data
fields may not always have a value.
Data has a nested, hierarchical structure as
the same type of entity may appear
multiple times.
You should probably use a relational
database.
Web Scraping
Web scraping collects the
HTML source code for a
page and then searches for
particular bits of
information in the source
based upon user-defined
criteria.
Field Name
Data
Web Scraping
Benefits
Can (theoretically) access any
information displayed on page.
Drawbacks
Often hard to program – lots of
junk to clean out of the HTML.
Slow – You will almost certainly
be rate limited while accessing
information.
** Websites change will break
your script. **
Field Name
Data
Package Purpose Link
Requests HTTP
Requests
https://pypi.org/project/requests/
urllib HTTP
Requests
Included in Python 3 standard library.
Beautiful
Soup
Html/XML
Parser
https://www.crummy.com/software/Beautiful
Soup/bs4/doc/
Scrapy Web
Crawling
https://scrapy.org/
Relevant Python Packages: Web Scraping
Install Python packages from command prompt using: “pip install PACKAGE_NAME”
Use browser tools to inspect pages HTML rather than “View Page
Source”.
Hints for Web Scraping
Use browser tools to inspect pages HTML rather than “View
Page Source”.
There could be a hidden API if the data you want is not in the
HTML. It is possible in some cases to reverse engineer these.
Hints for Web Scraping
Use browser tools to inspect pages HTML rather than “View
Page Source”.
There could be a hidden API if the data you want is not in the
HTML. It is possible in some cases to reverse engineer these.
It is possible there is inline JSON. CTL-F “JSON” to check if it
is present.
Hints for Web Scraping
Use browser tools to inspect pages HTML rather than “View
Page Source”.
There could be a hidden API if the data you want is not in the
HTML. It is possible in some cases to reverse engineer these.
It is possible there is inline JSON. CTL-F “JSON” to check if it
is present.
Self limit your request rate.
Hints for Web Scraping
How Can We Analyze
Social Media Data?
Relevant Python Packages: General Analysis
Package Purpose
Pandas • R-Like Dataframes
• Data Cleaning
Numpy • Multi-dimensional Arrays
• Matrices
scipy Scientific computing (e.g. linear algebra,
interpolation)
scikit learn Machine Learning Algorithms
The easiest way to install all of the above is to install Anaconda, a python
data science platform: https://www.anaconda.com
What’s a topic model?
At it’s simplest, topic
models take a body of
documents (i.e. corpus)
and look for terms that
co-occur frequently.
Groups of terms that co-
occur represent a topic.
Every document is
presumed to be a mixture
of topics and, thus,
receives a score for
every topic.
Stolen for demonstration purposes so
not related to the project in any way,
shape, or form.
Content Analysis
• Must be coded by humans.
• Prone to human error.
Different results with different
coders.
• Works best with small
amounts of textual data due
to human labor.
• Top down pre-determined
coding scheme.
Topic Models
• Can feasibly be done only by
a computer.
• Static mathematical model
that is always replicable.
• Works best with large
amounts of textual data due
to the sampling process.
• Bottom up inductive
categorization.
Content Analysis vs. Topic Models
BOTH HAVE THEIR PLACE !!!
Relevant Python Packages: Text & Topic Modeling
Package Purpose Link
NLTK • Natural
Language
processing
• Stemming/
Lemmatization
• Tokenization
https://www.nltk.org/
Gensim LDA (preferred) https://radimrehurek.com/gensim/
LDA LDA (not
preferred)
https://pypi.org/project/lda/
Install Python packages from command prompt using: “pip install PACKAGE_NAME”
Agenda Setting on Reddit
Scott, Don’t forget I was
also on Noah’s Ark!
Check out the dinosaur
enclosure.
The Ark Encounter is a biblical
theme park which received tax
subsidies from the state. As such,
it was heavily covered in press
where the focus.
Agenda setting theories hold that
the press doesn’t tell people what
to think, but rather what to think
about.
So do the topics discussed in
online forums mirror those found
in the national news media?
Topic Label (Tentative) Terms
topic_0 Nye Debate debat, nye, peopl, creationist, scienc, creation, scientist,
topic_1 Science Denial flood, chang, evid, stori, evolut, believ, scienc, earth, ha
topic_2 Religious Teaching religion, peopl, christian, religi, children, teach, kid, learn
topic_3 Ark Park noah, encount, million, dinosaur, feet, project, creation,
topic_4 Theme Park park, theme, theme_park, build, money, peopl, educ, re
topic_5 Flood Story water, day, look, anim, time, hate, peopl, food, bit, level,
topic_6 Discr. Hiring tax, incent, park, religi, hire, tax_incent, tourism, project,
topic_7 Park Funding money, museum, million, fund, creation, project, creatio
topic_8 Belief peopl, believ, christian, mean, faith, understand, belief,
topic_9 The Ark anim, build, boat, noah, speci, flood, float, built, ship, wo
topic_10 Sep. Church & State church, school, religi, religion, help, public, guy, trip, pro
topic_11 Tax Break tax, break, govern, tax_break, busi, money, pay, religi, p
topic_12 Belief 2 god, bibl, law, believ, histori, univers, earth, creat, huma
We can use topic models to look
at change over time or
differences between groups.
What are subreddits talking about:
Politics cares about separation of church
and state.
Christianity focuses on beliefs.
Atheism is into discussing the actual ark.
Do subreddit’s talk about the same
thing?
Network data represents when
entities are linked in some
fashion. The links can represent
communication, interaction,
trade, nomination, etc..
Social media typically represents
two types of networks:
Social Graphs – indicates a
relationship of some sort.
Interest Graphs – indicates
shared interests.
Network Data Analysis
Network data can help us
find distinct groups who
may or may not interact.
It can help us see who’s
important (or not!) in
groups.
It can help us find
brokers (with power over
information transfer)
between groups.
Network Data Analysis
Relevant Python Packages: Network Analysis
Package Purpose Link
networkx • Build networks
• Calculate
network metrics
https://networkx.github.io/
Python-
louvain
Community
Detection
https://github.com/taynaud/python
-louvain
Gephi Network
Visualization GUI
(Not Python)
https://gephi.org/
Install Python packages from command prompt using: “pip install PACKAGE_NAME”
Brand Networks
Calling a single twitter
account or Facebook
page a “community” is
not seeing the forest for
the trees.
0 50 100 150 200
Microsoft
Sony
Google
IBM
MTV
Disney
Samsung
Nike
Cisco
Oracle
Intel
Ford
GE
Adobe
Verified Unverified
Brand Networks
Calling a single twitter
account or Facebook page a
“community” is not seeing the
forest for the trees.
Brands have multiple
accounts each representing a
potential point of contact.
The interconnections among
accounts are meaningful and
beneficial to brands.
Survival analysis is underused in the
analysis of social media data.
Many actions on social media have
time stamps that allow us to know
exactly when they occur.
It can be used for A/B testing or
modeling behavioral persistence.
Survival Analysis
Timestamps may be in epoch
time which is the number of
seconds elapsed since January
1st 1970.
It is often common to use UTC
time in data.
“time” in the standard python
libraries has many tools for
managing timestamps. You can
create a time object and
perform arithmetic operations
directly on the object itself.
Account Abandonment w/ Extended Cox Model
Extended Cox Regression of Time to Account Abandonment.
Model 3
Variables b eb 95% CI(eb)
Verified Accounts .61*** 1.85 1.39, 2.45
Num. of Followers .02 1.02 .85, 1.22
Post Volume -7.89*** .00 .000, .003
Post Consistency 4.17*** 64.78 8.00, 524.80
In-Degree of Addressivity -.48* .62 .43, .90
Out-Degree of Retweet -.55** .58 .40, .84
Time*Num. of Statuses .92*** 2.51 1.88, 3.35
Time*consistency -.67*** .51 .37, .69
Model χ2 544.54
Change in -2LL 25.90***
-2 Log Likelihood 4,030.51

Social Media Data Collection & Analysis

  • 1.
    Dr. W. ScottSanders Office: SK 308A ● Email: scottsanders@louisville.edu Social Media Data Collection & Analysis
  • 2.
    The Nature ofSocial Media Data • Value of Social Media Data • Types of Social Media Data • Representativeness of Social Media Data How to Get Social Media Data • Database Dumps & Public Datasets • Application Programing Interfaces (APIs) • Web Scraping Analyzing Social Media Data • Text Analysis and Topic Modeling • Social Network Analysis & Survival Analyses Overview: What We’re Talking about Today Our goal today is to discuss the pipeline of social media data analysis and to understand what is possible!!! We will only get in the weeds if you have questions!
  • 3.
    The Nature ofSocial Media Data
  • 4.
    Data is theprimary asset of social media companies as their business model is based on data driven micro-targeting of advertisements. When companies share data they are trying to negotiate two conflicting goals: 1) To allow 3rd party developers to add value to the platform by creating new functionality. 2) To control access to their data to a) maintain their competitive advantage and b) guard users privacy. Value of Social Media Data
  • 5.
    Cambridge Analytica exploitedliberal permissions within the Facebook API to collect large amounts of data prior to 2014 using an app “This is Your Digital Life” . Facebook only restricted API access once it became clear that Cambridge Analytica could replicate substantial portions of their graph. The data was used to segment potential audiences and to microtarget political advertisements with the goal of influencing the 2016 election in favor of Trump. Example: Cambridge Analytica
  • 6.
    • Network –Friendship Networks (e.g. Facebook), Interest Networks (e.g. Pinterest), Semantic Networks (e.g. relationships between texts). • Text/Written Data – Facebook Posts, Tweets, Blog Posts, Instagram Captions, et. • Behavioral Records – timestamps (e.g. account creation, posting times), counts of activity (e.g. logins, retweets, etc.), purchasing history, reputation data. • Geographic Information – Geo-location data embedded in tweets or pictures, text-based mentions of landmarks, self-reported locations. • Images – Instagram posts, reddit memes, Facebook profile pictures, Tindr pictures, etc. Types of Data On Social Media
  • 7.
    Huge amount ofresearch has been done on Twitter data due to their liberal data sharing policies. As such, Twitter is in some ways social media’s model organism. All social platforms have a unique set of technological affordances (e.g. message length, private backchannels, etc..). Pitfalls of Social Media: Representativeness of Platform
  • 8.
    Demographic variables suchas age, race, and education differ across platforms. Within a platform it is possible for different communities of users to use the platform differently. Some accounts such as organizational or bot accounts may violate the assumption of an individual user. Pitfalls of Social Media: Representativeness of Population
  • 9.
    You may nothave access to a representative sample (i.e. platforms restrict data access). Platforms may sample data in a manner opaque to the researchers. Social datasets decay overtime. Pitfalls of Social Media: Representativeness of Data
  • 10.
  • 11.
    Database Dumps, PublicData Sets, & Buying Data Database dumps are copies of collected data provided in mass by companies or other researchers. Sometimes these data sets are made public for research or analysis. You can also buy data from companies with a business relationship with specific social media platforms (e.g. Gnip & Twitter).
  • 12.
    Database Dumps, PublicData Sets, & Buying Data Benefits Super easy!!! You may get a *lot* of data. Some websites (e.g. Wikipedia, reddit) will provide dump files. Drawbacks Low odds of just being given data. Low control over the types of data you will get.
  • 13.
    Official APIs –JSON & XML A web API (Application Programming Interface) is an interface that allows for the exchange data via a (relatively) stable interface. Returns formatted data that is easy to parse with a computer.
  • 14.
    Official APIs –JSON & XML Benefits Even when the internal working of a site changes, the API often stays the same. Easy to access with simple programs. Drawbacks Limited types of data are available. May not answer the questions you want answered.
  • 15.
    Platform Package Link RedditPraw https://github.com/praw-dev/praw eBay eBay SDK https://github.com/timotheus/ebaysdk-python Twitter Python Twitter https://github.com/bear/python-twitter Twitter Tweepy https://github.com/tweepy/tweepy Facebook NA - Script https://github.com/dfreelon/fb_scrape_public Relevant Python Packages: Interacting w/ APIs Install Python packages from command prompt using: “pip install PACKAGE_NAME”
  • 16.
    • Create a Developer Account& Application • Get Tokens/Secret Keys • Write Code to Make API Call • Parsing the Response & Storing the Data Using an API You will need to create a developers account on the social media platform. Next, you will need to create an application (i.e. an app) for data collection. You may be asked for a redirect URI – leave this blank since you do not need other users to authorize the app.
  • 17.
    • Create a Developer Account& Application • Get Tokens/ Secret Keys • Write Code to Make API Call • Parsing the Response & Storing the Data Using an API You will be provided with a set of tokens/keys so that your script can authenticate itself. Best practice is to store keys in a separate file than your main script so that you can share code without compromising your account.
  • 18.
    • Create a Developer Account& Application • Get Tokens/Secret Keys • Write Code to Make API Call • Parsing the Response & Storing the Data Using an API Most web API’s are rate limited. Do back of napkin calculations to see how many calls you can make in a given timeframe. Remember the calls take time – you may be able to do more than you think. You can often make bulk requests which lowers the total number of calls you need to make.
  • 19.
    • Create a Developer Account& Application • Get Tokens/Secret Keys • Write Code to Make API Call • Parsing the Response & Storing the Data Using an API Data is typically returned as JSON or XML. Use Try/Except statements as some data fields may not always have a value. Data has a nested, hierarchical structure as the same type of entity may appear multiple times. You should probably use a relational database.
  • 20.
    Web Scraping Web scrapingcollects the HTML source code for a page and then searches for particular bits of information in the source based upon user-defined criteria. Field Name Data
  • 21.
    Web Scraping Benefits Can (theoretically)access any information displayed on page. Drawbacks Often hard to program – lots of junk to clean out of the HTML. Slow – You will almost certainly be rate limited while accessing information. ** Websites change will break your script. ** Field Name Data
  • 22.
    Package Purpose Link RequestsHTTP Requests https://pypi.org/project/requests/ urllib HTTP Requests Included in Python 3 standard library. Beautiful Soup Html/XML Parser https://www.crummy.com/software/Beautiful Soup/bs4/doc/ Scrapy Web Crawling https://scrapy.org/ Relevant Python Packages: Web Scraping Install Python packages from command prompt using: “pip install PACKAGE_NAME”
  • 23.
    Use browser toolsto inspect pages HTML rather than “View Page Source”. Hints for Web Scraping
  • 25.
    Use browser toolsto inspect pages HTML rather than “View Page Source”. There could be a hidden API if the data you want is not in the HTML. It is possible in some cases to reverse engineer these. Hints for Web Scraping
  • 26.
    Use browser toolsto inspect pages HTML rather than “View Page Source”. There could be a hidden API if the data you want is not in the HTML. It is possible in some cases to reverse engineer these. It is possible there is inline JSON. CTL-F “JSON” to check if it is present. Hints for Web Scraping
  • 29.
    Use browser toolsto inspect pages HTML rather than “View Page Source”. There could be a hidden API if the data you want is not in the HTML. It is possible in some cases to reverse engineer these. It is possible there is inline JSON. CTL-F “JSON” to check if it is present. Self limit your request rate. Hints for Web Scraping
  • 30.
    How Can WeAnalyze Social Media Data?
  • 31.
    Relevant Python Packages:General Analysis Package Purpose Pandas • R-Like Dataframes • Data Cleaning Numpy • Multi-dimensional Arrays • Matrices scipy Scientific computing (e.g. linear algebra, interpolation) scikit learn Machine Learning Algorithms The easiest way to install all of the above is to install Anaconda, a python data science platform: https://www.anaconda.com
  • 32.
    What’s a topicmodel? At it’s simplest, topic models take a body of documents (i.e. corpus) and look for terms that co-occur frequently. Groups of terms that co- occur represent a topic. Every document is presumed to be a mixture of topics and, thus, receives a score for every topic. Stolen for demonstration purposes so not related to the project in any way, shape, or form.
  • 33.
    Content Analysis • Mustbe coded by humans. • Prone to human error. Different results with different coders. • Works best with small amounts of textual data due to human labor. • Top down pre-determined coding scheme. Topic Models • Can feasibly be done only by a computer. • Static mathematical model that is always replicable. • Works best with large amounts of textual data due to the sampling process. • Bottom up inductive categorization. Content Analysis vs. Topic Models BOTH HAVE THEIR PLACE !!!
  • 34.
    Relevant Python Packages:Text & Topic Modeling Package Purpose Link NLTK • Natural Language processing • Stemming/ Lemmatization • Tokenization https://www.nltk.org/ Gensim LDA (preferred) https://radimrehurek.com/gensim/ LDA LDA (not preferred) https://pypi.org/project/lda/ Install Python packages from command prompt using: “pip install PACKAGE_NAME”
  • 35.
    Agenda Setting onReddit Scott, Don’t forget I was also on Noah’s Ark! Check out the dinosaur enclosure. The Ark Encounter is a biblical theme park which received tax subsidies from the state. As such, it was heavily covered in press where the focus. Agenda setting theories hold that the press doesn’t tell people what to think, but rather what to think about. So do the topics discussed in online forums mirror those found in the national news media?
  • 36.
    Topic Label (Tentative)Terms topic_0 Nye Debate debat, nye, peopl, creationist, scienc, creation, scientist, topic_1 Science Denial flood, chang, evid, stori, evolut, believ, scienc, earth, ha topic_2 Religious Teaching religion, peopl, christian, religi, children, teach, kid, learn topic_3 Ark Park noah, encount, million, dinosaur, feet, project, creation, topic_4 Theme Park park, theme, theme_park, build, money, peopl, educ, re topic_5 Flood Story water, day, look, anim, time, hate, peopl, food, bit, level, topic_6 Discr. Hiring tax, incent, park, religi, hire, tax_incent, tourism, project, topic_7 Park Funding money, museum, million, fund, creation, project, creatio topic_8 Belief peopl, believ, christian, mean, faith, understand, belief, topic_9 The Ark anim, build, boat, noah, speci, flood, float, built, ship, wo topic_10 Sep. Church & State church, school, religi, religion, help, public, guy, trip, pro topic_11 Tax Break tax, break, govern, tax_break, busi, money, pay, religi, p topic_12 Belief 2 god, bibl, law, believ, histori, univers, earth, creat, huma
  • 37.
    We can usetopic models to look at change over time or differences between groups. What are subreddits talking about: Politics cares about separation of church and state. Christianity focuses on beliefs. Atheism is into discussing the actual ark. Do subreddit’s talk about the same thing?
  • 38.
    Network data representswhen entities are linked in some fashion. The links can represent communication, interaction, trade, nomination, etc.. Social media typically represents two types of networks: Social Graphs – indicates a relationship of some sort. Interest Graphs – indicates shared interests. Network Data Analysis
  • 39.
    Network data canhelp us find distinct groups who may or may not interact. It can help us see who’s important (or not!) in groups. It can help us find brokers (with power over information transfer) between groups. Network Data Analysis
  • 40.
    Relevant Python Packages:Network Analysis Package Purpose Link networkx • Build networks • Calculate network metrics https://networkx.github.io/ Python- louvain Community Detection https://github.com/taynaud/python -louvain Gephi Network Visualization GUI (Not Python) https://gephi.org/ Install Python packages from command prompt using: “pip install PACKAGE_NAME”
  • 41.
    Brand Networks Calling asingle twitter account or Facebook page a “community” is not seeing the forest for the trees. 0 50 100 150 200 Microsoft Sony Google IBM MTV Disney Samsung Nike Cisco Oracle Intel Ford GE Adobe Verified Unverified
  • 42.
    Brand Networks Calling asingle twitter account or Facebook page a “community” is not seeing the forest for the trees. Brands have multiple accounts each representing a potential point of contact. The interconnections among accounts are meaningful and beneficial to brands.
  • 43.
    Survival analysis isunderused in the analysis of social media data. Many actions on social media have time stamps that allow us to know exactly when they occur. It can be used for A/B testing or modeling behavioral persistence. Survival Analysis Timestamps may be in epoch time which is the number of seconds elapsed since January 1st 1970. It is often common to use UTC time in data. “time” in the standard python libraries has many tools for managing timestamps. You can create a time object and perform arithmetic operations directly on the object itself.
  • 44.
    Account Abandonment w/Extended Cox Model Extended Cox Regression of Time to Account Abandonment. Model 3 Variables b eb 95% CI(eb) Verified Accounts .61*** 1.85 1.39, 2.45 Num. of Followers .02 1.02 .85, 1.22 Post Volume -7.89*** .00 .000, .003 Post Consistency 4.17*** 64.78 8.00, 524.80 In-Degree of Addressivity -.48* .62 .43, .90 Out-Degree of Retweet -.55** .58 .40, .84 Time*Num. of Statuses .92*** 2.51 1.88, 3.35 Time*consistency -.67*** .51 .37, .69 Model χ2 544.54 Change in -2LL 25.90*** -2 Log Likelihood 4,030.51