Background: With more than 70% of Americans seeking health information online, social media are becoming main sources of health information and related discussions. 43% of social users interact on social media for a direct response to a problem. However, it is difficult to capture and analyze these data in aggregate. There are numerous free and paid tools available and each uses different sources and processes, which make data validation challenging. Given the rapid rise of e-cigarette use in the US, this study seeks to understand the reliability and ease of use of two tools analyzing e-cigarette tweets. Methods: This study examines Twitter mentions pulled from two different industry standard tools (GNIP and Radian6) using the key words, “e-cigarettes OR vaping” OR “e-cigarettes health” OR “vaping health”. 500 mentions were collected from each tool over a 30 second period of time (12:57pm EST on August 7, 2015) for a total of 1000 mentions. Seven measures were used in this analysis – tools were compared on Cost, Feasibility, and Ease of Use; mentions were compared on Relevance to topic of e-cigarettes, Poster (individual/organization), Context (tweet content analysis), and Valence (positive/negative). Results: Within 30 seconds, 1000 tweets about e-cigarettes were captured by both of the tools. GNIP offered more flexible pricing than Radian6, however Radian6 offered higher ease of use and feasibility. Preliminary findings indicate that of the 1000 tweets, ~40% of the content was the same across both sets of data, less 10% was not on topic (Relevance), more than 30% of tweets were from organizations and more than half were from individual users (Poster); approximately a quarter of these were sales-oriented (Context); and the majority of the tweets referred to e-cigarettes and vaping positively. Conclusions: While tweets related to e-cigarettes were captured by both tools, less than half of the content was consistent across tools. Each tool had advantages in analyzing social media conversations, however future work is needed to understand data validity.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
A Tale of Two Tools: Reliability and Feasibility of Examining Twitter Mentions about E-Cigarettes from Two Social Media Tools
1. A Tale of Two Tools: Reliability
and Feasibility of Examining
Twitter Mentions
Presentation at
Society for Behavioral Medicine 2016
Amelia Burke-Garcia, MA
Cassandra Stanton, PhD
Nicole Soufi
Westat Center for Digital Strategy & Research
April 1, 2016
2. “47.6% of current cigarette
smokers & 55.4% of recent
former cigarette smokers have
tried an e-cigarette.”
~CDC, 2015
3.
4. “Various terms are used
to refer to e-cigarettes,
e.g. “e-hookahs” and
“vaporizers”.
~New York Times, 2014
5. “The failure to equate vaping
products generally with e-
cigarettes underscores how
successful the tobacco industry
has been in reinventing a
popular “smoking” trend.”
~Gostin & Glasner, 2014
7. 7
“89% of all Americans are
online.”
― International Telecommunication Union (ITU), United Nations Population
Division, Internet & Mobile Association of India (IAMAI), World Bank, 2016
8. “With hundreds of millions of people
spending countless hours on social
media to share, communicate, connect,
interact, and create user-generated data
at an unprecedented rate, social media
has become one unique source of big
data.”
~Zafarani, Abbasi, & Liu, 2014
11. “Social media data is noisy,
free-format, of varying
length,
and multimedia.”
~Zafarani, Abbasi & Liu,
2014
12. More Issues
• There is a lack of documentation about how the data is
identified and sampled (Morstatter et al., 2013; Valkanas
et al., 2014).
• Twitter’s free sample provides less representative data
(Morstatter et al., 2013; Valkanas et al., 2014).
– This may hold true for samples drawn from other data mining
tools.
• Data come with accessing, storing & analyzing costs
(Morstatter et al., 2013; Valkanas et al., 2014).
13. Research Question
How does Twitter coverage of e-cigarette-
related conversations differ by data source
(e.g. Radian6 vs. GNIP)?
14. Methods
• Compared tweets from two tools:
– Twitter’s GNIP “Firehose” service
– Saleforce’s Radian6 tool
• Key words included:
– “e-cigarettes OR vaping” OR “e-cigarettes health”
OR “vaping health”
• A total of 1000 mentions were collected
– 500 mentions were collected from each tool over
a 30 second period of time (12:57pm EST on
August 7, 2015)
15. Methods
• Six measures were proposed to be used in
this analysis:
– Tools
• Cost, Feasibility & Ease of Use
– Themes
• Poster (individual/organization)
• Context (12 themes, combined to 9)
• Valence (positive/negative)
– Interrater reliability was 94%
17. Tool Comparison
Radian6 GNIP
Cost
Tiered pricing
Cost based on number of
mentions
Tiered pricing based on
sources and amount of
content
Ease of Use
Offers a visual dashboard
Easy to pull content and
analyze it
Requires storage capacity to
store data
Requires programming
knowledge to access the data
Requires computing power to
analyze the data
Feasibility ?? ??
?? ??
22. Feasibility
• Across most measures, these tools delivered
similar results.
– Specifically, both demonstrated the overwhelming
presence marketing content and individual
conversations about e-cigarettes.
• A key difference was in the level of sales and
marketing content that GNIP pulled.
• Based on this analysis, either tool may be a viable
option for researchers seeking to analyze Twitter
data.
– Radian6 may be a better option from a cost and ease-
of-use standpoint.
23. Conclusions
• Researchers seeking to understand social media
conversations have a number of options for data
mining.
• Given similarity in content collected across both
tools, cost and ease-of-use should be primary
considerations when selecting a data mining tool.
– GNIP offers quality data (and is well-referenced in
literature) but requires resources to work with its data.
– Radian6 provides an alternative when resources and
computing power are limited.
24. Conclusions
• In terms of content, results
demonstrated a gap in
conversations around
health consequences of
vaping.
• Moreover, this study
revealed that industry and
marketing are using this
medium exceedingly more
than the public health
community.
~500 e-
cigarette
marketing
tweets in 30
seconds.
25. Future Directions
• Analyze these data in greater detail, e.g.
which flavors and which brands.
• Compare data collected using other tools.
• Examine other forms of tobacco use
(e.g., hookah, cigars, snus).
• Further examine characteristics of the
posters.