The document describes 3 methods for analyzing fabrication on Twitter:
1. A lightweight supervised classifier that classifies tweets as real or fake based on the user's account history and behavior rates.
2. A dynamic lightweight topic modeler that uses incremental Pareto NMF to determine topics in a corpus and the percentage of real and fake tweets per topic.
3. A corpus summarizer that extracts the most important tweet to capture each subtopic by calculating word and tweet importance based on TF-IDF.
There’s no magic way to gain followers overnight. But there are some key steps you can take to help build your community. We will show you good examples of how reporters are using Twitter and what you can do to build your following.
There’s no magic way to gain followers overnight. But there are some key steps you can take to help build your community. We will show you good examples of how reporters are using Twitter and what you can do to build your following.
Daniel Victor, senior staff editor at The New York Times, prepared this presentation on using social media for reporting for New England NewsTrain on Oct. 14, 2017. It includes how to search Twitter, Facebook and LinkedIn for news and sources; monitor social media for story ideas; create a social dossier; contact a source on social media; use Twitter lists; use callouts to crowdsource; and verify user-generated content during breaking news. It is accompanied by a handout with the same title. NewsTrain is a training initiative of Associated Press Media Editors (APME). More info: http://bit.ly/NewsTrain
Brent Williams, President of Multifamily Insiders reviews Twitter basics from multifamily operators and marketers. Presented at the 2009 AIM Conference in Denver: http://aimconf.com.
Use and Applications of Social Media in ResearchHarris Lygidakis
This is a presentation about the Use and Applications of Social Media in Medical Research.
A big thanks to the #hcsmanz community and all the Twitter and Social Media users that made this presentation possible by providing valuable material.
A bare-bones, basic look at Twitter and how students can begin to use it effectively in their transition from professional students to professional public relations practitioners... advertisers... news reporters... photographers...
Social Media Driving Licence 3 - Twitter: come fly with usCJBS smdl
Slides from the third week's workshop on twitter, part of the Social Media Driving Licence.
Please note that much of this session involved hands-on/live demo elements which are not covered in these slides.
Cross posting from instagram to facebook good strategyScott Ayres
Full post here: https://www.agorapulse.com/social-media-lab/cross-posting-instagram-facebook
In order to get more Facebook Reach you may want to cross-post content from Instagram to Facebook. But is this a good idea? Find out in this detailed study. The results my surprise you.
Karen Workman, senior staff editor for The New York Times, helps journalists answer the question: Am I Doing Social Media Right? She talks about how to use Facebook and Twitter differently and how to maximize your use of Twitter lists. She also discusses metrics to use to measure your success at social media. This presentation -- Am I Doing Social Media Right? Maximizing Your Use of Social Media for Personal Branding and Audience Engagement -- was part of the Las Vegas NewsTrain on Oct. 10-11, 2014. It was updated for DeKalb, Illinois, NewsTrain Oct. 29-30, 2015. Please see an associated handout: Am I Doing Social Media Right? NewsTrain is a traveling workshop for journalists sponsored by Associated Press Media Editors. For more information, visit http://bit.ly/NewsTrain
7 Counterintuitive Stats and Facts on Blog Posting Timesdlvr.it
7 Counterintuitive Stats and Facts on Social Media Blog Posting Times. Social Media Minutes, Marketing Monday, Social Media, Counter Intuitive, Blog Posting Strategy, Blog, Blogging, Social Shares
View original post at:
A really concise and action oriented guide to using twitter more effectively targeted at novice users, particularly media folk by someone who has done A LOT of twitter training.
Daniel Victor, senior staff editor at The New York Times, prepared this presentation on using social media for reporting for New England NewsTrain on Oct. 14, 2017. It includes how to search Twitter, Facebook and LinkedIn for news and sources; monitor social media for story ideas; create a social dossier; contact a source on social media; use Twitter lists; use callouts to crowdsource; and verify user-generated content during breaking news. It is accompanied by a handout with the same title. NewsTrain is a training initiative of Associated Press Media Editors (APME). More info: http://bit.ly/NewsTrain
Brent Williams, President of Multifamily Insiders reviews Twitter basics from multifamily operators and marketers. Presented at the 2009 AIM Conference in Denver: http://aimconf.com.
Use and Applications of Social Media in ResearchHarris Lygidakis
This is a presentation about the Use and Applications of Social Media in Medical Research.
A big thanks to the #hcsmanz community and all the Twitter and Social Media users that made this presentation possible by providing valuable material.
A bare-bones, basic look at Twitter and how students can begin to use it effectively in their transition from professional students to professional public relations practitioners... advertisers... news reporters... photographers...
Social Media Driving Licence 3 - Twitter: come fly with usCJBS smdl
Slides from the third week's workshop on twitter, part of the Social Media Driving Licence.
Please note that much of this session involved hands-on/live demo elements which are not covered in these slides.
Cross posting from instagram to facebook good strategyScott Ayres
Full post here: https://www.agorapulse.com/social-media-lab/cross-posting-instagram-facebook
In order to get more Facebook Reach you may want to cross-post content from Instagram to Facebook. But is this a good idea? Find out in this detailed study. The results my surprise you.
Karen Workman, senior staff editor for The New York Times, helps journalists answer the question: Am I Doing Social Media Right? She talks about how to use Facebook and Twitter differently and how to maximize your use of Twitter lists. She also discusses metrics to use to measure your success at social media. This presentation -- Am I Doing Social Media Right? Maximizing Your Use of Social Media for Personal Branding and Audience Engagement -- was part of the Las Vegas NewsTrain on Oct. 10-11, 2014. It was updated for DeKalb, Illinois, NewsTrain Oct. 29-30, 2015. Please see an associated handout: Am I Doing Social Media Right? NewsTrain is a traveling workshop for journalists sponsored by Associated Press Media Editors. For more information, visit http://bit.ly/NewsTrain
7 Counterintuitive Stats and Facts on Blog Posting Timesdlvr.it
7 Counterintuitive Stats and Facts on Social Media Blog Posting Times. Social Media Minutes, Marketing Monday, Social Media, Counter Intuitive, Blog Posting Strategy, Blog, Blogging, Social Shares
View original post at:
A really concise and action oriented guide to using twitter more effectively targeted at novice users, particularly media folk by someone who has done A LOT of twitter training.
Twitter is the world’s most popular microblogging site. Users share real time thoughts in 140 character bites. The service has changed the way people communicate and share on the web. While its been credited with everything from oversharing to coordinating revolutions, the service offers libraries a unique opportunity to connect directly with users. Learn how to find out what people are saying about your library, respond and create a buzz for your library or library special event.
Twitter Fundraising Holy Grail Or Fail WhaleJohn Haydon
Presentation I gave at Network For Good's 911 call on 4-28-09. Loads of fun - 2,200+ folks on the Teleconference:
http://www.fundraising123.org/article/twitter-fundraising-holy-grail-or-fail-whale
Talk I gave recently for some senior execs on getting started in social media. Why we share, what to share and how. Won't make so much sense without the commentary but hopefully some interesting slides...
S431 "Social Media: How to Share Your Genealogy without Losing Your Mind," given at the National Genealogical Society Family History Conference, Charleston, South Carolina, Saturday 14 May 2011
How to Use Blogs, Twitter & LinkedIn for Legal ProfessionalsRocket Matter, LLC
Finally, the straight dope on social media!
Blogging can help legal professionals position themselves build a web presence, protect their reputations, and build business. Social media tools like Twitter and LinkedIn can also help you grow your practice, but how much really, and how much time should you spend on them?
Hosted by our friends at Fastcase®, Smarter Legal Research. Find out more at http://www.fastcase.com.
For more Rocket Matter insight and commentary, please check out http://www.legalproductivity.com
How to follow Twitter even without an account. How to create an account. How to tweet and participate in a conversation. How to use Storify or Evernote to save tweets. How to use Hootsuite to set up a conference dashboard.
A guide to why Twitter is relevant in the research environment, how it can be useful, and how to Tweet successfully.
There's a link in the presentation to the handout used in this workshop - although it was aimed at a University of York audience, it's relevant for any academics or researchers interested in using social media.
Part of the Becoming a Networked Researcher suite of workshops.
This is an introductory workshop for Twitter for Coalitions co-presented at the CADCA Leadership Forum, 2010, January 11, 2010, Washington, DC. with Sue Stine, Jeffery Biggs, and LaDonna Coy. Handout is available at http://technologyinprevention.wikispaces.com/file/view/TwitterHandout.pdf
1. BotBoosted
Explore and Understand
the Amount of Fabrication
of a Topic in Twitter
Brian Balagot
brian.balagot@gmail.com
https://www.linkedin.com/in/briancbalagot
https://github.com/brityboy
2. To explore and understand the
amount of fabrication on Twitter…
A tweet is fabricated if it was not made
by a bona-fide human or genuine news source
3. To explore and understand the
amount of fabrication on Twitter…
A tweet is fabricated if it was not made
by a bona-fide human or genuine news source
Search: lose weight instantly
4. To explore and understand the
amount of fabrication on Twitter…
A tweet is fabricated if it was not made
by a bona-fide human or genuine news source
Search: lose weight instantly
5. To explore and understand the
amount of fabrication on Twitter…
A tweet is fabricated if it was not made
by a bona-fide human or genuine news source
REAL
REAL
REAL
REAL
FAKE
REAL
Search: lose weight instantly
6. To explore and understand the
amount of fabrication on Twitter…
A tweet is fabricated if it was not made
by a bona-fide human or genuine news source
REAL
REAL
REAL
REAL
FAKE
REAL
Search: lose weight instantly
#1: Lightweight
Supervised Classifier
7. To explore and understand the
amount of fabrication on Twitter…
A tweet is fabricated if it was not made
by a bona-fide human or genuine news source
REAL
REAL
REAL
REAL
FAKE
REAL
Search: lose weight instantly
“think invent
try lose
weight”
100% real
“learn shed
pounds burn
belly fat
smoothie”
50% real
#1: Lightweight
Supervised Classifier
8. To explore and understand the
amount of fabrication on Twitter…
A tweet is fabricated if it was not made
by a bona-fide human or genuine news source
REAL
REAL
REAL
REAL
FAKE
REAL
Search: lose weight instantly
“think invent
try lose
weight”
100% real
“learn shed
pounds burn
belly fat
smoothie”
50% real
#1: Lightweight
Supervised Classifier
#2: Dynamic Lightweight
Topic Modeler
9. To explore and understand the
amount of fabrication on Twitter…
A tweet is fabricated if it was not made
by a bona-fide human or genuine news source
REAL
REAL
REAL
REAL
FAKE
REAL
Search: lose weight instantly
“think invent
try lose
weight”
100% real
“learn shed
pounds burn
belly fat
smoothie”
50% real
REAL
FAKE
REAL
#1: Lightweight
Supervised Classifier
#2: Dynamic Lightweight
Topic Modeler
10. To explore and understand the
amount of fabrication on Twitter…
A tweet is fabricated if it was not made
by a bona-fide human or genuine news source
REAL
REAL
REAL
REAL
FAKE
REAL
Search: lose weight instantly
“think invent
try lose
weight”
100% real
“learn shed
pounds burn
belly fat
smoothie”
50% real
REAL
FAKE
REAL
#1: Lightweight
Supervised Classifier
#2: Dynamic Lightweight
Topic Modeler
#3: Corpus Summarizer
11. 2: Dynamic Lightweight Topic Modeler
Tweet
Corpus
(labeled)
Tweet
Tokenizer
Normalized
Tokens
TFIDF
Vectorizer
TFIDF
Matrix
The GOAL: Heuristically determine the topic count of a corpus, on the fly
12. 2: Dynamic Lightweight Topic Modeler
Tweet
Corpus
(labeled)
Tweet
Tokenizer
Normalized
Tokens
TFIDF
Vectorizer
TFIDF
Matrix
Incremental
“Pareto”
NMF
The GOAL: Heuristically determine the topic count of a corpus, on the fly
This is different from Regular NMF
(we have to specify the number
of topics to extract when using NMF)
13. 2: Dynamic Lightweight Topic Modeler
Tweet
Corpus
(labeled)
Tweet
Tokenizer
Normalized
Tokens
TFIDF
Vectorizer
TFIDF
Matrix
Incremental
“Pareto”
NMF
Unexplained
Topics
(“noise”)
Rich Content
(the “MEAT”)
The GOAL: Heuristically determine the topic count of a corpus, on the fly
14. 2: Dynamic Lightweight Topic Modeler
Tweet
Corpus
(labeled)
Tweet
Tokenizer
Normalized
Tokens
TFIDF
Vectorizer
TFIDF
Matrix
Incremental
“Pareto”
NMF
Unexplained
Topics
(“noise”)
Rich Content
(the “MEAT”)
The GOAL: Heuristically determine the topic count of a corpus, on the fly
4 in the body
2 in the tail
Increment = 2
15. 2: Dynamic Lightweight Topic Modeler
Tweet
Corpus
(labeled)
Tweet
Tokenizer
Normalized
Tokens
TFIDF
Vectorizer
TFIDF
Matrix
Incremental
“Pareto”
NMF
Unexplained
Topics
(“noise”)
Rich Content
(the “MEAT”)
The GOAL: Heuristically determine the topic count of a corpus, on the fly
4 in the body
2 in the tail
6 in the body
2 in the tail
Increment = 2
16. 2: Dynamic Lightweight Topic Modeler
Tweet
Corpus
(labeled)
Tweet
Tokenizer
Normalized
Tokens
TFIDF
Vectorizer
TFIDF
Matrix
Incremental
“Pareto”
NMF
Unexplained
Topics
(“noise”)
Rich Content
(the “MEAT”)
The GOAL: Heuristically determine the topic count of a corpus, on the fly
4 in the body
2 in the tail
6 in the body
2 in the tail
6 in the body
4 in the tail
Increment = 2
17. 2: Dynamic Lightweight Topic Modeler
Tweet
Corpus
(labeled)
Tweet
Tokenizer
Normalized
Tokens
TFIDF
Vectorizer
TFIDF
Matrix
Topic Count,
Document
“Soft Clustering”
Incremental
“Pareto”
NMF
Unexplained
Topics
(“noise”)
Rich Content
(the “MEAT”)
The GOAL: Heuristically determine the topic count of a corpus, on the fly
4 in the body
2 in the tail
6 in the body
2 in the tail
6 in the body
4 in the tail
Increment = 2
18. BotBoosted
% Share of Real/Fake Tweets per Topic for “Lose Weight Now”
On Average, 34% of Tweets are Fake and 66% are Real
fast, weight,
lose, tips,
smoothie
%ShareofTweets
fast, lose,
weight, diet,
days
ways, lose,
fast, weight,
fat
tips, fast,
weight, lose,
scientifically
brian, flatt,
lose, fast,
ways
#Workout #Exercise Smoothie
To Lose Weight
https://...
6 Bedtime Habits
To lose Weight Fast
https://...
#Zumba #dance to Lose
#Weight by Brian Flatt
#lose_weight_fast https://...
Natural Ways to Lose #Weight
Fast by Brian Flatt |
#lose_weight https://...
REAL
FAKE
REAL
FAKE
19. BotBoosted
% Share of Real/Fake Tweets per Topic for “Lose Weight Now”
On Average, 34% of Tweets are Fake and 66% are Real
%ShareofTweets
fast, lose,
weight, diet,
days
ways, lose,
fast, weight,
fat
tips, fast,
weight, lose,
scientifically
brian, flatt,
lose, fast,
ways
We can see which conversations are boosted by bots
fast, weight,
lose, tips,
smoothie
#Workout #Exercise Smoothie
To Lose Weight
https://...
6 Bedtime Habits
To lose Weight Fast
https://...
#Zumba #dance to Lose
#Weight by Brian Flatt
#lose_weight_fast https://...
Natural Ways to Lose #Weight
Fast by Brian Flatt |
#lose_weight https://...
REAL
FAKE
REAL
FAKE
20. Limitations & Future Work
• Twitter is very aggressive at suspending
spammers (77% suspended 1 day after 1st tweet)
(BotBoosted detects Twitter’s False Negatives)
• Twitter’s FREE API is not a statistically
representative sample (try the Garden Hose API)
• Strengthen the model by continuously training it
with twitter’s false negatives
• Benchmark Incremental “Pareto” NMF vs HDP and
LDA to further improve the heuristic algorithm
• Improve corpus summarization with graph theory
via tweet “centrality” based on word co-
occurrence
21. BotBoosted
Explore and Understand
the Amount of Fabrication
of a Topic in Twitter
Brian Balagot
brian.balagot@gmail.com
https://www.linkedin.com/in/briancbalagot
https://github.com/brityboy
Thank
You!Major References:
• Aritter. Twitter NLP. (2016). Github repository https://github.com/aritter/twitter_nlp
• Azab, A., Idrees, A., Mahmoud, M., Hefny, H. (Nov 1, 2016). Fake Account Detection in Twitter Based on Minimum Weighted
Feature set. World Academy of Science, Engineering and Technology International Journal of Computer, Electrical, Automation,
Control and Information Engineering Vol:10, No:1, 2016.
• Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M. (2015). Fame for Sale: efficient
detection of fake Twitter followers. Retrieved from Cornell University Library. (arXiv:1509.04098)
• Joseph, K., Landwehr, P., Carley, K. (2014). Two 1%s don’t make a whole: Comparing simultaneous samples from Twitter’s Streaming API.
• Karambelkar, B. (2015, Jan 5). How to use Twitter’s Search REST API most effectively.
Kharde, V., Sonawane, S. (2016). Sentiment Analysis of Twitter Data: A Survey of Techniques. International Journal of Computer
Applications. Volum 139, No 11, April 2016.
• Kontaxis, G., Polakis, I., Ioannidis, S., Markatos, E. (2011). Detecting Social Network Profile Cloning.
Retrieved from SysSec Consortium. (n.d.).
• Mori, T., Kikuchi, M., Yoshia K. (2001). Term Weighting Method based on Information Gain Ratio for Summarizing
Documents retrieved by IR systems.
• Thomas, K., Grier, C., Paxson, V., Song, D. (2011). Suspended Accounts in Retrospect: An Analysis of Twitter Spam.
22. 1: Lightweight Supervised Classifier
The GOAL: Classify a user, based on their most recent tweet, as real or fake
Relative_Volume:
likes_friends,
tweets_friends
Account
History
Random
Forest
Behavior Rate
Random
Forest
10FCV
Acc: 97%
10FCV
Acc: 95%
Account History
Behavior Rate
History Pred %
Rate Pred %
Random
Forest
10FCV
Acc: 98%
Account History: followers_count,
friends_count, total_tweets,
total_likes
Behavior Rate: likes_day,
tweets_day, friends_day,
likes_friends_day
23. 3: Corpus Summarizer
The GOAL: extract the most important tweet that captures the subtopic
Tweet
Corpus
(labeled)
Tweet
Tokenizer
Normalized
Tokens
TFIDF
Vectorizer
TFIDF
Matrix
Incremental
“Pareto”
NMF
Topic Count,
Document
“Soft Clustering”
Term
Frequency
Inverse
Document
Frequency
Word
Importance
= X
Highest Total Word
Importance
Tweet
Importance
=
Feature/Word
Importance
X
TFIDF Matrix
Topic
Label
T
w
e
et
s
Bag of Words
Feature/Word Importance
Random Forest
A word is important if it is helpful in
“bucketing” tweets into topics
A tweet is important if it is made up of important words