A deleted tweet refers to a tweet that has been removed from Twitter by its author. The document discusses analyzing deleted tweets from Breitbart News and finding that most were from automatic deletion of tweets over a week old by Breitbart affiliates. It also notes the importance of archiving controversial tweets to the web for longer retention than personal archives and that top-level Twitter pages are better than individual tweets for finding deleted tweets of popular accounts.
Twinder: A Search Engine for Twitter Streams Ke Tao
Twinder is a search engine for Twitter streams that analyzes 13 features to determine the relevance and interestingness of tweets for a given topic. It demonstrates scalability using MapReduce and cloud computing. Analysis shows that models using semantics and topic-sensitive features outperform those without. Contextual user features have little impact, and feature importance depends on topic characteristics. Overall, Twinder achieves over 35% precision and 45% recall using all features.
Replaying Archived Twitter: When your bird is broken, will it bring you down? Kritika Garg
This document summarizes a study on analyzing labeled tweets from former US President Donald Trump's suspended @realDonaldTrump Twitter account that are archived in web archives. The study found that:
- The majority (93%) of archived mementos were of the old Twitter UI, which did not fully capture labels like "fact-check" and only gradually added the "Violated Twitter Rules" label starting in late August 2020.
- Nearly all (97%) of mementos for the 476 labeled tweets studied did not display the tweets' labels.
- Around half (49%) of mementos using the new Twitter UI were temporally out of order, with the page timestamp preceding the embedded tweet JSON
It’s getting easier to find marketers and brands who have at least one solid case study on how Facebook has resulted in some type of uptick against their marketing KPI’s. Yet, in 2012, it seems that Twitter has been all but kicked to the back of the room as a tool we use to talk with our friends and share pictures of kittens. Yet, in a recent study, Edison Research projects that while only 10% of Americans
regularly interact with Twitter, some 89% of Americans ages 12 and up are at least familiar with the service. Another study conducted by Trendrr in March of this year found that Twitter dominates about 85% of all social media activity surrounding broadcast TV. In this session, Nate Riggs will show you why as a social network and marketing tool — Twitter will win. He’ll demonstrate why this micro-blogging market place is worthy of the investment of your time and dollars, and share strategic tips, insights and tactical tools that will help you take advantage of the serendipity of Twitter.
Swap2010 twitter minining using semantic web technologies and linked dataSelver Softic
This document proposes an architecture for mining microblogs using semantic technologies. It discusses acquiring tweets from services like Twitter, triplifying the data into RDF triples, and interlinking the tweets with concepts in the Linked Open Data cloud. This semantic representation of microblog data could then enable more advanced analysis and applications over the unified, structured social media data.
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Twinder: A Search Engine for Twitter Streams Ke Tao
Twinder is a search engine for Twitter streams that analyzes 13 features to determine the relevance and interestingness of tweets for a given topic. It demonstrates scalability using MapReduce and cloud computing. Analysis shows that models using semantics and topic-sensitive features outperform those without. Contextual user features have little impact, and feature importance depends on topic characteristics. Overall, Twinder achieves over 35% precision and 45% recall using all features.
Replaying Archived Twitter: When your bird is broken, will it bring you down? Kritika Garg
This document summarizes a study on analyzing labeled tweets from former US President Donald Trump's suspended @realDonaldTrump Twitter account that are archived in web archives. The study found that:
- The majority (93%) of archived mementos were of the old Twitter UI, which did not fully capture labels like "fact-check" and only gradually added the "Violated Twitter Rules" label starting in late August 2020.
- Nearly all (97%) of mementos for the 476 labeled tweets studied did not display the tweets' labels.
- Around half (49%) of mementos using the new Twitter UI were temporally out of order, with the page timestamp preceding the embedded tweet JSON
It’s getting easier to find marketers and brands who have at least one solid case study on how Facebook has resulted in some type of uptick against their marketing KPI’s. Yet, in 2012, it seems that Twitter has been all but kicked to the back of the room as a tool we use to talk with our friends and share pictures of kittens. Yet, in a recent study, Edison Research projects that while only 10% of Americans
regularly interact with Twitter, some 89% of Americans ages 12 and up are at least familiar with the service. Another study conducted by Trendrr in March of this year found that Twitter dominates about 85% of all social media activity surrounding broadcast TV. In this session, Nate Riggs will show you why as a social network and marketing tool — Twitter will win. He’ll demonstrate why this micro-blogging market place is worthy of the investment of your time and dollars, and share strategic tips, insights and tactical tools that will help you take advantage of the serendipity of Twitter.
Swap2010 twitter minining using semantic web technologies and linked dataSelver Softic
This document proposes an architecture for mining microblogs using semantic technologies. It discusses acquiring tweets from services like Twitter, triplifying the data into RDF triples, and interlinking the tweets with concepts in the Linked Open Data cloud. This semantic representation of microblog data could then enable more advanced analysis and applications over the unified, structured social media data.
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Challenges in Replaying Archived Twitter PagesKritika Garg
Historians and researchers rely on web archives to preserve social media content that no longer exists on the live web. However, what we see on the live web and how it is replayed in the archive are not always the same. In this study, we document and analyze the problems in archiving Twitter after Twitter switched to a new user interface (UI) in June 2020. Most web archives were unable to archive the new UI, resulting in archived Twitter pages displaying Twitter’s “Something went wrong” error. The challenges in archiving the new UI forced web archives to continue using the old UI. But, features such as Twitter labels were a part of the new UI, hence web archives archiving Twitter’s old UI would be missing these labels. To analyze the potential loss of information in web archival data due to this change, we used the personal Twitter account of the 45th President of the United States, @realDonaldTrump, which was suspended by Twitter on January 8, 2021. Trump’s account was heavily labeled by Twitter for spreading misinformation, however we discovered that there is no evidence in web archives to prove that some of his tweets ever had a label assigned to them. We also studied the possibility of temporal violations in archived versions of the new UI, which may result in the replay of pages that never existed on the live web. We also discovered that when some tweets with embedded media are replayed, portions of the rewritten t.co URL, which is meant to be hidden from the end-user, is partially exposed in the replayed page. Our goal is to educate researchers who may use web archives and caution them when drawing conclusions based on archived Twitter pages.
Web Archives for Verifying Attribution in Twitter ScreenshotsTarannum Zaki
Users rarely think about verifying screenshots of social media posts before sharing them on social media. This eventually leads to the spread of misinformation and disinformation. We are developing an automated tool to estimate the probability that a screenshot of a social media post is fake. In many cases, web archives can be used to validate the attribution of such screenshots.
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein
A glimpse into what social media is all about and how the researchers in the world are using social media. Social media is not a mere hype and not a platform to leverage word-of-the-mouth practices as is the common perception of it in Pakistan: it is much more than that and this is what this talk presented.
MINING OPINIONS ABOUT TRAFFIC STATUS USING TWITTER MESSAGESIAEME Publication
The document describes a system for mining opinions from tweets about traffic status. Around 5000 traffic-related tweets were collected and preprocessed by removing stop words and punctuation. The tweets were manually labeled as expressing positive (p) or negative (n) opinions. Various classifiers were trained on the labeled data and evaluated. The top 7 classifiers were combined into an ensemble model, which achieved an F-measure of 87.15% for classifying tweets in the test set, indicating the system is effective at mining opinions on traffic status from tweets.
The document describes a study that aimed to develop a system for mining opinions from tweets about traffic status. Over 5,000 traffic-related tweets were collected and preprocessed by removing stop words and punctuation. The tweets were manually labeled as expressing either positive (p) or negative (n) sentiment. Various classifiers were trained on 80% of the labeled data and validated. The top 7 performing classifiers were combined into an ensemble model, which achieved an F-measure of 87.15% when classifying the remaining 20% of tweets, indicating the system is effective at mining traffic sentiment from tweets.
DataKind SG sharing on our first DataDive with Humanitarian Organization for Migration Economics (HOME) and Earth Hour.
Know of other non-profits we can help? Reach out to singapore@datakind.org or drop me a note =)
This document discusses using Twitter and Python for open-source intelligence (OSINT) gathering. It provides an overview of Twitter concepts and the Twitter API. It also demonstrates how to use the Python library Tweepy to access Twitter data and analyze tweets. Specific analyses demonstrated include visualizing hashtags, retweets, replies and interactions over time. The goal is to gather intelligence on individuals, groups, topics and markets from public Twitter data.
This document discusses congressional deleted tweets and tools for tracking them. It provides links to deleted tweets from members of Congress as well as tools like Politwoops and the Social Feed Manager that archive deleted tweets from public officials. It also notes that web archives can include deleted social media posts and highlights the challenge of capturing all deleted content from a single service.
Uncertainty in replaying archived Twitter pagesMichael Nelson
Michael L. Nelson
@phonedude_mln
with: Sawood Alam, Kritika Garg, Himarsha Jayanetti,
Shawn M. Jones, Nauman Siddique, Michele C. Weigle
@WebSciDL
Ethics and Archiving the Web: How to ethically collect and use web archives
2021-03-30
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
Keynote delivered at the SRA Social Media in Social Research conference, London, 24 June, 2013. The presentation highlights some thoughts on sampling, tools, data, ethics and user requirements for Twitter analytics, including an overview of a series of recent tools.
From Chirps to Whistles - Discovering Event-specific Informative Content from...Debanjan Mahata
Twitter has brought a paradigm shift in the way we produce and curate information about real-life events. Huge volumes of user-generated tweets are produced in Twitter, related to events. Not, all of them are useful and informative. A sizable amount of tweets are spams and colloquial personal status updates, which does not provide any useful information about an event. Thus, it is necessary to identify, rank and segregate event-specific informative content from the tweet streams. In this paper, we develop a novel generic framework based on the principle of mutual reinforcement, for identifying event-specific informative content from Twitter. Mutually reinforcing relationships between tweets, hashtags, text units, URLs and users are defined and represented using TwitterEventInfoGraph. An algorithm - TwitterEventInfoRank is proposed, that simultaneously ranks tweets, hashtags, text units, URLs and users producing them in terms of event-specific informativeness by leveraging the semantics of relationships between each of them as represented by TwitterEventInfoGraph. Experiments and observations are reported on four million (approx) tweets collected for five real-life events, and evaluated against popular baseline techniques showing significant improvement in performance.
Web Archives for Verifying Attribution in Twitter ScreenshotsTarannum Zaki
Users rarely think about verifying screenshots of social media posts before sharing them on social media. This eventually leads to the spread of misinformation and disinformation. We are developing an automated tool to estimate the probability that a screenshot of a social media post is fake. In many cases, web archives can be used to validate the attribution of such screenshots.
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
Presented at ACM CIKM 2019. Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search engine results and social media links are represented as surrogates, small easily digestible summaries of the underlying page. Search engines and social media have a different focus, and hence produce different surrogates than web archives. Search engine surrogates help a user answer the question "Will this link meet my information need?" Social media surrogates help a user decide "Should I click on this?" Our use case is subtly different. We hypothesize that groups of surrogates together are useful for summarizing a collection. We want to help users answer the question of "What does the underlying collection contain?" But which surrogate should we use? With Mechanical Turk participants, we evaluate six different surrogate types against each other. We find that the type of surrogate does not influence the time to complete the task we presented the participants. Of particular interest are social cards, surrogates typically found on social media, and browser thumbnails, screen captures of web pages rendered in a browser. At p=0.0569, and p=0.0770, respectively, we find that social cards and social cards paired side-by-side with browser thumbnails probably provide better collection understanding than the surrogates currently used by the popular Archive-It web archiving platform. We measure user interactions with each surrogate and find that users interact with social cards less than other types. The results of this study have implications for our web archive summarization work, live web curation platforms, social media, and more.
All You Need To Know About LETTER WRITINGHeather Lee
The Arc Oneida-Lewis, a nonprofit that serves people with intellectual and developmental disabilities, is facing organizational changes due to client dissatisfaction. Staff turnover has increased and employee morale has decreased as clients report unmet needs and a lack of communication from leadership. In order to better serve clients, the organization will need to address internal issues through changes to management practices, employee training, and communication strategies.
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
This document summarizes a presentation on how web archives could be weaponized to alter trustworthy content or obfuscate the provenance of untrustworthy content. As web archiving and audio/video synthesis techniques advance, it will become easier to generate misleading evidence from archived web pages and media. However, web archives are currently unreliable witnesses due to issues like zombies, temporal violations, and potential attacks. Relying on a single archive is not sufficient to authenticate web content.
This handout by Sona Patel accompanies slides -- Using social media for reporting -- presented by Julie Patel Liss and Seth Liss at Fresno NewsTrain. Sona Patel is director of community at The New York Times. For more information about the News Leaders Association's NewsTrain, please see https://www.newsleaders.org/newstrain.
The document discusses social media security challenges related to cognition, cross-platform issues, and push algorithms. It covers topics like abuse targeting internal or external victims, security issues on social media, and the life cycle and influence of social media posts. Detection of multiple accounts and geolocation identification on social media are also summarized.
Challenges in Replaying Archived Twitter PagesKritika Garg
Historians and researchers rely on web archives to preserve social media content that no longer exists on the live web. However, what we see on the live web and how it is replayed in the archive are not always the same. In this study, we document and analyze the problems in archiving Twitter after Twitter switched to a new user interface (UI) in June 2020. Most web archives were unable to archive the new UI, resulting in archived Twitter pages displaying Twitter’s “Something went wrong” error. The challenges in archiving the new UI forced web archives to continue using the old UI. But, features such as Twitter labels were a part of the new UI, hence web archives archiving Twitter’s old UI would be missing these labels. To analyze the potential loss of information in web archival data due to this change, we used the personal Twitter account of the 45th President of the United States, @realDonaldTrump, which was suspended by Twitter on January 8, 2021. Trump’s account was heavily labeled by Twitter for spreading misinformation, however we discovered that there is no evidence in web archives to prove that some of his tweets ever had a label assigned to them. We also studied the possibility of temporal violations in archived versions of the new UI, which may result in the replay of pages that never existed on the live web. We also discovered that when some tweets with embedded media are replayed, portions of the rewritten t.co URL, which is meant to be hidden from the end-user, is partially exposed in the replayed page. Our goal is to educate researchers who may use web archives and caution them when drawing conclusions based on archived Twitter pages.
Web Archives for Verifying Attribution in Twitter ScreenshotsTarannum Zaki
Users rarely think about verifying screenshots of social media posts before sharing them on social media. This eventually leads to the spread of misinformation and disinformation. We are developing an automated tool to estimate the probability that a screenshot of a social media post is fake. In many cases, web archives can be used to validate the attribution of such screenshots.
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein
A glimpse into what social media is all about and how the researchers in the world are using social media. Social media is not a mere hype and not a platform to leverage word-of-the-mouth practices as is the common perception of it in Pakistan: it is much more than that and this is what this talk presented.
MINING OPINIONS ABOUT TRAFFIC STATUS USING TWITTER MESSAGESIAEME Publication
The document describes a system for mining opinions from tweets about traffic status. Around 5000 traffic-related tweets were collected and preprocessed by removing stop words and punctuation. The tweets were manually labeled as expressing positive (p) or negative (n) opinions. Various classifiers were trained on the labeled data and evaluated. The top 7 classifiers were combined into an ensemble model, which achieved an F-measure of 87.15% for classifying tweets in the test set, indicating the system is effective at mining opinions on traffic status from tweets.
The document describes a study that aimed to develop a system for mining opinions from tweets about traffic status. Over 5,000 traffic-related tweets were collected and preprocessed by removing stop words and punctuation. The tweets were manually labeled as expressing either positive (p) or negative (n) sentiment. Various classifiers were trained on 80% of the labeled data and validated. The top 7 performing classifiers were combined into an ensemble model, which achieved an F-measure of 87.15% when classifying the remaining 20% of tweets, indicating the system is effective at mining traffic sentiment from tweets.
DataKind SG sharing on our first DataDive with Humanitarian Organization for Migration Economics (HOME) and Earth Hour.
Know of other non-profits we can help? Reach out to singapore@datakind.org or drop me a note =)
This document discusses using Twitter and Python for open-source intelligence (OSINT) gathering. It provides an overview of Twitter concepts and the Twitter API. It also demonstrates how to use the Python library Tweepy to access Twitter data and analyze tweets. Specific analyses demonstrated include visualizing hashtags, retweets, replies and interactions over time. The goal is to gather intelligence on individuals, groups, topics and markets from public Twitter data.
This document discusses congressional deleted tweets and tools for tracking them. It provides links to deleted tweets from members of Congress as well as tools like Politwoops and the Social Feed Manager that archive deleted tweets from public officials. It also notes that web archives can include deleted social media posts and highlights the challenge of capturing all deleted content from a single service.
Uncertainty in replaying archived Twitter pagesMichael Nelson
Michael L. Nelson
@phonedude_mln
with: Sawood Alam, Kritika Garg, Himarsha Jayanetti,
Shawn M. Jones, Nauman Siddique, Michele C. Weigle
@WebSciDL
Ethics and Archiving the Web: How to ethically collect and use web archives
2021-03-30
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
Keynote delivered at the SRA Social Media in Social Research conference, London, 24 June, 2013. The presentation highlights some thoughts on sampling, tools, data, ethics and user requirements for Twitter analytics, including an overview of a series of recent tools.
From Chirps to Whistles - Discovering Event-specific Informative Content from...Debanjan Mahata
Twitter has brought a paradigm shift in the way we produce and curate information about real-life events. Huge volumes of user-generated tweets are produced in Twitter, related to events. Not, all of them are useful and informative. A sizable amount of tweets are spams and colloquial personal status updates, which does not provide any useful information about an event. Thus, it is necessary to identify, rank and segregate event-specific informative content from the tweet streams. In this paper, we develop a novel generic framework based on the principle of mutual reinforcement, for identifying event-specific informative content from Twitter. Mutually reinforcing relationships between tweets, hashtags, text units, URLs and users are defined and represented using TwitterEventInfoGraph. An algorithm - TwitterEventInfoRank is proposed, that simultaneously ranks tweets, hashtags, text units, URLs and users producing them in terms of event-specific informativeness by leveraging the semantics of relationships between each of them as represented by TwitterEventInfoGraph. Experiments and observations are reported on four million (approx) tweets collected for five real-life events, and evaluated against popular baseline techniques showing significant improvement in performance.
Web Archives for Verifying Attribution in Twitter ScreenshotsTarannum Zaki
Users rarely think about verifying screenshots of social media posts before sharing them on social media. This eventually leads to the spread of misinformation and disinformation. We are developing an automated tool to estimate the probability that a screenshot of a social media post is fake. In many cases, web archives can be used to validate the attribution of such screenshots.
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
Presented at ACM CIKM 2019. Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search engine results and social media links are represented as surrogates, small easily digestible summaries of the underlying page. Search engines and social media have a different focus, and hence produce different surrogates than web archives. Search engine surrogates help a user answer the question "Will this link meet my information need?" Social media surrogates help a user decide "Should I click on this?" Our use case is subtly different. We hypothesize that groups of surrogates together are useful for summarizing a collection. We want to help users answer the question of "What does the underlying collection contain?" But which surrogate should we use? With Mechanical Turk participants, we evaluate six different surrogate types against each other. We find that the type of surrogate does not influence the time to complete the task we presented the participants. Of particular interest are social cards, surrogates typically found on social media, and browser thumbnails, screen captures of web pages rendered in a browser. At p=0.0569, and p=0.0770, respectively, we find that social cards and social cards paired side-by-side with browser thumbnails probably provide better collection understanding than the surrogates currently used by the popular Archive-It web archiving platform. We measure user interactions with each surrogate and find that users interact with social cards less than other types. The results of this study have implications for our web archive summarization work, live web curation platforms, social media, and more.
All You Need To Know About LETTER WRITINGHeather Lee
The Arc Oneida-Lewis, a nonprofit that serves people with intellectual and developmental disabilities, is facing organizational changes due to client dissatisfaction. Staff turnover has increased and employee morale has decreased as clients report unmet needs and a lack of communication from leadership. In order to better serve clients, the organization will need to address internal issues through changes to management practices, employee training, and communication strategies.
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
This document summarizes a presentation on how web archives could be weaponized to alter trustworthy content or obfuscate the provenance of untrustworthy content. As web archiving and audio/video synthesis techniques advance, it will become easier to generate misleading evidence from archived web pages and media. However, web archives are currently unreliable witnesses due to issues like zombies, temporal violations, and potential attacks. Relying on a single archive is not sufficient to authenticate web content.
This handout by Sona Patel accompanies slides -- Using social media for reporting -- presented by Julie Patel Liss and Seth Liss at Fresno NewsTrain. Sona Patel is director of community at The New York Times. For more information about the News Leaders Association's NewsTrain, please see https://www.newsleaders.org/newstrain.
The document discusses social media security challenges related to cognition, cross-platform issues, and push algorithms. It covers topics like abuse targeting internal or external victims, security issues on social media, and the life cycle and influence of social media posts. Detection of multiple accounts and geolocation identification on social media are also summarized.
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...APNIC
Adli Wahid, Senior Internet Security Specialist at APNIC, delivered a presentation titled 'Honeypots Unveiled: Proactive Defense Tactics for Cyber Security' at the Phoenix Summit held in Dhaka, Bangladesh from 23 to 24 May 2024.
Securing BGP: Operational Strategies and Best Practices for Network Defenders...APNIC
Md. Zobair Khan,
Network Analyst and Technical Trainer at APNIC, presented 'Securing BGP: Operational Strategies and Best Practices for Network Defenders' at the Phoenix Summit held in Dhaka, Bangladesh from 23 to 24 May 2024.
HijackLoader Evolution: Interactive Process HollowingDonato Onofri
CrowdStrike researchers have identified a HijackLoader (aka IDAT Loader) sample that employs sophisticated evasion techniques to enhance the complexity of the threat. HijackLoader, an increasingly popular tool among adversaries for deploying additional payloads and tooling, continues to evolve as its developers experiment and enhance its capabilities.
In their analysis of a recent HijackLoader sample, CrowdStrike researchers discovered new techniques designed to increase the defense evasion capabilities of the loader. The malware developer used a standard process hollowing technique coupled with an additional trigger that was activated by the parent process writing to a pipe. This new approach, called "Interactive Process Hollowing", has the potential to make defense evasion stealthier.
1. GRAMPA, WHAT'S A
DELETED TWEET?
Mohammed Nauman Siddique
Web Archiving Forensics (CS 895)
Spring, 2019
Web Science and Digital Libraries Group
Old Dominion University
Norfolk, Virginia, USA
@WebSciDL
2. Presidential tweets are now government records
@m_nsiddique, @WebSciDL 2
Source: https://web.archive.org/web/20170121171210/http:/twitter.com/realDonaldTrump/status/822853741040771072
News Article: https://theconversation.com/donald-trumps-tweets-are-now-presidential-records-71973
3. 11% of the social media resources are lost in their first year
@m_nsiddique, @WebSciDL 3
Source: SalahEldeen H.M., Nelson M.L. (2012) Losing My Revolution: How Many Resources Shared on Social Media
Have Been Lost?. TPDL 2012. Springer, Berlin, Heidelberg
Blog Link: http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
4. Politwoops: Tracks deleted tweets by public officials
@m_nsiddique, @WebSciDL 4
Source: https://projects.propublica.org/politwoops/
5. The best way to find a typo is to hit send
@m_nsiddique, @WebSciDL 5
Source: https://projects.propublica.org/politwoops/tweet/1056626382548156416
6. Fixing typos only introduces more typos
@m_nsiddique, @WebSciDL 6
Source: https://twitter.com/RepDannyDavis/status/1056627582148530177
7. Unretweeted after a year!!!
@m_nsiddique, @WebSciDL 7
Source: https://projects.propublica.org/politwoops/tweet/910352940749254657
9. Politwoops resumes after 6 months
@m_nsiddique, @WebSciDL 9
Tweet is deleted
Source: https://blog.twitter.com/official/en_us/a/2015/holding-public-officials-accountable-with-twitter-and-politwoops.html
10. Flight handle is gone
@m_nsiddique, @WebSciDL 10
Source: https://twitter.com/Flight/status/656882929923059713
11. No worries web archives come to the rescue
@m_nsiddique, @WebSciDL 11
Source: https://web.archive.org/web/20160205000405/https://twitter.com/Flight/status/656882929923059713
12. Web archives include social media too
@m_nsiddique, @WebSciDL 12
Source: https://web.archive.org/web/20180929210711/https:/twitter.com/RepDannyDavis
13. Nauman, you are not archived
@m_nsiddique, @WebSciDL 13
Source: https://web.archive.org/web/*/https://twitter.com/m_nsiddique
14. @BreitbartNews is well archived
@m_nsiddique, @WebSciDL 14
Source: https://web.archive.org/web/*/https://twitter.com/BreitbartNews
15. @realDonaldTrump is very heavily archived
@m_nsiddique, @WebSciDL 15
Source: https://web.archive.org/web/*/https://twitter.com/realDonaldTrump
16. Archival captures for top level pages have approximately 20 tweets
@m_nsiddique, @WebSciDL 16
Source: https://web.archive.org/web/20190202074656/https:/twitter.com/realDonaldTrump
17. Tweet Ids are just a single tweet
@m_nsiddique, @WebSciDL 17
Source: https://web.archive.org/web/20190202054351/https://twitter.com/realdonaldtrump/status/1091427927475085312
18. Not enough to take screenshots
@m_nsiddique, @WebSciDL 18
Source: https://twitter.com/CasMudde/status/960546130684768256
News Article: https://www.huffingtonpost.com/entry/
breitbart-anti-muslim-tweet_us_5a78b426e4b0164659c70e15
21. How did we find the deleted tweets?
• Used Twitter API to fetch recent 3200 tweets
• Tweets spanned from Oct 22, 2017 to Feb 18, 2018
• Used Memgator, memento aggregator service to fetch
mementos
@m_nsiddique, @WebSciDL 21
22. Code to fetch recent tweets using Python-TwitterAPI
import twitter
api = twitter.Api(consumer_key='xxxxxx',
consumer_secret='xxxxxx',
access_token_key='xxxxxx',
access_token_secret='xxxxxx',
sleep_on_rate_limit=True)
twitter_response = api.GetUserTimeline(screen_name=screen_name,
count=200, include_rts=True)
@m_nsiddique, @WebSciDL 22
23. Run MemGator locally
$ memgator --contimeout=10s --agent=XXXXXX server
MemGator 1.0-rc7
_____ _______ __
/ _____ _____ / _____/______/ |___________
/ Y Y / __ / / _____ _/ _ _ _
/ | | ___/ Y Y _ / __ | | |_| | | /
__/_____/______|_|__/_______/_____|__|___/|__|
TimeMap : http://localhost:1208/timemap/{FORMAT}/{URI-R}
TimeGate : http://localhost:1208/timegate/{URI-R} [Accept-
Datetime]
Memento :
http://localhost:1208/memento[/{FORMAT}|proxy]/{DATETIME}/{UR
I-R}
# FORMAT => link|json|cdxj
# DATETIME => YYYY[MM[DD[hh[mm[ss]]]]]
# Accept-Datetime => Header in RFC1123 format
@m_nsiddique, @WebSciDL 23
Source: https://github.com/oduwsdl/MemGator
26. Play with TimeMap and TimeGate
@m_nsiddique, @WebSciDL 26
Source: http://memgator.cs.odu.edu/api.html
27. Code to fetch TimeMap for any Twitter handle
url = "http://localhost:1208/timemap/"
data_format = "cdxj"
command = url + data_format +
"/http://twitter.com/<screen-name>" +
response = requests.get(command)
@m_nsiddique, @WebSciDL 27
28. Code to parse tweet-related information
import bs4
soup = bs4.BeautifulSoup(open(<HTML representation of
Memento>),"html.parser")
match_tweet_div_tag = soup.select('div.js-stream-tweet')
for tag in match_tweet_div_tag:
if tag.has_attr("data-tweet-id"):
# Get Tweet id
...........
# Parse tweets
match_timeline_tweets = tag.select('p.js-tweet-
text.tweet-text')
...........
# Parse tweet timestamps
match_tweet_timestamp = tag.find("span", {"class":
"js-short-timestamp"})
...........
@m_nsiddique, @WebSciDL 28
29. Analysis of Breitbart News Deleted Tweets
• Of the 22 deleted tweets, 20 were of the form where
Breitbart News retweeted someone's tweet but the
original tweet was lost.
• Of those 20 tweets, 18 were from two affiliates of Breitbart
News, @NolteNC and @carney. Therefore, we decided to
have a look at both the accounts to determine the reason
for their deleted tweets.
@m_nsiddique, @WebSciDL 29
33. Analysis on @carney and @NolteNC
• Mementos fetched between Nov 3, 2017 and Feb 17,
2018
• Low number of mementos for @carney
• @NolteNC had 169 live tweets and 3569 deleted tweets
• Fetched live tweets using Twitter API for both accounts for
over two weeks
@m_nsiddique, @WebSciDL 33
34. Tweets older than a week on Tuesday and Saturday are deleted
@m_nsiddique, @WebSciDL 34
35. Tweets older than a week on Wednesday and Saturday are deleted
@m_nsiddique, @WebSciDL 35
36. • With 1000s of deleted tweets, it seemed unlikely that he
was manually deleting tweets.
• We have all the reasons to believe that @carney and
@NolteNC deleted tweets automatically using some tweet
deletion service.
@m_nsiddique, @WebSciDL 36
Deletion Behavior
37. Take Away
• It is not enough to make screen shots of controversial
tweets, rather we need to push it to the web archives for
longer retention capability than our personal archives.
• For finding deleted tweets, web archives work effectively
for popular accounts but for less popular accounts this
approach might not work.
• For finding deleted tweets, top level page works better
than individual Tweet Id URLs.
• Most deletions for Breitbart News come from automatic
deletion of tweets by some of its correspondents.
@m_nsiddique, @WebSciDL 37
38. You can read more on the blog
http://ws-dl.blogspot.com/2018/04/2018-04-23-grampa-
whats-deleted-tweet.html
@m_nsiddique, @WebSciDL 38