Users rarely think about verifying screenshots of social media posts before sharing them on social media. This eventually leads to the spread of misinformation and disinformation. We are developing an automated tool to estimate the probability that a screenshot of a social media post is fake. In many cases, web archives can be used to validate the attribution of such screenshots.
Stunning ➥8448380779▻ Call Girls In Paharganj Delhi NCR
Web Archives for Verifying Attribution in Twitter Screenshots
1. Modeling Simulation & Visualization Student
Capstone Conference 2024
Web Archives for Verifying
Attribution in Twitter Screenshots
Track: AI and Autonomous Systems
Authors: Tarannum Zaki, Michael L. Nelson, and Michele C. Weigle
Presented by Tarannum Zaki
Department of Computer Science
Old Dominion University, Norfolk, Virginia
April 11, 2024
2. Screenshots are commonly used to annotate the social media of others
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL 2
https://twitter.com/BetteMidler/status/1541472225341198338
https://twitter.com/MahyarTousi/status/1534307163073658881 https://twitter.com/urbanachievr/status/1505944201208516612
3. Why screenshots?
To use as an evidence for deleted posts
3
https://web.archive.org/web/20220525125749/https://twitter.com/DanielDefense/status/1526237750277681154
Controversial posts
may be deleted.
https://twitter.com/ashtonpittman/status/1530243294868930560
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://twitter.com/DanielDefense/status/1526237750277681154
Other reasons: To deny cross-platform engagement, to aggregate, to mark-up etc.
4. Did they really post that?
Screenshots can also be used for humor, satire, and disinformation
4
https://twitter.com/Shayan86/status/1515753937139388418
https://twitter.com/paulthacker11/status/1495436489492090881
https://twitter.com/elonmusk/status/1544051155562598401
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
5. Creating fake tweets using Tweetgen
5
https://www.tweetgen.com/
https://www.tweetgen.com/create/tweet.html
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
6. Motivation
➢ Fake tweets can be responsible for misinformation/disinformation spread.
➢ Fake tweets are easy to create using online tools.
➢ There are no tools currently available to evaluate the authenticity of
attribution of screenshots.
6
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
7. Aim
To develop a tool that would automatically provide a probability
whether a screenshot of a social media post is fake using the
services of web archives.
7
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
8. To search for a tweet in the Wayback Machine, you must first
know its URL
8
https://web.archive.org/web/20220323185843/https://twitter.com/annaturley/status/1506706947239817224
URL of the tweet:
https://twitter.com/annaturley/status/1506706947239817224
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://web.archive.org/
9. But, URL of a tweet is not present in most screenshots
9
https://twitter.com/AaronBastani/status/1507391218854117377
@annaturley
March 23, 2022
March 25, 2022
https://twitter.com/TWITTER_HANDLE/status/TWEET_ID
https://web.archive.org/web/20220323185843/https://twitter.com/annaturley/status/1506706947239817224
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
Tweet ID encodes the timestamp of when
the tweet was created
Construction of a tweet URL
- Use the Twitter handle and approximate a time window based
on the timestamp.
- Construct URL for the tweet.
- Search for the tweet in the Wayback Machine using the URL.
10. Process to verify whether content of a screenshot exists in the
Wayback Machine
10
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
11. Creating a dataset of screenshots collected from Twitter
11
Fields
Shared post’s URL Original post’s URL
Category Reason
Content category Structural features
Post type Social media
Search strategy Annotated images
Screenshot Remarks
- Screenshot images shared on Twitter.
- 200 examples
- Examples include both real and fake screenshots
https://ws-dl.blogspot.com/2022/12/2022-12-12-disinformation-spread-on.html
https://twitter.com/rvawonk/status/1503227687917305863
https://twitter.com/RealCandaceO/status/1501576
352587292673
Category: Real
Reason: Found in the live web
Content category: Politics
Post Type: Tweet
Structural features: Single author, single
post
Search strategy: Searched on Twitter
interface
Social media: Twitter
Original post’s URL
Shared post’s URL
Screenshot
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
12. OCRing screenshots: Single tweet images
12
OCR
Optical Character Recognition extracts information as text from digital image.
Example screenshot image OCR extracted output
Twitter Handle
Timestamp
Tweet Text
Zaki, T., Nelson, M.L., and Weigle, M.C. (2023, Jun 14). Extracting Information from Twitter Screenshots. Tech Report arXiv:2306.08236. https://doi.org/10.48550/arXiv.2306.08236
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
13. Computing a time window based on the screenshot timestamp
13
The maximum difference between two time zones on Earth is 26 hours.
Example screenshot image OCR extracted output
Twitter handle and computed timestamps
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
14. Using CDX API to retrieve archived tweets within the time window
14
request = "http://web.archive.org/cdx/search/cdx?url=" + urir + params
urir = "https://twitter.com/"+randyhillier+"/status"
params =
"&matchType=prefix&from="+20220218154100+"&to="+20220220174100
CDX API prefix search process
Twitter handle and computed timestamps
Output: Retrieved archived tweets within the timeframe (cropped).
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://archive.org/help/wayback_api.php
15. Extracting tweet IDs and determining tweet creation
timestamp using TweetedAt
15
https://web.archive.org/web/20220218163926/https://twitter.com/randyhillier/status/1006984708109099008
https://ws-dl.blogspot.com/2019/08/2019-08-03-tweetedat-finding-tweet.html
Each tweet ID encodes its
creation timestamp
An archived tweet’s URL
https://oduwsdl.github.io/tweetedat/#1006984708109099008
Tweet ID Tweet Creation Date
1006984708109099008 20180613194037
………… …………..
Mapping between all the tweet IDs and
tweet creation timestamps
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
16. Determining the final set of archived tweets by filtering the
tweet creation timestamps within the time window
16
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://web.archive.org/web/20220218163926/https://twitter.com/randyhillier/status/1006984708109099008
An archived tweet’s URL
Timestamp when the tweet was archived
Tweet ID encoding the tweet creation timestamp:
20180613194037
The archived timestamp of the tweet falls within the timeframe, but the tweet creation
timestamp does not fall within the timeframe.
So, such archived tweets can be filtered out.
17. Extracting tweet text from archived tweets using
BeautifulSoup and Selenium
17
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://web.archive.org/web/20220220024223/https://twitter.com/randyhillier/status/1495226962058649603
TweetTextSize TweetTextSize--jumbo js-tweet-text tweet-text
An archived tweet’s URL
Extracted text from archived tweet
HTML tag containing
the tweet text
https://www.selenium.dev/
https://pypi.org/project/beautifulsoup4/
Selenium automates web scraping and BeautifulSoup parses text from HTML.
18. Computing text similarity score between tweet text from
screenshot and archived tweets using Python’s difflib library
18
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://docs.python.org/3/library/difflib.html
Example screenshot image Extracted text from archived tweet Extracted tweet text from screenshot
match_score(Archived_Tweet_Text, Screenshot_Tweet_Text)= 81.40%
Text similarity score is computed based on longest common subsequence
Archived_Tweet_Text1 Screenshot_Tweet_Text match _score = 81.40%
Archived_Tweet_Text2 Screenshot_Tweet_Text match_score = 30.78%
Archived_Tweet_Text3 Screenshot_Tweet_Text match_score = 5.67%
……………..
A match score of 81.40% helps us to prove the existence of the screenshot tweet posted by the alleged
author.
19. A threshold of 60% produced the highest F1 (0.69)
19
Threshold Value Precision Recall F1 Score
90% 1.00 0.42 0.59
80% 1.00 0.49 0.66
70% 1.00 0.51 0.67
60% 1.00 0.53 0.69
Experimented on 108 single tweet images from the collected dataset.
Performance of the overlap between the tweet text from the
screenshot and the archived tweets.
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
20. Summary
20
➢ Screenshots are an easy way to share content on social media.
➢ Since screenshots can be easily faked, it is a critical task to detect a fabricated post.
➢ Services of web archives could be useful to verify attribution of a screenshot by finding
an archived version of the screenshot content.
➢ Our research will mitigate misinformation and disinformation spread on social media.
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL