Historians and researchers rely on web archives to preserve social media content that no longer exists on the live web. However, what we see on the live web and how it is replayed in the archive are not always the same. In this study, we document and analyze the problems in archiving Twitter after Twitter switched to a new user interface (UI) in June 2020. Most web archives were unable to archive the new UI, resulting in archived Twitter pages displaying Twitter’s “Something went wrong” error. The challenges in archiving the new UI forced web archives to continue using the old UI. But, features such as Twitter labels were a part of the new UI, hence web archives archiving Twitter’s old UI would be missing these labels. To analyze the potential loss of information in web archival data due to this change, we used the personal Twitter account of the 45th President of the United States, @realDonaldTrump, which was suspended by Twitter on January 8, 2021. Trump’s account was heavily labeled by Twitter for spreading misinformation, however we discovered that there is no evidence in web archives to prove that some of his tweets ever had a label assigned to them. We also studied the possibility of temporal violations in archived versions of the new UI, which may result in the replay of pages that never existed on the live web. We also discovered that when some tweets with embedded media are replayed, portions of the rewritten t.co URL, which is meant to be hidden from the end-user, is partially exposed in the replayed page. Our goal is to educate researchers who may use web archives and caution them when drawing conclusions based on archived Twitter pages.
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
Challenges in Replaying Archived Twitter Pages
1. Challenges in Replaying
Archived Twitter Pages
Published in Joint Conference on Digital Libraries (JCDL) 2021
Kritika Garg
Web Science & Digital Libraries Research Group
Department of Computer Science, Old Dominion University
@Kritika_garg @WebSciDL @oducs
Committee Members:
Michael L. Nelson (Advisor), Michele C. Weigle,
Sampath Jayarathna, Jian Wu, Vikas Ganjigunte Ashok
2. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 2
https://doi.org/10.1109/JCDL52503.2021.00028
In 2020, Twitter changed its user Interface.
We examined the challenges web archives faced in
preserving Twitter after the change.
The observations and results provided in this work are
accurate for the time of this study in 2021. Things may
have altered since Twitter ownership shifted in 2022.
https://www.bbc.com/news/technology-63402338
3. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 3
Tweets and accounts on the live web may become unavailable
https://twitter.com/AOC/status/1364623055658635268 https://twitter.com/realDonaldTrump/
4. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 4
Archives allow us to access pages that no longer exist on live web
URI-R: https://twitter.com/AOC/status/1364623055658635268
URI-M: https://web.archive.org/web/20210224170823/https://twitter.com/AOC/status/1364623055658635268
Memento-Datetime: 20210224170823 (datetime of when memento was captured)
Archive banner providing
details of the capture. For
ex, this capture is from
February 24, 2021.
Web archives rehost
the captured page
(memento)
All the embeds and
outlinked pages are
also served from the
web archive.
5. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 5
Archives allow us to replay the past web of suspended accounts
2009
https://web.archive.org/web/20090702030955/https://twitter.com/realDonaldTrum
p
2013
https://web.archive.org/web/20130608234757/https://twitter.com/realDonaldTrump
https://web.archive.org/web/20170702084625/https://twitter.com/realDonaldTrum
p
https://web.archive.org/web/20230407025620/https://twitter.com/realDonaldTrump
2017
2020
Mementos (archived
pages) allow us to
replay the earlier pages
of suspended or
deleted Twitter
accounts from when
they were present on
the live web.
6. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 6
Live web keeps changing, web archives must adjust to keep up
2009
https://web.archive.org/web/20090702030955/https://twitter.com/realDonaldTrum
p
2013
https://web.archive.org/web/20130608234757/https://twitter.com/realDonaldTrump
https://web.archive.org/web/20170702084625/https://twitter.com/realDonaldTrum
p
https://web.archive.org/web/20230407025620/https://twitter.com/realDonaldTrump
2017
(Old UI)
2020
(New UI)
Twitter user interface
(UI) has undergone
various changes.
The web archives were
affected by the change in
2020 due to the vast
structural differences
between the old and new
UI.
7. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 7
Tweets in old UI are embedded in HTML while
new UI requires separate JSON requests to populate content
the root HTML contains only a skeleton, and all page sections
are served dynamically through API JSON responses
New
UI
Old UI
20 tweets and Twitter bio are
embedded in the root HTML
Content populated with
follow-up XHR requests
(https://api.twitter.com/2/timeline/profile/25073877.json?.
.)
8. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 8
Archiving new UI resulted in error or incomplete pages due to
Twitter’s API rate limiting
To archive the new UI, multiple calls
for JSON responses must be issued to
Twitter’s API.
Result: Error or incomplete pages
because of exceeding API rate limit
https://ws-dl.blogspot.com/2020/07/2020-07-15-twitter-was-already.html
9. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 9
Many web archives continued to archive the old UI
by pretending to be a “GoogleBot”
This technique no longer returns old Twitter UI (last observed on April 10, 2023)
10. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 10
Mismatch in what we saw on live web and
how it replayed in the web archive
Missing
Twitter's
Fact-check
warning
Archived Live Web (2020)
Old User Interface
https://twitter.com/peterktodd/status/1325549199350435841
Many web archives had difficulty archiving the new UI, so they pretended to be “googlebot” so they can archive the old UI.
Result: view a page on the live web, archive it & replay it, and they don’t match
https://web.archive.org/web/20200529145339/https://twitter.com/realDonaldTrump/status/1266231100780744704
11. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 11
Crucial data, like Twitter Labels, in new UI were not in old UI
https://twitter.com/realDonaldTrump/status/1313449844413992961
https://twitter.com/realDonaldTrump/status/1265255835124539392
Violated Twitter Rules Labels
(VTR)
Fact-check Labels
No engagements!
Placing a Tweet in violation (controversial content
or behavior) behind a tombstone
https://help.twitter.com/en/rules-and-policies/notices-on-twitter
Labeling a Tweet that may contain disputed or
misleading information
12. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 12
New UI mementos may replay pages that never existed on live web
Aug 18, 2020, 05:52:23 UTC
https://ws-dl.blogspot.com/2020/11/2020-11-04-new-twitter-ui-replaying.html
13. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 13
New UI mementos may replay pages that never existed on live web
71 Missing
Tweets
Aug 18, 2020, 05:52:23 UTC
https://ws-dl.blogspot.com/2020/11/2020-11-04-new-twitter-ui-replaying.html
14. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 14
Archives had difficulty in accurately preserving Twitter in 2020
Historians using web archives for a study of historically significant tweets
made in late 2020 might witness:
1. Mementos displaying the “Something went wrong”
2. Mementos with different UI for the same URI-R
3. Mementos not displaying labels on disputed or controversial tweets
4. Mementos of Twitter account pages missing tweets
15. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 15
Using @realDonaldTrump to study the impact of Twitter UI
change on web archives
2022-11-19
2020-05-01 2021-01-08
No content on live web for ~2 years
as account was suspended
Collected ~8 months of archived data of
@realDonaldTrump to quantify the impact of the change
Suspension of
@realDonaldTrump
https://blog.twitter.com/en_us/topics/company/2020/suspension
https://en.wikipedia.org/wiki/Acquisition_of_Twitter_by_Elon_Musk
Elon Musk brings Donald
Trump back on Twitter
Twitter stopped
supporting its old UI
16. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 16
@realDonaldTrump is well archived
http://web.archive.org/web/20200701000000*/https://twitter.com/realDonaldTrump
https://www.thetrumparchive.com/
https://factba.se/trump/
Internet Archive
The
Trump
Archive
Factbase
Dedicated third party archives
were available for ground truth
17. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 17
Twitter’s account page vs. tweet page
Profile/Account Page Tweet Page
The account page provides details specific to the account's
owner, such as their brief description, following, followers,
and the recent tweets they published or retweeted.
The tweet page displays a single tweet and its
engagement, such as the number of likes,
retweets, and replies to the tweet.
https://twitter.com/realDonaldTrump/status/1347569870578266115
https://twitter.com/realDonaldTrump
18. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 18
~1.3M mementos for 8.7K @realDonaldTrump’s tweets
from 7 web archives
We collected 8.7K @realDonaldTrump’s tweets from the ~8 months of archived data from 7 web archives. We found 64K
mementos of account page and 1.29M mementos for 8.7K tweets.
Start: 2021-05-01
(Twitter stopped supporting its old UI)
End: 2021-01-08
(Trump’s account suspended)
19. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 19
~1.3M mementos for 8.7K @realDonaldTrump’s tweets
from 7 web archives
We collected 8.7K @realDonaldTrump’s tweets from the ~8 months of archived data from 7 web archives. We found 64K
mementos of account page and 1.29M mementos for 8.7K tweets.
Start: 2021-05-01
(Twitter stopped supporting its old UI)
End: 2021-01-08
(Trump’s account suspended)
20. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 20
Old Twitter UI is more prominent in web archives,
93% out of 1.3M mementos were old UI
We separated the mementos into old UI and new UI. The graph shows the distribution of old UI and new UI for account page
mementos and tweet page mementos across each month from May 2020 until Jan 202.
a) Account page mementos b) Tweet page mementos
21. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 21
Collected 476 labeled tweets of @realDonaldTrump:
450 Fact-check and 26 VTR
1. thetrumparchive.com: https://www.thetrumparchive.com/
2. Factba.se: https://factba.se/topic/flagged-tweets
3. Twitterlabels6: https://github.com/oduwsdl/TwitterLabels Number of Tweets (Fact-check, VTR)
22. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 22
Twitter added VTR label to old UI at least by August 26, 2020
https://ws-dl.blogspot.com/2020/12/2020-12-08-twitter-added-labels-on-its.html
1. The red dot shows when each tweet was created.
2. Before August 26, 2020 (dotted line 1), the mementos do not have labels (blue dot).
3. After September 9, 2020 (dotted line 2), we could see the labels in the mementos (green dot).
23. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 23
“Fact-check” label never existed in old Twitter UI
24. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 24
The New UI mementos can be used to
see the labelled tweet.
https://web.archive.org/web/20221122044113/https://twitter.com/realDonaldTrump/status/1265255835124539392
Archived New UI
Fact-check label no longer exist on live web (new UI)
Live Web
No Twitter's
Fact-check label
25. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 25
At least 18% of 6.5K new UI mementos replayed the labels
Fact-check: at least 967 out of 5,994 (16%) new UI mementos were working and displayed the Fact-check label.
VTR: at least 213 out of 559 (38%) new UI mementos were working and displayed the VTR label.
Type of labels Tweets New UI mementos Working mementos Mementos with label
Fact-check 450 5,994 1,615 967
VTR 26 559 272 213
Total 476 6,553 1,887 1180
26. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 26
Analyzed missing tweets in new UI mementos
Memento-Datetime of
the root HTML
Time delta
(Δ)
Memento-Datetime of
the archived JSON
= -
71 Missing
Tweets
-1 day 5 hrs 4 mins
Aug 18, 2020, 05:52:23 UTC
Tweets
http://web.archive.org/web/20200818055223/https://twitter.com/realdonaldtrump
http://web.archive.org/web/20200817004843/
https://api.twitter.com/2/timeline/profile/25073
877.json?..
27. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 27
Analyzed missing tweets in new UI mementos
Memento-Datetime of
the root HTML
Time delta
(Δ)
Memento-Datetime of
the archived JSON
= -
71 Missing
Tweets
-1 day 5 hrs 4 mins
Aug 18, 2020, 05:52:23 UTC
Tweets
http://web.archive.org/web/20200817004843/
https://api.twitter.com/2/timeline/profile/25073
877.json?..
Since within this ~2 days (time delta), Trump tweeted 71 times, this memento is temporally violative. This
phenomenon is referred as Temporal Violation
http://web.archive.org/web/20200818055223/https://twitter.com/realdonaldtrump
28. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 28
Calculated time deltas for 1.8K new UI account page mementos
-1 day 5 hs 4 mins
-1 day 5 hrs 19 mins
-1 day 5 hrs 19 mins
71 Missing
Tweets
-1 day 5 hrs 4 mins
-24 days 21 hrs 29 mins
Aug 18, 2020, 05:52:23 UTC
Bio
Tweets
You might like
What’s happening
Media timeline
Memento-Datetime of
the root HTML
Time delta
(Δ)
Memento-Datetime of
the archived JSON
= -
29. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 29
Temporal spread for new UI account page mementos
We analyzed the maximum and minimum value of the time delta for 1.8K new UI mementos to obtain temporal spread
Tweets
Bio
Media timeline
You might like
What’s happening
30. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 30
49% of 1.8K new UI mementos of @realDonaldTrump were
temporally violative
We looked at number of missing (negative delta) or future (positive delta) tweets in each memento.
The linear relationship shows that as the time delta increases, tweets the memento is off by also increases.
JSON from 6 days
in future -> the
memento is off by
more than 250
tweets
JSON from 4 days
in past -> the
memento is missing
~130 tweets
31. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 31
49% of 1.8K new UI mementos of @realDonaldTrump were
temporally violative
We looked at number of missing (negative delta) or future (positive delta) tweets in each memento.
The linear relationship shows that as the time delta increases, tweets the memento is off by also increases.
Outliers:
Very high activity by
@realDonaldTrump
in small time delta
e.g., 115 tweets in
under 7.7 hours
This relationship only hold for highly active accounts. For accounts with less activity, the time-delta would
have to be higher for temporal violation to be apparent.
32. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 32
Conclusions
● Change in Twitter’s UI in 2020 brought new challenges for web
archives
● Old UI was more prominent (93.3% of 1.3M mementos) than new
UI mementos
● Missing labels in web archives:
○ No “Fact-check” label in old UI
○ VTR was added to old UI at least by August 26, 2020
○ 18% of 6.5K new UI mementos of 476 labeled tweets
replayed the label
● Missing tweets:
○ Temporal violation can occur with components (JSON
response) from either the past or future
○ 49% of 1.8K mementos were temporally violative
Github Repo: https://github.com/oduwsdl/TwitterLabels
33. Challenges in Replaying Archived Twitter Pages | Kritika Garg <@kritika_garg> 33
What’s happening now?
Conclusions
● Change in Twitter’s UI in 2020 brought new challenges for web
archives
● Old UI was more prominent (93.3% of 1.3M mementos) than new
UI mementos
● Missing labels in web archives:
○ No “Fact-check” label in old UI
○ VTR was added to old UI at least by August 26, 2020
○ 18% of 6.5K new UI mementos of 476 labeled tweets
replayed the label
● Missing tweets:
○ Temporal violation can occur with components (JSON
response) from either the past or future
○ 49% of 1.8K mementos were temporally violative
Github Repo: https://github.com/oduwsdl/TwitterLabels
● Twitter no longer provides its old UI for
“Googlebot”
● Web archives are archiving new UI
● Mementos from late 2020 & 2021, contains old
and new Twitter UI
● Fact-check label no longer exist on live web
● VTR label still exist on live web.