This document summarizes interviews conducted with web archiving practitioners about their experiences curating COVID-19 collections. The interviews covered which content was included in the collections, how long collecting was continued, challenges faced, and how the content was made accessible for research. Regarding content, practitioners focused on their national domains and languages while trying to capture a global phenomenon. Collecting continued until the WHO declared an end to the pandemic. Challenges included technical and resource limitations as well as defining the national scope on social media. Collections were made accessible online or on-site with some restrictions. The document provides insights into early pandemic web archiving practices across multiple countries and organizations.
1. The online archives of COVID-19
An oral history of born-digital collecting practices during the pandemic
WARCnet Closing Conference
Aarhus, 18 October 2022
Friedel Geeraert, Jane Winters, Nicola Bingham, Niels Brügger, Frédéric Clavert, Sophie Gebeil, Federico Nanni,
Caroline Nyvang, Valérie Schafer, Helle Strandgaard Jensen, Karin de Wild
2. Behind the scenes of born-digital COVID-19 collections
WARCnet Closing Conference
Aarhus, 18 October 2022
Friedel Geeraert
3. • In-depth interviews with web archiving practitioners about curating
COVID-19 collections
• Denmark, France, Hungary, Iceland, Luxembourg, Switzerland, the UK,
The Netherlands, the US, Australia, Singapore and the IIPC
collaborative collection
4. ● Which content is included?
○ How do you delimit a global phenomenon on the national level?
● How long is the collecting continued?
● What were the biggest challenges?
● How is the content made accessible for research?
5. Which
content?
How do you archive a global event on a national level?
• Focus on the national domain (.nl, .fr, .dk, …)
• Focus on the national language(s)
• Information about the country
• Information published by national citizens/organisations
• Information published in the country
7. Websites Twitter Facebook Instagram YouTube Tiktok Podcasts Reddit Twitch
Swiss national library
National library of Iceland
National Library of Luxembourg
National Library of France
National Library of Hungary
Royal Danish Library
IIPC collaborative collection
UK Web Archive
Institut national de l’Audiovisuel
(FR)
Royal Library of The Netherlands
Library of Congress
National Library Board (Singapore)
National Library of Australia
Archived Not archived
8.
9. WHEN?
January -
National
Library of
France (topical
news series)
26/01 -Royal
Danish Library
(cartoon)
13/02 - IIPC collaborative
collection
29/02 – Royal Library of The
Netherlands
February
● National Library of
Hungary
● National Library Board
(Singapore)
● National Library of
Australia
Early March - UK Web Archive
06/03 - Swiss National Library
12/03 - Royal Danish Library
13/03 - Institut national de
l’Audiovisuel
16/03 - National Library of France
16/03 - National Library of Luxembourg
24/03 - Library of Congress
Late March - Icelandic national library
JANUARY FEBRUARY MARCH 2020
10. Challenges
• Technical challenges related to capturing certain content
• Financial and human resources limitations
• Defining the national sphere on social media
• Permissions
• Required a sustained effort over several years
• Delimiting the COVID-19 collection since the pandemic touched
upon every aspect of life
• Identifying gaps in collections: overlooked themes / groups
• COVID-denial websites or controversial content
11. Inclusivity
• Limited number of curators leading to subjectivity
• Overlooked themes / groups
• Web archives remain largely unknown
12. Curating within web
archiving team
Internal network of
curators throughout the
institution
External network of
curators
Recommendations from
the public
Swiss national library
National library of Iceland
National Library of Luxembourg
National Library of France
National Library of Hungary
Royal Danish Library
IIPC collaborative collection
UK Web Archive
Institut national de l’Audiovisuel
(FR)
Royal Library of The Netherlands
Library of Congress
National Library Board
(Singapore)
National Library of Australia
Yes No
13. ‘We … launched a call for participation at the end of March and there too
we got a very good response from the media and a few communities that we
would not have otherwise thought of suggested their websites … For
example, the Muslim community and shoura.lu. I hadn't thought of looking
for religious communities. The Muslim community posted online information
and recommendations for its members about services in mosques, religious
holidays, etc. Based on this suggestion, we then looked more closely at other
religious communities.’
Ben Els (Bibliothèque nationale du Luxembourg) interviewed by Valérie Schafer
14. ‘We also collaborated with the Digital Heritage Network in The
Netherlands, together we initiated a call for heritage institutions
to help build a national collection about the coronavirus and its
effects in The Netherlands.’
Peter de Bode (Royal Library of The Netherlands) interviewed by Karin de Wild and Ismini Kyritsis
15. ‘Web archive Switzerland is a collaboration between the Swiss
National Library and 30 Swiss institutions, mostly libraries and
archives.’
Barbara Signori (Swiss National Library) interviewed by Friedel Geeraert
16. ‘The NLA continues to maintain partner arrangements for the
selective component [...] of the larger web archiving program. The
partner organisations include: the state libraries of Victoria, New
South Wales, Queensland, South Australia and Western Australia;
the Library and Archives of the Northern Territory; the Australian
War Memorial and, the National Gallery of Australia.’
Paul Koerbin (National Library of Australia) interviewed by Olga Holownia and Friedel
Geeraert
17. ‘Because there were some subject matter experts that were not on
this team, we did a lot of consultations with experts across the
Library. [...] We did a general analysis after a year or so of collecting
and were able to identify other gaps. Then we’d say: “This month,
or this week, this is the gap area we’re going to focus on, or let’s
focus on these groups”.’
Jennifer Harbster (Library of Congress) interviewed by Olga Holownia and Friedel Geeraert
18. How do you
handle fake
news and
controversial
topics?
• Exclude from collection but keep a record
• Include
• Include but contextualise
19. ‘I was particularly concerned about representativeness,
choosing sites that reflected all viewpoints. That was all
the more important because there were highly
controversial topics like chloroquine …’
David Benoist (National Library of France) interviewed by Sophie Gebeil and Valérie Schafer
20. ‘We’re not collecting them because of their authority; we are
collecting them because they were an example of misinformation.
The solution was to go to one of these big aggregators that listed
them all - NewsGuard - so that is in the archive and so there is
content that represents the misinformation.’
Jennifer Garbster (Library of Congress) interviewed by Olga Holownia and Friedel Geeraert
21. ‘It was quite an extreme right-wing website that had an anti-vaccination
policy [and said] that the coronavirus was a made up pandemic. It looked like
it was quite factual and verified information. …
So the decision was taken to not add this website to the collection, because
there is a risk that somebody might look at that article … [and that] it could
potentially cause danger to health. … We kept a record that the website had
been nominated. We recorded what the content of the website was and why
we decided to not include it in the collection.’
Nicola Bingham (IIPC collection) interviewed by Friedel Geeraert
22. ACCESS
• Collections freely available online
• IIPC collaborative collection
• Icelandic web archive (vefsafn.is)
• National Library of Australia
• Collections partly available online (permission granted), partly on
site
• Library of Congress
• UK Web Archive
• National Library Board of Singapore
• Collections available on site (and in other partner institutions)
• National Library of France + regional libraries
• Institut National de l’Audiovisuel (France) + regional libraries
• Royal Library of The Netherlands
• National Library of Luxembourg
• Swiss National Library
• Royal Danish Library: remote access & data dumps
• Published seed lists / metadata
26. Wearing three hats: the archivist researcher’s perspective on collecting
COVID-19
WARCnet Closing Conference
Aarhus, 18 October 2022
Nicola Bingham
27.
28.
29. What was collected?
● Mainly websites or sections of websites relevant to COVID-19
● Limited social media due to technical difficulties and ethical
concerns
● c. 1000 Twitter accounts of public figures and official agencies
● English language also Welsh and Scottish Gaelic
● Capture frequency is on a site-by-site basis (“one-off”, daily,
weekly, monthly)
● Focus is only on UK due to Legal Deposit Regulations
○ Websites hosted on UK TLDs and published in the UK
○ Extensive news coverage gets global perspective
○ Include diaspora communities e.g. Chinese in the UK
● Collaboration with other archives e.g. IIPC
● When to end collecting?
○ When WHO declares and end to pandemic
○ Continue related collections separately, e.g. Cost-of-Living
Crisis
30. UK Web Archive Covid-19 Collection in use
Datathon undertaken by Working Group 2, January 2021
- Outputs included “Chicken and Egg paper”
- Learning outcomes for the archivist
- The importance of contextual information, such as selection
decisions, where are the gaps and technical limitations, what
couldn’t we archive but wanted to, what can we do with the
data? Legal situation of exporting data.
‘Covid Stories’ Learning resource at British Library
- Creative response to collection items e.g. archived websites,
oral history interviews with NHS workers, broadcast news and
radio
https://cc.au.dk/fileadmin/dac/Projekter/WARCnet/Aasma
n_et_al_Chicken_and_Egg.pdf
31.
32. • 34 IIPC Members contributed nominations
• c.2000 nominations from the public
• Collection scope
• Published information prioritised rather than
social media
• Specify seeds at appropriate section of website
• Deprioritise rich media websites (data budget
concerns)
33. - 13, 855 seeds nominated by IIPC members
- 2,018 seeds nominated by public
- 2,411 = number of crawls run
- 16, 000 = total number of seeds archived
- 72 million = number of documents archived
- 5.8 TB = total data archived
- 180 Top Level Domains
- 70 languages
- 145 countries represented in Collection (see
map)
Link to collection
https://archive-it.org/collections/13529
IIPC CDG Novel Coronavirus
(COVID-19) stats October 2022
34. • IIPC Research Working Group worked with
Bibliothecca Alexandrina to develop services that
offer additional functionality beyond conventional
web archive playback to accommodate research
use cases.
• Solrwayback: Danish Web Archive
• Linkgate: data service, data extraction tool and
visualization front end for scalable temporal graph
visualisation for web archive research. Project
leads: Bibliothecca Alexandrina and National
Library of New Zealand
• AWAC2 (Analysing Web Archives of the COVID
Crisis through the IIPC Novel Coronavirus Dataset)
WARCnet network: Susan Aasman, Niels Brügger,
Frédéric Clavert, Karin De Wild, Sophie Gebeil,
Valérie Schafer, Joshgun Sirajzade
IIPC CDG Novel Coronavirus (COVID-19) researcher use of collections
35. Some highlights
98% of respondents describe themselves as professionals
Library and Archive are the most popular organisation type (82%)
68% are at National organisational level
After the UK (29%), Belgium is the most represented (22%)
Most respondents are Archivists (43%)
27% of respondents have 100-250 members of staff
68% had a COVID-19 special collection of web materials and/or social
media
70% say the initiative came from staff
52% collaborated with partners
72% added descriptive metadata
Scope was determined by subject/theme for 65%
64% have not stopped collecting content
71% collected WARC data
WG2 WARCnet survey
COVID-19 WEB COLLECTIONS
● June-September 2022
● Survey of European GLAM
organisations on Covid-19
collecting
● Analysis and results
forthcoming (2023)
36. Let’s talk about web archiving … three institutions, many possibilities
WARCnet Closing Conference
Aarhus, 18 October 2022
Valérie Schafer
38. •To document the rather “invisible” work, the shadows, hidden infrastructures and agencies
•To document the collections
•To better understand curation of collections, as well as perimeters, geographical coverage,
temporalities, size, teams, inclusiveness, participation…
•To take advantage of the WARCnet project which is gathering researchers and web
archivists, and involve everybody within WG2 (scholars from several countries)
•Already an expertise through the Terrorist Attacks collections + study of governance of Web
Archives + “Do Web archives have politics?” with F. Musiani …
•Special collections are special …
•National specificities
40. …and differences
- scope and perimeters
- experience (and notably in live collections)
- relation to social networks
- formats and methods of crawling (API, IA, …)
- time and organisation of the interview
44. Analysing web archives
- Temporalities to be retrieved
- New content
- Representativeness
- Preparation of a research
- Over-representations and silences
52. •Documentation of hidden infrastructures
•Documentation for future historians
•OH as both a tool and an object of study
(and notably for scalable reading)
53. Reading web archivists’ interviews at a distance
WARCnet Closing Conference
Aarhus, 18 October 2022
Jane Winters (on behalf of Frédéric Clavert)
54.
55.
56.
57.
58.
59.
60.
61. Key
Libraries:
01 Denmark
02 France
03 Hungary
04 IIPC (International)
05 Switzerland
06 UK
07 Iceland
08 Luxembourg
Sections:
01 The reasons for the special
collection
02 The scope of the collection
03 The framing of the collection
04 Accessibility & searchability
05 Partnerships and uses
Analysis by Frédéric Clavert,
C2
DH
66. Web archive histories and scalable reading
WARCnet Closing Conference
Aarhus, 18 October 2022
Helle Strandgaard Jensen
67. The advantages of Scalable Reading
Digital Humantites and the separation of ‘the digital’ and ‘the
humanities’
Staying in conversation with existing research (methods?)
Connecting close and distant reading (using distant reading as context
for individual data points and systematic selection of cases)