Preserving Online Information: towards a Belgian Strategy
(PROMISE)
Studiedag ‘Het web gearchiveerd?’ – 11/10/2018
Peter Mechant (imec-mict-UGent)
1988
creation of
the .be
domain
1994
129
registered
.be
2012
creation of
.vlaanderen
and .brussels
2018
1,6 million
.be 6500 .vl
4500 .br
identify
best
practices in
the field of
web
archiving
develop a
Belgian web
archiving
strategy
pilot
archiving
Belgian web
& providing
access to
the
collections
make
recommen-
dations for
implementing
a sustainable
web archiving
service
identify
best
practices in
the field of
web
archiving
develop a
Belgian web
archiving
strategy
pilot
archiving
Belgian web
& providing
access to
the
collections
make
recommen-
dations for
implementing
a sustainable
web archiving
service
identify
best
practices in
the field of
web
archiving
develop a
Belgian web
archiving
strategy
pilot
archiving
Belgian web
& providing
access to
the
collections
make
recommen-
dations for
implementing
a sustainable
web archiving
service
identify
best
practices in
the field of
web
archiving
develop a
Belgian web
archiving
strategy
pilot
archiving
Belgian web
& providing
access to
the
collections
make
recommen-
dations for
implementing
a sustainable
web archiving
service
identify
best
practices in
the field of
web
archiving
develop a
Belgian web
archiving
strategy
pilot
archiving
Belgian web
& providing
access to
the
collections
make
recommen-
dations for
implementing
a sustainable
web archiving
service
BROAD CRAWLS
(superficial capture)
1. The national
domain (top-level
domain crawls)
2. Other websites that
are considered
interesting
SELECTIVE CRAWLS
(complete capture)
1. Themes
2. Events
3. Emergencies
SELECTIVE
CRAWLS
(complete capture)
Themes, events, ...
Country Institution Broad crawls Selective: thematic Selective: events Selective: other
Netherlands Nat. Library No Yes No No
France Nat. Library Yes
(top-layers only)
Yes Yes Yes
(emergencies)
UK British Library Yes
(non-print legal
deposit web)
Yes
(open UK web
archive)
Yes
(open UK web
archive)
Yes
(emergencies- open UK
web archive)
Luxembourg Nat. Library Yes No Yes No
Denmark Royal Danish
Library
Yes Yes Yes Yes
(emergencies, research
projects, videos)
Portugal Arquivo.pt Yes No Yes No
Ireland Nat. Library Yes Yes Yes No
Canada Libr. & Arch.
Canada
No (in preparation) Yes Yes Yes
(emergencies, risk of
disappearing)
Canada Nat. Libr. & Arch.
Quebec
No Yes Yes No
Switzerland Nat. Libr. No Yes Yes No
Selection policy: social media
Facebook Twitter Youtube Instagram Flickr
France (Nat. Libr.) Not anymore Yes No No No
Denmark (Roy.
Libr.)
Yes Yes Yes Yes No
Luxembourg (Nat.
Libr.)
Yes Yes Yes Yes No
UK (British Library) Yes Yes No No No
Ireland (Nat. Libr.) No Yes Yes No Starting 2018
UK (Nat. Arch.) No Yes Yes No No
Library and
Archives (Canada)
Yes Yes Yes Yes No
“A common feature of most web archiving backed by legal
deposit legislation is some sort of restrictions on the access
afforded to the end user of the archive” (Webster, 2017: p. 180).
Webster, P. (2017). Users, technologies, organisations: Towards a cultural history of world web archiving. In N.
Brügger, N. (Ed.), Web 25. Histories from 25 years of the world wide web, (pp.175-190). New York: Peter Lang.
“A common feature of most web archiving backed by legal
deposit legislation is some sort of restrictions on the access
afforded to the end user of the archive” (Webster, 2017: p. 180).
Types of access:
ü Open and freely accessible online + physical access on
location:
à For everyone
à For certain profiles and/or for certain content only
ü 3. Only physical access on location
ü 4. No access
Country Institution Access method Who has access?
Open & freely
accessible online
Physical access on
location
The
Netherlands
National Library No Yes Everyone with a paid library card. Big data researchers can gain access after
a meeting and having signed a contract.
The
Netherlands
National Archive Yes (for websites
with an ‘open’
status)
Yes (for websites
with a ‘restricted’
or ‘offline’ status)
‘Open' & ‘offline’ status websites: everybody. Some items are ‘restricted’,
which means you need a special permission (a research proposal is required
to obtain this permission or proof that the subject of the archived content is
dead). Together with the special permission a signed form is needed stating
you understand your own responsibilities under the privacy-law.
France
National Library No Yes (but also from
within the 26
partner libraries)
Authorized users of the BnF (18 years or older and for university studies,
professional or personal research. For the latter two categories, interviews
are conducted before accreditation is given.)
Luxembourg
National Library No No No public system yet.
UK
British Library Yes (for the UK
web archive)
Yes (for the legal
deposit UK web
archive and JISC
domain dataset)
Everyone with a reader’s pass.
UK National Archives Yes No Everyone
Denmark
Royal Danish Library Yes (only for
researchers
conducting
research on a Ph.D-
level or above)
Yes (only for
researchers)
Only for research purposes after filling an application form that needs to be
evaluated.
Portugal
Foundation for
Science & Technology
Yes No Everyone
Ireland National Library Yes No Everyone
Country Institution Search options
URL Full-text Topical
browsing
Alphabetic
browsing
The Netherlands National
Library
Yes No No No
The Netherlands National
Archive
No No No No
France National
Library
Yes Yes Yes No
Luxembourg National
Library
closed for public closed for
public
closed for
public
closed for public
UK British
Library
Yes Yes Yes No
UK National
Archives
Yes Yes No Yes
Denmark Royal Danish
Library
Yes Yes No No
Portugal Foundation
for Science
and
Technology
Yes Yes No No
Ireland National
Library
Yes Yes No Yes
Vlassenroot, E., Chambers, S., Di Pretoro, E., Geeraert,
F., Haesendonck, G., Michel, A., Mechant, P. (2019).
Web Archives As a Data Resource for Digital Scholars.
International Journal of Digital Humanities, x(x).
(forthcoming, Spring 2019)
https://goo.gl/2q
Whju
“I use web archives to find historical documents pertaining to works of digital art (e.g.
reviews of shows in arts magazines and newspapers, interviews with artists, etc.).”
“I try to compare the development of library homepages by design, with programming
tools based on different harvested versions from archive.org.”
“I tried to get back lost texts of blog articles after the update of my personal blog by
archive.org.”
“The main reason to use web archives, for me as a genealogist, is tracking family
members.”
“Clearly Google is the norm for the participants.
They expected a search engine to work on full text,
just like Google.”
Ras, M., & Van Bussel, S. (2007). Web archiving user survey. Online at:
https://www.kb.nl/sites/default/files/docs/kb_usersurvey_webarchive_en.pdf.
“Clearly Google is the norm for the participants.
They expected a search engine to work on full text,
just like Google.”
Ras, M., & Van Bussel, S. (2007). Web archiving user survey. Online at:
https://www.kb.nl/sites/default/files/docs/kb_usersurvey_webarchive_en.pdf.
“There is not just one way of using web archives.
Narrow, pre-selective collections will only meet the
requirements of small groups of researchers and
disappoint the most. Large-scale, national collections
with limited access methods will equally fail to meet
scholarly requirements by being in danger of ‘one
size fits nobody’.”
Hockx-Yu, H. (2013). Web Archiving and Scholarly Use of Web Archives. Online at: http://docplayer.net/10376122-Scholarly-use-of-
web-archives.html.
“Access and use (…) remains a perceived area of
need. Likewise, metadata is identified as an area
that would benefit from ongoing knowledge-sharing
around best practices. Social media and quality
assurance continue to be recognized as areas for
which better and more accessible tools are needed.”
Bailey, J., Grotke, A., McCain, E., Moffatt, C., & Taylor, N. (2017). Web Archiving in the United States: A 2016 Survey. National Digital
Stewardship Alliance. Online at: https://ndsa.org/documents/WebArchivingintheUnitedStates_A2016Survey.pdf.
“Access and use (…) remains a perceived area of
need. Likewise, metadata is identified as an area
that would benefit from ongoing knowledge-sharing
around best practices. Social media and quality
assurance continue to be recognized as areas for
which better and more accessible tools are needed.”
Bailey, J., Grotke, A., McCain, E., Moffatt, C., & Taylor, N. (2017). Web Archiving in the United States: A 2016 Survey. National Digital
Stewardship Alliance. Online at: https://ndsa.org/documents/WebArchivingintheUnitedStates_A2016Survey.pdf.
“(…) better discoverability options for the archived
content, data selection and management, as well as
better access to more ways of analysing the data is
needed.”
Costea, M.-D. (2018). Report on the Scholarly Use of Web Archives. NetLab. Online at: http://netlab.dk/wp-
content/uploads/2018/02/Costea_Report_on_the_Scholarly_Use_of_Web_Archives.pdf.
identify
best
practices in
the field of
web
archiving
develop a
Belgian web
archiving
strategy
pilot
archiving
Belgian web
& providing
access to
the
collections
make
recommen-
dations for
implementing
a sustainable
web archiving
service
identify
best
practices in
the field of
web
archiving
develop a
Belgian web
archiving
strategy
pilot
archiving
Belgian web
& providing
access to
the
collections
make
recommen-
dations for
implementing
a sustainable
web archiving
service
identify
best
practices in
the field of
web
archiving
develop a
Belgian web
archiving
strategy
pilot
archiving
Belgian web
& providing
access to
the
collections
make
recommen-
dations for
implementing
a sustainable
web archiving
service
identify
best
practices in
the field of
web
archiving
develop a
Belgian web
archiving
strategy
pilot
archiving
Belgian web
& providing
access to
the
collections
make
recommen-
dations for
implementing
a sustainable
web archiving
service
bedankt!
peter.mechant@ugent.be

3e Studiedag Webarchivering - Promise

  • 1.
    Preserving Online Information:towards a Belgian Strategy (PROMISE) Studiedag ‘Het web gearchiveerd?’ – 11/10/2018 Peter Mechant (imec-mict-UGent)
  • 2.
    1988 creation of the .be domain 1994 129 registered .be 2012 creationof .vlaanderen and .brussels 2018 1,6 million .be 6500 .vl 4500 .br
  • 3.
    identify best practices in the fieldof web archiving develop a Belgian web archiving strategy pilot archiving Belgian web & providing access to the collections make recommen- dations for implementing a sustainable web archiving service
  • 4.
    identify best practices in the fieldof web archiving develop a Belgian web archiving strategy pilot archiving Belgian web & providing access to the collections make recommen- dations for implementing a sustainable web archiving service
  • 5.
    identify best practices in the fieldof web archiving develop a Belgian web archiving strategy pilot archiving Belgian web & providing access to the collections make recommen- dations for implementing a sustainable web archiving service
  • 6.
    identify best practices in the fieldof web archiving develop a Belgian web archiving strategy pilot archiving Belgian web & providing access to the collections make recommen- dations for implementing a sustainable web archiving service
  • 7.
    identify best practices in the fieldof web archiving develop a Belgian web archiving strategy pilot archiving Belgian web & providing access to the collections make recommen- dations for implementing a sustainable web archiving service
  • 15.
    BROAD CRAWLS (superficial capture) 1.The national domain (top-level domain crawls) 2. Other websites that are considered interesting SELECTIVE CRAWLS (complete capture) 1. Themes 2. Events 3. Emergencies
  • 16.
    SELECTIVE CRAWLS (complete capture) Themes, events,... Country Institution Broad crawls Selective: thematic Selective: events Selective: other Netherlands Nat. Library No Yes No No France Nat. Library Yes (top-layers only) Yes Yes Yes (emergencies) UK British Library Yes (non-print legal deposit web) Yes (open UK web archive) Yes (open UK web archive) Yes (emergencies- open UK web archive) Luxembourg Nat. Library Yes No Yes No Denmark Royal Danish Library Yes Yes Yes Yes (emergencies, research projects, videos) Portugal Arquivo.pt Yes No Yes No Ireland Nat. Library Yes Yes Yes No Canada Libr. & Arch. Canada No (in preparation) Yes Yes Yes (emergencies, risk of disappearing) Canada Nat. Libr. & Arch. Quebec No Yes Yes No Switzerland Nat. Libr. No Yes Yes No
  • 17.
    Selection policy: socialmedia Facebook Twitter Youtube Instagram Flickr France (Nat. Libr.) Not anymore Yes No No No Denmark (Roy. Libr.) Yes Yes Yes Yes No Luxembourg (Nat. Libr.) Yes Yes Yes Yes No UK (British Library) Yes Yes No No No Ireland (Nat. Libr.) No Yes Yes No Starting 2018 UK (Nat. Arch.) No Yes Yes No No Library and Archives (Canada) Yes Yes Yes Yes No
  • 20.
    “A common featureof most web archiving backed by legal deposit legislation is some sort of restrictions on the access afforded to the end user of the archive” (Webster, 2017: p. 180). Webster, P. (2017). Users, technologies, organisations: Towards a cultural history of world web archiving. In N. Brügger, N. (Ed.), Web 25. Histories from 25 years of the world wide web, (pp.175-190). New York: Peter Lang.
  • 21.
    “A common featureof most web archiving backed by legal deposit legislation is some sort of restrictions on the access afforded to the end user of the archive” (Webster, 2017: p. 180). Types of access: ü Open and freely accessible online + physical access on location: à For everyone à For certain profiles and/or for certain content only ü 3. Only physical access on location ü 4. No access
  • 22.
    Country Institution Accessmethod Who has access? Open & freely accessible online Physical access on location The Netherlands National Library No Yes Everyone with a paid library card. Big data researchers can gain access after a meeting and having signed a contract. The Netherlands National Archive Yes (for websites with an ‘open’ status) Yes (for websites with a ‘restricted’ or ‘offline’ status) ‘Open' & ‘offline’ status websites: everybody. Some items are ‘restricted’, which means you need a special permission (a research proposal is required to obtain this permission or proof that the subject of the archived content is dead). Together with the special permission a signed form is needed stating you understand your own responsibilities under the privacy-law. France National Library No Yes (but also from within the 26 partner libraries) Authorized users of the BnF (18 years or older and for university studies, professional or personal research. For the latter two categories, interviews are conducted before accreditation is given.) Luxembourg National Library No No No public system yet. UK British Library Yes (for the UK web archive) Yes (for the legal deposit UK web archive and JISC domain dataset) Everyone with a reader’s pass. UK National Archives Yes No Everyone Denmark Royal Danish Library Yes (only for researchers conducting research on a Ph.D- level or above) Yes (only for researchers) Only for research purposes after filling an application form that needs to be evaluated. Portugal Foundation for Science & Technology Yes No Everyone Ireland National Library Yes No Everyone
  • 23.
    Country Institution Searchoptions URL Full-text Topical browsing Alphabetic browsing The Netherlands National Library Yes No No No The Netherlands National Archive No No No No France National Library Yes Yes Yes No Luxembourg National Library closed for public closed for public closed for public closed for public UK British Library Yes Yes Yes No UK National Archives Yes Yes No Yes Denmark Royal Danish Library Yes Yes No No Portugal Foundation for Science and Technology Yes Yes No No Ireland National Library Yes Yes No Yes
  • 28.
    Vlassenroot, E., Chambers,S., Di Pretoro, E., Geeraert, F., Haesendonck, G., Michel, A., Mechant, P. (2019). Web Archives As a Data Resource for Digital Scholars. International Journal of Digital Humanities, x(x). (forthcoming, Spring 2019)
  • 31.
  • 38.
    “I use webarchives to find historical documents pertaining to works of digital art (e.g. reviews of shows in arts magazines and newspapers, interviews with artists, etc.).” “I try to compare the development of library homepages by design, with programming tools based on different harvested versions from archive.org.” “I tried to get back lost texts of blog articles after the update of my personal blog by archive.org.” “The main reason to use web archives, for me as a genealogist, is tracking family members.”
  • 42.
    “Clearly Google isthe norm for the participants. They expected a search engine to work on full text, just like Google.” Ras, M., & Van Bussel, S. (2007). Web archiving user survey. Online at: https://www.kb.nl/sites/default/files/docs/kb_usersurvey_webarchive_en.pdf.
  • 43.
    “Clearly Google isthe norm for the participants. They expected a search engine to work on full text, just like Google.” Ras, M., & Van Bussel, S. (2007). Web archiving user survey. Online at: https://www.kb.nl/sites/default/files/docs/kb_usersurvey_webarchive_en.pdf. “There is not just one way of using web archives. Narrow, pre-selective collections will only meet the requirements of small groups of researchers and disappoint the most. Large-scale, national collections with limited access methods will equally fail to meet scholarly requirements by being in danger of ‘one size fits nobody’.” Hockx-Yu, H. (2013). Web Archiving and Scholarly Use of Web Archives. Online at: http://docplayer.net/10376122-Scholarly-use-of- web-archives.html.
  • 44.
    “Access and use(…) remains a perceived area of need. Likewise, metadata is identified as an area that would benefit from ongoing knowledge-sharing around best practices. Social media and quality assurance continue to be recognized as areas for which better and more accessible tools are needed.” Bailey, J., Grotke, A., McCain, E., Moffatt, C., & Taylor, N. (2017). Web Archiving in the United States: A 2016 Survey. National Digital Stewardship Alliance. Online at: https://ndsa.org/documents/WebArchivingintheUnitedStates_A2016Survey.pdf.
  • 45.
    “Access and use(…) remains a perceived area of need. Likewise, metadata is identified as an area that would benefit from ongoing knowledge-sharing around best practices. Social media and quality assurance continue to be recognized as areas for which better and more accessible tools are needed.” Bailey, J., Grotke, A., McCain, E., Moffatt, C., & Taylor, N. (2017). Web Archiving in the United States: A 2016 Survey. National Digital Stewardship Alliance. Online at: https://ndsa.org/documents/WebArchivingintheUnitedStates_A2016Survey.pdf. “(…) better discoverability options for the archived content, data selection and management, as well as better access to more ways of analysing the data is needed.” Costea, M.-D. (2018). Report on the Scholarly Use of Web Archives. NetLab. Online at: http://netlab.dk/wp- content/uploads/2018/02/Costea_Report_on_the_Scholarly_Use_of_Web_Archives.pdf.
  • 46.
    identify best practices in the fieldof web archiving develop a Belgian web archiving strategy pilot archiving Belgian web & providing access to the collections make recommen- dations for implementing a sustainable web archiving service
  • 47.
    identify best practices in the fieldof web archiving develop a Belgian web archiving strategy pilot archiving Belgian web & providing access to the collections make recommen- dations for implementing a sustainable web archiving service
  • 48.
    identify best practices in the fieldof web archiving develop a Belgian web archiving strategy pilot archiving Belgian web & providing access to the collections make recommen- dations for implementing a sustainable web archiving service
  • 49.
    identify best practices in the fieldof web archiving develop a Belgian web archiving strategy pilot archiving Belgian web & providing access to the collections make recommen- dations for implementing a sustainable web archiving service
  • 50.