THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
1
Web archiving matters
Web archives and digital research
infrastructures
Web archiving matters
Web archives and digital research
infrastructures
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
2
bl.uk — 1997 (the Internet Archive)
1999
2003 2007
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
3
Lessons to be learned?Lessons to be learned?
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
4
The importance of the web is
growing
More and more of our societal, cultural,
political, etc. communication take place
on the web
The importance of the web is
growing
More and more of our societal, cultural,
political, etc. communication take place
on the web
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
5
The web of the past disappears
40% changed, 40% removed, 20% still
there after one year
The web of the past disappears
40% changed, 40% removed, 20% still
there after one year
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
6
If we want to document the
present or study the past on the
web we have to archive it
‘We’ can be a scholar, a group of scholars
or a (trans)national web archive such as
the Internet Archive or BL
If we want to document the
present or study the past on the
web we have to archive it
‘We’ can be a scholar, a group of scholars
or a (trans)national web archive such as
the Internet Archive or BL
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
7
Web archiving matters...
... for anyone who wants to use the web
as a source in any kind of study
Web archiving matters...
... for anyone who wants to use the web
as a source in any kind of study
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
8
The short history of web archives — 12-14 years — three main phasesThe short history of web archives — 12-14 years — three main phases
Therefore web archives have
been established
A short history of web archives — 12-14
years — three main phases
Therefore web archives have
been established
A short history of web archives — 12-14
years — three main phases
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
9
The pre-history of web archives
• begin of the 90ies and onwards
• individuals, families, organizations, institutions...
• html, screendumps
• no considerations about archiving
• no considerations about cultural heritage
The pre-history of web archives
• begin of the 90ies and onwards
• individuals, families, organizations, institutions...
• html, screendumps
• no considerations about archiving
• no considerations about cultural heritage
Static web publications in national libraries
• app. same period
• national libraries
• static web documents, look like journals and books
• overall approach that of print culture (catalogueing...)
• more professional
• legal deposit laws
Static web publications in national libraries
• app. same period
• national libraries
• static web documents, look like journals and books
• overall approach that of print culture (catalogueing...)
• more professional
• legal deposit laws
The dynamic web in (trans)national web archives
• a little later
• crawlers, spin-off of search engine technology
• the number of archiving initiatives increases
• dynamic web material
• librarian approach challenged
• other transnational stakeholders
The dynamic web in (trans)national web archives
• a little later
• crawlers, spin-off of search engine technology
• the number of archiving initiatives increases
• dynamic web material
• librarian approach challenged
• other transnational stakeholders
Examples
• The Internet Archive, 1996
• Kulturarw3, Sweden, 1996/97
• Pandora, Australia, 1996
• Netarkivet, Denmark, 2005
• UK Government Web Archive, 1997, UK Web Archive, 2005
Examples
• The Internet Archive, 1996
• Kulturarw3, Sweden, 1996/97
• Pandora, Australia, 1996
• Netarkivet, Denmark, 2005
• UK Government Web Archive, 1997, UK Web Archive, 2005
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
10
Web archives and scholars?Web archives and scholars?
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
11
Archived web may differ from
what was once online
No matter how an archived web
document has been created, and no
matter in what archive it is found, one
expect it to be an identical copy on a 1:1
scale of what was actually on the live web
at a given time
Archived web may differ from
what was once online
No matter how an archived web
document has been created, and no
matter in what archive it is found, one
expect it to be an identical copy on a 1:1
scale of what was actually on the live web
at a given time
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
12
The short history of web archives — 12-14 years — three main phasesThe short history of web archives — 12-14 years — three main phases
Consequence?
Web archives make decisions with great
impact on the scholarly use of the archive
Business as usual, but more extreme:
choices are more complex, the
consequences are not always known,
documentation is scarce, there is no
baseline, no original
Consequence?
Web archives make decisions with great
impact on the scholarly use of the archive
Business as usual, but more extreme:
choices are more complex, the
consequences are not always known,
documentation is scarce, there is no
baseline, no original
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
13
The short history of web archives — 12-14 years — three main phasesThe short history of web archives — 12-14 years — three main phases
With a view to minimising the
problem co-operations are
needed
With a view to minimising the
problem co-operations are
needed
ScholarScholarBLBL
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
14
The short history of web archives — 12-14 years — three main phasesThe short history of web archives — 12-14 years — three main phases
Co-operation on a transnational
scale
RESAW — a REsearch infrastructure for
the Study of Archived Web materials
Co-operation on a transnational
scale
RESAW — a REsearch infrastructure for
the Study of Archived Web materials
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
15
IIPCIIPC
existexist
to be established
FrFr
......
DKDK
UKUKBLBL
BnFBnF
KBKB
NANA
PW
A
PW
A
INAINA
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
16
October 2012: proposal for an EU ’Topic proposal for integrating and
opening existing national research infrastructures’ — linking FP7-
initiatives to Horizon 2020.
40 web archiving institutions, research groups, and scholars
March 2013, evaluation: on the list of projects of 'High potential' and
with 'merit for future Horizon 2020 Research Infrastructure Actions…'.
Aiming at handing in an application to the Horizon2020 research
infrastructure programme.
October 2012: proposal for an EU ’Topic proposal for integrating and
opening existing national research infrastructures’ — linking FP7-
initiatives to Horizon 2020.
40 web archiving institutions, research groups, and scholars
March 2013, evaluation: on the list of projects of 'High potential' and
with 'merit for future Horizon 2020 Research Infrastructure Actions…'.
Aiming at handing in an application to the Horizon2020 research
infrastructure programme.
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
17
Research driven: based on research questions and projects, primarily
within the humanities and the social sciences
Foster cooperation and networking between web archiving institutions
and research communities (Europe and global)
Develop digital analytical tools to be used in web archives
Build the relevant skills for development and use of software supported
methods for studying internet materials across different national web
archives (technology, legal issues...)
Research driven: based on research questions and projects, primarily
within the humanities and the social sciences
Foster cooperation and networking between web archiving institutions
and research communities (Europe and global)
Develop digital analytical tools to be used in web archives
Build the relevant skills for development and use of software supported
methods for studying internet materials across different national web
archives (technology, legal issues...)
THE CENTRE FOR INTERNET STUDIES
Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab
Web archiving matters, 26 September 2013
18
Build strategies for archiving relevant web materials which are not taken
care of within existing institutional frameworks (e.g. .eu and other non-
national domains .net, .info, .biz, .mobi, etc.).
Validate the quality of the various archives due to the different
principles and combinations of methods used in building the archives
Initiate investigations as to how other internet activities (e.g. email,
apps to smart phones and tablets, facebook data etc.) can be integrated
into general internet & web archives
Build strategies for archiving relevant web materials which are not taken
care of within existing institutional frameworks (e.g. .eu and other non-
national domains .net, .info, .biz, .mobi, etc.).
Validate the quality of the various archives due to the different
principles and combinations of methods used in building the archives
Initiate investigations as to how other internet activities (e.g. email,
apps to smart phones and tablets, facebook data etc.) can be integrated
into general internet & web archives

Niels Brügger's slides from Digital Conversations event on 26/09/2013

  • 1.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 1 Web archiving matters Web archives and digital research infrastructures Web archiving matters Web archives and digital research infrastructures
  • 2.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 2 bl.uk — 1997 (the Internet Archive) 1999 2003 2007
  • 3.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 3 Lessons to be learned?Lessons to be learned?
  • 4.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 4 The importance of the web is growing More and more of our societal, cultural, political, etc. communication take place on the web The importance of the web is growing More and more of our societal, cultural, political, etc. communication take place on the web
  • 5.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 5 The web of the past disappears 40% changed, 40% removed, 20% still there after one year The web of the past disappears 40% changed, 40% removed, 20% still there after one year
  • 6.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 6 If we want to document the present or study the past on the web we have to archive it ‘We’ can be a scholar, a group of scholars or a (trans)national web archive such as the Internet Archive or BL If we want to document the present or study the past on the web we have to archive it ‘We’ can be a scholar, a group of scholars or a (trans)national web archive such as the Internet Archive or BL
  • 7.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 7 Web archiving matters... ... for anyone who wants to use the web as a source in any kind of study Web archiving matters... ... for anyone who wants to use the web as a source in any kind of study
  • 8.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 8 The short history of web archives — 12-14 years — three main phasesThe short history of web archives — 12-14 years — three main phases Therefore web archives have been established A short history of web archives — 12-14 years — three main phases Therefore web archives have been established A short history of web archives — 12-14 years — three main phases
  • 9.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 9 The pre-history of web archives • begin of the 90ies and onwards • individuals, families, organizations, institutions... • html, screendumps • no considerations about archiving • no considerations about cultural heritage The pre-history of web archives • begin of the 90ies and onwards • individuals, families, organizations, institutions... • html, screendumps • no considerations about archiving • no considerations about cultural heritage Static web publications in national libraries • app. same period • national libraries • static web documents, look like journals and books • overall approach that of print culture (catalogueing...) • more professional • legal deposit laws Static web publications in national libraries • app. same period • national libraries • static web documents, look like journals and books • overall approach that of print culture (catalogueing...) • more professional • legal deposit laws The dynamic web in (trans)national web archives • a little later • crawlers, spin-off of search engine technology • the number of archiving initiatives increases • dynamic web material • librarian approach challenged • other transnational stakeholders The dynamic web in (trans)national web archives • a little later • crawlers, spin-off of search engine technology • the number of archiving initiatives increases • dynamic web material • librarian approach challenged • other transnational stakeholders Examples • The Internet Archive, 1996 • Kulturarw3, Sweden, 1996/97 • Pandora, Australia, 1996 • Netarkivet, Denmark, 2005 • UK Government Web Archive, 1997, UK Web Archive, 2005 Examples • The Internet Archive, 1996 • Kulturarw3, Sweden, 1996/97 • Pandora, Australia, 1996 • Netarkivet, Denmark, 2005 • UK Government Web Archive, 1997, UK Web Archive, 2005
  • 10.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 10 Web archives and scholars?Web archives and scholars?
  • 11.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 11 Archived web may differ from what was once online No matter how an archived web document has been created, and no matter in what archive it is found, one expect it to be an identical copy on a 1:1 scale of what was actually on the live web at a given time Archived web may differ from what was once online No matter how an archived web document has been created, and no matter in what archive it is found, one expect it to be an identical copy on a 1:1 scale of what was actually on the live web at a given time
  • 12.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 12 The short history of web archives — 12-14 years — three main phasesThe short history of web archives — 12-14 years — three main phases Consequence? Web archives make decisions with great impact on the scholarly use of the archive Business as usual, but more extreme: choices are more complex, the consequences are not always known, documentation is scarce, there is no baseline, no original Consequence? Web archives make decisions with great impact on the scholarly use of the archive Business as usual, but more extreme: choices are more complex, the consequences are not always known, documentation is scarce, there is no baseline, no original
  • 13.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 13 The short history of web archives — 12-14 years — three main phasesThe short history of web archives — 12-14 years — three main phases With a view to minimising the problem co-operations are needed With a view to minimising the problem co-operations are needed ScholarScholarBLBL
  • 14.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 14 The short history of web archives — 12-14 years — three main phasesThe short history of web archives — 12-14 years — three main phases Co-operation on a transnational scale RESAW — a REsearch infrastructure for the Study of Archived Web materials Co-operation on a transnational scale RESAW — a REsearch infrastructure for the Study of Archived Web materials
  • 15.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 15 IIPCIIPC existexist to be established FrFr ...... DKDK UKUKBLBL BnFBnF KBKB NANA PW A PW A INAINA
  • 16.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 16 October 2012: proposal for an EU ’Topic proposal for integrating and opening existing national research infrastructures’ — linking FP7- initiatives to Horizon 2020. 40 web archiving institutions, research groups, and scholars March 2013, evaluation: on the list of projects of 'High potential' and with 'merit for future Horizon 2020 Research Infrastructure Actions…'. Aiming at handing in an application to the Horizon2020 research infrastructure programme. October 2012: proposal for an EU ’Topic proposal for integrating and opening existing national research infrastructures’ — linking FP7- initiatives to Horizon 2020. 40 web archiving institutions, research groups, and scholars March 2013, evaluation: on the list of projects of 'High potential' and with 'merit for future Horizon 2020 Research Infrastructure Actions…'. Aiming at handing in an application to the Horizon2020 research infrastructure programme.
  • 17.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 17 Research driven: based on research questions and projects, primarily within the humanities and the social sciences Foster cooperation and networking between web archiving institutions and research communities (Europe and global) Develop digital analytical tools to be used in web archives Build the relevant skills for development and use of software supported methods for studying internet materials across different national web archives (technology, legal issues...) Research driven: based on research questions and projects, primarily within the humanities and the social sciences Foster cooperation and networking between web archiving institutions and research communities (Europe and global) Develop digital analytical tools to be used in web archives Build the relevant skills for development and use of software supported methods for studying internet materials across different national web archives (technology, legal issues...)
  • 18.
    THE CENTRE FORINTERNET STUDIES Niels Brügger, Director, the Centre for Internet Studies & co-director NetLab Web archiving matters, 26 September 2013 18 Build strategies for archiving relevant web materials which are not taken care of within existing institutional frameworks (e.g. .eu and other non- national domains .net, .info, .biz, .mobi, etc.). Validate the quality of the various archives due to the different principles and combinations of methods used in building the archives Initiate investigations as to how other internet activities (e.g. email, apps to smart phones and tablets, facebook data etc.) can be integrated into general internet & web archives Build strategies for archiving relevant web materials which are not taken care of within existing institutional frameworks (e.g. .eu and other non- national domains .net, .info, .biz, .mobi, etc.). Validate the quality of the various archives due to the different principles and combinations of methods used in building the archives Initiate investigations as to how other internet activities (e.g. email, apps to smart phones and tablets, facebook data etc.) can be integrated into general internet & web archives