SlideShare a Scribd company logo
1 of 27
Web@rchive Austria
Archiving Online Media



                                        Michaela Mayr
                                        Michaela.Mayr@onb.ac.at

                                        Austrian National Library
                                        webarchiv.onb.ac.at



University of Liechtenstein, 03.04.13                               1
2
http://www.yahoo.com/, 17.10.1996, Source: www.archive.org
3
http://www.google.com/, 11.11.1998, Source: www.archive.org
4
http://www.youtube.com/, 25.06.2005, Source: www.archive.org
5
http://www.flickr.com/, 29.04.2004, Source: www.archive.org
6
http://www.cnn.com/, 17.08.2000, Source: www.archive.org
7
http://www.apple.com/, 15.07.1997, Source: www.archive.org
Internet Archive
          www.archive.org
          Non-Profit Organization
          USA, founded 1996

          • > 10 Petabytes in
            total
          • + 20 Terabytes per
            month
          • 280 Billion pages
          • Public online archive
                                8
Internet Archive uni.li




                                              9
http://www.uni.li/, Source: www.archive.org
10
http://www.uni.li/, 04.02.2011, Source: www.archive.org
International Internet
Preservation Consortium IIPC
www.netpreserve.org
• Founded 2003 by 12 national libraries +
  Internet Archive
• 44 members (Austria since 2008)
• Working groups + projects




                                            11
Our Project History

                                                                      2013:
                                                             2010:    Online
                                                                      Access
                                                             Public   (search, m
                                         2009:               Access   etadata)
                                         Legal
                         2008:           Deposit
                                         for born
                         Start           digital
                         Web-            media
            2001:        archiving
            Pilot        project
            project
            TU
            Vienna
                                                                           12
Pilot project: http://www.ifs.tuwien.ac.at/~aola/beschreibung.html
Access
• Only on site at selected libraries,
  not online (special terminals)
• No electronic processing, just printing
• Single concurrent user principle for
  password protected content
                   • Authorized libraries
                      – Federal Chancellery,
                        Parliament
                      – Austrian State Archives
                      – State- and University
                        libraries
                                                  13
Web@rchive Austria
                      • Team: 2 FTE, Digital Library:
                             – Project manager / Curator
                             – Developer / Crawl Engineer / System Administrator
                      • Open Source Software (Heritrix, Wayback)
                      • Cooperation NetarchiveSuite with Netarchive.dk
                        and National Library of France
                      • Storage and Back-Up outsourced to Austrian
                        Federal Computing Centre (+ copy St. Johann)




                                                                               14
Grafik: Kurier, http://kurier.at/techno/2004890.php
Collection Strategies (1)
• Domain Harvesting
 – Entire top-level-domain .at
   (currently approx. 1.2m
   Domains, source: nic.at)
 – Other top-level-domains with
   relation to Austria (no legal
   definition, manual process)
 – Every 2 years, currently 3rd
   domain crawl running

                                   15
Development .at Domain




                                       16
Source: www.nic.at
Collection Strategies (2)
• Selective Harvesting
  – Selected important websites that
    change regularly
  – Harvesting in appropriate intervals
  – Content:
     •   Media national and regional,
     •   Government/administration (.gv.at),
     •   Academics/universities (.ac.at)
     •   Society, economy, culture etc.
  – Ongoing Collections:
     • 2011 Media
     • 2012 Austrian Authors
     • 2013 Politics
                                               17
Collection Strategies (3)
• Event Harvesting
  – Special occasions and events (e.g.
    elections)
  – Many websites only exist for the time
    of the event
  – Previous event harvestings:
    •   (EUROTM 2008 – soccer championship)
    •   (Parliamentary elections 2008)
    •   EU elections 2009
    •   Olympic Winter Games 2010
    •   Presidential elections 2010
    •   ORF.Futurezone 2010 (technology portal)

                                                  18
Data Web@rchive Austria (1)

Currently
nearly 90% of
data from
domain crawls




                           19
Data Web@rchive Austria (2)

• Physical storage 19 TB
• Raw data 32 TB
• Number of objects 1.241.650.566




                                    20
File Formats
Number of Objects   Storage




                              21
Challenges
                      • Time
                            – Short lifecycle of webpages:
                              approx. 44-75 days
                              (source: Library of Congress)
                            – Digital Preservation:
                              Migration, Emulation
                      • Content
                            –    Careful Selection
                            –    Deep Web
                            –    New technologies
                            –    Unwanted Content:
                                 Parkingsites etc.

                                                                                                22
Bildquellen: http://mrg.bz/DJIHP6, http://foldedstory.files.wordpress.com/2012/03/iceberg.jpg
Short lifecycle


ARCHIVE                                                         LIVE WEB 




 Russian National Public Library for Science and Technology, 08.04.2011       23
Russian National Public Library for Science and Technology, 08.04.2011   24
Demo
Russian National Public Library for Science and Technology, 08.04.2011      25
Nominate a website
http://www.onb.ac.at/about/seiten_nominieren.htm




                                               26
webarchiving

Further Information:
http://webarchiv.onb.ac.at

Social Media:
http://twitter.com/AT_Webarchive
http://www.facebook.com/ATWebarchive
http://www.slideshare.net/ATWebarchive
http://screenr.com/user/AT_Webarchive
           Digitale Langzeitarchivierung ADV, 19.09.2012   27

More Related Content

What's hot

Francesca Schulze - Europeana Licensing 062013
Francesca Schulze - Europeana Licensing 062013Francesca Schulze - Europeana Licensing 062013
Francesca Schulze - Europeana Licensing 062013
Europeana Licensing
 

What's hot (20)

Europeana Newspapers -
Europeana Newspapers - Europeana Newspapers -
Europeana Newspapers -
 
The Great Twentieth-Century Hole Or, what the Digital Humanities Miss
The Great Twentieth-Century Hole Or, what the Digital Humanities MissThe Great Twentieth-Century Hole Or, what the Digital Humanities Miss
The Great Twentieth-Century Hole Or, what the Digital Humanities Miss
 
Europeana in a Research Context
Europeana in a Research ContextEuropeana in a Research Context
Europeana in a Research Context
 
DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)
 
Monday 4 May: From linear to non-linear broadcast contents: considering an “...
Monday 4 May: From linear to non-linear broadcast contents:  considering an “...Monday 4 May: From linear to non-linear broadcast contents:  considering an “...
Monday 4 May: From linear to non-linear broadcast contents: considering an “...
 
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
 
20101015 linked openeuropeanafi
20101015 linked openeuropeanafi20101015 linked openeuropeanafi
20101015 linked openeuropeanafi
 
Europeana Research Panel DH Benelux 2017
Europeana Research Panel DH Benelux 2017Europeana Research Panel DH Benelux 2017
Europeana Research Panel DH Benelux 2017
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLW
 
Multilingual challenges and ongoing work to tackle them at Europeana
Multilingual challenges and ongoing work to tackle them at EuropeanaMultilingual challenges and ongoing work to tackle them at Europeana
Multilingual challenges and ongoing work to tackle them at Europeana
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Europeana as a Linked Data (Quality) case
Europeana as a Linked Data (Quality) caseEuropeana as a Linked Data (Quality) case
Europeana as a Linked Data (Quality) case
 
Francesca Schulze - Europeana Licensing 062013
Francesca Schulze - Europeana Licensing 062013Francesca Schulze - Europeana Licensing 062013
Francesca Schulze - Europeana Licensing 062013
 
2017 IIIF Conference - The Vatican - SACHA
2017 IIIF Conference - The Vatican - SACHA2017 IIIF Conference - The Vatican - SACHA
2017 IIIF Conference - The Vatican - SACHA
 
The Danish case: What does the danish web talk about
The Danish case: What does the danish web talk aboutThe Danish case: What does the danish web talk about
The Danish case: What does the danish web talk about
 
Estermann Panel on Authority Files, 3 June 2020
Estermann Panel on Authority Files, 3 June 2020Estermann Panel on Authority Files, 3 June 2020
Estermann Panel on Authority Files, 3 June 2020
 
Entity Management at Europeana - DCMI 2021
Entity Management at Europeana - DCMI 2021Entity Management at Europeana - DCMI 2021
Entity Management at Europeana - DCMI 2021
 
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
 
Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory Workshop
 
Wikidata Introduction, Linked Digital Future Initiative, August 2019
Wikidata Introduction, Linked Digital Future Initiative, August 2019Wikidata Introduction, Linked Digital Future Initiative, August 2019
Wikidata Introduction, Linked Digital Future Initiative, August 2019
 

Viewers also liked

Insights Discover Accreditation Pack(1)
Insights Discover Accreditation Pack(1)Insights Discover Accreditation Pack(1)
Insights Discover Accreditation Pack(1)
Tarek Kotb
 
APU Payroll Transition
APU Payroll TransitionAPU Payroll Transition
APU Payroll Transition
rampageai
 
OUAFCourseCert
OUAFCourseCertOUAFCourseCert
OUAFCourseCert
Dave Tait
 
past the tipping point
past the tipping pointpast the tipping point
past the tipping point
Nav Nair
 
mgm카지노 ''SX797.COM'' 복권다운
mgm카지노 ''SX797.COM'' 복권다운mgm카지노 ''SX797.COM'' 복권다운
mgm카지노 ''SX797.COM'' 복권다운
jdpofgk
 
Instructions For Monday July 14
Instructions For Monday July 14Instructions For Monday July 14
Instructions For Monday July 14
MrLin
 

Viewers also liked (14)

Insights Discover Accreditation Pack(1)
Insights Discover Accreditation Pack(1)Insights Discover Accreditation Pack(1)
Insights Discover Accreditation Pack(1)
 
Designs 01
Designs 01Designs 01
Designs 01
 
RWhite Resume v15.2.
RWhite Resume v15.2.RWhite Resume v15.2.
RWhite Resume v15.2.
 
Hinnerup har lige fået et nyt yndlingsværksted
Hinnerup har lige fået et nyt yndlingsværkstedHinnerup har lige fået et nyt yndlingsværksted
Hinnerup har lige fået et nyt yndlingsværksted
 
APU Payroll Transition
APU Payroll TransitionAPU Payroll Transition
APU Payroll Transition
 
presentacion van gogh
presentacion van goghpresentacion van gogh
presentacion van gogh
 
OUAFCourseCert
OUAFCourseCertOUAFCourseCert
OUAFCourseCert
 
Dale audio visual-20methods_20in_20teaching_1_
Dale audio visual-20methods_20in_20teaching_1_Dale audio visual-20methods_20in_20teaching_1_
Dale audio visual-20methods_20in_20teaching_1_
 
past the tipping point
past the tipping pointpast the tipping point
past the tipping point
 
Fábricas abandonadas
Fábricas abandonadasFábricas abandonadas
Fábricas abandonadas
 
mgm카지노 ''SX797.COM'' 복권다운
mgm카지노 ''SX797.COM'' 복권다운mgm카지노 ''SX797.COM'' 복권다운
mgm카지노 ''SX797.COM'' 복권다운
 
Instructions For Monday July 14
Instructions For Monday July 14Instructions For Monday July 14
Instructions For Monday July 14
 
MeasureWorks - The Waiting Experience
MeasureWorks - The Waiting ExperienceMeasureWorks - The Waiting Experience
MeasureWorks - The Waiting Experience
 
Legure - Dragan Stojanović
Legure - Dragan StojanovićLegure - Dragan Stojanović
Legure - Dragan Stojanović
 

Similar to Web@rchive Austria (Archiving Online Media)

Barcelona oldmapsonline
Barcelona oldmapsonlineBarcelona oldmapsonline
Barcelona oldmapsonline
Petr Pridal
 
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
lljohnston
 
Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01
The European Library
 

Similar to Web@rchive Austria (Archiving Online Media) (20)

Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
 
NECTAR_VRE1
NECTAR_VRE1NECTAR_VRE1
NECTAR_VRE1
 
Introduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientistsIntroduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientists
 
WEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVEWEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVE
 
The digital future of the past and present
The digital future of the past and presentThe digital future of the past and present
The digital future of the past and present
 
Europeana Cloud: The Essential Facts
Europeana Cloud: The Essential FactsEuropeana Cloud: The Essential Facts
Europeana Cloud: The Essential Facts
 
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studioI Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
 
Discovering New Zealand Museums
Discovering New Zealand MuseumsDiscovering New Zealand Museums
Discovering New Zealand Museums
 
greenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrlgreenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrl
 
Can you save the web? Web Archiving!
Can you save the web? Web Archiving!Can you save the web? Web Archiving!
Can you save the web? Web Archiving!
 
Barcelona oldmapsonline
Barcelona oldmapsonlineBarcelona oldmapsonline
Barcelona oldmapsonline
 
The aDORe Federation Architecture
The aDORe Federation ArchitectureThe aDORe Federation Architecture
The aDORe Federation Architecture
 
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
ALIADA Project. AtCult
ALIADA Project. AtCultALIADA Project. AtCult
ALIADA Project. AtCult
 
Promoting Austrian Cultural and Scientific Heritage via EUROPEANA
Promoting Austrian Cultural and Scientific Heritage via EUROPEANAPromoting Austrian Cultural and Scientific Heritage via EUROPEANA
Promoting Austrian Cultural and Scientific Heritage via EUROPEANA
 
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
 
Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive
 
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectiveGIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
 
Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01
 

More from Web@rchive Austria (7)

Web@rchiv Österreich bei "Österreich liest"
Web@rchiv Österreich bei "Österreich liest"Web@rchiv Österreich bei "Österreich liest"
Web@rchiv Österreich bei "Österreich liest"
 
Bedeutung der Webarchivierung am Beispiel von Web@rchiv Österreich
Bedeutung der Webarchivierung am Beispiel von Web@rchiv ÖsterreichBedeutung der Webarchivierung am Beispiel von Web@rchiv Österreich
Bedeutung der Webarchivierung am Beispiel von Web@rchiv Österreich
 
(1) Von der Hofbibliothek zum digitalen Medienzentrum
(1) Von der Hofbibliothek zum digitalen Medienzentrum(1) Von der Hofbibliothek zum digitalen Medienzentrum
(1) Von der Hofbibliothek zum digitalen Medienzentrum
 
(2) Von der Hofbibliothek zum digitalen Medienzentrum
(2) Von der Hofbibliothek zum digitalen Medienzentrum(2) Von der Hofbibliothek zum digitalen Medienzentrum
(2) Von der Hofbibliothek zum digitalen Medienzentrum
 
Österreich liest Vortrag zum Web@rchiv Österreich
Österreich liest Vortrag zum Web@rchiv ÖsterreichÖsterreich liest Vortrag zum Web@rchiv Österreich
Österreich liest Vortrag zum Web@rchiv Österreich
 
Archiving News on the Web
Archiving News on the WebArchiving News on the Web
Archiving News on the Web
 
TU Wien Gastvortrag 07.06.2010, Michaela Mayr
TU Wien Gastvortrag 07.06.2010, Michaela MayrTU Wien Gastvortrag 07.06.2010, Michaela Mayr
TU Wien Gastvortrag 07.06.2010, Michaela Mayr
 

Web@rchive Austria (Archiving Online Media)