Web@rchive Austria (Archiving Online Media)

932 views
879 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
932
On SlideShare
0
From Embeds
0
Number of Embeds
45
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Web@rchive Austria (Archiving Online Media)

  1. 1. Web@rchive AustriaArchiving Online Media Michaela Mayr Michaela.Mayr@onb.ac.at Austrian National Library webarchiv.onb.ac.atUniversity of Liechtenstein, 03.04.13 1
  2. 2. 2http://www.yahoo.com/, 17.10.1996, Source: www.archive.org
  3. 3. 3http://www.google.com/, 11.11.1998, Source: www.archive.org
  4. 4. 4http://www.youtube.com/, 25.06.2005, Source: www.archive.org
  5. 5. 5http://www.flickr.com/, 29.04.2004, Source: www.archive.org
  6. 6. 6http://www.cnn.com/, 17.08.2000, Source: www.archive.org
  7. 7. 7http://www.apple.com/, 15.07.1997, Source: www.archive.org
  8. 8. Internet Archive www.archive.org Non-Profit Organization USA, founded 1996 • > 10 Petabytes in total • + 20 Terabytes per month • 280 Billion pages • Public online archive 8
  9. 9. Internet Archive uni.li 9http://www.uni.li/, Source: www.archive.org
  10. 10. 10http://www.uni.li/, 04.02.2011, Source: www.archive.org
  11. 11. International InternetPreservation Consortium IIPCwww.netpreserve.org• Founded 2003 by 12 national libraries + Internet Archive• 44 members (Austria since 2008)• Working groups + projects 11
  12. 12. Our Project History 2013: 2010: Online Access Public (search, m 2009: Access etadata) Legal 2008: Deposit for born Start digital Web- media 2001: archiving Pilot project project TU Vienna 12Pilot project: http://www.ifs.tuwien.ac.at/~aola/beschreibung.html
  13. 13. Access• Only on site at selected libraries, not online (special terminals)• No electronic processing, just printing• Single concurrent user principle for password protected content • Authorized libraries – Federal Chancellery, Parliament – Austrian State Archives – State- and University libraries 13
  14. 14. Web@rchive Austria • Team: 2 FTE, Digital Library: – Project manager / Curator – Developer / Crawl Engineer / System Administrator • Open Source Software (Heritrix, Wayback) • Cooperation NetarchiveSuite with Netarchive.dk and National Library of France • Storage and Back-Up outsourced to Austrian Federal Computing Centre (+ copy St. Johann) 14Grafik: Kurier, http://kurier.at/techno/2004890.php
  15. 15. Collection Strategies (1)• Domain Harvesting – Entire top-level-domain .at (currently approx. 1.2m Domains, source: nic.at) – Other top-level-domains with relation to Austria (no legal definition, manual process) – Every 2 years, currently 3rd domain crawl running 15
  16. 16. Development .at Domain 16Source: www.nic.at
  17. 17. Collection Strategies (2)• Selective Harvesting – Selected important websites that change regularly – Harvesting in appropriate intervals – Content: • Media national and regional, • Government/administration (.gv.at), • Academics/universities (.ac.at) • Society, economy, culture etc. – Ongoing Collections: • 2011 Media • 2012 Austrian Authors • 2013 Politics 17
  18. 18. Collection Strategies (3)• Event Harvesting – Special occasions and events (e.g. elections) – Many websites only exist for the time of the event – Previous event harvestings: • (EUROTM 2008 – soccer championship) • (Parliamentary elections 2008) • EU elections 2009 • Olympic Winter Games 2010 • Presidential elections 2010 • ORF.Futurezone 2010 (technology portal) 18
  19. 19. Data Web@rchive Austria (1)Currentlynearly 90% ofdata fromdomain crawls 19
  20. 20. Data Web@rchive Austria (2)• Physical storage 19 TB• Raw data 32 TB• Number of objects 1.241.650.566 20
  21. 21. File FormatsNumber of Objects Storage 21
  22. 22. Challenges • Time – Short lifecycle of webpages: approx. 44-75 days (source: Library of Congress) – Digital Preservation: Migration, Emulation • Content – Careful Selection – Deep Web – New technologies – Unwanted Content: Parkingsites etc. 22Bildquellen: http://mrg.bz/DJIHP6, http://foldedstory.files.wordpress.com/2012/03/iceberg.jpg
  23. 23. Short lifecycleARCHIVE  LIVE WEB  Russian National Public Library for Science and Technology, 08.04.2011 23
  24. 24. Russian National Public Library for Science and Technology, 08.04.2011 24
  25. 25. DemoRussian National Public Library for Science and Technology, 08.04.2011 25
  26. 26. Nominate a websitehttp://www.onb.ac.at/about/seiten_nominieren.htm 26
  27. 27. webarchivingFurther Information:http://webarchiv.onb.ac.atSocial Media:http://twitter.com/AT_Webarchivehttp://www.facebook.com/ATWebarchivehttp://www.slideshare.net/ATWebarchivehttp://screenr.com/user/AT_Webarchive Digitale Langzeitarchivierung ADV, 19.09.2012 27

×