Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Archiving News on the Web


Published on

Michaela Mayr
IIPC General Assembly 2010
6 May 2010

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Archiving News on the Web

  1. 1. Web@rchive Austria Archiving News on the Web Michaela Mayr Austrian National Library [email_address]
  2. 2. Austrian National Library <ul><li>Based in Vienna </li></ul><ul><li>Dating back to 14 th century </li></ul><ul><li>8 million objects </li></ul><ul><li>Webarchiving since 2008 </li></ul>
  3. 3. Web@rchive Austria (1) <ul><li>Webarchiving project started 2008 </li></ul><ul><li>Legal Deposit for born digital media in force since March 2009 </li></ul><ul><li>Staff 2 FTE, department Digital Library: </li></ul><ul><ul><li>Project manager </li></ul></ul><ul><ul><li>Developer/Crawl engineer </li></ul></ul><ul><ul><li>System administrator </li></ul></ul><ul><li>Storage and back up outsourced to Austrian Federal Computing Centre </li></ul><ul><li>Software </li></ul><ul><ul><li>Crawler Heritrix </li></ul></ul><ul><ul><li>Crawl management with NetarchiveSuite ( </li></ul></ul><ul><ul><li>Access with Wayback Machine </li></ul></ul>
  4. 4. Web@rchive Austria (2) <ul><li>Domain Harvesting : </li></ul><ul><ul><li>930,000 .at domains + content related to Austria </li></ul></ul><ul><ul><li>Every 2 years </li></ul></ul><ul><ul><li>Currently first Austrian domain crawl </li></ul></ul><ul><li>Event Harvesting : </li></ul><ul><ul><li>Mainly sports events (2) and elections (3) </li></ul></ul><ul><ul><li>IIPC collaborations: EU elections, Olympics 2010 </li></ul></ul><ul><li>Selective Harvesting : </li></ul><ul><ul><li>Starting mid 2010 </li></ul></ul><ul><ul><li>Media national and regional </li></ul></ul><ul><ul><li>Society, economy, culture </li></ul></ul><ul><ul><li>Government agencies, public authorities </li></ul></ul><ul><ul><li>Science, research, universities </li></ul></ul><ul><ul><li>New techniques, net art </li></ul></ul>
  5. 5. Access <ul><li>On site at Austrian National Library (special terminals) </li></ul><ul><li>Open for everybody , not only researchers </li></ul><ul><li>+20 other libraries in Austria (National Archives, Parliament State- and University Libraries) </li></ul><ul><li>Access starting May 2010 </li></ul>
  6. 6. News on the Web <ul><li>Online newspapers, TV channel websites </li></ul><ul><li>Highly dynamic, change constantly </li></ul><ul><li>Some password protected (archives) </li></ul>
  7. 7. Multimedia Content <ul><li>Graphics, Videos, Flash, Streaming, Embedded Content </li></ul>
  8. 8. Interactive content <ul><li>Chats, Postings, Ratings </li></ul>
  9. 9. Interactive content <ul><li>Customizing, RSS </li></ul>
  10. 10. Social Media <ul><li>Facebook, Twitter, etc. </li></ul>
  11. 11. Advertisments <ul><li>Content Live Web </li></ul>
  12. 12. Examples <ul><li>Ads, graphics </li></ul>LIVE ARCHIVE
  13. 13. Examples <ul><li>Ads, live ticker </li></ul>LIVE ARCHIVE
  14. 14. Archiving News Changed <ul><li>Speed and amount of information increased </li></ul><ul><li>Type of information changed </li></ul><ul><li>Amalgamation of different content </li></ul><ul><li>Complexity increased </li></ul><ul><li> Monitoring & QA </li></ul>
  15. 15. Thank you for your attention!
  16. 16. Examples <ul><li>Ads </li></ul>LIVE ARCHIVE
  17. 17. Examples <ul><li>Ads </li></ul>LIVE ARCHIVE
  18. 18. Examples <ul><li>Ads </li></ul>LIVE ARCHIVE