Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
How to Face the Challenges    of Web Archiving?   The experiences of a small library on the edge.                  Chloe M...
Context:                National Library of Ireland• Beginnings: Established by the Dublin Science and Museum  Act, 1877• ...
Context:                       Internet MemoryEuropean Archive / Internet Memory Foundation•Established in 2004 in Amsterd...
Web Archiving Project: Project Origins               National Library of IrelandBuilding a 21st Century Library:   –   Bor...
Web Archiving Project: Project Origins            National Library of IrelandBorn Digital Materials:• Natural progression ...
Web Archiving Project: Project Origins           National Library of IrelandThe Hand of History:  – Snap General Election ...
Web Archiving Project: Project Origins        National Library of Ireland             Just do it                  LIBER 20...
Web Archiving Project: Project Origins         National Library of Ireland              Just do it                  How?  ...
Web Archiving Project: Project Origins            National Library of IrelandCollaborative                  Requirements:P...
Web Archiving Project: Project Origins           National Library of IrelandProject phases:  – Project scoping and contrac...
Site Selection and Permissions               National Library of IrelandSelection Criteria:                 Permissions:  ...
Scope of projects              National Library of IrelandGeneral Election:                Presidential Election:  –   Cra...
Crawl                        Internet Memory• Seeds Validation:URLs, Duplication, Redirection, External links, Dynamic web...
Quality Assurance (QA)            National Library of Ireland•   Manual QA•   Jira software•   IM – Technical QA•   NLI - ...
Quality Assurance (QA)              Internet Memory• Why?• How?  • Manual and visual method: homepage + 2  • Resolution of...
Access           National Library of Ireland• Available to the public• Full text search• IM website – search by keyword, U...
Publication and Promotion           National Library of Ireland• NLI social media initiative (Twitter and  blog)• Project ...
Usage Statistics of Web Archive     National Library of Ireland                 21/09/2011: Official launch of NLI Web    ...
Advantages of Web Archiving            National Library of IrelandWeb archiving:  – New opportunities for delivery of mate...
Advantages of Web Archiving           National Library of IrelandPolitical web archives;Irish General Election:  – Researc...
Benefits of Working Together            National Library of IrelandPilot project for a long-term activity:  – Allowed us t...
Benefits of Working Together               Internet Memory• To supporte the development of Web  archiving initiatives• To ...
Conclusion                            View the NLI collections at:General Election:           http://www.nli.ie/en/udlist/...
Questions?                      Thanks for your attention!Catherine Ryan                                                  ...
Upcoming SlideShare
Loading in …5
×

How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

694 views

Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

  1. 1. How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland LIBER 2012 - 1
  2. 2. Context: National Library of Ireland• Beginnings: Established by the Dublin Science and Museum Act, 1877• Mission: “to collect, preserve, promote and make accessible the documentary and intellectual record of the life of Ireland”.• The Digital Record: Born Digital Programme established in 2010, covering web archiving.• Web Archive Projects: 2 pilot projects in 2011 LIBER 2012 - 2
  3. 3. Context: Internet MemoryEuropean Archive / Internet Memory Foundation•Established in 2004 in Amsterdam (offices also in Paris)•Mission: to preserve Web content as a new media for current andfuture generations•Actions: Sensibilization, partnerships, R&D•Open Access Collections: UK National Archives & Parliament,PRONI, CERN and The National Library of IrelandInternet Memory Research•Spin-off of IM established in June 2011 in Paris•Missions: to operate large scale or selective crawls & develop newtechnologies (crawl, access, processing and extraction) LIBER 2012 - 3
  4. 4. Web Archiving Project: Project Origins National Library of IrelandBuilding a 21st Century Library: – Born Digital – Digitisation – Single Integrated Catalogue – Digital Repository – OSCAIL, the Digital Library Programme LIBER 2012 - 4
  5. 5. Web Archiving Project: Project Origins National Library of IrelandBorn Digital Materials:• Natural progression for NLI’s strong political, cultural and historical collections• How best to approach this in time of unprecedented financial difficulty?• Born Digital Programme established to examine requirements and produce a policy document for the next steps LIBER 2012 - 5
  6. 6. Web Archiving Project: Project Origins National Library of IrelandThe Hand of History: – Snap General Election – Five Weeks LIBER 2012 - 6
  7. 7. Web Archiving Project: Project Origins National Library of Ireland Just do it LIBER 2012 - 7
  8. 8. Web Archiving Project: Project Origins National Library of Ireland Just do it How? LIBER 2012 - 8
  9. 9. Web Archiving Project: Project Origins National Library of IrelandCollaborative Requirements:Partnership: – Technical skills in the NLI but working on other projects –Partner that suited our needed these skillsrequirements and that – Leverage NLI’s onhad experience with strong curatorialothers in the cultural experience, esp. insector politics – Fast! LIBER 2012 - 9
  10. 10. Web Archiving Project: Project Origins National Library of IrelandProject phases: – Project scoping and contract – Site selection – Permissions gathering – QA (look and feel) – Publication and promotion LIBER 2012 - 10
  11. 11. Site Selection and Permissions National Library of IrelandSelection Criteria: Permissions: – Website presence – All sites contacted and – Technical reasons provided with a brief – Cut-off date – Pressurised but – necessary phase Women candidates LIBER 2012 - 11
  12. 12. Scope of projects National Library of IrelandGeneral Election: Presidential Election: – Crawl: 200 snapshots – Crawl: 80 snapshots – Scope: 100 seeds – Scope: 70 seeds – Frequency: 2 times – Frequency: 3 times – Date: Feb. 2011 – Date: Oct-Nov. 2011 LIBER 2012 - 12
  13. 13. Crawl Internet Memory• Seeds Validation:URLs, Duplication, Redirection, External links, Dynamic websites• Scope Parameters:Domain, host and path ; Social Web content ; Frequency ; Robots.txt files exclusion ; Politeness• Specific incidents  technical changes on the flyModification of scope ; Pending crawls ; Adaptation of the politeness• Improvement of second crawl LIBER 2012 - 13
  14. 14. Quality Assurance (QA) National Library of Ireland• Manual QA• Jira software• IM – Technical QA• NLI - ‘Look and Feel’ QA• Multiple browsers• Communication with site owners (building relationships and promotion) LIBER 2012 - 14
  15. 15. Quality Assurance (QA) Internet Memory• Why?• How? • Manual and visual method: homepage + 2 • Resolution of issues• Temporal Coherence LIBER 2012 - 15
  16. 16. Access National Library of Ireland• Available to the public• Full text search• IM website – search by keyword, URL• NLI catalogue – keyword via widget developed by NLI IS team and IM• Future – access through NLI’s own interfaces, issue of integrating results LIBER 2012 - 16
  17. 17. Publication and Promotion National Library of Ireland• NLI social media initiative (Twitter and blog)• Project participants• Print media (esp. in area of technology)• And IM!• Usage figures have increased but real value more apparent in 5-10 years LIBER 2012 - 17
  18. 18. Usage Statistics of Web Archive National Library of Ireland 21/09/2011: Official launch of NLI Web archives (Tweets) 26/10/2011: Blog post on nli.ie/blog and Paper in thejournal.ie 25/11/2011: Paper on irishtimes.com 20/01/2012: Paper on irishtimes.com 17/03/2012: Post on soundofthearchives.wordpress.com 04/05/2012: Paper on irisheconomy.ie LIBER 2012 - 18
  19. 19. Advantages of Web Archiving National Library of IrelandWeb archiving: – New opportunities for delivery of materials to users – Work with existing users expectations that content be online – Reach new audiences LIBER 2012 - 19
  20. 20. Advantages of Web Archiving National Library of IrelandPolitical web archives;Irish General Election: – Researchers can compare online content pre- and post-election – Facilitates research into how ‘online’ this election was – Assess impact of technological developments in campaign communications – Record of campaign information LIBER 2012 - 20
  21. 21. Benefits of Working Together National Library of IrelandPilot project for a long-term activity: – Allowed us to enter a new collecting area despite lack of tech expertise – Facilitated collection of important material that one else was collecting – Collect material quickly – Leverage curatorial skills – Gained new technical skills LIBER 2012 - 21
  22. 22. Benefits of Working Together Internet Memory• To supporte the development of Web archiving initiatives• To operate rapid deployment of Web archives• To address new challenges in this area: • Social media content • QA • Automatization LIBER 2012 - 22
  23. 23. Conclusion View the NLI collections at:General Election: http://www.nli.ie/en/udlist/digital-collections.a • 18,495,771 URLs • 1.14 TB • 10,405 ARCs View the Web archive blog entry at: http://www.nli.ie/blog/index.php/2011/10/26/Presidential Election: • 7,333,399 URLs View Internet Memory Collections at: • 278.10 GB http://collections.europarchive.org/ • 2,513 ARCs To be continued… LIBER 2012 - 23
  24. 24. Questions? Thanks for your attention!Catherine Ryan Chloe MartinNational Library of Ireland Internet Memoryhttp://www.nli.ie http://internetmemory.orgcryan@nli.ie chloe@internetmemory.net@NLIreland @InternetMemory LIBER 2012 - 24

×