Uploaded on


  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Library and Archives Canada’s Web Archiving Program Presentation to the International Internet Preservation Consortium General Assembly Open Session - May 5, 2009 Gillian Cantello A/Director General Published Heritage Branch
  • 2. Purpose of LAC’s Web Archiving Program
    • To acquire, preserve and make accessible knowledge and information from the Canadian Internet for current and future generations of Canadians
  • 3. Collection Development Policy for Websites
    • LAC’s website selection guidelines form part of its Digital Collection Development Policy http://www.collectionscanada.gc.ca/collection/003-200-e.html
    • Two-pronged approach:
      • Individual capture of websites
      • Domain or thematic harvests
  • 4. Websites/Domains in LAC’s Collection
    • Acquired using Heritrix software:
      • Government of Canada web domain: 2005-2006, 2007, 2008
      • Provincial/Territorial web domains: 2006, 2008
      • Federal Elections: 2006, 2008
      • Provincial Elections: Alberta, Quebec 2008; Newfoundland, Northwest Territories, Ontario, Saskatchewan 2007,
      • Olympic & Paralympic Games: 2006, 2008
    • Acquired using MetaPro software
      • Selected individual sites – Added to LAC’s E-Collection
  • 5. Artists Online Website Accessible in the Electronic Collection and AMICUS
  • 6. Government of Canada Web Archive Search Interface
  • 7. WebCan
    • Developed by LAC to allow acquisitions staff to manage all aspects of web harvesting: seed lists, crawls, QA & indexing
    • To be released as open source
  • 8. Vancouver 2010 Olympic and Paralympic Winter Games
    • Will provide an archive of significant websites associated with the Vancouver 2010 Olympic and Paralympic Games
    • Based on NLA model for Sydney Olympics
    • Partnership with Department of Canadian Heritage
    • Two test crawls completed so far
  • 9. National Research Council Project
    • Data mining research partnership with NRC
    • Content from 3 rd crawl of Government of Canada Web Archive will be used for NRC research on bilingual machine translation
    • NRC researchers will advise LAC on automated means of enhancing preservation and access of content in the archive.
  • 10. Plans for 2009-2010
    • Acquisitions
      • Fourth crawl .gc.ca domain
      • Third crawl provincial/territorial government web domains
      • Thematic/topical websites still to be selected
      • Vancouver 2010 Olympics
    • Non-GC thematic crawls to be made accessible