Successfully reported this slideshow.



Published on

  • Be the first to comment

  • Be the first to like this


  1. 1. Library and Archives Canada’s Web Archiving Program Presentation to the International Internet Preservation Consortium General Assembly Open Session - May 5, 2009 Gillian Cantello A/Director General Published Heritage Branch
  2. 2. Purpose of LAC’s Web Archiving Program <ul><li>To acquire, preserve and make accessible knowledge and information from the Canadian Internet for current and future generations of Canadians </li></ul>
  3. 3. Collection Development Policy for Websites <ul><li>LAC’s website selection guidelines form part of its Digital Collection Development Policy </li></ul><ul><li>Two-pronged approach: </li></ul><ul><ul><li>Individual capture of websites </li></ul></ul><ul><ul><li>Domain or thematic harvests </li></ul></ul>
  4. 4. Websites/Domains in LAC’s Collection <ul><li>Acquired using Heritrix software: </li></ul><ul><ul><li>Government of Canada web domain: 2005-2006, 2007, 2008 </li></ul></ul><ul><ul><li>Provincial/Territorial web domains: 2006, 2008 </li></ul></ul><ul><ul><li>Federal Elections: 2006, 2008 </li></ul></ul><ul><ul><li>Provincial Elections: Alberta, Quebec 2008; Newfoundland, Northwest Territories, Ontario, Saskatchewan 2007, </li></ul></ul><ul><ul><li>Olympic & Paralympic Games: 2006, 2008 </li></ul></ul><ul><li>Acquired using MetaPro software </li></ul><ul><ul><li>Selected individual sites – Added to LAC’s E-Collection </li></ul></ul>
  5. 5. Artists Online Website Accessible in the Electronic Collection and AMICUS
  6. 6. Government of Canada Web Archive Search Interface
  7. 7. WebCan <ul><li>Developed by LAC to allow acquisitions staff to manage all aspects of web harvesting: seed lists, crawls, QA & indexing </li></ul><ul><li>To be released as open source </li></ul>
  8. 8. Vancouver 2010 Olympic and Paralympic Winter Games <ul><li>Will provide an archive of significant websites associated with the Vancouver 2010 Olympic and Paralympic Games </li></ul><ul><li>Based on NLA model for Sydney Olympics </li></ul><ul><li>Partnership with Department of Canadian Heritage </li></ul><ul><li>Two test crawls completed so far </li></ul>
  9. 9. National Research Council Project <ul><li>Data mining research partnership with NRC </li></ul><ul><li>Content from 3 rd crawl of Government of Canada Web Archive will be used for NRC research on bilingual machine translation </li></ul><ul><li>NRC researchers will advise LAC on automated means of enhancing preservation and access of content in the archive. </li></ul>
  10. 10. Plans for 2009-2010 <ul><li>Acquisitions </li></ul><ul><ul><li>Fourth crawl domain </li></ul></ul><ul><ul><li>Third crawl provincial/territorial government web domains </li></ul></ul><ul><ul><li>Thematic/topical websites still to be selected </li></ul></ul><ul><ul><li>Vancouver 2010 Olympics </li></ul></ul><ul><li>Non-GC thematic crawls to be made accessible </li></ul>