Preserving access

575 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
575
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Preserving access

  1. 1. Preserving access:Making more informed“guesses” about what worksPrepared by: Maxine Davis, Collaboration Research OfficerPresented by: David Pearson, Acting DirectorWeb Archiving & Digital Preservation,National Library of AustraliaIIPC Open Day, San Francisco, 7 October 2009 1
  2. 2. Presentation Outline• The problem• Case study: PANDORA Web Archive• Some approaches & options – Approach 1: Unified Digital Format Registry (UDFR) – Approach 2: Wikipedia – Approach 3: Another way documenting what web archives actually use/d 2
  3. 3. The problem• The World Wide Web is constantly evolving – Requires combinations of software/hardware to render web content – But what is used for creation and access changes• Web archives – Contain snapshots of websites taken at different times (different sites or same sites multiple times) – Lots of files, many file formats, various versions – Aim for ongoing access 3
  4. 4. Process of version “creep”in the archive• Mixed accessibility resulting from: – Different browsers, plug-ins, operating systems in use (then and now) – Backwards compatibility not guaranteed – Changes in standards and coding practices (deprecated, dead & non-standard tags) – Obsolescence of file formats & renderers• Changes to access paths – Incremental loss of access not directly obvious – Alternative access paths not specified 4
  5. 5. Case study:PANDORA Australia’s Web Archive (1) • Selective archive began collecting 1996 – Sites individually selected by NLA & partners – As at July 2009 over 70.6 million files – Accessible over the web using standard web browser • .au whole domain harvests – 4 annual harvests 2005-2008 completed, 2009 underway with Internet Archive – Combined harvests 05-08 ~ 2.3 billion files – Not currently publicly available 5
  6. 6. Case study:PANDORA Australia’s Web Archive (2) 6
  7. 7. IIPC Preservation Working Groupdiscussions• Need for documenting the technical environment• Support required for alternative preservation action strategies – Emulation of past environments – Migration to standard formats – Risk notification – Recording conversion and alternate access paths• Exploring different approaches• Sharing information sensible 7
  8. 8. Technical information of interest• Browsers + plug-ins/helper applications versions & dependencies• Used approximately when?• Appropriate for which individual/ type of file format or whole archive? 8
  9. 9. Already documented?• Manufacturer/vendor’s websites• Developer’s networks, forums, blogs, etc.• File format registries• File extension resources• Software archives/download sites• Internet history websites• Internet statistics websites• Wikipedia 9
  10. 10. Possible Approach 1: UDFR• Digital format registry will result from proposed merger of PRONOM and GDFR• Pros – Considerable intellectual investment already – Could be used for general digital preservation and potential interaction with other tools• Cons – Under development – Web archive requirements need to be specified, use cases developed, changes to data model, population with relevant data and regular updating – Temporal aspect not currently catered for – Entry point Individual file format or software type [could be a pro?] 10
  11. 11. Possible Approach 2: Wikipedia (1)• Pros – Existing free, web-based collaborative multilingual project – Draws together a rich set of information • browsers, layout engines, plug-ins & software, statistics, creators, standards, etc. • lists, history, comparisons, timelines, links to internal & external references – Updated by many voluntary contributors 11
  12. 12. Possible Approach 2: Wikipedia (2)• Cons – General audience, not specific to web archive requirements or specific web archive – Amount of detail varies (between different language versions, articles) – Can be edited by multiple users (+ & -) – Not designed to interact with other digital preservation tools as UDFR has potential to do 12
  13. 13. Extract example 13
  14. 14. Possible Approach 3:Documenting what web archivesare using/used• Pros – Time based software suite approach – Starting point for • Potential UDFR seed list • Identifying commonly used software • Inferring additional software requirements • Identifying alternate access paths• Cons – Easier to document current versions – Obscure/obsolete material in our collections may be unknown 14
  15. 15. Individual web archives assources of information• Analysis of archive contents & harvesting statistics• Web archivists observations & records – UK Web Archive Technology Watch blog• Website usage statistics – Browser versions & operating systems – Indicative of popularity• Archived sites – Plug-in requirements, file type information – May include useful information websites – Internet Archive complementary collection 15
  16. 16. Example: NLA Web archivingsoftware environment July 2009• Operating system: Windows XP• Computer: Windows PC, Intel Pentium 4• Browser: Internet Explorer 7 (main browser), IE8, Firefox 3.0• Additional software: – Adobe Reader 8 – Adobe Shockwave Player – Adobe Flash Player 10 – Real Player 10 – Apple QuickTime 7 – Windows Media Player 11 – Java 6 Update 11 – JavaScript enabled – Word, Excel, PowerPoint 2003 – WinZip 16
  17. 17. Example: Earlier NLA Software Environment2005 2000 1996Windows 2000 Windows 95 Windows 3.1/ Windows for WorkgroupsWindows PC Windows PC Windows PCIE6 (since June 2002) Netscape Navigator 4.08 Netscape Navigator 1, 2 or 3?Adobe Acrobat Reader Acrobat Reader Acrobat ReaderMacromedia Shockwave Macromedia Shockwave Macromedia ShockwaveMacromedia Flash player Macromedia Flash ? playerReal Player Real Player Real Audio playerApple QuickTime Apple QuickTime QuickTimeWindows Media Player 9? Windows Media Player Netscape Media Player? 6.4?Java ? Java ? Java?JavaScript enabled JavaScript enabled JavaScript enabledWord, Excel, PowerPoint Word, Excel, PowerPoint Word, Excel, PowerPointWinZip WinZip PKUnzip ? 17
  18. 18. Example: Comparison NLA and BnF software environmentsNLA web archivist’s BnF Librarian’s BnF public in-housesoftware 2009 software since 2005 access software 2008Internet Explorer 7 and 8 Internet Explorer Internet ExplorerFirefox 3.0Adobe Reader 8 Acrobat Reader* Adobe ReaderAdobe Shockwave Player Macromedia Flash Adobe Flash playerAdobe Flash Player 10 player* Adobe ShockwaveReal Player 10 Windows Media Player* playerApple QuickTime 7 QuickTime* VLC Media playerWindows Media Player 11 Java Virtual Machine Real playerJava 6 Update 11 (Microsoft)* Word, Excel &JavaScript enabled PowerPoint Viewers Later additions: Java Virtual MachineWord, Excel, PowerPoint Firefox2003 RealOne Player 10WinZip *Software versions progressively updated to latest compatible with Windows XP 18
  19. 19. Going forward • Is it worth pursuing approach 3? • If so where would we record (IIPC PWG wiki?, other suggestions)? • Interested in contributing? 19
  20. 20. Questions? Contact • David Pearson dapearson@nla.gov.au • Maxine Davis madavis@nla.gov.au Report to IIPC PWG by end October 2009Everything, for Everyone Forever 20

×