Your SlideShare is downloading. ×
0
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Preserving access
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Preserving access

335

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
335
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Preserving access:Making more informed“guesses” about what worksPrepared by: Maxine Davis, Collaboration Research OfficerPresented by: David Pearson, Acting DirectorWeb Archiving & Digital Preservation,National Library of AustraliaIIPC Open Day, San Francisco, 7 October 2009 1
  • 2. Presentation Outline• The problem• Case study: PANDORA Web Archive• Some approaches & options – Approach 1: Unified Digital Format Registry (UDFR) – Approach 2: Wikipedia – Approach 3: Another way documenting what web archives actually use/d 2
  • 3. The problem• The World Wide Web is constantly evolving – Requires combinations of software/hardware to render web content – But what is used for creation and access changes• Web archives – Contain snapshots of websites taken at different times (different sites or same sites multiple times) – Lots of files, many file formats, various versions – Aim for ongoing access 3
  • 4. Process of version “creep”in the archive• Mixed accessibility resulting from: – Different browsers, plug-ins, operating systems in use (then and now) – Backwards compatibility not guaranteed – Changes in standards and coding practices (deprecated, dead & non-standard tags) – Obsolescence of file formats & renderers• Changes to access paths – Incremental loss of access not directly obvious – Alternative access paths not specified 4
  • 5. Case study:PANDORA Australia’s Web Archive (1) • Selective archive began collecting 1996 – Sites individually selected by NLA & partners – As at July 2009 over 70.6 million files – Accessible over the web using standard web browser • .au whole domain harvests – 4 annual harvests 2005-2008 completed, 2009 underway with Internet Archive – Combined harvests 05-08 ~ 2.3 billion files – Not currently publicly available 5
  • 6. Case study:PANDORA Australia’s Web Archive (2) 6
  • 7. IIPC Preservation Working Groupdiscussions• Need for documenting the technical environment• Support required for alternative preservation action strategies – Emulation of past environments – Migration to standard formats – Risk notification – Recording conversion and alternate access paths• Exploring different approaches• Sharing information sensible 7
  • 8. Technical information of interest• Browsers + plug-ins/helper applications versions & dependencies• Used approximately when?• Appropriate for which individual/ type of file format or whole archive? 8
  • 9. Already documented?• Manufacturer/vendor’s websites• Developer’s networks, forums, blogs, etc.• File format registries• File extension resources• Software archives/download sites• Internet history websites• Internet statistics websites• Wikipedia 9
  • 10. Possible Approach 1: UDFR• Digital format registry will result from proposed merger of PRONOM and GDFR• Pros – Considerable intellectual investment already – Could be used for general digital preservation and potential interaction with other tools• Cons – Under development – Web archive requirements need to be specified, use cases developed, changes to data model, population with relevant data and regular updating – Temporal aspect not currently catered for – Entry point Individual file format or software type [could be a pro?] 10
  • 11. Possible Approach 2: Wikipedia (1)• Pros – Existing free, web-based collaborative multilingual project – Draws together a rich set of information • browsers, layout engines, plug-ins & software, statistics, creators, standards, etc. • lists, history, comparisons, timelines, links to internal & external references – Updated by many voluntary contributors 11
  • 12. Possible Approach 2: Wikipedia (2)• Cons – General audience, not specific to web archive requirements or specific web archive – Amount of detail varies (between different language versions, articles) – Can be edited by multiple users (+ & -) – Not designed to interact with other digital preservation tools as UDFR has potential to do 12
  • 13. Extract example 13
  • 14. Possible Approach 3:Documenting what web archivesare using/used• Pros – Time based software suite approach – Starting point for • Potential UDFR seed list • Identifying commonly used software • Inferring additional software requirements • Identifying alternate access paths• Cons – Easier to document current versions – Obscure/obsolete material in our collections may be unknown 14
  • 15. Individual web archives assources of information• Analysis of archive contents & harvesting statistics• Web archivists observations & records – UK Web Archive Technology Watch blog• Website usage statistics – Browser versions & operating systems – Indicative of popularity• Archived sites – Plug-in requirements, file type information – May include useful information websites – Internet Archive complementary collection 15
  • 16. Example: NLA Web archivingsoftware environment July 2009• Operating system: Windows XP• Computer: Windows PC, Intel Pentium 4• Browser: Internet Explorer 7 (main browser), IE8, Firefox 3.0• Additional software: – Adobe Reader 8 – Adobe Shockwave Player – Adobe Flash Player 10 – Real Player 10 – Apple QuickTime 7 – Windows Media Player 11 – Java 6 Update 11 – JavaScript enabled – Word, Excel, PowerPoint 2003 – WinZip 16
  • 17. Example: Earlier NLA Software Environment2005 2000 1996Windows 2000 Windows 95 Windows 3.1/ Windows for WorkgroupsWindows PC Windows PC Windows PCIE6 (since June 2002) Netscape Navigator 4.08 Netscape Navigator 1, 2 or 3?Adobe Acrobat Reader Acrobat Reader Acrobat ReaderMacromedia Shockwave Macromedia Shockwave Macromedia ShockwaveMacromedia Flash player Macromedia Flash ? playerReal Player Real Player Real Audio playerApple QuickTime Apple QuickTime QuickTimeWindows Media Player 9? Windows Media Player Netscape Media Player? 6.4?Java ? Java ? Java?JavaScript enabled JavaScript enabled JavaScript enabledWord, Excel, PowerPoint Word, Excel, PowerPoint Word, Excel, PowerPointWinZip WinZip PKUnzip ? 17
  • 18. Example: Comparison NLA and BnF software environmentsNLA web archivist’s BnF Librarian’s BnF public in-housesoftware 2009 software since 2005 access software 2008Internet Explorer 7 and 8 Internet Explorer Internet ExplorerFirefox 3.0Adobe Reader 8 Acrobat Reader* Adobe ReaderAdobe Shockwave Player Macromedia Flash Adobe Flash playerAdobe Flash Player 10 player* Adobe ShockwaveReal Player 10 Windows Media Player* playerApple QuickTime 7 QuickTime* VLC Media playerWindows Media Player 11 Java Virtual Machine Real playerJava 6 Update 11 (Microsoft)* Word, Excel &JavaScript enabled PowerPoint Viewers Later additions: Java Virtual MachineWord, Excel, PowerPoint Firefox2003 RealOne Player 10WinZip *Software versions progressively updated to latest compatible with Windows XP 18
  • 19. Going forward • Is it worth pursuing approach 3? • If so where would we record (IIPC PWG wiki?, other suggestions)? • Interested in contributing? 19
  • 20. Questions? Contact • David Pearson dapearson@nla.gov.au • Maxine Davis madavis@nla.gov.au Report to IIPC PWG by end October 2009Everything, for Everyone Forever 20

×