SlideShare a Scribd company logo
Preserving access:
Making more informed
“guesses” about what works
Prepared by: Maxine Davis, Collaboration Research Officer
Presented by: David Pearson, Acting Director

Web Archiving & Digital Preservation,
National Library of Australia

IIPC Open Day, San Francisco, 7 October 2009
                                                     1
Presentation Outline

• The problem
• Case study: PANDORA Web Archive
• Some approaches & options
  – Approach 1: Unified Digital Format
    Registry (UDFR)
  – Approach 2: Wikipedia
  – Approach 3: Another way documenting
    what web archives actually use/d



                                          2
The problem
• The World Wide Web is constantly
  evolving
  – Requires combinations of software/hardware
    to render web content
  – But what is used for creation and access
    changes
• Web archives
  – Contain snapshots of websites taken at
    different times (different sites or same sites
    multiple times)
  – Lots of files, many file formats, various
    versions
  – Aim for ongoing access
                                                 3
Process of version “creep”
in the archive
• Mixed accessibility resulting from:
  – Different browsers, plug-ins, operating
    systems in use (then and now)
  – Backwards compatibility not guaranteed
  – Changes in standards and coding practices
    (deprecated, dead & non-standard tags)
  – Obsolescence of file formats & renderers
• Changes to access paths
  – Incremental loss of access not directly
    obvious
  – Alternative access paths not specified
                                              4
Case study:
PANDORA Australia’s Web Archive (1)
 • Selective archive began collecting 1996
   – Sites individually selected by NLA &
     partners
   – As at July 2009 over 70.6 million files
   – Accessible over the web using standard
     web browser
 • .au whole domain harvests
   – 4 annual harvests 2005-2008 completed,
     2009 underway with Internet Archive
   – Combined harvests 05-08 ~ 2.3 billion files
   – Not currently publicly available

                                               5
Case study:
PANDORA Australia’s Web Archive (2)




                                 6
IIPC Preservation Working Group
discussions
• Need for documenting the
  technical environment
• Support required for alternative
  preservation action strategies
  –   Emulation of past environments
  –   Migration to standard formats
  –   Risk notification
  –   Recording conversion and alternate
      access paths
• Exploring different approaches
• Sharing information sensible
                                           7
Technical information of interest

• Browsers + plug-ins/helper
  applications versions &
  dependencies

• Used approximately when?

• Appropriate for which individual/
  type of file format or whole
  archive?
                                      8
Already documented?

• Manufacturer/vendor’s websites
• Developer’s networks, forums, blogs,
  etc.
• File format registries
• File extension resources
• Software archives/download sites
• Internet history websites
• Internet statistics websites
• Wikipedia

                                         9
Possible Approach 1: UDFR
• Digital format registry will result from
  proposed merger of PRONOM and
  GDFR
• Pros
   – Considerable intellectual investment already
   – Could be used for general digital preservation and
     potential interaction with other tools
• Cons
   – Under development
   – Web archive requirements need to be specified, use
     cases developed, changes to data model, population
     with relevant data and regular updating
   – Temporal aspect not currently catered for
   – Entry point Individual file format or software type [could
     be a pro?]
                                                           10
Possible Approach 2: Wikipedia (1)

• Pros
  – Existing free, web-based
    collaborative multilingual
    project
  – Draws together a rich set of
    information
     • browsers, layout engines,
       plug-ins & software, statistics,
       creators, standards, etc.
     • lists, history, comparisons,
       timelines, links to internal &
       external references
  – Updated by many voluntary
    contributors
                                          11
Possible Approach 2: Wikipedia (2)
• Cons
   – General audience, not specific to web archive
     requirements or specific web archive
   – Amount of detail varies (between different
     language versions, articles)
   – Can be edited by multiple users (+ & -)
   – Not designed to interact with other digital
     preservation tools as UDFR has potential to do




                                                      12
Extract example




                  13
Possible Approach 3:
Documenting what web archives
are using/used
• Pros
  – Time based software suite approach
  – Starting point for
     •   Potential UDFR seed list
     •   Identifying commonly used software
     •   Inferring additional software requirements
     •   Identifying alternate access paths
• Cons
  – Easier to document current versions
  – Obscure/obsolete material in our collections
    may be unknown
                                                      14
Individual web archives as
sources of information
• Analysis of archive contents & harvesting
  statistics

• Web archivists observations & records
   – UK Web Archive Technology Watch blog

• Website usage statistics
   – Browser versions & operating systems
   – Indicative of popularity

• Archived sites
   – Plug-in requirements, file type information
   – May include useful information websites
   – Internet Archive complementary collection
                                                   15
Example: NLA Web archiving
software environment July 2009
• Operating system: Windows XP
• Computer: Windows PC, Intel Pentium 4
• Browser: Internet Explorer 7 (main browser),
  IE8, Firefox 3.0
• Additional software:
   –   Adobe Reader 8
   –   Adobe Shockwave Player
   –   Adobe Flash Player 10
   –   Real Player 10
   –   Apple QuickTime 7
   –   Windows Media Player 11
   –   Java 6 Update 11
   –   JavaScript enabled
   –   Word, Excel, PowerPoint 2003
   –   WinZip
                                                 16
Example: Earlier NLA Software
         Environment
2005                      2000                      1996
Windows 2000              Windows 95                Windows 3.1/ Windows
                                                    for Workgroups
Windows PC                Windows PC                Windows PC
IE6 (since June 2002)     Netscape Navigator 4.08   Netscape Navigator 1, 2
                                                    or 3?
Adobe Acrobat Reader      Acrobat Reader            Acrobat Reader
Macromedia Shockwave      Macromedia Shockwave      Macromedia Shockwave
Macromedia Flash player   Macromedia Flash          ?
                          player
Real Player               Real Player               Real Audio player
Apple QuickTime           Apple QuickTime           QuickTime
Windows Media Player 9?   Windows Media Player      Netscape Media Player?
                          6.4?
Java ?                    Java ?                    Java?
JavaScript enabled        JavaScript enabled        JavaScript enabled
Word, Excel, PowerPoint   Word, Excel, PowerPoint   Word, Excel, PowerPoint
WinZip                    WinZip                    PKUnzip ?
                                                                         17
Example: Comparison NLA and
    BnF software environments
NLA web archivist’s         BnF Librarian’s                   BnF public in-house
software 2009               software since 2005               access software
                                                              2008
Internet Explorer 7 and 8   Internet Explorer                 Internet Explorer
Firefox 3.0
Adobe Reader 8              Acrobat Reader*                   Adobe Reader
Adobe Shockwave Player      Macromedia Flash                  Adobe Flash player
Adobe Flash Player 10       player*                           Adobe Shockwave
Real Player 10              Windows Media Player*             player
Apple QuickTime 7           QuickTime*                        VLC Media player
Windows Media Player 11     Java Virtual Machine              Real player
Java 6 Update 11            (Microsoft)*                      Word, Excel &
JavaScript enabled                                            PowerPoint Viewers
                            Later additions:                  Java Virtual Machine
Word, Excel, PowerPoint
                            Firefox
2003
                            RealOne Player 10
WinZip
                            *Software versions
                            progressively updated to latest
                            compatible with Windows XP                            18
Going forward

 • Is it worth pursuing approach 3?
 • If so where would we record
   (IIPC PWG wiki?, other
   suggestions)?
 • Interested in contributing?




                                  19
Questions?



                           Contact
                           •   David Pearson
                               dapearson@nla.gov.au
                           •   Maxine Davis
                               madavis@nla.gov.au


                           Report to IIPC PWG by
                             end October 2009
Everything, for Everyone
        Forever                                  20

More Related Content

Viewers also liked

‘If a tree falls in the forest’: recording and sharing digital preservation k...
‘If a tree falls in the forest’: recording and sharing digital preservation k...‘If a tree falls in the forest’: recording and sharing digital preservation k...
‘If a tree falls in the forest’: recording and sharing digital preservation k...National Library of Australia
 
The Adventures of Digi: Ideas, Requirements and Reality
The Adventures of Digi: Ideas, Requirements and RealityThe Adventures of Digi: Ideas, Requirements and Reality
The Adventures of Digi: Ideas, Requirements and RealityNational Library of Australia
 
Those Mad Men from the Antipodes: Presentation Intent at the National Library...
Those Mad Men from the Antipodes: Presentation Intent at the National Library...Those Mad Men from the Antipodes: Presentation Intent at the National Library...
Those Mad Men from the Antipodes: Presentation Intent at the National Library...National Library of Australia
 
Carolingian and Gothic script
Carolingian and Gothic scriptCarolingian and Gothic script
Carolingian and Gothic scriptbananafish711
 

Viewers also liked (6)

Digital presevation
Digital presevationDigital presevation
Digital presevation
 
‘If a tree falls in the forest’: recording and sharing digital preservation k...
‘If a tree falls in the forest’: recording and sharing digital preservation k...‘If a tree falls in the forest’: recording and sharing digital preservation k...
‘If a tree falls in the forest’: recording and sharing digital preservation k...
 
The Adventures of Digi: Ideas, Requirements and Reality
The Adventures of Digi: Ideas, Requirements and RealityThe Adventures of Digi: Ideas, Requirements and Reality
The Adventures of Digi: Ideas, Requirements and Reality
 
Those Mad Men from the Antipodes: Presentation Intent at the National Library...
Those Mad Men from the Antipodes: Presentation Intent at the National Library...Those Mad Men from the Antipodes: Presentation Intent at the National Library...
Those Mad Men from the Antipodes: Presentation Intent at the National Library...
 
I say emulate
I say emulateI say emulate
I say emulate
 
Carolingian and Gothic script
Carolingian and Gothic scriptCarolingian and Gothic script
Carolingian and Gothic script
 

Similar to Preserving access

Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with condaTravis Oliphant
 
Felczak Pkp 2009
Felczak Pkp 2009Felczak Pkp 2009
Felczak Pkp 2009jbatchel
 
Dick Ng'ambis Podcasting workshop
Dick Ng'ambis Podcasting workshop Dick Ng'ambis Podcasting workshop
Dick Ng'ambis Podcasting workshop Daniela Gachago
 
Dd13.2013.milano.open ntf
Dd13.2013.milano.open ntfDd13.2013.milano.open ntf
Dd13.2013.milano.open ntfUlrich Krause
 
Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)Derek Buitenhuis
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Centre of Competence
 
Electronic Publishing 2.0: Reimagining the Publication and Preservation of E ...
Electronic Publishing 2.0: Reimagining the Publication and Preservation of E ...Electronic Publishing 2.0: Reimagining the Publication and Preservation of E ...
Electronic Publishing 2.0: Reimagining the Publication and Preservation of E ...Leonardo Flores
 
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13Dominopoint - Italian Lotus User Group
 
IWMW 1999: Browser management
IWMW 1999: Browser managementIWMW 1999: Browser management
IWMW 1999: Browser managementIWMW
 
LTR Handout
LTR HandoutLTR Handout
LTR Handoutkoegeljm
 
An Introduction to Open Source Software and Web Application Development
An Introduction to Open Source Software and Web Application DevelopmentAn Introduction to Open Source Software and Web Application Development
An Introduction to Open Source Software and Web Application Developmenttrevorthornton
 
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!Joseph Labrecque
 
Open source caqdas what is in the box and what is missing
Open source caqdas what is in the box and what is missingOpen source caqdas what is in the box and what is missing
Open source caqdas what is in the box and what is missingMerlien Institute
 
Open source softrware, group 5 final
Open source softrware, group 5 finalOpen source softrware, group 5 final
Open source softrware, group 5 finalbigrouge
 
GoOpen 2010: David Elboth
GoOpen 2010: David ElbothGoOpen 2010: David Elboth
GoOpen 2010: David ElbothFriprogsenteret
 

Similar to Preserving access (20)

Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 
Felczak Pkp 2009
Felczak Pkp 2009Felczak Pkp 2009
Felczak Pkp 2009
 
Web Browsers
Web BrowsersWeb Browsers
Web Browsers
 
ownCloud - CampKDE 2011
ownCloud - CampKDE 2011ownCloud - CampKDE 2011
ownCloud - CampKDE 2011
 
Dick Ng'ambis Podcasting workshop
Dick Ng'ambis Podcasting workshop Dick Ng'ambis Podcasting workshop
Dick Ng'ambis Podcasting workshop
 
Dd13.2013.milano.open ntf
Dd13.2013.milano.open ntfDd13.2013.milano.open ntf
Dd13.2013.milano.open ntf
 
Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens Neudecker
 
Electronic Publishing 2.0: Reimagining the Publication and Preservation of E ...
Electronic Publishing 2.0: Reimagining the Publication and Preservation of E ...Electronic Publishing 2.0: Reimagining the Publication and Preservation of E ...
Electronic Publishing 2.0: Reimagining the Publication and Preservation of E ...
 
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
 
IWMW 1999: Browser management
IWMW 1999: Browser managementIWMW 1999: Browser management
IWMW 1999: Browser management
 
LTR Handout
LTR HandoutLTR Handout
LTR Handout
 
OpenGen webinar 011110
OpenGen webinar 011110OpenGen webinar 011110
OpenGen webinar 011110
 
An Introduction to Open Source Software and Web Application Development
An Introduction to Open Source Software and Web Application DevelopmentAn Introduction to Open Source Software and Web Application Development
An Introduction to Open Source Software and Web Application Development
 
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!
 
Open source caqdas what is in the box and what is missing
Open source caqdas what is in the box and what is missingOpen source caqdas what is in the box and what is missing
Open source caqdas what is in the box and what is missing
 
Open source softrware, group 5 final
Open source softrware, group 5 finalOpen source softrware, group 5 final
Open source softrware, group 5 final
 
GoOpen 2010: David Elboth
GoOpen 2010: David ElbothGoOpen 2010: David Elboth
GoOpen 2010: David Elboth
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 
Open Source in the Enterprise
Open Source in the EnterpriseOpen Source in the Enterprise
Open Source in the Enterprise
 

More from National Library of Australia

Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...National Library of Australia
 
CHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic ArtCHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic ArtNational Library of Australia
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaNational Library of Australia
 
Trove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLATrove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLANational Library of Australia
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...National Library of Australia
 
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
 Assessing Significance and Significance 2.0: an introduction - Margaret Birt... Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...National Library of Australia
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyNational Library of Australia
 
Publicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'CastroPublicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'CastroNational Library of Australia
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaNational Library of Australia
 
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLATROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLANational Library of Australia
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...National Library of Australia
 
CHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of SandhurstCHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of SandhurstNational Library of Australia
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyNational Library of Australia
 
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...National Library of Australia
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaNational Library of Australia
 

More from National Library of Australia (20)

Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
 
CHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic ArtCHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic Art
 
Completing your CHG project - Fran D'Castro
Completing your CHG project - Fran D'CastroCompleting your CHG project - Fran D'Castro
Completing your CHG project - Fran D'Castro
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
 
Trove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLATrove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLA
 
National Archives of Australia
National Archives of AustraliaNational Archives of Australia
National Archives of Australia
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
 
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
 Assessing Significance and Significance 2.0: an introduction - Margaret Birt... Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
 
Preservation Needs Assessment - Tamara Lavrencic
Preservation Needs Assessment  - Tamara LavrencicPreservation Needs Assessment  - Tamara Lavrencic
Preservation Needs Assessment - Tamara Lavrencic
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania Cleary
 
Publicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'CastroPublicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'Castro
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
 
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLATROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
 
CHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of SandhurstCHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
 
Preservation Needs Assessment - Tamara Lavrencic
Preservation Needs Assessment - Tamara LavrencicPreservation Needs Assessment - Tamara Lavrencic
Preservation Needs Assessment - Tamara Lavrencic
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania Cleary
 
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
 
Preservation assessment - Tamara Lavrencic
Preservation assessment - Tamara LavrencicPreservation assessment - Tamara Lavrencic
Preservation assessment - Tamara Lavrencic
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
 

Recently uploaded

PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfalexjohnson7307
 

Recently uploaded (20)

PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 

Preserving access

  • 1. Preserving access: Making more informed “guesses” about what works Prepared by: Maxine Davis, Collaboration Research Officer Presented by: David Pearson, Acting Director Web Archiving & Digital Preservation, National Library of Australia IIPC Open Day, San Francisco, 7 October 2009 1
  • 2. Presentation Outline • The problem • Case study: PANDORA Web Archive • Some approaches & options – Approach 1: Unified Digital Format Registry (UDFR) – Approach 2: Wikipedia – Approach 3: Another way documenting what web archives actually use/d 2
  • 3. The problem • The World Wide Web is constantly evolving – Requires combinations of software/hardware to render web content – But what is used for creation and access changes • Web archives – Contain snapshots of websites taken at different times (different sites or same sites multiple times) – Lots of files, many file formats, various versions – Aim for ongoing access 3
  • 4. Process of version “creep” in the archive • Mixed accessibility resulting from: – Different browsers, plug-ins, operating systems in use (then and now) – Backwards compatibility not guaranteed – Changes in standards and coding practices (deprecated, dead & non-standard tags) – Obsolescence of file formats & renderers • Changes to access paths – Incremental loss of access not directly obvious – Alternative access paths not specified 4
  • 5. Case study: PANDORA Australia’s Web Archive (1) • Selective archive began collecting 1996 – Sites individually selected by NLA & partners – As at July 2009 over 70.6 million files – Accessible over the web using standard web browser • .au whole domain harvests – 4 annual harvests 2005-2008 completed, 2009 underway with Internet Archive – Combined harvests 05-08 ~ 2.3 billion files – Not currently publicly available 5
  • 7. IIPC Preservation Working Group discussions • Need for documenting the technical environment • Support required for alternative preservation action strategies – Emulation of past environments – Migration to standard formats – Risk notification – Recording conversion and alternate access paths • Exploring different approaches • Sharing information sensible 7
  • 8. Technical information of interest • Browsers + plug-ins/helper applications versions & dependencies • Used approximately when? • Appropriate for which individual/ type of file format or whole archive? 8
  • 9. Already documented? • Manufacturer/vendor’s websites • Developer’s networks, forums, blogs, etc. • File format registries • File extension resources • Software archives/download sites • Internet history websites • Internet statistics websites • Wikipedia 9
  • 10. Possible Approach 1: UDFR • Digital format registry will result from proposed merger of PRONOM and GDFR • Pros – Considerable intellectual investment already – Could be used for general digital preservation and potential interaction with other tools • Cons – Under development – Web archive requirements need to be specified, use cases developed, changes to data model, population with relevant data and regular updating – Temporal aspect not currently catered for – Entry point Individual file format or software type [could be a pro?] 10
  • 11. Possible Approach 2: Wikipedia (1) • Pros – Existing free, web-based collaborative multilingual project – Draws together a rich set of information • browsers, layout engines, plug-ins & software, statistics, creators, standards, etc. • lists, history, comparisons, timelines, links to internal & external references – Updated by many voluntary contributors 11
  • 12. Possible Approach 2: Wikipedia (2) • Cons – General audience, not specific to web archive requirements or specific web archive – Amount of detail varies (between different language versions, articles) – Can be edited by multiple users (+ & -) – Not designed to interact with other digital preservation tools as UDFR has potential to do 12
  • 14. Possible Approach 3: Documenting what web archives are using/used • Pros – Time based software suite approach – Starting point for • Potential UDFR seed list • Identifying commonly used software • Inferring additional software requirements • Identifying alternate access paths • Cons – Easier to document current versions – Obscure/obsolete material in our collections may be unknown 14
  • 15. Individual web archives as sources of information • Analysis of archive contents & harvesting statistics • Web archivists observations & records – UK Web Archive Technology Watch blog • Website usage statistics – Browser versions & operating systems – Indicative of popularity • Archived sites – Plug-in requirements, file type information – May include useful information websites – Internet Archive complementary collection 15
  • 16. Example: NLA Web archiving software environment July 2009 • Operating system: Windows XP • Computer: Windows PC, Intel Pentium 4 • Browser: Internet Explorer 7 (main browser), IE8, Firefox 3.0 • Additional software: – Adobe Reader 8 – Adobe Shockwave Player – Adobe Flash Player 10 – Real Player 10 – Apple QuickTime 7 – Windows Media Player 11 – Java 6 Update 11 – JavaScript enabled – Word, Excel, PowerPoint 2003 – WinZip 16
  • 17. Example: Earlier NLA Software Environment 2005 2000 1996 Windows 2000 Windows 95 Windows 3.1/ Windows for Workgroups Windows PC Windows PC Windows PC IE6 (since June 2002) Netscape Navigator 4.08 Netscape Navigator 1, 2 or 3? Adobe Acrobat Reader Acrobat Reader Acrobat Reader Macromedia Shockwave Macromedia Shockwave Macromedia Shockwave Macromedia Flash player Macromedia Flash ? player Real Player Real Player Real Audio player Apple QuickTime Apple QuickTime QuickTime Windows Media Player 9? Windows Media Player Netscape Media Player? 6.4? Java ? Java ? Java? JavaScript enabled JavaScript enabled JavaScript enabled Word, Excel, PowerPoint Word, Excel, PowerPoint Word, Excel, PowerPoint WinZip WinZip PKUnzip ? 17
  • 18. Example: Comparison NLA and BnF software environments NLA web archivist’s BnF Librarian’s BnF public in-house software 2009 software since 2005 access software 2008 Internet Explorer 7 and 8 Internet Explorer Internet Explorer Firefox 3.0 Adobe Reader 8 Acrobat Reader* Adobe Reader Adobe Shockwave Player Macromedia Flash Adobe Flash player Adobe Flash Player 10 player* Adobe Shockwave Real Player 10 Windows Media Player* player Apple QuickTime 7 QuickTime* VLC Media player Windows Media Player 11 Java Virtual Machine Real player Java 6 Update 11 (Microsoft)* Word, Excel & JavaScript enabled PowerPoint Viewers Later additions: Java Virtual Machine Word, Excel, PowerPoint Firefox 2003 RealOne Player 10 WinZip *Software versions progressively updated to latest compatible with Windows XP 18
  • 19. Going forward • Is it worth pursuing approach 3? • If so where would we record (IIPC PWG wiki?, other suggestions)? • Interested in contributing? 19
  • 20. Questions? Contact • David Pearson dapearson@nla.gov.au • Maxine Davis madavis@nla.gov.au Report to IIPC PWG by end October 2009 Everything, for Everyone Forever 20