PANDORA: An Overview Future-proofing Institutional Websites 19-20 January 2006 London Matthew Walker Deputy Director, Coll...
Introduction <ul><li>Origin: Proof-of-concept </li></ul><ul><li>Selection work started in 1996 </li></ul><ul><li>Archiving...
How? <ul><li>Dynamic approach </li></ul><ul><ul><li>Low structure, high flexibility </li></ul></ul><ul><ul><li>Processes d...
Who? <ul><li>NLA </li></ul><ul><ul><li>Digital Archiving Section </li></ul></ul><ul><ul><ul><li>Business responsibility (~...
Who? <ul><li>Partner Institutions </li></ul><ul><ul><li>Libraries: </li></ul></ul><ul><ul><ul><li>Northern Territory Libra...
What? <ul><li>NLA responsibilities </li></ul><ul><ul><li>National Library Act, 1960 </li></ul></ul><ul><ul><ul><li>No lega...
Characteristics <ul><li>Selective approach </li></ul><ul><li>Scalable to available resources </li></ul><ul><li>Negotiate p...
Issues <ul><li>Missing resources for future researchers </li></ul><ul><li>Labour intensive </li></ul><ul><li>Full linking ...
Workflow <ul><li>Nominating/Identifying </li></ul><ul><ul><li>Publisher self-nomination </li></ul></ul><ul><ul><ul><li>Nom...
Workflow <ul><li>Selecting </li></ul><ul><ul><li>DAS </li></ul></ul><ul><ul><ul><li>NLA selection guidelines ( http://pand...
Workflow <ul><li>Gathering </li></ul><ul><ul><li>Mechanisms </li></ul></ul><ul><ul><ul><li>HTTrack crawling ( http://www.h...
Workflow <ul><li>Processing </li></ul><ul><ul><li>Quality assurance </li></ul></ul><ul><ul><ul><li>Manual check for viewin...
Workflow <ul><li>Archiving </li></ul><ul><ul><li>Transfer master display copy from working area to Digital Object Storage ...
Workflow <ul><li>Publishing </li></ul><ul><ul><li>Title Entry Page (TEP) </li></ul></ul><ul><ul><ul><li>Created from metad...
Workflow <ul><li>Cataloguing </li></ul><ul><ul><li>Bibliographic details </li></ul></ul><ul><ul><ul><li>NLA catalogue </li...
Workflow <ul><li>Permissions </li></ul><ul><ul><li>No legal deposit </li></ul></ul><ul><ul><ul><li>Explicit permission of ...
Workflow <ul><li>Restrictions </li></ul><ul><ul><li>Publisher restrictions on access </li></ul></ul><ul><ul><ul><li>Period...
NLA Tools <ul><li>PANDAS </li></ul><ul><ul><li>http://pandora.nla.gov.au/pandas.html </li></ul></ul><ul><ul><li>Web archiv...
Other Tools <ul><li>PageVault </li></ul><ul><ul><li>http://www.projectcomputing.com/products/pageVault/ </li></ul></ul><ul...
PANDORA Resources <ul><li>Selection guidelines </li></ul><ul><ul><li>http://pandora.nla.gov.au/selectionguidelinesallpartn...
Other Resources <ul><li>PANDORA Archiving Issues FAQ http:// pandora.nla.gov.au/manual/pandas/faq.html </li></ul><ul><li>N...
Future Directions/Issues <ul><li>Deep web – database archiving </li></ul><ul><li>Historical repository of tools for viewin...
Recommendations for starting out <ul><li>Do something small & do it now. </li></ul><ul><li>Build on what you already have....
Summary <ul><li>The PANDORA story </li></ul><ul><li>Tools and resources </li></ul><ul><li>Futures/ideas </li></ul>
Upcoming SlideShare
Loading in …5
×

香港六合彩

1,997 views

Published on

香港六合彩要文明点,香港六合彩看,我不就作得很好嘛尽管我想宰了香港六合彩

Published in: Business, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,997
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 香港六合彩

    1. 1. PANDORA: An Overview Future-proofing Institutional Websites 19-20 January 2006 London Matthew Walker Deputy Director, Collection Infrastructure IT Division National Library of Australia
    2. 2. Introduction <ul><li>Origin: Proof-of-concept </li></ul><ul><li>Selection work started in 1996 </li></ul><ul><li>Archiving began late 1996/early 1997 </li></ul><ul><ul><li>Few automated processes </li></ul></ul><ul><ul><li>Progressed to more automated approach </li></ul></ul><ul><li>Now: Important NLA archiving activity </li></ul>
    3. 3. How? <ul><li>Dynamic approach </li></ul><ul><ul><li>Low structure, high flexibility </li></ul></ul><ul><ul><li>Processes developed “on the fly” </li></ul></ul><ul><li>Result </li></ul><ul><ul><li>Outcomes achieved </li></ul></ul><ul><ul><li>Best use of available resources </li></ul></ul>
    4. 4. Who? <ul><li>NLA </li></ul><ul><ul><li>Digital Archiving Section </li></ul></ul><ul><ul><ul><li>Business responsibility (~7 staff) </li></ul></ul></ul><ul><ul><li>Librarians (support as needed) </li></ul></ul><ul><ul><ul><li>Cataloguing </li></ul></ul></ul><ul><ul><li>Information Technology </li></ul></ul><ul><ul><ul><li>Support (~1 staff) </li></ul></ul></ul><ul><ul><ul><li>Enhancement/Redevelopment (~4 staff) </li></ul></ul></ul>
    5. 5. Who? <ul><li>Partner Institutions </li></ul><ul><ul><li>Libraries: </li></ul></ul><ul><ul><ul><li>Northern Territory Library, State Library of New South Wales, State Library of Queensland, State Library of South Australia, State Library of Victoria, State Library of Western Australia </li></ul></ul></ul><ul><ul><li>Other: </li></ul></ul><ul><ul><ul><li>Australian Institute of Aboriginal and Torres Strait Islander Studies, Australian War Memorial, National Film and Sound Archive </li></ul></ul></ul>
    6. 6. What? <ul><li>NLA responsibilities </li></ul><ul><ul><li>National Library Act, 1960 </li></ul></ul><ul><ul><ul><li>No legal deposit legislation for electronic resources! </li></ul></ul></ul><ul><ul><li>Maintain and develop a national collection of ‘library material’ </li></ul></ul><ul><ul><li>Comprehensive collection relating to Australia and the Australian people </li></ul></ul><ul><ul><li>Leadership role </li></ul></ul>
    7. 7. Characteristics <ul><li>Selective approach </li></ul><ul><li>Scalable to available resources </li></ul><ul><li>Negotiate permission to archive </li></ul><ul><li>Manual quality assurance processes </li></ul><ul><li>Access to the archived resources </li></ul>
    8. 8. Issues <ul><li>Missing resources for future researchers </li></ul><ul><li>Labour intensive </li></ul><ul><li>Full linking structure of the Internet not retained </li></ul><ul><li>Deep web content not archived </li></ul>
    9. 9. Workflow <ul><li>Nominating/Identifying </li></ul><ul><ul><li>Publisher self-nomination </li></ul></ul><ul><ul><ul><li>Nomination form ( http://pandora.nla.gov.au/registration_form.html ) </li></ul></ul></ul><ul><ul><li>Indexing/abstracting agency nominations. </li></ul></ul><ul><ul><ul><li>Nomination form ( http://pandora.nla.gov.au/indexerform.html ) </li></ul></ul></ul><ul><ul><li>NLA’s Digital Archiving Section (DAS) </li></ul></ul><ul><ul><li>Partner institutions </li></ul></ul>
    10. 10. Workflow <ul><li>Selecting </li></ul><ul><ul><li>DAS </li></ul></ul><ul><ul><ul><li>NLA selection guidelines ( http://pandora.nla.gov.au/selectionguidelines.html ) </li></ul></ul></ul><ul><ul><li>Partner institutions </li></ul></ul><ul><ul><ul><li>Own selection guidelines </li></ul></ul></ul><ul><ul><li>Type of content </li></ul></ul><ul><ul><ul><li>Documents (e.g. PDF) </li></ul></ul></ul><ul><ul><ul><li>Whole and partial websites </li></ul></ul></ul>
    11. 11. Workflow <ul><li>Gathering </li></ul><ul><ul><li>Mechanisms </li></ul></ul><ul><ul><ul><li>HTTrack crawling ( http://www.httrack.com ) </li></ul></ul></ul><ul><ul><ul><li>FTP from publisher </li></ul></ul></ul><ul><ul><ul><li>Email from publisher </li></ul></ul></ul><ul><ul><li>Preservation copy </li></ul></ul><ul><ul><li>Post-crawl processing </li></ul></ul><ul><ul><li>Working area </li></ul></ul>
    12. 12. Workflow <ul><li>Processing </li></ul><ul><ul><li>Quality assurance </li></ul></ul><ul><ul><ul><li>Manual check for viewing/linking errors </li></ul></ul></ul><ul><ul><ul><li>Completeness and functionality </li></ul></ul></ul><ul><ul><ul><li>New content (compare with previous instance) </li></ul></ul></ul><ul><ul><ul><li>No unexpected content </li></ul></ul></ul><ul><ul><li>Modifications </li></ul></ul><ul><ul><ul><li>Write access to the working area </li></ul></ul></ul><ul><ul><ul><ul><li>Add missing files, fix broken links, etc. </li></ul></ul></ul></ul>
    13. 13. Workflow <ul><li>Archiving </li></ul><ul><ul><li>Transfer master display copy from working area to Digital Object Storage System (DOSS) </li></ul></ul><ul><ul><li>Transfer preservation copy to preservation area on the DOSS </li></ul></ul><ul><ul><li>Create display copy on web server </li></ul></ul><ul><ul><li>Still not published! </li></ul></ul>
    14. 14. Workflow <ul><li>Publishing </li></ul><ul><ul><li>Title Entry Page (TEP) </li></ul></ul><ul><ul><ul><li>Created from metadata </li></ul></ul></ul><ul><ul><ul><li>Additional links to notes, links to serial issues, copyright statement, etc. </li></ul></ul></ul><ul><ul><ul><li>Creation makes the archived copy publicly accessible </li></ul></ul></ul><ul><ul><li>Persistent Identifiers (PIs) </li></ul></ul><ul><ul><ul><li>e.g. nla.arc-25849-20051113-www.bullyingnoway.com.au/default.html </li></ul></ul></ul>
    15. 15. Workflow <ul><li>Cataloguing </li></ul><ul><ul><li>Bibliographic details </li></ul></ul><ul><ul><ul><li>NLA catalogue </li></ul></ul></ul><ul><ul><ul><li>National Bibliographic Database (NDB) </li></ul></ul></ul><ul><ul><li>Metadata imported into PANDORA TEPs </li></ul></ul>
    16. 16. Workflow <ul><li>Permissions </li></ul><ul><ul><li>No legal deposit </li></ul></ul><ul><ul><ul><li>Explicit permission of the publisher is sought prior to archiving </li></ul></ul></ul><ul><ul><li>Copyright, etc </li></ul></ul><ul><ul><ul><li>Publisher’s permission to make publicly available </li></ul></ul></ul><ul><ul><ul><ul><li>Restrictions </li></ul></ul></ul></ul>
    17. 17. Workflow <ul><li>Restrictions </li></ul><ul><ul><li>Publisher restrictions on access </li></ul></ul><ul><ul><ul><li>Period </li></ul></ul></ul><ul><ul><ul><ul><li>e.g. accessible from restricted location/s for 5 years </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Location is specified by IP address and subnet mask </li></ul></ul></ul></ul><ul><ul><ul><li>Date </li></ul></ul></ul><ul><ul><ul><ul><li>e.g. accessible from restricted location/s between 3/12/2005 and 31/1/2007 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Location is specified by IP address and subnet mask </li></ul></ul></ul></ul><ul><ul><ul><li>Authenticated group </li></ul></ul></ul><ul><ul><ul><ul><li>e.g. accessible by username/password credentials </li></ul></ul></ul></ul><ul><ul><ul><li>Can be enabled/disabled in PANDAS </li></ul></ul></ul>
    18. 18. NLA Tools <ul><li>PANDAS </li></ul><ul><ul><li>http://pandora.nla.gov.au/pandas.html </li></ul></ul><ul><ul><li>Web archive management system. </li></ul></ul><ul><li>XINQ </li></ul><ul><ul><li>http://www.nla.gov.au/xinq/ </li></ul></ul><ul><ul><li>Making deep web database archives accessible by browse/search. </li></ul></ul>
    19. 19. Other Tools <ul><li>PageVault </li></ul><ul><ul><li>http://www.projectcomputing.com/products/pageVault/ </li></ul></ul><ul><ul><li>Archives your website by keeping a copy of every accessed version of a page as it passes through your web server. </li></ul></ul><ul><li>HTTrack </li></ul><ul><ul><li>http://www.httrack.com </li></ul></ul><ul><ul><li>Desktop/command-line tool for crawling websites. </li></ul></ul><ul><li>Heritrix </li></ul><ul><ul><li>http://crawler.archive.org/ </li></ul></ul><ul><ul><li>Tool from Internet Archive for crawling the web. </li></ul></ul><ul><ul><li>Designed for large-scale crawls, rather than individual websites. </li></ul></ul>
    20. 20. PANDORA Resources <ul><li>Selection guidelines </li></ul><ul><ul><li>http://pandora.nla.gov.au/selectionguidelinesallpartners.html </li></ul></ul><ul><li>Papers & presentations </li></ul><ul><ul><li>http://pandora.nla.gov.au/papers.html </li></ul></ul>
    21. 21. Other Resources <ul><li>PANDORA Archiving Issues FAQ http:// pandora.nla.gov.au/manual/pandas/faq.html </li></ul><ul><li>NLA Digital Archiving Section - General Procedures (Procedures for handling Internet resources) http://pandora.nla.gov.au/manual/general_procedures.html </li></ul><ul><li>NLA Digital Archiving Section Manual - Check List for Scheduled Gatherings http://pandora.nla.gov.au/manual/checklist.html </li></ul><ul><li>NLA Digital Archiving Section Manual - Gathering Schedule Guidelines http:// pandora.nla.gov.au/manual/schedule_guidelines.html </li></ul>
    22. 22. Future Directions/Issues <ul><li>Deep web – database archiving </li></ul><ul><li>Historical repository of tools for viewing archive content </li></ul><ul><li>New & future ways of authoring & publishing to the web </li></ul><ul><ul><li>XML publishing, blogs, DB driven, wikis… </li></ul></ul><ul><ul><li>What’s coming in 2, 5 or 10 years’ time? </li></ul></ul>
    23. 23. Recommendations for starting out <ul><li>Do something small & do it now. </li></ul><ul><li>Build on what you already have. </li></ul><ul><li>Think about what you have done and revise/expand as necessary. </li></ul>
    24. 24. Summary <ul><li>The PANDORA story </li></ul><ul><li>Tools and resources </li></ul><ul><li>Futures/ideas </li></ul>

    ×