Designing Preservable                Websites                                Nicholas Taylor                              ...
why preserve the web?copy of the first webpage
web archivists aren’t visible       stakeholders     design                      archiving     usage
search engine crawler ≠    archival crawler      “GoogleBots” by Flickr user ares64 under CC BY 2.0
what is a “preservable”           website?“Fish Preserver” by Flickr user ecstaticist under CC BY-NC-SA 2.0
three priorities:• capture: can resources be acquired by  current web archiving technologies?• replay: can the user’s expe...
follow web standards and   accessibility guidelines“Web Standards Fortune Cookie” by Flickr user mherzber under CC BY-SA 2.0
be careful with robots.txt       exclusions     robots.txt for Last.fm
use a site map, transparentlinks, and contiguous navigation    “Card sorting” by Flickr user Manchester Library under CC B...
maintain stable URLs and redirect when necessary     “Improvised detour sign” by Flickr user Jason McHuff under CC BY-SA 2.0
consider using a Creative   Commons license   “2500 Creative Commons Licenses” by Flickr user qthomasbower under CC BY-SA ...
use durable data formats “Lascaux cave painting” by Flickr user qoforchris under CC BY-ND 2.0
embed metadata, especially the      character encoding   source code of http://www.seo.com/
use archiving-friendly platform     providers and CMSs   robots.txt for Drupal 7
three tips1. see how well your site   validates on   http://validator.w3.org/2. see how your site looks   on http://archiv...
thank you!Nicholas Taylor @nullhandle
Upcoming SlideShare
Loading in …5
×

Designing Preservable Websites

0 views
489 views

Published on

Presentation given at the DC, VA and MD Search Engine Marketing Meetup on how to design websites to improve their archiving and preservation.

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
0
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Design decisions have a major effect on website preservability.
  • “ Benign neglect” may have been sufficient for physical objects; more active interventions needed for digital ones.
  • Design and usage inform each other; where does web archiving fit?
  • Because web archivists care about recreating the user experience, they care about all assets being crawled.
  • Good also for usability and SEO. Web crawlers access sites like a text browser. Replay platform must accommodate coding idiosyncrasies.
  • CSS and JavaScript directories matter for archiving but perhaps not for search engine indexing.
  • Crawler can only capture links it sees. User of archived site can only navigate by following links. Avoid relying on Flash, JavaScript, or other technologies that obscure links. Use a site map.
  • Link rot is common. Web archiving tools are URL-sensitive. Stable/redirect URLs make for seamless archive access.
  • Copyright law lacks explicit provisions for digital preservation. Many libraries ask for permission to archive websites. Creative Commons provides affirmative permission to be crawled and preserved.
  • Websites contain many different file types, each with distinct preservation risks. Favor open standards and file formats, except when poorly-documented or where vendor-specific extensions are allowed.
  • Embedded metadata makes it easier to replay and preserve archived sites.
  • Platform providers more likely to accommodate commercial search indexers than archival crawlers. If you care about archiving, inquire about policies, examine robots.txt, or look at how website looks in Internet Archive’s Wayback Machine. If you’re using an open source CMS, be sure to review the bundled robots.txt.
  • While following these recommendations won’t guarantee perfect archiving, not following them will ensure additional challenges.
  • Designing Preservable Websites

    1. 1. Designing Preservable Websites Nicholas Taylor @nullhandleDC, VA & MD Search Engine Marketing MeetupJuly 18, 2012 “found glass” by Flickr user nuanc under CC BY-NC-ND 2.0
    2. 2. why preserve the web?copy of the first webpage
    3. 3. web archivists aren’t visible stakeholders design archiving usage
    4. 4. search engine crawler ≠ archival crawler “GoogleBots” by Flickr user ares64 under CC BY 2.0
    5. 5. what is a “preservable” website?“Fish Preserver” by Flickr user ecstaticist under CC BY-NC-SA 2.0
    6. 6. three priorities:• capture: can resources be acquired by current web archiving technologies?• replay: can the user’s experience of the original website be recreated from the archived resources?• preservation: how can it be assured that the archived website remains coherent over time?
    7. 7. follow web standards and accessibility guidelines“Web Standards Fortune Cookie” by Flickr user mherzber under CC BY-SA 2.0
    8. 8. be careful with robots.txt exclusions robots.txt for Last.fm
    9. 9. use a site map, transparentlinks, and contiguous navigation “Card sorting” by Flickr user Manchester Library under CC BY-SA 2.0
    10. 10. maintain stable URLs and redirect when necessary “Improvised detour sign” by Flickr user Jason McHuff under CC BY-SA 2.0
    11. 11. consider using a Creative Commons license “2500 Creative Commons Licenses” by Flickr user qthomasbower under CC BY-SA 2.0
    12. 12. use durable data formats “Lascaux cave painting” by Flickr user qoforchris under CC BY-ND 2.0
    13. 13. embed metadata, especially the character encoding source code of http://www.seo.com/
    14. 14. use archiving-friendly platform providers and CMSs robots.txt for Drupal 7
    15. 15. three tips1. see how well your site validates on http://validator.w3.org/2. see how your site looks on http://archive.org/3. your favorite online sitemap generator is a good starting point “Highlighters” by Flickr user KJGarbutt under CC BY-ND 2.0
    16. 16. thank you!Nicholas Taylor @nullhandle

    ×