Your SlideShare is downloading. ×
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Designing Preservable Websites
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Designing Preservable Websites

358

Published on

Presentation given at the DC, VA and MD Search Engine Marketing Meetup on how to design websites to improve their archiving and preservation.

Presentation given at the DC, VA and MD Search Engine Marketing Meetup on how to design websites to improve their archiving and preservation.

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
358
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Design decisions have a major effect on website preservability.
  • “ Benign neglect” may have been sufficient for physical objects; more active interventions needed for digital ones.
  • Design and usage inform each other; where does web archiving fit?
  • Because web archivists care about recreating the user experience, they care about all assets being crawled.
  • Good also for usability and SEO. Web crawlers access sites like a text browser. Replay platform must accommodate coding idiosyncrasies.
  • CSS and JavaScript directories matter for archiving but perhaps not for search engine indexing.
  • Crawler can only capture links it sees. User of archived site can only navigate by following links. Avoid relying on Flash, JavaScript, or other technologies that obscure links. Use a site map.
  • Link rot is common. Web archiving tools are URL-sensitive. Stable/redirect URLs make for seamless archive access.
  • Copyright law lacks explicit provisions for digital preservation. Many libraries ask for permission to archive websites. Creative Commons provides affirmative permission to be crawled and preserved.
  • Websites contain many different file types, each with distinct preservation risks. Favor open standards and file formats, except when poorly-documented or where vendor-specific extensions are allowed.
  • Embedded metadata makes it easier to replay and preserve archived sites.
  • Platform providers more likely to accommodate commercial search indexers than archival crawlers. If you care about archiving, inquire about policies, examine robots.txt, or look at how website looks in Internet Archive’s Wayback Machine. If you’re using an open source CMS, be sure to review the bundled robots.txt.
  • While following these recommendations won’t guarantee perfect archiving, not following them will ensure additional challenges.
  • Transcript

    • 1. Designing Preservable Websites Nicholas Taylor @nullhandleDC, VA & MD Search Engine Marketing MeetupJuly 18, 2012 “found glass” by Flickr user nuanc under CC BY-NC-ND 2.0
    • 2. why preserve the web?copy of the first webpage
    • 3. web archivists aren’t visible stakeholders design archiving usage
    • 4. search engine crawler ≠ archival crawler “GoogleBots” by Flickr user ares64 under CC BY 2.0
    • 5. what is a “preservable” website?“Fish Preserver” by Flickr user ecstaticist under CC BY-NC-SA 2.0
    • 6. three priorities:• capture: can resources be acquired by current web archiving technologies?• replay: can the user’s experience of the original website be recreated from the archived resources?• preservation: how can it be assured that the archived website remains coherent over time?
    • 7. follow web standards and accessibility guidelines“Web Standards Fortune Cookie” by Flickr user mherzber under CC BY-SA 2.0
    • 8. be careful with robots.txt exclusions robots.txt for Last.fm
    • 9. use a site map, transparentlinks, and contiguous navigation “Card sorting” by Flickr user Manchester Library under CC BY-SA 2.0
    • 10. maintain stable URLs and redirect when necessary “Improvised detour sign” by Flickr user Jason McHuff under CC BY-SA 2.0
    • 11. consider using a Creative Commons license “2500 Creative Commons Licenses” by Flickr user qthomasbower under CC BY-SA 2.0
    • 12. use durable data formats “Lascaux cave painting” by Flickr user qoforchris under CC BY-ND 2.0
    • 13. embed metadata, especially the character encoding source code of http://www.seo.com/
    • 14. use archiving-friendly platform providers and CMSs robots.txt for Drupal 7
    • 15. three tips1. see how well your site validates on http://validator.w3.org/2. see how your site looks on http://archive.org/3. your favorite online sitemap generator is a good starting point “Highlighters” by Flickr user KJGarbutt under CC BY-ND 2.0
    • 16. thank you!Nicholas Taylor @nullhandle

    ×