UKOLN is supported  by: Approaches to Archiving Professional Blogs Hosted in the Cloud iPRES 2010, Vienna, Austria Tuesday, September 21st 2010 Marieke Guy Research Officer, UKOLN www.bath.ac.uk This work is licensed under a Attribution-NonCommercial-ShareAlike 2.0 licence http://www.ukoln.ac.uk/web-focus/papers/pres-2010/paper25/
Introduction to UKOLN UKOLN is a centre of excellence in digital information management, providing advice and services to the library, information and cultural heritage communities  Library and cataloguing background Located at the University of Bath, UK Funded by JISC to advise UK HE and FE communities  Also project funding, including EU funding Many areas of work including metadata, repositories, dissemination activities, eScience, etc. Digital preservation projects: DRIVER, CEDARS, eBank,  JISC Preservation of Web Resources, Beginners Guide, etc. Digital Curation Centre
Why blogs? Why in the Cloud? Ease of creation, ease of use, ease of sharing Increasingly used for reflecting, analyzing, questioning, critiquing, recording, discussing, learning, etc. Very important for information professionals Many dissemination benefits Lack of institutional blogging infrastructure UKOLN supports innovation Cloud is an agile, cost-effective, highly useable way to deliver a service Now own institutional service and over 15 blogs
The Professional’s blog Established 2006 750+ posts 240 users per day Personal style Institution vs individual?
The Project blog 2008 - 2010 118 posts 141 comments 6 contributors Professional style
The Event blog June – August 2009 68 posts 3 contributors + guests Video, interviews, photos, discussion Informal/professional style
Why Preserve blogs? Contain useful information Information not available elsewhere Look and feel relevant Cultural significance Reliance on 3 rd  party services Blogs disappear (UK HE funding cuts…) ‘ Archiving’ - ways in which blog content can be migrated to alternative environments in order to satisfy a number of business functions Focus on short-term continuity and management Could comprise part of a preservation Strategy
Different Approaches: New Static Master Copy Backup Copy Migration to Another Platform Physical Manifestation Other technical approaches What are the issues with each of these? http://www.flickr.com/photos/mnsc/433436548/
New Static Master Copy Migrate blog to static HTML Point to new static resource IWMW – WinHTTTrack static copy Issues:  No interactivity Loss of technical architecture e.g. plugins Loss of other elements e.g comments Look and feel
Backup Copy Using XML, using HTML? Where?  On the server? On a disc? On an external hard drive? On the same blog platform? ArchivePress On alternate blog platform? JP XML version on Intranet IWMW static version on Intranet Issues:  Access
Migration to Another Platform Live blog to alternate platform Could just be for data mining purposes – can’t do on current environment UKWF   VOX platform, RSS feeds used, Yahoo pipes Export feature Issues:  Access Loss of technical architecture e.g. plugins Loss of other elements e.g comments Look and feel
Physical Manifestation Create a hard copy print out e.g. self-publishing Create PDF of site, RSS2PDF UKWF Lulu self published book available Purpose specific Issues Obviously not interactive but record unlikely to degrade like other options
Technical Approaches HTML Scraping HTTTrack – static Web site created Third-party Web archiving UK Web Archive Internet Archive Not always complete capture but useful for look and feel URL submitted for case study blogs
Freezing a blog Assessment of status of blog Audit - Get your house in order: links to embeds, comments, spam, etc. Preliminary posts Statistics: dates, posts, comments, spam, contributors, theme, plugins, software, licence etc.  Archive page/sidebar widget Final post Indication that blog is archived Close comments Archive blog http://www.flickr.com/photos/plousia/93646438/
The Archive Page
General Issues What constitutes a blog? – content, layout, plugins, comments, tags, images, multimedia, etc. Who owns a blog? Identity, copyright, ownership and licences Privacy Permissions to access blogs belonging to individuals Understandability of pages if out of context Blog policies Availability
Best Practice Checklist Planning Clarification of rights Monitoring of technologies used Auditing Understanding of costs and benefits Identification and implementation of archiving strategy Dissemination Learning Organisational Audit
Lessons Learnt Need for a risk assessment framework if using third party services Importance of planning and writing of blog policy at start of blog lifecycle Useful to consider a combination of approaches rather than just one Value of sharing best practice of blog archiving
Questions? Twitter Id: mariekeguy Email: m.guy@ukoln.ac.uk Slides: http://www.slideshare.net/MariekeGuy All resource URLs tagged with ipres2010-blogs: http://delicious.com/mariekeguy/ipres2010-blogs

Approaches to Archiving Professional Blogs Hosted in the Cloud

  • 1.
    UKOLN is supported by: Approaches to Archiving Professional Blogs Hosted in the Cloud iPRES 2010, Vienna, Austria Tuesday, September 21st 2010 Marieke Guy Research Officer, UKOLN www.bath.ac.uk This work is licensed under a Attribution-NonCommercial-ShareAlike 2.0 licence http://www.ukoln.ac.uk/web-focus/papers/pres-2010/paper25/
  • 2.
    Introduction to UKOLNUKOLN is a centre of excellence in digital information management, providing advice and services to the library, information and cultural heritage communities Library and cataloguing background Located at the University of Bath, UK Funded by JISC to advise UK HE and FE communities Also project funding, including EU funding Many areas of work including metadata, repositories, dissemination activities, eScience, etc. Digital preservation projects: DRIVER, CEDARS, eBank, JISC Preservation of Web Resources, Beginners Guide, etc. Digital Curation Centre
  • 3.
    Why blogs? Whyin the Cloud? Ease of creation, ease of use, ease of sharing Increasingly used for reflecting, analyzing, questioning, critiquing, recording, discussing, learning, etc. Very important for information professionals Many dissemination benefits Lack of institutional blogging infrastructure UKOLN supports innovation Cloud is an agile, cost-effective, highly useable way to deliver a service Now own institutional service and over 15 blogs
  • 4.
    The Professional’s blogEstablished 2006 750+ posts 240 users per day Personal style Institution vs individual?
  • 5.
    The Project blog2008 - 2010 118 posts 141 comments 6 contributors Professional style
  • 6.
    The Event blogJune – August 2009 68 posts 3 contributors + guests Video, interviews, photos, discussion Informal/professional style
  • 7.
    Why Preserve blogs?Contain useful information Information not available elsewhere Look and feel relevant Cultural significance Reliance on 3 rd party services Blogs disappear (UK HE funding cuts…) ‘ Archiving’ - ways in which blog content can be migrated to alternative environments in order to satisfy a number of business functions Focus on short-term continuity and management Could comprise part of a preservation Strategy
  • 8.
    Different Approaches: NewStatic Master Copy Backup Copy Migration to Another Platform Physical Manifestation Other technical approaches What are the issues with each of these? http://www.flickr.com/photos/mnsc/433436548/
  • 9.
    New Static MasterCopy Migrate blog to static HTML Point to new static resource IWMW – WinHTTTrack static copy Issues: No interactivity Loss of technical architecture e.g. plugins Loss of other elements e.g comments Look and feel
  • 10.
    Backup Copy UsingXML, using HTML? Where? On the server? On a disc? On an external hard drive? On the same blog platform? ArchivePress On alternate blog platform? JP XML version on Intranet IWMW static version on Intranet Issues: Access
  • 11.
    Migration to AnotherPlatform Live blog to alternate platform Could just be for data mining purposes – can’t do on current environment UKWF  VOX platform, RSS feeds used, Yahoo pipes Export feature Issues: Access Loss of technical architecture e.g. plugins Loss of other elements e.g comments Look and feel
  • 12.
    Physical Manifestation Createa hard copy print out e.g. self-publishing Create PDF of site, RSS2PDF UKWF Lulu self published book available Purpose specific Issues Obviously not interactive but record unlikely to degrade like other options
  • 13.
    Technical Approaches HTMLScraping HTTTrack – static Web site created Third-party Web archiving UK Web Archive Internet Archive Not always complete capture but useful for look and feel URL submitted for case study blogs
  • 14.
    Freezing a blogAssessment of status of blog Audit - Get your house in order: links to embeds, comments, spam, etc. Preliminary posts Statistics: dates, posts, comments, spam, contributors, theme, plugins, software, licence etc. Archive page/sidebar widget Final post Indication that blog is archived Close comments Archive blog http://www.flickr.com/photos/plousia/93646438/
  • 15.
  • 16.
    General Issues Whatconstitutes a blog? – content, layout, plugins, comments, tags, images, multimedia, etc. Who owns a blog? Identity, copyright, ownership and licences Privacy Permissions to access blogs belonging to individuals Understandability of pages if out of context Blog policies Availability
  • 17.
    Best Practice ChecklistPlanning Clarification of rights Monitoring of technologies used Auditing Understanding of costs and benefits Identification and implementation of archiving strategy Dissemination Learning Organisational Audit
  • 18.
    Lessons Learnt Needfor a risk assessment framework if using third party services Importance of planning and writing of blog policy at start of blog lifecycle Useful to consider a combination of approaches rather than just one Value of sharing best practice of blog archiving
  • 19.
    Questions? Twitter Id:mariekeguy Email: m.guy@ukoln.ac.uk Slides: http://www.slideshare.net/MariekeGuy All resource URLs tagged with ipres2010-blogs: http://delicious.com/mariekeguy/ipres2010-blogs