Successfully reported this slideshow.
Your SlideShare is downloading. ×

Gareth millwood interrogating the archived uk web

Gareth millwood interrogating the archived uk web

Download to read offline

Digital History seminar
4 November 2014
Live Stream: http://ihrdighist.blogs.sas.ac.uk/2014/10/28/tuesday-4-november-interrogating-the-archived-uk-web-historians-and-social-scientists-research-experiences/

Digital History seminar
4 November 2014
Live Stream: http://ihrdighist.blogs.sas.ac.uk/2014/10/28/tuesday-4-november-interrogating-the-archived-uk-web-historians-and-social-scientists-research-experiences/

More Related Content

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Gareth millwood interrogating the archived uk web

  1. 1. Interrogating the archived UK web “RNIB” Gareth Millward – gareth.millward@lshtm.ac.uk – Centre for History in Public Health Improving health worldwide http:://history.lshtm.ac.uk
  2. 2. “The best-laid schemes o’ mice an’ men… • Original plan to investigate the presence of information for disabled people on the UK web • Also to look at the accessibility of that info through Web Accessibility Standard 1.0 (1998) • Search for major organisations and key disability words • Run sample through validation tools Pieter Bruegel the Elder - The Tower of Babel (Vienna) - Google Art Project – edited : from Wikipedia
  3. 3. … Gang aft agley.” • Far too much stuff! • Search terms such as “RADAR”, “SCOPE” and “MIND” obviously… problematic… • No discernible pattern from code validation • “Experience” of using screen readers impossible (for now)* • Defining “information” or “reach” not a simple task • Still major problems with assessing “importance” and “relevance” * - At least within design scope of this project… ! Macintosh Performa 5200, a mid-90s Apple computer. From Wikipedia.
  4. 4. “RNIB” • A simple four-letter string • Played a key role in promoting web standards in Britain • Just over half a million “hits” – significant number compared to other disability organisations. RNIB logo © RNIB – RNIB.org.uk
  5. 5. Large number of instances relative to peers… Search term Instances RNIB 516,165 MENCAP 218,439 RNID 217,963 "disability alliance" 22,421 royal association for disability and rehabilitation 16,072 BCODP 12,501 UKDPC 2,348 "spinal injuries association" 45,477 "centre for independent living" 23,185 "disability benefits consortium" 2,205 disability 12,909,868 *.* (all) 2,023,288,655 0.00% 0.01% 0.01% 0.02% 0.02% 0.03% 0.03% 0.04% 0.04% 0.05% 0.05% 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Instancesp.a.asperecentageofwholep.a. Instances of search terms relative to *.*, 1996 - 2010 RNIB MENCAP RNID
  6. 6. … and not all self- referential 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% Instances per domain as percentage of total for "RNIB"
  7. 7. Predominance of .org.uk 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% .org.uk .co.uk .gov.uk .ac.uk .nhs.uk .parliament.uk Domains of instances as percentage of total of "RNIB"
  8. 8. The trouble begins - links Links to Instances -> rnib.org.uk 259,421 -> w3.org 71,798 -> mla.gov.uk 34,435 -> openharmonise.org 32,071 -> facebook.com 31,098 • Disaggregated statistics are basically meaningless • Second most common link is to W3.org – had virtually nothing to do with the actual activities of RNIB • openharmonise.org – the CMS for mla.gov.uk. Reflects references on MLA site, not the activity of RNIB
  9. 9. The bloody Guardian…
  10. 10. Commensurability goes out the window.. • Once you start filtering out the areas that aren’t “really” part of your search, it becomes impossible to compare one search term with another. • You will lose “useful” information and keep “useless” stuff • Can begin to build a “human readable” corpus – but what the heck do I actually have, here? Certainly not what I originally intended to look at… xkcd:Thesis Defence
  11. 11. Whittling down • REMOVED LINKS TO W3.org (usually just a mention of WAI) • REMOVED RNIB.org.uk (I can browse the main site – more interested in external material) • REMOVED 2009 & 2010 (made the sample smaller, and these use different crawling system) • REMOVED RNIB.co.uk • REMOVED big-print.co.uk • REMOVED MLA.gov.uk (mentions RNIB a lot, but becomes noise) • The result of all this? The corpus is down to 71,112 • (Actually, by reducing the date range further and adding a couple of extra tweaks, now down to 39,270)
  12. 12. What did we learn today? • Visible effects of the impact of RNIB on UK web standards • Sheer presence suggests RNIB was better than its peers at establishing itself on the internet • Google has made us me lazy • An archive without an archivist or a catalogue is highly problematic for researchers The British Library – from Wikicommons

×