Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Uk Web Archive - How researchers are using it?

Presentation by Jason Webber, Web Archive Engagement Manager. Presentation at ALISS xmas event 2019

  • Login to see the comments

  • Be the first to like this

Uk Web Archive - How researchers are using it?

  1. 1. UK Web Archive – How researchers are using it? Jason Webber Web Archive Engagement Manager
  2. 2. www.bl.uk 2 What is the UK Web Archive?
  3. 3. www.bl.uk 3 What we aim to do: Capture and preserve the entire UK Web Space
  4. 4. www.bl.uk 4 What do we collect? Millions of websites and billions of individual assets every year.
  5. 5. www.bl.uk 5 What don’t we collect? • Email • Intranets • Anything behind a personal user login • Flash (remember that??) • Youtube, Soundcloud, etc. • Facebook, Instagram, snapchat (some twitter)
  6. 6. www.bl.uk 6 When did we start? • Selective collecting since 2005 – thousands of websites (15th anniversary in 2020!) • Bulk archiving since 2013 – millions of websites
  7. 7. www.bl.uk 7 How is it stored?
  8. 8. www.bl.uk 8 Why do we do this?
  9. 9. www.bl.uk 9 1996 2001 2006 2011 2016 2017
  10. 10. www.bl.uk 10
  11. 11. www.bl.uk 11 How do we ‘archive’ the web? 1. Identify targets (websites) ‘in scope’ (UK) for capture 2. Send out ‘crawl’ bots 3. Download websites into WARC files 4. Index the collection (so all the words can be searched) 5. ‘Playback’ via a website interface (webarchive.org.uk) 6. Carry out Quality Assurance (QA) on some content 7. Request open access licence for some websites
  12. 12. www.bl.uk 12 What do you get? Archived webpages should look and operate as they did originally
  13. 13. www.bl.uk 13
  14. 14. www.bl.uk 14
  15. 15. www.bl.uk 15 Mixed access • All of the collection can be discovered via the main website www.webarchive.org.uk • Some of the websites themselves can be viewed from anywhere • Most of the content can only be viewed in Legal Deposit Library reading rooms (on library terminals).
  16. 16. www.bl.uk 16 www.webarchive.org.uk
  17. 17. www.bl.uk 17 Search
  18. 18. www.bl.uk 18 Topics and Themes
  19. 19. www.bl.uk 19 Researchers • Caio Mello • Liam Markey • Emmanouil Tranos • Barbara McGilivary
  20. 20. www.bl.uk 20 Olympics legacy – Caio Mello The goal is to detect the kind of legacy that had been covered by the media and also the sentiments behind the articles. blogs.bl.uk/webarchive/2019/12/what-is-left-behind-exploring-the-olympic- games-legacies-through-the-uk-web-archive-.html
  21. 21. www.bl.uk 21 Militarism and its role in the commemoration of British war dead – Liam Markey blogs.bl.uk/webarchive/2019/11/militarism-liam-markey-01.html
  22. 22. www.bl.uk 22 Secondary Datasets
  23. 23. www.bl.uk 23 What Secondary data is available? data.webarchive.org.uk • Format profiles • Geoindex • Host links • Crawled URL index
  24. 24. www.bl.uk 24 What do you get?
  25. 25. www.bl.uk Geographical – Emmanouil Tranos 25 “If you are a quantitative social scientist there are few things more fascinating than free, under-utilised, quirky and easy to download data that also fits well the narrative of 'big data'.” Emmanouil Tranos, University of Birmingham and Alan Turing Fellow
  26. 26. www.bl.uk Geographical – Emmanouil Tranos 26
  27. 27. www.bl.uk Geographical – Emmanouil Tranos 27
  28. 28. www.bl.uk Geographical – Emmanouil Tranos 28
  29. 29. www.bl.uk Geographical – Emmanouil Tranos 29 BL blog post by Emmanouil Tranos Hypothesis “The availability of internet content of local interest can attract people online in order to access and take advantage of the potential on-line opportunities such as accessing local products and services. The first results seem to support our hypothesis.”
  30. 30. www.bl.uk Discovering polarity change in word meanings – Barbara McGilivrary 30 Words can change meaning, and polarity, quite fast. Spectral clustering methods and natural language processing can offer a solution to this problem.
  31. 31. www.bl.uk 31 Your web archive needs YOU! •Save a website! - Nominate •Create a Collection – Get in touch •Tell five friends about the web archive
  32. 32. www.bl.uk 32 www.webarchive.org.uk/ www.webarchive.org.uk/shine jason.webber@bl.uk @UKWebArchive
  33. 33. www.bl.uk 33 Cats – Dogs – Birds
  34. 34. www.bl.uk 34 The force is strong in this one
  35. 35. www.bl.uk 35 War on Terror

×