Successfully reported this slideshow.

ArAcAi - The Problem of Access: Prototyping a Researcher Dashboard for the UK Government Web Archive

Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 23
1 of 23

ArAcAi - The Problem of Access: Prototyping a Researcher Dashboard for the UK Government Web Archive

Download to read offline

Web archives and the problem of access: prototyping a researcher dashboard for the UK Government Web Archive
By Mark Bell, Tom Storrar and Jane Winters
January 2020

Web archives and the problem of access: prototyping a researcher dashboard for the UK Government Web Archive
By Mark Bell, Tom Storrar and Jane Winters
January 2020

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

ArAcAi - The Problem of Access: Prototyping a Researcher Dashboard for the UK Government Web Archive

  1. 1. Web archives and the problem of access: prototyping a researcher dashboard for the UK Government Web Archive Mark Bell, Tom Storrar and Jane Winters 15 January 2020
  2. 2. The National Archives is the official archive of UK government: collecting, preserving and giving access to 1,000 years of history Alongside paper and digitised records, our and web archive collections are growing rapidly and the UKGWA is our largest collection: ■ 1996 to present: over 23 years of government websites and social media ■ 6 billion resources, 150TB+ (compressed) data ■ It has gov.uk domains but lots more, too - wherever government hosts content (at present, over 800 websites!) The UK Government Web Archive (UKGWA)
  3. 3. The UKGWA is openly available and well-used Typical routes into the content of the collection include: ■ Through Google and other search engines ■ Redirection to it from government websites or from references to historic documents within other documents ■ Direct “research sessions” - often returning users who have a specific information need. They will often use our search service: https://webarchive.nationalarchives.gov.uk/search/ ■ Increasing use of the collection “as data” - but this is challenging in a number of ways Use of the UKGWA
  4. 4. What do researchers want to do with the UKGWA? ❏ Essential primary source for the history of the late 20th and early 21st century (mid 1990s to the present day) ❏ Record of government (central and local) and its interactions with its citizens online ❏ Need to understand both its scope and its scale, and this means moving beyond keyword searching (the default for many humanities researchers) ❏ Gain insight into the collection processes, how these have changed over time, and the factors that have influenced when and how data is harvested (these are patchwork or ‘Frankenstein’ archives)
  5. 5. ❏ Extract different kinds of data from the archive (text, images, remove navigation etc.) ❏ Analyse trends in the data, e.g. cultural and linguistic change ❏ Study online networks of government and the flow of information between and within departments ❏ Deploy visualisation to aid navigation and analysis (macro- and micro-level) What do researchers want to do with the UKGWA? Elevation for clock dial for Big Ben tower
  6. 6. Web archiving as collaboration ❏ The challenges posed by web archives (for researchers, web archivists and research software engineers) are too complex to be solved by individuals or organisations working on their own ❏ Researchers need web archivists, and web archivists need researchers ❏ Through collaboration, we can develop a robust community of practice and knowledge ❏ We can argue for enhanced access to web archives, for researchers and the wider public ❏ We can experiment, innovate and sometimes fail ❏ We can make the case for greater investment in web archiving (and in web archiving institutions)
  7. 7. Dashboard basics
  8. 8. Rise and Fall of the Web
  9. 9. What are we analysing? - Macroscopic view Archive -> Domain -> Sub-domain -> Page -> Resource
  10. 10. What are we analysing? - Content History of salt The craving for salt Human beings have an intimate relationship with salt. Our tears, blood and sweat taste of salt. The chemical reactions inside our bodies need sodium - one of the two elements that make up salt (with chloride). We can't survive without sodium, but it was about five million years before humans began to eat their sodium as salt. Hunters in Greenland ate no salt until they were introduced to it by whaling Europeans in the 17th century. Like our prehistoric forebears, Lapps, Samoyeds, Kirghiz, Bedouin, Masai and Zulus used to consume all the sodium they needed from the animals and fish they ate. Agriculture and salt Archaeologists believe that salt eating developed as humans learned how to keep animals and grow crops in the years after 10,000 BC. As the proportion of meat in their diet fell, people had to find salt for themselves and for their domesticated animals. Content
  11. 11. What is content? Content
  12. 12. What are we analysing? - Page Structure
  13. 13. What are we analysing? - Site Structure https://webarchive.nationalarchives.gov.uk/20190102181627/https://www.gov.uk/guidance/cartels-confess-and-apply-for- leniency Warning: This doesn’t exist!
  14. 14. Topic Modelling 0 : research councils council innovation rcuk funding public government review business executive working training development work group 1 : museum maritime national greenwich royal nmm time london observatory family house rights world visit reserved events 2 : day information fruit local health navigation legal school scheme contact children vegetable healthy vegetables department content 3 : ocr science information gateway aqa including edexcel chemistry physics teachers webpage wjec teaching revision gcse century 4 : science triple learning support resources latest physics students schools programme teaching teachers gcse resource feedback comments 5 : food eat foods people bacteria meat fish agency fridge don standards raw cooked pregnant date find 6 : army museum national british war general nam enquiries pm services quick britain follow world field soldiers 7 : salt eat fruit foods fat food high good eating day milk diet children vitamin vegetables healthy
  15. 15. Doc2Vec - Like word2vec but with documents ● Find similar documents ● Group documents together ● Enable semantic search
  16. 16. Document Summarisation
  17. 17. Scale reduction Home Page Sub-section BSub-section A Page 1 Page 2 Page 3 Page 4 PDF 1 PDF 2 10s of millions 1000s Home Page Sub-section A Sub-section B Sub-section A Page 1 Page 2
  18. 18. Change over time Content Structure Static Dormant
  19. 19. Components of a dashboard Select sites for analysis: manual or by similarityScope Granularity Time Content/ Structure £ Export Level to perform analysis: archive, domain, page Filter by time period: state at time; activity during period Compare change in one set of sites with another Charges: paying for computation Exporting results and visualisations Compare Analyse by content or structure (page, site, network) Visualise Charts, networks, word clouds etc.
  20. 20. Web archives are created through actions, decisions, both human and machine. Human actions involve decisions on when and how to capture a resource or a website but also why. Data on this is kept as part of the archive but most of it is not public. Machines make decisions based on the parameters or rules they are provided by human actors. We can add trust and transparency to this process by revealing as much of this as we can to our users. We can commit to publishing this knowledge but publishing in a way that adds to users’ comprehension of the web archive it a challenge. Static datasets (csv) are a start, leading to queryable ones (APIs…) Key Context on the creation of the UKGWA
  21. 21. We’re not alone; we are part of a vibrant community of web archives and researchers. We are taking inspiration (and code!) from the great work being done by Archives Unleashed, the Internet Archive, the British Library and many others. We’ve also been gaining more and more hands-on experience of running research projects using UKGWA data, for example, recently: ■ Alan Turing Institute Data Challenge - Identifying Topics and Trends (December 2019) ■ CAS Network Analysis Workshop (June 2019) These are crucial to our work and there are many more are to come! Collaborate!
  22. 22. ❏ Bring stakeholders together regularly (workshops, hackathons etc.) ❏ A wide range of skills and expertise are required but some interventions can lower barriers ❏ Artificial intelligence is already helping us to explore web archives, and will continue to transform access ❏ … but it is not enough on its own Conclusion Wartime storage of documents in the Long Gallery at Haddon Hall
  23. 23. Colossus electronic digital computer

×