Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data visualisation workshop


Published on

Slides accompanying newspaper data visualisation workshop, organised by British Library and London College of Communication, 2 October 2019

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Data visualisation workshop

  1. 1. Newspaper Data Visualisation Workshop 2 October 2019 British Library & London College of Communication
  2. 2. Newspapers at the British Library • The national collection, from 1619 to present day • Over 34,000 titles, or 60 million individual issues, or 450 million pages • 25,000 titles from the UK and Ireland • 34 million pages are available at • To date, selection of newspaper titles for digitization undertaken by Findmypast • To date, all data created through digitisation is owned by Findmypast 2
  3. 3. Heritage Made Digital Newspapers 3 • British Library project to digitise newspapers for itself, alongside Findmypast operation • Target – 1.3 million pages by 2022 • Over 200 titles, with focus on poor/unfit titles published in London • Titles will all be made available on British Newspaper Archive • All titles will eventually be openly available online • All metadata, including derived data (OCR, entities) owned by British Library and to be made openly available, with other newspaper data to follow
  4. 4. Our newspaper data plans 4 • To encourage multiple uses of data derived from newspapers • Treating newspaper data as a ‘collection’ in its own right Outputs • Bibliographical list of all BL UK and Irish newspapers • HMD and other newspaper data openly available through BL’s new digital repository • Visualisations of the collection to help explore and understand it Users • Academics using ‘big data’ for new kinds of research • General users unfamiliar with data or easy-to-use tools • Creatives
  5. 5. • We aim to develop a series of workshops with London College of Communication, which will integrate newspaper data analytics with creative design • Today is a trial workshop to test ideas • The wider goal is to help researchers understand how to visualise the complexities of the Library’s newspaper collection 5 Workshop goals
  6. 6. Python Programming language, becoming the industry standard for data analytics 6 Programming tools we use R Programming language with good visualisation packages Jupyter Notebooks & JupyterHub Tools which allow for interactive and shareable Python code e.g. BL Labs version coming soon
  7. 7. • Learn how to use these at: • • • Watch out for Adult Learning courses at British Library next Autumn 7 Programming tools we use
  8. 8. Voyant Easy-to-use web application for text mining 8 Visualisation tools we use Palladio Online tool for visualising structured historical data (people and places)
  9. 9. • Named Entity Recognition using the Python library NLTK • Named Entity Recognition is the term for a range of methods for detecting certain types of words in text, such as people, places and organisations • Can be used to analyse the prominence of certain people or places in a large dataset 9 Methods used for today
  10. 10. • Simple text mining with R and the library tidytext • Counting word frequencies can help us to ‘see inside’ large amounts of text, and detect patterns we might otherwise miss • We might use it to understand the focus of a particular newspaper title, or to compare reporting over different chunks of time 10 Methods used for today
  11. 11. Thanks for attending, we really value your participation and your thoughts 11