Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
News data at the
British Library
Luke McKernan
Lead Curator, News and Moving Image
Working with news data across different...
www.bl.uk 2
Map of news stories in the UK as read via Twitter (created using bit.ly links), Guardian Datablog, 16 May 2012...
www.bl.uk 3
 Moving from a world-class newspaper service to a world-
class news service
 Newspapers, television, radio a...
www.bl.uk 4
Newspapers
 The UK national collection
 34,000 newspaper titles: approximately 60M issues or
450M individual...
www.bl.uk 5
Television and radio news
 Began recording television and radio news
programmes receivable in the UK in May 2...
www.bl.uk 6
Web news
 Non-print legal deposit legislation introduced in April
2013 means British Library can start harves...
www.bl.uk 7
Our news research services
Explore.bl.uk The Newsroom Boston Spa reading room
British Newspaper Archive UK Web...
www.bl.uk 8
News data
 2M 19thC British newspaper pages – XML, images
 UK television news data 2010 onwards – EPG data f...
www.bl.uk 9
Plans
 All out-of-copyright UK newspapers on British
Newspaper Archive, issue level data for research re-
use...
www.bl.uk 10
Dreams
 An open news dataset
 An archive news data model
 All British Library news records available at
is...
www.bl.uk 11
Questions
 Copyright constraints limit use of much material to BL
premises – how can tools such as named ent...
www.bl.uk 12
Email: luke.mckernan@bl.uk
Twitter: @BL_newsroom
Web: http://bl.uk/subjects/news-media
Blog: http://britishli...
Upcoming SlideShare
Loading in …5
×

News data at the British Library

574 views

Published on

A presentation by Luke McKernan, Lead Curator News & Moving Image at the British Library, for the workshop 'Working with News Data across Different Media', 7 September 2015

Published in: News & Politics
  • Be the first to comment

  • Be the first to like this

News data at the British Library

  1. 1. News data at the British Library Luke McKernan Lead Curator, News and Moving Image Working with news data across different media 7 September 2015
  2. 2. www.bl.uk 2 Map of news stories in the UK as read via Twitter (created using bit.ly links), Guardian Datablog, 16 May 2012 Changing news
  3. 3. www.bl.uk 3  Moving from a world-class newspaper service to a world- class news service  Newspapers, television, radio and Web news  Reflection of the significant changes in news production and consumption taking place today, but it also reflects how news has always been consumed  News does not exist in any one form. It is sought out and selected by its users, from the multiple forms of information on offer  A change in how we manage news data is an essential part of how to deliver such change  “News is information of current interest for a specific audience” News content strategy The Newcastle Courant, The Huffington Post, Today, Al Jazeera English
  4. 4. www.bl.uk 4 Newspapers  The UK national collection  34,000 newspaper titles: approximately 60M issues or 450M individual pages, from 17thC to present day  Current acquisition: 1,500 daily or weekly titles  Print copies acquired under legal deposit but will move increasingly towards digital acquisition  Physical access at Newsroom and Boston Spa  Online access to 11M pages via British Newspaper Archive (http://www.britishnewspaperarchive.com)  Approximately third of collection has microfilm access copies; around 2.5% has been digitised so far British Newspaper Archive
  5. 5. www.bl.uk 5 Television and radio news  Began recording television and radio news programmes receivable in the UK in May 2010  Collection of over 60,000 programmes, recorded off-air from 20 channels inc. BBC, Al-Jazeera, Russia Today, CNN, CCTV (China), NHK, Bloomberg, France 24, World Service, LBC  30 hours of TV and 22 hours of radio captured per day  Born digital archive, including Electronic Programme Guide data and subtitles where available  Access onsite only, owing to copyright restrictions, via Broadcast News service Broadcast News
  6. 6. www.bl.uk 6 Web news  Non-print legal deposit legislation introduced in April 2013 means British Library can start harvesting UK websites  First annual crawl collected 4.5M .uk websites and web pages – collection now amounts to around 3Bn digital assets  Harvesting c.1000 UK news websites (newspapers and web-only sites e.g. hyperlocals) on daily/weekly basis, from end of 2013, with another 500 to be added soon  Access onsite only at British Library and other Legal Deposit libraries  Also Open UK Web Archive, smaller collection of selected websites, openly available at http://www.webarchive.org.uk UK Web Archive
  7. 7. www.bl.uk 7 Our news research services Explore.bl.uk The Newsroom Boston Spa reading room British Newspaper Archive UK Web Archive Broadcast News
  8. 8. www.bl.uk 8 News data  2M 19thC British newspaper pages – XML, images  UK television news data 2010 onwards – EPG data for 45,000 programmes, subtitles (XML) for c.25,000 programmes, some speech-to-text files for 2011 broadcasts (XML)  UK radio news data 2010 onwards – EPG data for 15,000 programmes, some speech-to-text files for 2011 broadcasts (XML)  Financial Times – four years of content (1888, 1939, 1966, 1991) – XML, images  Web news selection – possibly Financial Times, 1893 and 2008
  9. 9. www.bl.uk 9 Plans  All out-of-copyright UK newspapers on British Newspaper Archive, issue level data for research re- use, covered by single agreement, available through an API. Possibly…  Title-level data for all newspapers we hold (34,000 titles) released as open data  More partner initiatives  Hackathon on 16 November 2015, to be followed by other news data events in 2016  User-led development BBC radio news script, 14/7/1969
  10. 10. www.bl.uk 10 Dreams  An open news dataset  An archive news data model  All British Library news records available at issue level Hyperlocal news sites: On the Wight, The City Talking, A Little Bit of Stone
  11. 11. www.bl.uk 11 Questions  Copyright constraints limit use of much material to BL premises – how can tools such as named entity extraction work as a means to get round this?  How can print, web, television, radio news, and other news media, be linked up together, and to other resources, and how would this benefit research?  What research questions will we be able to support through a greater focus on news data?  Is news data only for the specialist, or can more general user-friendly applications be produced?  What can news archives learn from the management tools for current news?  How can we help each other? TV news idents
  12. 12. www.bl.uk 12 Email: luke.mckernan@bl.uk Twitter: @BL_newsroom Web: http://bl.uk/subjects/news-media Blog: http://britishlibrary.typepad.co.uk/thenewsroom Contact

×