Implementing Digital Preservation Strategy: Collection Profiling at the British Library 
Michael Day, Akiko Kimura, Maureen Pennock The British Library 
Ann MacDonald University of Kent, Canterbury 
Michael.Day@bl.uk 
Digital Libraries 2014, London, 8-12 September 2014
www.bl.uk 
2 
Presentation outline 
•The British Library context 
•Identifying high-level collection types 
•Developing a framework for collection profiling 
•Some challenges
www.bl.uk 
3 
The British Library context (1) 
•The British Library is increasingly a digital library 
–The result of digitisation activities and partnerships that have been operating over many years, covering many different content streams, e.g. books, newspapers, maps, sound content, manuscripts and archives 
–The collection of ‘born-digital’ content, initially through negotiation or voluntary deposit, e.g. geographical data, personal archives 
–Since April 2013, facilitated through the Legal Deposit Libraries (Non-Print Works) Regulations 2013, e.g. for collection of e-journals, eBooks and Web content (domain harvesting)
www.bl.uk 
4 
The British Library context (2) 
•Infrastructures 
–Investment in the means to acquire, store and manage large amounts of digital content, the Digital Library System (DLS) 
•Strategies 
–British Library Content Strategy, 2013-2015 
–British Library Digital Preservation Strategy, 2013-2016 
•Digital Preservation Team 
–Established in 2005, now part of Collection Management 
–Enabling the implementation of appropriate and timely preservation practices across the Library
www.bl.uk 
5 
The British Library context (3) 
•Documenting preservation requirements, includes: 
–What content do we have and what is important about it? (collection profiling) 
–Assessments of file formats, preservation tools, workflows, etc. to inform preservation planning 
•Aims of collection profiling: 
–Documenting key knowledge about the Library’s top-tier digital collections 
–Considering preservation requirements / preservation intent for these collections 
–A tool for liaising with curators and collection specialists
www.bl.uk 
6 
Identifying high-level collection types (1)
www.bl.uk 
7 
Identifying high-level collection types (2) 
•Taxonomy of high-level collections: 
–No standard list of content types 
–Various lists are available on the Library website and catalogue, but they are inconsistent 
–Needed a pragmatic starting point for collection profiling: 
•Developed a new taxonomy based on the Library’s existing digital asset register 
•Attempted to identify logical groupings, e.g. ignoring distinctions between digitised and ‘born-digital’ content, where possible 
•It is not perfect (subject to revision)
www.bl.uk 
8 
Identifying high-level collection types (3) 
Type 
Collection 
News Content 
Digitised newspapers 
Web content 
Sound recordings 
Born-digital newspapers 
Books 
Digitised printed books 
NDLP eBooks 
Voluntary deposit 
Manuscripts / Archives 
Digitised Manuscripts 
Digitised archives 
Personal digital archives 
Music 
Digitised Music Collections 
Sheet Music 
Maps 
Digital mapping supplied by Ordnance Survey (GIS) 
Digitised maps
www.bl.uk 
9 
Identifying high-level collection types (4) 
Type 
Collection 
Academic journals 
NPLD eJournals 
Voluntary deposit e-Journals 
Subscription e-Journals 
Voluntary deposit 
Theses 
eTheses 
Patents 
Patent databases 
Web archives 
Open UK Web Archive 
Legal Deposit UK Web Archive 
Sound / multimedia 
Archive sound recordings 
Sound Archive (field recordings) 
Digitised sound / video 
Stamps 
Digitised stamps 
Photographs 
Digitised photographs 
Printed ephemera 
Digitised ephemera
www.bl.uk 
10 
Collection profile framework (1) 
•Main inspirations: 
–MIT Libraries - Digital Content Reviews (DCR) for Life Cycle Management 
–Purdue University Libraries - Data Curation Profiles 
–National Library of Australia - Preservation Intent Statements
www.bl.uk 
11 
Collection profile framework (2) 
Summary 
Content Type (from high-level list). 
Brief Description. 
Location. 
Curators / collection owners. 
Interviews held. 
Legal Deposit status. 
Creation status. 
Accrual status. 
Number of digital objects (approximate). 
Background 
An introduction to the content type, providing background on the collection/s covered by the profile. 
Acquisition 
Identifying the main current acquisition routes for collection content. 
Preservation Intent 
Summary of points agreed by curators / content owners, identifying the main characteristics of collections that will need to be preserved. 
Acquisition Format 
Identifying the main formats currently being acquired (where collections are complex, this does not need to be exhaustive). 
Issues 
Highlighting any specific current challenges. 
Profile Metadata 
Information about the completed collection profile itself, e.g. identifying creators, dates, and status / version number.
www.bl.uk 
12 
Current draft profiles 
•Draft profiles worked on so far: 
–e-Journals (including Legal Deposit content) 
–eBooks (including Legal Deposit content) 
–Web content (including Legal Deposit content) 
–Archives and manuscripts 
–News content 
–eTheses
www.bl.uk 
13 
Some challenges 
•Collections 
–The complexity of collections - many are modular aggregations of many other kinds of content, e.g. text, images, video, sound, games, software, data, etc. 
–Rapidly changing user-expectations - it is difficult to specify preservation intent (and it will change over time) 
•Collection profiling and preservation planning 
–Integrating collection profiling with other digital preservation activities 
–Collection profiling is just a starting point (profiles will need to be reviewed on a regular basis)
www.bl.uk 
14 
Thank you

Implementing digital preservation strategy: collection profiling at the British Library

  • 1.
    Implementing Digital PreservationStrategy: Collection Profiling at the British Library Michael Day, Akiko Kimura, Maureen Pennock The British Library Ann MacDonald University of Kent, Canterbury Michael.Day@bl.uk Digital Libraries 2014, London, 8-12 September 2014
  • 2.
    www.bl.uk 2 Presentationoutline •The British Library context •Identifying high-level collection types •Developing a framework for collection profiling •Some challenges
  • 3.
    www.bl.uk 3 TheBritish Library context (1) •The British Library is increasingly a digital library –The result of digitisation activities and partnerships that have been operating over many years, covering many different content streams, e.g. books, newspapers, maps, sound content, manuscripts and archives –The collection of ‘born-digital’ content, initially through negotiation or voluntary deposit, e.g. geographical data, personal archives –Since April 2013, facilitated through the Legal Deposit Libraries (Non-Print Works) Regulations 2013, e.g. for collection of e-journals, eBooks and Web content (domain harvesting)
  • 4.
    www.bl.uk 4 TheBritish Library context (2) •Infrastructures –Investment in the means to acquire, store and manage large amounts of digital content, the Digital Library System (DLS) •Strategies –British Library Content Strategy, 2013-2015 –British Library Digital Preservation Strategy, 2013-2016 •Digital Preservation Team –Established in 2005, now part of Collection Management –Enabling the implementation of appropriate and timely preservation practices across the Library
  • 5.
    www.bl.uk 5 TheBritish Library context (3) •Documenting preservation requirements, includes: –What content do we have and what is important about it? (collection profiling) –Assessments of file formats, preservation tools, workflows, etc. to inform preservation planning •Aims of collection profiling: –Documenting key knowledge about the Library’s top-tier digital collections –Considering preservation requirements / preservation intent for these collections –A tool for liaising with curators and collection specialists
  • 6.
    www.bl.uk 6 Identifyinghigh-level collection types (1)
  • 7.
    www.bl.uk 7 Identifyinghigh-level collection types (2) •Taxonomy of high-level collections: –No standard list of content types –Various lists are available on the Library website and catalogue, but they are inconsistent –Needed a pragmatic starting point for collection profiling: •Developed a new taxonomy based on the Library’s existing digital asset register •Attempted to identify logical groupings, e.g. ignoring distinctions between digitised and ‘born-digital’ content, where possible •It is not perfect (subject to revision)
  • 8.
    www.bl.uk 8 Identifyinghigh-level collection types (3) Type Collection News Content Digitised newspapers Web content Sound recordings Born-digital newspapers Books Digitised printed books NDLP eBooks Voluntary deposit Manuscripts / Archives Digitised Manuscripts Digitised archives Personal digital archives Music Digitised Music Collections Sheet Music Maps Digital mapping supplied by Ordnance Survey (GIS) Digitised maps
  • 9.
    www.bl.uk 9 Identifyinghigh-level collection types (4) Type Collection Academic journals NPLD eJournals Voluntary deposit e-Journals Subscription e-Journals Voluntary deposit Theses eTheses Patents Patent databases Web archives Open UK Web Archive Legal Deposit UK Web Archive Sound / multimedia Archive sound recordings Sound Archive (field recordings) Digitised sound / video Stamps Digitised stamps Photographs Digitised photographs Printed ephemera Digitised ephemera
  • 10.
    www.bl.uk 10 Collectionprofile framework (1) •Main inspirations: –MIT Libraries - Digital Content Reviews (DCR) for Life Cycle Management –Purdue University Libraries - Data Curation Profiles –National Library of Australia - Preservation Intent Statements
  • 11.
    www.bl.uk 11 Collectionprofile framework (2) Summary Content Type (from high-level list). Brief Description. Location. Curators / collection owners. Interviews held. Legal Deposit status. Creation status. Accrual status. Number of digital objects (approximate). Background An introduction to the content type, providing background on the collection/s covered by the profile. Acquisition Identifying the main current acquisition routes for collection content. Preservation Intent Summary of points agreed by curators / content owners, identifying the main characteristics of collections that will need to be preserved. Acquisition Format Identifying the main formats currently being acquired (where collections are complex, this does not need to be exhaustive). Issues Highlighting any specific current challenges. Profile Metadata Information about the completed collection profile itself, e.g. identifying creators, dates, and status / version number.
  • 12.
    www.bl.uk 12 Currentdraft profiles •Draft profiles worked on so far: –e-Journals (including Legal Deposit content) –eBooks (including Legal Deposit content) –Web content (including Legal Deposit content) –Archives and manuscripts –News content –eTheses
  • 13.
    www.bl.uk 13 Somechallenges •Collections –The complexity of collections - many are modular aggregations of many other kinds of content, e.g. text, images, video, sound, games, software, data, etc. –Rapidly changing user-expectations - it is difficult to specify preservation intent (and it will change over time) •Collection profiling and preservation planning –Integrating collection profiling with other digital preservation activities –Collection profiling is just a starting point (profiles will need to be reviewed on a regular basis)
  • 14.