SlideShare a Scribd company logo
1 of 22
Digitisation; 
Nuts & bolts at the Wellcome 
Library 
In the picture: getting the most out of images inside & outside your collection. 
CILIP, September 2014 
Dave Thompson 
Digital Curator, Wellcome Library
The Wellcome Library 
• Part of Wellcome Collection, astonishing public 
venue in London developed by the Wellcome 
Trust. Where people can learn more about 
medicine through the ages & across cultures. 
• More than 10,000 readers visit us each year, 
including historians, academics, students, health 
professionals & consumers, journalists, artists & 
members of the general public. 
Harvesting 
Harvesting
Digitisation in the Wellcome Library 
• Strategic approach, conscious planned decisions. 
• Library transformation strategy, physical to digital. 
• From ‘project’ to ‘production’. 
• Digitisation as a sustainable end-to-end process. 
• Sustainable activity delivering access to content.
Overview - three IT systems… 
1. Workflow management system – ‘Goobi’ = 
PRODUCTION. 
2. Digital object repository – ‘Preservica’ = 
STORAGE. 
3. Front end - ‘the player’ = ACCESS. 
Remember, this doesn’t include cataloguing or bibliographic systems. Here 
we’re just talking about the process of creating, storing & delivering digital 
content. You have to assume that those other systems are also in place.
Goobi is our core digitisation system 
• Goobi can be used to normalise image formats, 
e.g. TIFFs into JPEG2000. 
• Used for reporting, volumes, numbers, etc. 
• Web based, used by all staff involved in 
digitisation. 
• Produces METS files, flexible & standards based. 
Goobi is the primary interface for most staff involved in digitisation. It’s the only 
software that many use, which simplifies training & delivery.
Goobi workflow tracking & management 
• Manages & tracks the production of content. 
• Workflow driven. Already highly automated. 
• Allows us to set very granular access conditions. 
• Scalable & highly adaptable to different projects. 
Goobi has been in production for about 3 years now, it’s already processed 
some 2.5 million images. Content which is publicly available in our player.
Digitisation the steps 
MARC records are imported from Sierra into 
Goobi
Digitisation – enter the humans 
Digitised images are imported into Goobi & 
automatically associated with that metadata 
We use cameras not scanners for better resolution & quicker imaging.
Digitisation – enter the humans 
METS files are created in Goobi
Digitisation – enter the humans 
Goobi initiates ingest of the JPEG2000 
images & metadata in Preservica
Digitisation – enter the humans 
Player pulls images from 
Preservica using metadata in the 
METS file
Goobi – exit the humans 
• Goobi key steps performed by humans. 
• There are high levels of automation, but not 
everything is automated. 
• Ambition is to build fully automated workflows. 
• Scalable & highly adaptable to different projects. 
Remember, humans are still an important part of digitisation. There are some 
decisions that only a human can make, & there will always be a need for 
human driven processes.
Working with digitised content 
Goobi Preservica 
In-house 
Institutions 
Contractors 
Harvesting 
TIFF or JP2 
TIFF or JP2 
HD & ftp 
TIFF or JP2 
Normalises TIFF to 
JP2 
Manual 
Automatic 
Jpylyzer validates JP2 
Auto harvesting of 
JP2 & DMD 
Grey literature 
PDF 
Ingest Officer / Digital Curator 
Snagging 
Snagging
Goobi – 19th century book project 
• Internet Archive (IA) is digitising our 19th century 
books. 
• Content is uploaded by them to the IA website. 
• IA do Optical Character Recognition the books & 
create structure. 
• Goobi harvests the files that the IA create to 
automatically process content. 
http://www.kuka-robotics.com/l
Looking at the IA website 
https://archive.org/details/wellcomelibrary
Looking at the IA website – metadata
How the automation works 
• Goobi builds a process using the MARC record. 
• Against this process it imports the images. 
• Uses the scandata file to create a METS file with 
pagination & structure. 
• Uses the raw Abbyy file to create ALTO files that 
allow us to search for words & highlight search 
term hits. 
http://www.impactautomation.com.au/automation
Here’s the record in our OPAC 
b20422155
Here’s the book in our player
How it all works…
So, to wrap up… 
• Digitisation is a strategic activity. 
• We have built an end-to-end process from 
selection to access. 
• Working at scale so efficiency is important. 
• Integrated in our OPAC. No silos. 
• Well articulated architecture.
Thank you 
Questions now, questions later…? 
Dave Thompson, Digital Curator 
Wellcome Library 
d.thompson@wellcome.ac.uk - @d_n_t 
http://wellcomelibrary.org/

More Related Content

Similar to Dave's Wellcome Library digitisation presentation

Systems and Processes: making order out of chaos
Systems and Processes: making order out of chaosSystems and Processes: making order out of chaos
Systems and Processes: making order out of chaosWellcome Library
 
Goobi in the Wellcome Library
Goobi in the Wellcome LibraryGoobi in the Wellcome Library
Goobi in the Wellcome Librarygoobi_org
 
Wt dnt digitisation_open_day_v9
Wt dnt digitisation_open_day_v9Wt dnt digitisation_open_day_v9
Wt dnt digitisation_open_day_v9Wellcome Library
 
2018.04.06 digitization revolutionized - ffw at seapavaa
2018.04.06   digitization revolutionized - ffw at seapavaa2018.04.06   digitization revolutionized - ffw at seapavaa
2018.04.06 digitization revolutionized - ffw at seapavaaTobias Golodnoff
 
UBC Library's Digital Preservation Strategy
UBC Library's Digital Preservation StrategyUBC Library's Digital Preservation Strategy
UBC Library's Digital Preservation StrategyUBC Library
 
Managing Large Scale Digitisation at the Wellcome Library
Managing Large Scale Digitisation at the Wellcome LibraryManaging Large Scale Digitisation at the Wellcome Library
Managing Large Scale Digitisation at the Wellcome LibraryWellcome Library
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryUCD Library
 
Digital Publishing Solution for eBooks
Digital Publishing Solution for eBooksDigital Publishing Solution for eBooks
Digital Publishing Solution for eBooksBora Ünal
 
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara AubryArchiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara AubryBiblioteca Nacional de España
 
Systems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offSystems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offWellcome Library
 
2013 Perforce Collaboration Tour - Procter & Gamble
2013 Perforce Collaboration Tour - Procter & Gamble2013 Perforce Collaboration Tour - Procter & Gamble
2013 Perforce Collaboration Tour - Procter & GamblePerforce
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Big Data Spain
 
The Poznań Foundation of Scientific Libraries - Gorny et Lewandowski
The Poznań Foundation of Scientific Libraries  - Gorny et LewandowskiThe Poznań Foundation of Scientific Libraries  - Gorny et Lewandowski
The Poznań Foundation of Scientific Libraries - Gorny et LewandowskiIMPACT Centre of Competence
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research dataARDC
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoopcneudecker
 
Shortening the feedback loop
Shortening the feedback loopShortening the feedback loop
Shortening the feedback loopJosh Baer
 
Collaborative Working: University of Sunderland & Roundhouse Digital
Collaborative Working: University of Sunderland & Roundhouse Digital Collaborative Working: University of Sunderland & Roundhouse Digital
Collaborative Working: University of Sunderland & Roundhouse Digital Terminalfour
 

Similar to Dave's Wellcome Library digitisation presentation (20)

Systems and Processes: making order out of chaos
Systems and Processes: making order out of chaosSystems and Processes: making order out of chaos
Systems and Processes: making order out of chaos
 
Goobi in the Wellcome Library
Goobi in the Wellcome LibraryGoobi in the Wellcome Library
Goobi in the Wellcome Library
 
Wt dnt digitisation_open_day_v9
Wt dnt digitisation_open_day_v9Wt dnt digitisation_open_day_v9
Wt dnt digitisation_open_day_v9
 
2018.04.06 digitization revolutionized - ffw at seapavaa
2018.04.06   digitization revolutionized - ffw at seapavaa2018.04.06   digitization revolutionized - ffw at seapavaa
2018.04.06 digitization revolutionized - ffw at seapavaa
 
UBC Library's Digital Preservation Strategy
UBC Library's Digital Preservation StrategyUBC Library's Digital Preservation Strategy
UBC Library's Digital Preservation Strategy
 
Managing Large Scale Digitisation at the Wellcome Library
Managing Large Scale Digitisation at the Wellcome LibraryManaging Large Scale Digitisation at the Wellcome Library
Managing Large Scale Digitisation at the Wellcome Library
 
Carpe Digital
Carpe DigitalCarpe Digital
Carpe Digital
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital Library
 
Digital Publishing Solution for eBooks
Digital Publishing Solution for eBooksDigital Publishing Solution for eBooks
Digital Publishing Solution for eBooks
 
Ai Library
Ai LibraryAi Library
Ai Library
 
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara AubryArchiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
 
DLCS
DLCSDLCS
DLCS
 
Systems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offSystems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling off
 
2013 Perforce Collaboration Tour - Procter & Gamble
2013 Perforce Collaboration Tour - Procter & Gamble2013 Perforce Collaboration Tour - Procter & Gamble
2013 Perforce Collaboration Tour - Procter & Gamble
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
The Poznań Foundation of Scientific Libraries - Gorny et Lewandowski
The Poznań Foundation of Scientific Libraries  - Gorny et LewandowskiThe Poznań Foundation of Scientific Libraries  - Gorny et Lewandowski
The Poznań Foundation of Scientific Libraries - Gorny et Lewandowski
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research data
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
 
Shortening the feedback loop
Shortening the feedback loopShortening the feedback loop
Shortening the feedback loop
 
Collaborative Working: University of Sunderland & Roundhouse Digital
Collaborative Working: University of Sunderland & Roundhouse Digital Collaborative Working: University of Sunderland & Roundhouse Digital
Collaborative Working: University of Sunderland & Roundhouse Digital
 

More from LSG South, a regional subgroup of CILIP LSG (10)

4 Delving into Dowty
4 Delving into Dowty4 Delving into Dowty
4 Delving into Dowty
 
Tate anno tate_cilip_presentation
Tate anno tate_cilip_presentationTate anno tate_cilip_presentation
Tate anno tate_cilip_presentation
 
Transcription protocol v.1.2
Transcription protocol v.1.2Transcription protocol v.1.2
Transcription protocol v.1.2
 
Darnley presentation
Darnley presentationDarnley presentation
Darnley presentation
 
2018 03-21 davy mooc
2018 03-21 davy mooc2018 03-21 davy mooc
2018 03-21 davy mooc
 
A tour around library image projects
A tour around library image projectsA tour around library image projects
A tour around library image projects
 
Merton Memories Commentary text.
Merton Memories Commentary text.Merton Memories Commentary text.
Merton Memories Commentary text.
 
Merton Memories Presentation
Merton Memories PresentationMerton Memories Presentation
Merton Memories Presentation
 
Bracknell Forest Flickr Digitisation project presentation
Bracknell Forest Flickr Digitisation project presentationBracknell Forest Flickr Digitisation project presentation
Bracknell Forest Flickr Digitisation project presentation
 
Britain from the Air presentation
Britain from the Air presentationBritain from the Air presentation
Britain from the Air presentation
 

Dave's Wellcome Library digitisation presentation

  • 1. Digitisation; Nuts & bolts at the Wellcome Library In the picture: getting the most out of images inside & outside your collection. CILIP, September 2014 Dave Thompson Digital Curator, Wellcome Library
  • 2. The Wellcome Library • Part of Wellcome Collection, astonishing public venue in London developed by the Wellcome Trust. Where people can learn more about medicine through the ages & across cultures. • More than 10,000 readers visit us each year, including historians, academics, students, health professionals & consumers, journalists, artists & members of the general public. Harvesting Harvesting
  • 3. Digitisation in the Wellcome Library • Strategic approach, conscious planned decisions. • Library transformation strategy, physical to digital. • From ‘project’ to ‘production’. • Digitisation as a sustainable end-to-end process. • Sustainable activity delivering access to content.
  • 4. Overview - three IT systems… 1. Workflow management system – ‘Goobi’ = PRODUCTION. 2. Digital object repository – ‘Preservica’ = STORAGE. 3. Front end - ‘the player’ = ACCESS. Remember, this doesn’t include cataloguing or bibliographic systems. Here we’re just talking about the process of creating, storing & delivering digital content. You have to assume that those other systems are also in place.
  • 5. Goobi is our core digitisation system • Goobi can be used to normalise image formats, e.g. TIFFs into JPEG2000. • Used for reporting, volumes, numbers, etc. • Web based, used by all staff involved in digitisation. • Produces METS files, flexible & standards based. Goobi is the primary interface for most staff involved in digitisation. It’s the only software that many use, which simplifies training & delivery.
  • 6. Goobi workflow tracking & management • Manages & tracks the production of content. • Workflow driven. Already highly automated. • Allows us to set very granular access conditions. • Scalable & highly adaptable to different projects. Goobi has been in production for about 3 years now, it’s already processed some 2.5 million images. Content which is publicly available in our player.
  • 7. Digitisation the steps MARC records are imported from Sierra into Goobi
  • 8. Digitisation – enter the humans Digitised images are imported into Goobi & automatically associated with that metadata We use cameras not scanners for better resolution & quicker imaging.
  • 9. Digitisation – enter the humans METS files are created in Goobi
  • 10. Digitisation – enter the humans Goobi initiates ingest of the JPEG2000 images & metadata in Preservica
  • 11. Digitisation – enter the humans Player pulls images from Preservica using metadata in the METS file
  • 12. Goobi – exit the humans • Goobi key steps performed by humans. • There are high levels of automation, but not everything is automated. • Ambition is to build fully automated workflows. • Scalable & highly adaptable to different projects. Remember, humans are still an important part of digitisation. There are some decisions that only a human can make, & there will always be a need for human driven processes.
  • 13. Working with digitised content Goobi Preservica In-house Institutions Contractors Harvesting TIFF or JP2 TIFF or JP2 HD & ftp TIFF or JP2 Normalises TIFF to JP2 Manual Automatic Jpylyzer validates JP2 Auto harvesting of JP2 & DMD Grey literature PDF Ingest Officer / Digital Curator Snagging Snagging
  • 14. Goobi – 19th century book project • Internet Archive (IA) is digitising our 19th century books. • Content is uploaded by them to the IA website. • IA do Optical Character Recognition the books & create structure. • Goobi harvests the files that the IA create to automatically process content. http://www.kuka-robotics.com/l
  • 15. Looking at the IA website https://archive.org/details/wellcomelibrary
  • 16. Looking at the IA website – metadata
  • 17. How the automation works • Goobi builds a process using the MARC record. • Against this process it imports the images. • Uses the scandata file to create a METS file with pagination & structure. • Uses the raw Abbyy file to create ALTO files that allow us to search for words & highlight search term hits. http://www.impactautomation.com.au/automation
  • 18. Here’s the record in our OPAC b20422155
  • 19. Here’s the book in our player
  • 20. How it all works…
  • 21. So, to wrap up… • Digitisation is a strategic activity. • We have built an end-to-end process from selection to access. • Working at scale so efficiency is important. • Integrated in our OPAC. No silos. • Well articulated architecture.
  • 22. Thank you Questions now, questions later…? Dave Thompson, Digital Curator Wellcome Library d.thompson@wellcome.ac.uk - @d_n_t http://wellcomelibrary.org/

Editor's Notes

  1. dnt
  2. dnt
  3. dnt
  4. dnt
  5. dnt
  6. dnt
  7. dnt
  8. dnt
  9. dnt