Goobi in the Wellcome Library Digitisation Roadshow, Linz, Feb 2013 Dave Thompson Digital Curator, Wellcome Library
Goobi in the Wellcome Library • In production March 2012. • 6 Servers running Goobi – test & production. • 11 staff users, some part time. • 1.2 million images processed & available via Library website. • Can upload maximum of <1000 objects into SDB per 24 hrs. • Total space allocated to Goobi is 40tb.
A strategic approach • Library transformation strategy, physical to digital. • From ‘project’ to ‘production’. • Digitisation as a sustainable end-to-end process. • 18 month pilot/implementation project. • Just taken into production.
Diverse sources of content • In-house digitisation. • External contractors. • Contractors working in-house. • External organisations digitising their content for us.
Where did Goobi come from? • Late 2010 early 2011 as plans for developing SDB grew realised that we needed a means of mass import of digital content. • Began to think about high volume production & the management of that. • Early modelling of our systems suggested that we needed a tool to manage production of content. • Began looking at workflow tracking systems.
Perceived benefits of Goobi • Web based distributed access to concurrent users. • Flexible workflow based processing, managed through ‘Projects’. • Workflow process enforced, ensures accuracy & efficiency. • Adaptable to different types of content. • Initiates & manages esternal processes via Intranda task manager (ITM). • METS as basis of access & access control.
Rapid evolution of Goobi • Goobi we have now quite different to what we bought. • Initial configuration to import MARC XML DMD & to automate ingest into SDB. • Initially Goobi didn’t scale to met our ambition. • Initial install monolithic, now running Goobi as distributed services. • Developed new features with Intranda, e.g. Jpylyzation.
Working with DMD • Upload MARC XML DMD exported from Sierra using standard Goobi features. • MARC fields edited to provide a consistent Goobi process title, e.g. using shelf mark. • MARC Leader 6 field identifies content type, e.g ‘Archive’ or ‘Monograph’. • Content ‘type’ used by Goobi to set default METS access conditions. • DMD not delivered to end user, that comes from live catalogue.
Uploading content • Content upload using the Sync2Goobi Tool for bulk import. • Drag ‘n drop interface. • Can be either TIFF or JP2. • Project based workflow templates manage either format. • Use Goobi Mount Tool (GMT) to access/manage content already uploaded.
Using METS Editor • Main point of human interaction with Goobi. Goobi automates METS creation. • METS basis for access control & usage conditions for material. • Basis for retrieval of content from SDB by using SDB PUIDs. • Goobi automates ingest of content into SDB & receives AMD in return.
How we use METS • Setting material type & default values for access based on DMD. • Access restrictions can be at the item level. • DMD in METS not delivered to end user, serves only to help a human identify content when snagging.
Shared development • Wellcome Trust is not a development house. Rely on Intranda to provide development support. • Developed specifc requirememnts for extensions to Goobi, e.g. Jpylyser for JPEG2000 validation. • Development proposals from both sides. We have idea, Intranda helps us make that idea a reality. • Benefit from community developments commissioned by others.
Additional Tools • Lurawave for converting TIFF to JPEG2000. • Jpylyzer for validating JPEG2000 files. • Sync2Goobi Tool for bulk upload of content. • Goobi Mount Tool/MS Windows File Explorer for access to ‘Home’ folders.
Goobi – the future • Built in OCR & creation of ALTO files. • Further refinement of Sync2Goobi Tool. • Further development/integration of validation tools. • Integration of ftp with Goobi for 3rd party direct upload of content. • Establishment of separate database server for Goobi.
Lessons learned - systems • We were ambitious but underestimated what capacity we would require. • Underestimated storage requirements. • Underestimated the desirability of high levels of automation. • Focus human interaction at as few points as possible.
Lessons learned - Intranda • Have relied heavily on input & support from Intranda. • Share information with Intranda & trust them to provide answers. • Be prepared to share development. But be prepared to accept some pain.
Lessons learned - Goobi • In less than a year Goobi has become key to delivering the Library’s content. • Centralised user activities in one system – Goobi – less to learn, more efficient. • Streamline & automate. High volume efficient production essential. • Streamline other digitisation & access processes to match Goobi. • METS an efficient single place for access related metadata.
Thank youQuestions now, questions later…? Dave Thompson, Digital Curator Wellcome Library email@example.com http://wellcomelibrary.org/