SlideShare a Scribd company logo
Systems, processes & how we
stop the wheels falling off
Digitisation Open Day, September 2013
Dave Thompson
Digital Curator, Wellcome Library
Digitisation – process overview
Plan project
Catalogue
Identify material
Identify resources
Plan process
Review as you go
Digitise/proces
s
Deliver
Refine processes
Document/share
Document/share
Document/share
Funding, staff, equipment, IT,
storage, data management
planning
Open source player
Meanwhile, at the coal face…
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
Thinking conceptually … OAIS
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
In OAIS speak this is a SIP. An aggregation of object & its
metadata in a form that is acceptable to the repository, e.g.
JPEG2000 images and MARC XML.
The Open Archive Information System Reference model (OAIS) is an ISO
that describes a conceptual model of an archive. It sets out the activities of an
archive & the processes involved in submission, storage & access. Developed
by NASA after they ‘lost’ space data through obsolescence.
Thinking conceptually… OAIS
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
In OAIS speak this is a AIP. This is the object & its metadata
stored in a repository.
OAIS talks of 3 information packages.
1.Submission Information package = what is ingested
2.Archive Information Package = what is stored
3.Dissemination Information package = what is made available
Thinking conceptually …OAIS
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
In OAIS speak this is a DIP. This is the parts of the object & its
metadata that we are able to make available.
As defined in the (#DPC) handbook, access is assumed to mean continued,
ongoing usability of a digital resource, retaining all qualities of authenticity,
accuracy and functionality deemed to be essential for the purposes the digital
material was created and/or acquired for.
Lets tackle the basics…processing
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
Administrative metadata, (AMD) technical description of the files.
Automatically created by Safety Deposit Box (SDB) on ingest
into our repository. Used by the player for display purposes.
Administrative MetaData is typically created automatically, it could be:
•File size
•Image HxW
•File format
•Checksum
Lets tackle the basics…processing
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
DMD. MARC, converted to MARC XML. This becomes MODS in
the METS. Material must be catalogued before we can store it &
make it available.
Descriptive MetaData (DMD), typically human generated, AKA cataloguing
metadata. ISAD(g) for archival material, MARC for bibliographic material.
Metadata Object Description Schema (MODS)
Lets tackle the basics…processing
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
Safety Deposit Box (SDB), the place where we store digital stuff.
Ingest is automatically initiated by Goobi. Database that
associates objects with DMD & AMD. Source for dissemination.
Digital Repositories offer a convenient infrastructure through which to store,
manage, re-use and curate digital materials. They are used by a variety of
communities, may carry out many different functions, and can take many
forms.
Lets tackle the basics…processing
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
METS is metadata about structure & pagination created by
humans, METS file built automatically.
A Metadata Encoding & Transmission Standard (METS) file is an aggregated
collection of DMD & AMD (a file list with structure) that provides a mechanism
for managed access. A METS file allows metadata from different system to
be combined into a portable format.
The formats
• JPEG2000 is our master image format.
• We create dissemination images (JPEG) on the
fly.
• Also use PDF, MPEG2, MP3
The systems
• Goobi. Manages & tracks the production of
digitised content.
• SDB. Repository that stores digitised content
along with its DMD & AMD.
• Player. User interface to view digitised material.
How Goobi works – the basics
• Project based.
• Workflow driven.
• Users accept ‘tasks’.
• A users role determines what projects they belong
to & what roles they have.
How Goobi works – a workflow
How Goobi works – METS editing
Pagination as per original
Descriptive metadata
Structure
Lessons from Goobi
• Design your workflows in advance. But be flexible.
• Automate as much as possible, saves time &
more efficient.
• Document processes & procedures.
• Share what you learn.
How SDB works – the basics
• Workflow based easily ‘talks’ to other systems.
• Content agnostic.
• Creates administrative metadata on ingest.
• Preservation orientated.
How SDB works
How SDB works – behind the scenes
• No public access to SDB.
• Little direct staff access to SDB content.
• High levels of automation of ingest, Goobi.
• Platform for dissemination mediated by the player.
Lessons from SDB
• Plan your systems integration, which system talks
to which, and how.
• Plan workflows & processes.
• Data management plan. Your eggs in one basket.
• Plan what you’ll do when it all turns to custard.
How the player works – the basics
How the player works
• Makes HTTP request to SDB for content.
• Draws access conditions from METS file.
• Permitted actions drawn from METS.
• Draws DMD from live catalogue.
Summary
• Digitisation is an end to end process that brings
together objects & metadata.
• Have to think about the whole system to deliver
results. Process is one of combining metadata
from different systems.
• Document plans & document process.
• Be prepared to be flexible & to change as
necessary. But try to stick to the plan!
Further reading
• Wellcome Library – http://wellcomelibrary.org
• Metadata Encoding & Transmission Standard at the Library of Congress -
http://www.loc.gov/standards/mets/
• Reference Model for an Open Archival Information System (OAIS).
Magenta Book. Issue 2. June 2012 -
http://public.ccsds.org/publications/RefModel.aspx
• Tessella, Safety Deposit Box - http://www.tessella.com/tag/safety-deposit-
box/
• Data management planning - http://www.dcc.ac.uk/resources/data-
management-plans
• Repository Software Comparison: Building Digital Library Infrastructure at
LSE - http://www.ariadne.ac.uk/issue64/fay
Thank you
Questions now, questions later…?
Dave Thompson, Digital Curator
Wellcome Library
d.thompson@wellcome.ac.uk - #welldigi
http://wellcomelibrary.org/

More Related Content

What's hot

Cloud computing and big data analytics
Cloud computing and big data analyticsCloud computing and big data analytics
Cloud computing and big data analytics
hanish93
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
Praveen Hanchinal
 
Campus Bridging with Globus Services
Campus Bridging with Globus ServicesCampus Bridging with Globus Services
Campus Bridging with Globus Services
Ian Foster
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
Henning Bergmeyer
 
BHL Global Infrastructure - Vision
BHL Global Infrastructure - VisionBHL Global Infrastructure - Vision
BHL Global Infrastructure - Vision
Chris Freeland
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
Ahmed Salman
 
BigData HUB Workshop
BigData HUB WorkshopBigData HUB Workshop
BigData HUB Workshop
Ahmed Salman
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
Amrut Patil
 
Cni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferiesCni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferies
BDLSS
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
Stratebi
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 
Hadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionHadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or question
DataWorks Summit
 
Mongodb
MongodbMongodb
Mongodb
Apurva Vyas
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plans
Ian Foster
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
Globus
 

What's hot (15)

Cloud computing and big data analytics
Cloud computing and big data analyticsCloud computing and big data analytics
Cloud computing and big data analytics
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
Campus Bridging with Globus Services
Campus Bridging with Globus ServicesCampus Bridging with Globus Services
Campus Bridging with Globus Services
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
 
BHL Global Infrastructure - Vision
BHL Global Infrastructure - VisionBHL Global Infrastructure - Vision
BHL Global Infrastructure - Vision
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
 
BigData HUB Workshop
BigData HUB WorkshopBigData HUB Workshop
BigData HUB Workshop
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Cni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferiesCni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferies
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Hadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionHadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or question
 
Mongodb
MongodbMongodb
Mongodb
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plans
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
 

Viewers also liked

Copyright clearance for genetics books - a pilot project at the Wellcome Library
Copyright clearance for genetics books - a pilot project at the Wellcome LibraryCopyright clearance for genetics books - a pilot project at the Wellcome Library
Copyright clearance for genetics books - a pilot project at the Wellcome Library
Wellcome Library
 
Systems and Processes: making order out of chaos
Systems and Processes: making order out of chaosSystems and Processes: making order out of chaos
Systems and Processes: making order out of chaos
Wellcome Library
 
Webinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigraineWebinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration Migraine
Peak Hosting
 
Doing Projects: 10 laws of digitisation
Doing Projects: 10 laws of digitisationDoing Projects: 10 laws of digitisation
Doing Projects: 10 laws of digitisation
Wellcome Library
 
How will history remember you…?
How will history remember you…?How will history remember you…?
How will history remember you…?
Wellcome Library
 
Making Order Out of Chaos
Making Order Out of ChaosMaking Order Out of Chaos
Making Order Out of Chaos
Muhammad Umer Yasin
 

Viewers also liked (6)

Copyright clearance for genetics books - a pilot project at the Wellcome Library
Copyright clearance for genetics books - a pilot project at the Wellcome LibraryCopyright clearance for genetics books - a pilot project at the Wellcome Library
Copyright clearance for genetics books - a pilot project at the Wellcome Library
 
Systems and Processes: making order out of chaos
Systems and Processes: making order out of chaosSystems and Processes: making order out of chaos
Systems and Processes: making order out of chaos
 
Webinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigraineWebinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration Migraine
 
Doing Projects: 10 laws of digitisation
Doing Projects: 10 laws of digitisationDoing Projects: 10 laws of digitisation
Doing Projects: 10 laws of digitisation
 
How will history remember you…?
How will history remember you…?How will history remember you…?
How will history remember you…?
 
Making Order Out of Chaos
Making Order Out of ChaosMaking Order Out of Chaos
Making Order Out of Chaos
 

Similar to Systems, processes & how we stop the wheels falling off

Wt dnt digitisation_open_day_v9
Wt dnt digitisation_open_day_v9Wt dnt digitisation_open_day_v9
Wt dnt digitisation_open_day_v9
Wellcome Library
 
Analytics with unified file and object
Analytics with unified file and object Analytics with unified file and object
Analytics with unified file and object
Sandeep Patil
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
Sergey Karayev
 
From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014
Adam Ferrari
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by Sunny
DignitasDigital1
 
Digital library technologies
Digital library technologies Digital library technologies
Digital library technologies
Shriram Pandey
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
Sarah Anna Stewart
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
MohammedShahid562503
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
MongoDB
 
2015 05-07-mac
2015 05-07-mac2015 05-07-mac
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
MaulikLakhani
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Denodo
 
METS Metadata for Complete Beginners
METS Metadata for Complete BeginnersMETS Metadata for Complete Beginners
METS Metadata for Complete Beginners
stuartayeates
 
Dbms Useful PPT
Dbms Useful PPTDbms Useful PPT
Dbms Useful PPT
Krishna Bashyal
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
Uwe Printz
 
Database system
Database systemDatabase system
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 Standards
Neo4j
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
Data Privacy at Scale
Data Privacy at ScaleData Privacy at Scale
Data Privacy at Scale
DataWorks Summit
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
Nederlands Instituut voor Beeld en Geluid
 

Similar to Systems, processes & how we stop the wheels falling off (20)

Wt dnt digitisation_open_day_v9
Wt dnt digitisation_open_day_v9Wt dnt digitisation_open_day_v9
Wt dnt digitisation_open_day_v9
 
Analytics with unified file and object
Analytics with unified file and object Analytics with unified file and object
Analytics with unified file and object
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
 
From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by Sunny
 
Digital library technologies
Digital library technologies Digital library technologies
Digital library technologies
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
2015 05-07-mac
2015 05-07-mac2015 05-07-mac
2015 05-07-mac
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
 
METS Metadata for Complete Beginners
METS Metadata for Complete BeginnersMETS Metadata for Complete Beginners
METS Metadata for Complete Beginners
 
Dbms Useful PPT
Dbms Useful PPTDbms Useful PPT
Dbms Useful PPT
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Database system
Database systemDatabase system
Database system
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 Standards
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Data Privacy at Scale
Data Privacy at ScaleData Privacy at Scale
Data Privacy at Scale
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 

More from Wellcome Library

Wellcome Library Transcribing Recipes report
Wellcome Library Transcribing Recipes reportWellcome Library Transcribing Recipes report
Wellcome Library Transcribing Recipes report
Wellcome Library
 
ProQuest Early European Books: Partner Perspective
ProQuest Early European Books: Partner PerspectiveProQuest Early European Books: Partner Perspective
ProQuest Early European Books: Partner Perspective
Wellcome Library
 
Creating an online resource for medical archives at the Wellcome Library
Creating an online resource for medical archives at the Wellcome LibraryCreating an online resource for medical archives at the Wellcome Library
Creating an online resource for medical archives at the Wellcome Library
Wellcome Library
 
Jpeg2000 at Wellcome Library
Jpeg2000 at Wellcome LibraryJpeg2000 at Wellcome Library
Jpeg2000 at Wellcome Library
Wellcome Library
 
Digitisation Projects at Wellcome Library
Digitisation Projects at Wellcome LibraryDigitisation Projects at Wellcome Library
Digitisation Projects at Wellcome Library
Wellcome Library
 
Image Capture
Image CaptureImage Capture
Image Capture
Wellcome Library
 
Conservation for Digitisation
Conservation for DigitisationConservation for Digitisation
Conservation for Digitisation
Wellcome Library
 
Copyright Clearance for Genetics Books, A pilot project at the Wellcome Library
Copyright Clearance for Genetics Books, A pilot project at the Wellcome LibraryCopyright Clearance for Genetics Books, A pilot project at the Wellcome Library
Copyright Clearance for Genetics Books, A pilot project at the Wellcome Library
Wellcome Library
 
Managing Large Scale Digitisation at the Wellcome Library
Managing Large Scale Digitisation at the Wellcome LibraryManaging Large Scale Digitisation at the Wellcome Library
Managing Large Scale Digitisation at the Wellcome Library
Wellcome Library
 
Upscaling digitisation at the Wellcome Library
Upscaling digitisation at the Wellcome LibraryUpscaling digitisation at the Wellcome Library
Upscaling digitisation at the Wellcome Library
Wellcome Library
 
Mandating Open Access - Wellcome Trust
Mandating Open Access - Wellcome TrustMandating Open Access - Wellcome Trust
Mandating Open Access - Wellcome Trust
Wellcome Library
 

More from Wellcome Library (11)

Wellcome Library Transcribing Recipes report
Wellcome Library Transcribing Recipes reportWellcome Library Transcribing Recipes report
Wellcome Library Transcribing Recipes report
 
ProQuest Early European Books: Partner Perspective
ProQuest Early European Books: Partner PerspectiveProQuest Early European Books: Partner Perspective
ProQuest Early European Books: Partner Perspective
 
Creating an online resource for medical archives at the Wellcome Library
Creating an online resource for medical archives at the Wellcome LibraryCreating an online resource for medical archives at the Wellcome Library
Creating an online resource for medical archives at the Wellcome Library
 
Jpeg2000 at Wellcome Library
Jpeg2000 at Wellcome LibraryJpeg2000 at Wellcome Library
Jpeg2000 at Wellcome Library
 
Digitisation Projects at Wellcome Library
Digitisation Projects at Wellcome LibraryDigitisation Projects at Wellcome Library
Digitisation Projects at Wellcome Library
 
Image Capture
Image CaptureImage Capture
Image Capture
 
Conservation for Digitisation
Conservation for DigitisationConservation for Digitisation
Conservation for Digitisation
 
Copyright Clearance for Genetics Books, A pilot project at the Wellcome Library
Copyright Clearance for Genetics Books, A pilot project at the Wellcome LibraryCopyright Clearance for Genetics Books, A pilot project at the Wellcome Library
Copyright Clearance for Genetics Books, A pilot project at the Wellcome Library
 
Managing Large Scale Digitisation at the Wellcome Library
Managing Large Scale Digitisation at the Wellcome LibraryManaging Large Scale Digitisation at the Wellcome Library
Managing Large Scale Digitisation at the Wellcome Library
 
Upscaling digitisation at the Wellcome Library
Upscaling digitisation at the Wellcome LibraryUpscaling digitisation at the Wellcome Library
Upscaling digitisation at the Wellcome Library
 
Mandating Open Access - Wellcome Trust
Mandating Open Access - Wellcome TrustMandating Open Access - Wellcome Trust
Mandating Open Access - Wellcome Trust
 

Recently uploaded

Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

Systems, processes & how we stop the wheels falling off

  • 1. Systems, processes & how we stop the wheels falling off Digitisation Open Day, September 2013 Dave Thompson Digital Curator, Wellcome Library
  • 2. Digitisation – process overview Plan project Catalogue Identify material Identify resources Plan process Review as you go Digitise/proces s Deliver Refine processes Document/share Document/share Document/share Funding, staff, equipment, IT, storage, data management planning Open source player
  • 3. Meanwhile, at the coal face… Administrative metadata Descriptive metadata Digitised images Ingestion into repository Creation of METS Access + = + + +
  • 4. Thinking conceptually … OAIS Administrative metadata Descriptive metadata Digitised images Ingestion into repository Creation of METS Access + = + + + In OAIS speak this is a SIP. An aggregation of object & its metadata in a form that is acceptable to the repository, e.g. JPEG2000 images and MARC XML. The Open Archive Information System Reference model (OAIS) is an ISO that describes a conceptual model of an archive. It sets out the activities of an archive & the processes involved in submission, storage & access. Developed by NASA after they ‘lost’ space data through obsolescence.
  • 5. Thinking conceptually… OAIS Administrative metadata Descriptive metadata Digitised images Ingestion into repository Creation of METS Access + = + + + In OAIS speak this is a AIP. This is the object & its metadata stored in a repository. OAIS talks of 3 information packages. 1.Submission Information package = what is ingested 2.Archive Information Package = what is stored 3.Dissemination Information package = what is made available
  • 6. Thinking conceptually …OAIS Administrative metadata Descriptive metadata Digitised images Ingestion into repository Creation of METS Access + = + + + In OAIS speak this is a DIP. This is the parts of the object & its metadata that we are able to make available. As defined in the (#DPC) handbook, access is assumed to mean continued, ongoing usability of a digital resource, retaining all qualities of authenticity, accuracy and functionality deemed to be essential for the purposes the digital material was created and/or acquired for.
  • 7. Lets tackle the basics…processing Administrative metadata Descriptive metadata Digitised images Ingestion into repository Creation of METS Access + = + + + Administrative metadata, (AMD) technical description of the files. Automatically created by Safety Deposit Box (SDB) on ingest into our repository. Used by the player for display purposes. Administrative MetaData is typically created automatically, it could be: •File size •Image HxW •File format •Checksum
  • 8. Lets tackle the basics…processing Administrative metadata Descriptive metadata Digitised images Ingestion into repository Creation of METS Access + = + + + DMD. MARC, converted to MARC XML. This becomes MODS in the METS. Material must be catalogued before we can store it & make it available. Descriptive MetaData (DMD), typically human generated, AKA cataloguing metadata. ISAD(g) for archival material, MARC for bibliographic material. Metadata Object Description Schema (MODS)
  • 9. Lets tackle the basics…processing Administrative metadata Descriptive metadata Digitised images Ingestion into repository Creation of METS Access + = + + + Safety Deposit Box (SDB), the place where we store digital stuff. Ingest is automatically initiated by Goobi. Database that associates objects with DMD & AMD. Source for dissemination. Digital Repositories offer a convenient infrastructure through which to store, manage, re-use and curate digital materials. They are used by a variety of communities, may carry out many different functions, and can take many forms.
  • 10. Lets tackle the basics…processing Administrative metadata Descriptive metadata Digitised images Ingestion into repository Creation of METS Access + = + + + METS is metadata about structure & pagination created by humans, METS file built automatically. A Metadata Encoding & Transmission Standard (METS) file is an aggregated collection of DMD & AMD (a file list with structure) that provides a mechanism for managed access. A METS file allows metadata from different system to be combined into a portable format.
  • 11. The formats • JPEG2000 is our master image format. • We create dissemination images (JPEG) on the fly. • Also use PDF, MPEG2, MP3
  • 12. The systems • Goobi. Manages & tracks the production of digitised content. • SDB. Repository that stores digitised content along with its DMD & AMD. • Player. User interface to view digitised material.
  • 13. How Goobi works – the basics • Project based. • Workflow driven. • Users accept ‘tasks’. • A users role determines what projects they belong to & what roles they have.
  • 14. How Goobi works – a workflow
  • 15. How Goobi works – METS editing Pagination as per original Descriptive metadata Structure
  • 16. Lessons from Goobi • Design your workflows in advance. But be flexible. • Automate as much as possible, saves time & more efficient. • Document processes & procedures. • Share what you learn.
  • 17. How SDB works – the basics • Workflow based easily ‘talks’ to other systems. • Content agnostic. • Creates administrative metadata on ingest. • Preservation orientated.
  • 19. How SDB works – behind the scenes • No public access to SDB. • Little direct staff access to SDB content. • High levels of automation of ingest, Goobi. • Platform for dissemination mediated by the player.
  • 20. Lessons from SDB • Plan your systems integration, which system talks to which, and how. • Plan workflows & processes. • Data management plan. Your eggs in one basket. • Plan what you’ll do when it all turns to custard.
  • 21. How the player works – the basics
  • 22. How the player works • Makes HTTP request to SDB for content. • Draws access conditions from METS file. • Permitted actions drawn from METS. • Draws DMD from live catalogue.
  • 23.
  • 24. Summary • Digitisation is an end to end process that brings together objects & metadata. • Have to think about the whole system to deliver results. Process is one of combining metadata from different systems. • Document plans & document process. • Be prepared to be flexible & to change as necessary. But try to stick to the plan!
  • 25. Further reading • Wellcome Library – http://wellcomelibrary.org • Metadata Encoding & Transmission Standard at the Library of Congress - http://www.loc.gov/standards/mets/ • Reference Model for an Open Archival Information System (OAIS). Magenta Book. Issue 2. June 2012 - http://public.ccsds.org/publications/RefModel.aspx • Tessella, Safety Deposit Box - http://www.tessella.com/tag/safety-deposit- box/ • Data management planning - http://www.dcc.ac.uk/resources/data- management-plans • Repository Software Comparison: Building Digital Library Infrastructure at LSE - http://www.ariadne.ac.uk/issue64/fay
  • 26. Thank you Questions now, questions later…? Dave Thompson, Digital Curator Wellcome Library d.thompson@wellcome.ac.uk - #welldigi http://wellcomelibrary.org/

Editor's Notes

  1. dnt
  2. dnt
  3. dnt
  4. dnt
  5. dnt
  6. dnt
  7. dnt
  8. dnt
  9. dnt