SlideShare a Scribd company logo
1 of 9
Download to read offline
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
Minimal Effort Ingest
en.statsbiblioteket.dk/minimal-effort-ingest
Dec 3, 2015 2
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
State and University Library
● A National Library
– Responsible for preserving the Danish Cultural
Heritage
● Many diverse collections, from many legacy
systems
– These collections must be preserved, but very few
users want access.
Dec 3, 2015 3
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
What is Minimal Effort Ingest?
● A different approach to ingest and Quality
Assurance
● In OAIS detailed QA is part of ingest
– Strict compliance required before ingest
● Minimal Effort Ingest postpones most of QA
– Data ingested as is
– QA is done just after ingest - or even later, if resources
are sparse
– Failure in QA is handled within the repository
Dec 3, 2015 4
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
Why do Minimal Effort Ingest?
● Secure the incoming data quickly
● Old collections are preserved
– even if resources for QA are not available
● Update and rerun preservation actions as
needed
Dec 3, 2015 5
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
Minimal Effort Ingest – An example
● Collection: Wav files and a CSV file with metadata
1) Ingest all the files, just as File Objects
2) Generate technical metadata for the File Objects
3) Parse the CSV file and create Track Objects
4) Generate Access Copies for the Track Objects
5) Verify that the Track Metadata is correct
1) Simple checks such as duration
2) Complex checks could be akin to forensics
6) Do speech2text to generate better indexes
You can do as many of these as you have the budget
for.
If you do only 1, the collection is still well preserved
If you also do 2, you will be able to plan for format
preservation risks
If you do 3 the collection can be made available for
discovery
If you do 4 the collection can be made available for
access
If you do 5, you can verify that your collection actually
contain what you believe it do
If you do 6, you can improve the discovery greatly
Do note that point 4 and 5 can be done in reverse
order, if quality is more important than access.
Dec 3, 2015 6
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
In a timely fashion...
● The important matter is that everything, data and
metadata and context, is available when needed,
and not before
● This includes information not known at the time of
creation
● So the question becomes not
– How much metadata do I need?
● but rather
– When would I need this metadata?
Some metadata is only available at the time of
creation, even if it is only used much later, eg.
digitization hardware.
While it is good practice to get as much metadata as
possible as early is possible, do not assume you
can get all.
Some require tools (speech2text, OCR) which are
still improving
Some metadata require special skills to both
generate and understand
The most important metadata might not be something
the creator can provide
Journals and citation-counts is one such example.
Truthfulness is another.
Dec 3, 2015 7
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
Expensive Understandings
● In our experience, the most expensive part of
digital preservation is understanding your
collection
● This cost turned out to be fairly constant,
irrespective of the collection size
● This is even more true for Research Data
● Preserving the files and preserving the
understanding are very different challenges
Understanding a collection allows you to build data
models and to do QA
Datamodels are important for Access systems.
QA is only really important, if you are able to get a
better version of the data.
When receiving these data from a provider, you can
often request a new version, if something is broken.
When “represerving” an old collection or when getting
research data, the data is what it is, broken or not.
QA becomes less valuable, as a broken file is still
more valuable than no file
Preserving understanding. Is it nessessary, and how
much? Should I preserve the jpeg spec along with
my jpeg files? How about a dictionary?
Dec 3, 2015 8
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
Preservation Events
● Our archival record's life will often consist of these three
phases
1) Raw Ingest
2) Enrichment and transformation to data model
3) Preservation Actions
● The history of a Record should include all these phases.
This happens naturally if the transformation happens inside
the repository.
● Unfortunately, many traditional systems do their most
important transformations before ingest.
With Minimal Effort Ingest, even the preparation
happens inside the repository. So whatever
version/event tracking system the repository uses,
will also list the initial transformations.
It is hard to prove authenticity if you cannot show
what changes happened from “files on disk” to
“SIP” even if you know everything that happened
from “SIP” and onwards.
Dec 3, 2015 9
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
Preservation 2.0?
● Web 1.0 was the web of static webpages, and
the user would read but never contribute
● Web 2.0 is perhaps best exemplified by Wikis,
where the user is also an editor
● Records are updated, but with strong
versioning and history
This does not mean everybody can edit, it means that
the system is build around the concept of updating
and enriching content. We still envisage a strong
Curatorial presence.
The dead archival record is past. Records in the
repository are alive. They are updated, changed
and interlinked during their lifetime.
Design your preservation systems not as the archives
of old, but as the wikis of today.

More Related Content

Viewers also liked

The Biggest Lender
The Biggest LenderThe Biggest Lender
The Biggest LenderAdrian Teng
 
Trabajo de organigrama
Trabajo de organigramaTrabajo de organigrama
Trabajo de organigramagrtm132
 
Екскурсія музеєм Баштанської ЗОШ І-ІІІ ст. №2
Екскурсія музеєм Баштанської ЗОШ І-ІІІ ст. №2Екскурсія музеєм Баштанської ЗОШ І-ІІІ ст. №2
Екскурсія музеєм Баштанської ЗОШ І-ІІІ ст. №2painAlex
 
cardinal health Q3 2007 Earnings Release
cardinal health 	Q3 2007 Earnings Releasecardinal health 	Q3 2007 Earnings Release
cardinal health Q3 2007 Earnings Releasefinance2
 
mckesson Annual Report as Filed on Form 10-K - 880k 2004
mckesson Annual Report as Filed on Form 10-K - 880k  2004mckesson Annual Report as Filed on Form 10-K - 880k  2004
mckesson Annual Report as Filed on Form 10-K - 880k 2004finance2
 
mckesson Annual Report as Filed on Form 10-K - 2.3M 2005
mckesson Annual Report as Filed on Form 10-K - 2.3M 2005mckesson Annual Report as Filed on Form 10-K - 2.3M 2005
mckesson Annual Report as Filed on Form 10-K - 2.3M 2005finance2
 
2003 Merrill Lynch Global Healthcare Conference
	 2003 Merrill Lynch Global Healthcare Conference	 2003 Merrill Lynch Global Healthcare Conference
2003 Merrill Lynch Global Healthcare Conferencefinance2
 
Mekesson Quarterly Reports 2002 1st
Mekesson Quarterly Reports 2002 1stMekesson Quarterly Reports 2002 1st
Mekesson Quarterly Reports 2002 1stfinance2
 

Viewers also liked (11)

practice profile
practice profilepractice profile
practice profile
 
The Biggest Lender
The Biggest LenderThe Biggest Lender
The Biggest Lender
 
MSanJoaquin_CV
MSanJoaquin_CVMSanJoaquin_CV
MSanJoaquin_CV
 
Trabajo de organigrama
Trabajo de organigramaTrabajo de organigrama
Trabajo de organigrama
 
Екскурсія музеєм Баштанської ЗОШ І-ІІІ ст. №2
Екскурсія музеєм Баштанської ЗОШ І-ІІІ ст. №2Екскурсія музеєм Баштанської ЗОШ І-ІІІ ст. №2
Екскурсія музеєм Баштанської ЗОШ І-ІІІ ст. №2
 
SuperMemo World
SuperMemo WorldSuperMemo World
SuperMemo World
 
cardinal health Q3 2007 Earnings Release
cardinal health 	Q3 2007 Earnings Releasecardinal health 	Q3 2007 Earnings Release
cardinal health Q3 2007 Earnings Release
 
mckesson Annual Report as Filed on Form 10-K - 880k 2004
mckesson Annual Report as Filed on Form 10-K - 880k  2004mckesson Annual Report as Filed on Form 10-K - 880k  2004
mckesson Annual Report as Filed on Form 10-K - 880k 2004
 
mckesson Annual Report as Filed on Form 10-K - 2.3M 2005
mckesson Annual Report as Filed on Form 10-K - 2.3M 2005mckesson Annual Report as Filed on Form 10-K - 2.3M 2005
mckesson Annual Report as Filed on Form 10-K - 2.3M 2005
 
2003 Merrill Lynch Global Healthcare Conference
	 2003 Merrill Lynch Global Healthcare Conference	 2003 Merrill Lynch Global Healthcare Conference
2003 Merrill Lynch Global Healthcare Conference
 
Mekesson Quarterly Reports 2002 1st
Mekesson Quarterly Reports 2002 1stMekesson Quarterly Reports 2002 1st
Mekesson Quarterly Reports 2002 1st
 

Similar to Minimal Effort Ingest for DPC Metadata meeting

Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassAaron Collie
 
Getting started in digital preservation
Getting started in digital preservationGetting started in digital preservation
Getting started in digital preservationSarah Jones
 
Data presentation and transfer
Data presentation and transferData presentation and transfer
Data presentation and transferIyad Abou Rabii
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsAaron Collie
 
Getting to grips with Research Data Management
Getting to grips with Research Data ManagementGetting to grips with Research Data Management
Getting to grips with Research Data ManagementIzzyChad
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementcunera
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data ThingsKatina Toufexis
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...John Scally
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
Smithies bodleian 2017_v.2.0
Smithies bodleian 2017_v.2.0Smithies bodleian 2017_v.2.0
Smithies bodleian 2017_v.2.0jamessmithies
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅kulibrarians
 
Getting Things Done for Technical Communicators at TCUK14
Getting Things Done for Technical Communicators at TCUK14Getting Things Done for Technical Communicators at TCUK14
Getting Things Done for Technical Communicators at TCUK14Karen Mardahl
 
Introducingthe anu datacommons
Introducingthe anu datacommonsIntroducingthe anu datacommons
Introducingthe anu datacommonsDoug Moncur
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management Wendy Mears
 
How to share useful data
How to share useful dataHow to share useful data
How to share useful dataPeter McQuilton
 
Filling the digital preservation gap
Filling the digital preservation gapFilling the digital preservation gap
Filling the digital preservation gapJisc
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSubrata Saharia
 

Similar to Minimal Effort Ingest for DPC Metadata meeting (20)

Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities Class
 
Getting started in digital preservation
Getting started in digital preservationGetting started in digital preservation
Getting started in digital preservation
 
Data presentation and transfer
Data presentation and transferData presentation and transfer
Data presentation and transfer
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering Students
 
Getting to grips with Research Data Management
Getting to grips with Research Data ManagementGetting to grips with Research Data Management
Getting to grips with Research Data Management
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data Things
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Smithies bodleian 2017_v.2.0
Smithies bodleian 2017_v.2.0Smithies bodleian 2017_v.2.0
Smithies bodleian 2017_v.2.0
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
Getting Things Done for Technical Communicators at TCUK14
Getting Things Done for Technical Communicators at TCUK14Getting Things Done for Technical Communicators at TCUK14
Getting Things Done for Technical Communicators at TCUK14
 
Data management
Data management Data management
Data management
 
Patterson e life uksg 2013
Patterson e life uksg 2013Patterson e life uksg 2013
Patterson e life uksg 2013
 
Introducingthe anu datacommons
Introducingthe anu datacommonsIntroducingthe anu datacommons
Introducingthe anu datacommons
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management
 
How to share useful data
How to share useful dataHow to share useful data
How to share useful data
 
Filling the digital preservation gap
Filling the digital preservation gapFilling the digital preservation gap
Filling the digital preservation gap
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Minimal Effort Ingest for DPC Metadata meeting

  • 1. Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk Minimal Effort Ingest en.statsbiblioteket.dk/minimal-effort-ingest
  • 2. Dec 3, 2015 2 Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk State and University Library ● A National Library – Responsible for preserving the Danish Cultural Heritage ● Many diverse collections, from many legacy systems – These collections must be preserved, but very few users want access.
  • 3. Dec 3, 2015 3 Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk What is Minimal Effort Ingest? ● A different approach to ingest and Quality Assurance ● In OAIS detailed QA is part of ingest – Strict compliance required before ingest ● Minimal Effort Ingest postpones most of QA – Data ingested as is – QA is done just after ingest - or even later, if resources are sparse – Failure in QA is handled within the repository
  • 4. Dec 3, 2015 4 Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk Why do Minimal Effort Ingest? ● Secure the incoming data quickly ● Old collections are preserved – even if resources for QA are not available ● Update and rerun preservation actions as needed
  • 5. Dec 3, 2015 5 Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk Minimal Effort Ingest – An example ● Collection: Wav files and a CSV file with metadata 1) Ingest all the files, just as File Objects 2) Generate technical metadata for the File Objects 3) Parse the CSV file and create Track Objects 4) Generate Access Copies for the Track Objects 5) Verify that the Track Metadata is correct 1) Simple checks such as duration 2) Complex checks could be akin to forensics 6) Do speech2text to generate better indexes You can do as many of these as you have the budget for. If you do only 1, the collection is still well preserved If you also do 2, you will be able to plan for format preservation risks If you do 3 the collection can be made available for discovery If you do 4 the collection can be made available for access If you do 5, you can verify that your collection actually contain what you believe it do If you do 6, you can improve the discovery greatly Do note that point 4 and 5 can be done in reverse order, if quality is more important than access.
  • 6. Dec 3, 2015 6 Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk In a timely fashion... ● The important matter is that everything, data and metadata and context, is available when needed, and not before ● This includes information not known at the time of creation ● So the question becomes not – How much metadata do I need? ● but rather – When would I need this metadata? Some metadata is only available at the time of creation, even if it is only used much later, eg. digitization hardware. While it is good practice to get as much metadata as possible as early is possible, do not assume you can get all. Some require tools (speech2text, OCR) which are still improving Some metadata require special skills to both generate and understand The most important metadata might not be something the creator can provide Journals and citation-counts is one such example. Truthfulness is another.
  • 7. Dec 3, 2015 7 Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk Expensive Understandings ● In our experience, the most expensive part of digital preservation is understanding your collection ● This cost turned out to be fairly constant, irrespective of the collection size ● This is even more true for Research Data ● Preserving the files and preserving the understanding are very different challenges Understanding a collection allows you to build data models and to do QA Datamodels are important for Access systems. QA is only really important, if you are able to get a better version of the data. When receiving these data from a provider, you can often request a new version, if something is broken. When “represerving” an old collection or when getting research data, the data is what it is, broken or not. QA becomes less valuable, as a broken file is still more valuable than no file Preserving understanding. Is it nessessary, and how much? Should I preserve the jpeg spec along with my jpeg files? How about a dictionary?
  • 8. Dec 3, 2015 8 Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk Preservation Events ● Our archival record's life will often consist of these three phases 1) Raw Ingest 2) Enrichment and transformation to data model 3) Preservation Actions ● The history of a Record should include all these phases. This happens naturally if the transformation happens inside the repository. ● Unfortunately, many traditional systems do their most important transformations before ingest. With Minimal Effort Ingest, even the preparation happens inside the repository. So whatever version/event tracking system the repository uses, will also list the initial transformations. It is hard to prove authenticity if you cannot show what changes happened from “files on disk” to “SIP” even if you know everything that happened from “SIP” and onwards.
  • 9. Dec 3, 2015 9 Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk Preservation 2.0? ● Web 1.0 was the web of static webpages, and the user would read but never contribute ● Web 2.0 is perhaps best exemplified by Wikis, where the user is also an editor ● Records are updated, but with strong versioning and history This does not mean everybody can edit, it means that the system is build around the concept of updating and enriching content. We still envisage a strong Curatorial presence. The dead archival record is past. Records in the repository are alive. They are updated, changed and interlinked during their lifetime. Design your preservation systems not as the archives of old, but as the wikis of today.

Editor's Notes

  1. You can do as many of these as you have the budget for. If you do only 1, the collection is still well preserved If you also do 2, you will be able to plan for format preservation risks If you do 3 the collection can be made available for discovery If you do 4 the collection can be made available for access If you do 5, you can verify that your collection actually contain what you believe it do If you do 6, you can improve the discovery greatly Do note that point 4 and 5 can be done in reverse order, if quality is more important than access.
  2. Some metadata is only available at the time of creation, even if it is only used much later, eg. digitization hardware. While it is good practice to get as much metadata as possible as early is possible, do not assume you can get all. Some require tools (speech2text, OCR) which are still improving Some metadata require special skills to both generate and understand The most important metadata might not be something the creator can provide Journals and citation-counts is one such example. Truthfulness is another.
  3. Understanding a collection allows you to build data models and to do QA Datamodels are important for Access systems. QA is only really important, if you are able to get a better version of the data. When receiving these data from a provider, you can often request a new version, if something is broken. When “represerving” an old collection or when getting research data, the data is what it is, broken or not. QA becomes less valuable, as a broken file is still more valuable than no file Preserving understanding. Is it nessessary, and how much? Should I preserve the jpeg spec along with my jpeg files? How about a dictionary?
  4. With Minimal Effort Ingest, even the preparation happens inside the repository. So whatever version/event tracking system the repository uses, will also list the initial transformations. It is hard to prove authenticity if you cannot show what changes happened from “files on disk” to “SIP” even if you know everything that happened from “SIP” and onwards.
  5. This does not mean everybody can edit, it means that the system is build around the concept of updating and enriching content. We still envisage a strong Curatorial presence. The dead archival record is past. Records in the repository are alive. They are updated, changed and interlinked during their lifetime. Design your preservation systems not as the archives of old, but as the wikis of today.