This document discusses the concept of Minimal Effort Ingest, an approach to digital preservation that focuses on quickly ingesting data and collections into a repository with minimal upfront quality assurance. It postpones most quality assurance activities until after ingest, allowing data to be preserved even if full QA resources are not available. This helps secure incoming data and collections quickly while still allowing for future QA and access activities. The document provides an example of how Minimal Effort Ingest could be applied to a collection of audio files and metadata, and emphasizes that the most important goal is having all data, files, and context available over time even if full understanding is not achieved immediately.
The SuperMemo method provides the learner with an optimum repetition plan while studying. Thanks to the SuperMemo method, it is possible to achieve the retention level of almost 100% and learn 15 times faster than with traditional methods.
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
A presentation on research data management tools, workflows and best practices at Imperial College London with a focus on software management. Presented at the 2017 session of the HPC Summer School (Dept. of Computing).
The slides of my talk 'Systems Development & Application / Data Lifecycle Management in King’s Digital Lab', Bodleian Library, University of Oxford, November 30th, 2017.
The SuperMemo method provides the learner with an optimum repetition plan while studying. Thanks to the SuperMemo method, it is possible to achieve the retention level of almost 100% and learn 15 times faster than with traditional methods.
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
A presentation on research data management tools, workflows and best practices at Imperial College London with a focus on software management. Presented at the 2017 session of the HPC Summer School (Dept. of Computing).
The slides of my talk 'Systems Development & Application / Data Lifecycle Management in King’s Digital Lab', Bodleian Library, University of Oxford, November 30th, 2017.
http://kulibrarians.g.hatena.ne.jp/kulibrarians/20170222
Presentation by Cuna Ekmekcioglu (The University of Edinburgh)
- Creating and Managing Digital Research Data in Creative Arts: An overview (2016)
CC BY-NC-SA 4.0
Getting Things Done for Technical Communicators at TCUK14Karen Mardahl
My presentation at TCUK14 in Brighton in September 2014 - technicalcommunicationuk.com. It is an update of my similar presentation in June at UA Europe.
A 45min presentation given at the 'Getting published in Nature's Scientific Data journal', hosted by the University of Cambridge Research Data Management team (www.data.cam.ac.uk). Presented on Monday 11th January 2016.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
2. Dec 3, 2015 2
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
State and University Library
● A National Library
– Responsible for preserving the Danish Cultural
Heritage
● Many diverse collections, from many legacy
systems
– These collections must be preserved, but very few
users want access.
3. Dec 3, 2015 3
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
What is Minimal Effort Ingest?
● A different approach to ingest and Quality
Assurance
● In OAIS detailed QA is part of ingest
– Strict compliance required before ingest
● Minimal Effort Ingest postpones most of QA
– Data ingested as is
– QA is done just after ingest - or even later, if resources
are sparse
– Failure in QA is handled within the repository
4. Dec 3, 2015 4
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
Why do Minimal Effort Ingest?
● Secure the incoming data quickly
● Old collections are preserved
– even if resources for QA are not available
● Update and rerun preservation actions as
needed
5. Dec 3, 2015 5
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
Minimal Effort Ingest – An example
● Collection: Wav files and a CSV file with metadata
1) Ingest all the files, just as File Objects
2) Generate technical metadata for the File Objects
3) Parse the CSV file and create Track Objects
4) Generate Access Copies for the Track Objects
5) Verify that the Track Metadata is correct
1) Simple checks such as duration
2) Complex checks could be akin to forensics
6) Do speech2text to generate better indexes
You can do as many of these as you have the budget
for.
If you do only 1, the collection is still well preserved
If you also do 2, you will be able to plan for format
preservation risks
If you do 3 the collection can be made available for
discovery
If you do 4 the collection can be made available for
access
If you do 5, you can verify that your collection actually
contain what you believe it do
If you do 6, you can improve the discovery greatly
Do note that point 4 and 5 can be done in reverse
order, if quality is more important than access.
6. Dec 3, 2015 6
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
In a timely fashion...
● The important matter is that everything, data and
metadata and context, is available when needed,
and not before
● This includes information not known at the time of
creation
● So the question becomes not
– How much metadata do I need?
● but rather
– When would I need this metadata?
Some metadata is only available at the time of
creation, even if it is only used much later, eg.
digitization hardware.
While it is good practice to get as much metadata as
possible as early is possible, do not assume you
can get all.
Some require tools (speech2text, OCR) which are
still improving
Some metadata require special skills to both
generate and understand
The most important metadata might not be something
the creator can provide
Journals and citation-counts is one such example.
Truthfulness is another.
7. Dec 3, 2015 7
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
Expensive Understandings
● In our experience, the most expensive part of
digital preservation is understanding your
collection
● This cost turned out to be fairly constant,
irrespective of the collection size
● This is even more true for Research Data
● Preserving the files and preserving the
understanding are very different challenges
Understanding a collection allows you to build data
models and to do QA
Datamodels are important for Access systems.
QA is only really important, if you are able to get a
better version of the data.
When receiving these data from a provider, you can
often request a new version, if something is broken.
When “represerving” an old collection or when getting
research data, the data is what it is, broken or not.
QA becomes less valuable, as a broken file is still
more valuable than no file
Preserving understanding. Is it nessessary, and how
much? Should I preserve the jpeg spec along with
my jpeg files? How about a dictionary?
8. Dec 3, 2015 8
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
Preservation Events
● Our archival record's life will often consist of these three
phases
1) Raw Ingest
2) Enrichment and transformation to data model
3) Preservation Actions
● The history of a Record should include all these phases.
This happens naturally if the transformation happens inside
the repository.
● Unfortunately, many traditional systems do their most
important transformations before ingest.
With Minimal Effort Ingest, even the preparation
happens inside the repository. So whatever
version/event tracking system the repository uses,
will also list the initial transformations.
It is hard to prove authenticity if you cannot show
what changes happened from “files on disk” to
“SIP” even if you know everything that happened
from “SIP” and onwards.
9. Dec 3, 2015 9
Bolette Ammitzbøll Jurik Asger Askov Blekinge Kåre Fiedler Christiansen
baj@statsbiblioteket.dk abr@statsbiblioteket.dk kfc@statsbiblioteket.dk
Preservation 2.0?
● Web 1.0 was the web of static webpages, and
the user would read but never contribute
● Web 2.0 is perhaps best exemplified by Wikis,
where the user is also an editor
● Records are updated, but with strong
versioning and history
This does not mean everybody can edit, it means that
the system is build around the concept of updating
and enriching content. We still envisage a strong
Curatorial presence.
The dead archival record is past. Records in the
repository are alive. They are updated, changed
and interlinked during their lifetime.
Design your preservation systems not as the archives
of old, but as the wikis of today.
Editor's Notes
You can do as many of these as you have the budget for.
If you do only 1, the collection is still well preserved
If you also do 2, you will be able to plan for format preservation risks
If you do 3 the collection can be made available for discovery
If you do 4 the collection can be made available for access
If you do 5, you can verify that your collection actually contain what you believe it do
If you do 6, you can improve the discovery greatly
Do note that point 4 and 5 can be done in reverse order, if quality is more important than access.
Some metadata is only available at the time of creation, even if it is only used much later, eg. digitization hardware.
While it is good practice to get as much metadata as possible as early is possible, do not assume you can get all.
Some require tools (speech2text, OCR) which are still improving
Some metadata require special skills to both generate and understand
The most important metadata might not be something the creator can provide
Journals and citation-counts is one such example. Truthfulness is another.
Understanding a collection allows you to build data models and to do QA
Datamodels are important for Access systems.
QA is only really important, if you are able to get a better version of the data.
When receiving these data from a provider, you can often request a new version, if something is broken.
When “represerving” an old collection or when getting research data, the data is what it is, broken or not. QA becomes less valuable, as a broken file is still more valuable than no file
Preserving understanding. Is it nessessary, and how much? Should I preserve the jpeg spec along with my jpeg files? How about a dictionary?
With Minimal Effort Ingest, even the preparation happens inside the repository. So whatever version/event tracking system the repository uses, will also list the initial transformations.
It is hard to prove authenticity if you cannot show what changes happened from “files on disk” to “SIP” even if you know everything that happened from “SIP” and onwards.
This does not mean everybody can edit, it means that the system is build around the concept of updating and enriching content. We still envisage a strong Curatorial presence.
The dead archival record is past. Records in the repository are alive. They are updated, changed and interlinked during their lifetime.
Design your preservation systems not as the archives of old, but as the wikis of today.