Normalizing existing digitized content into standardized packages for robust long-term management. A report on SFU Library's METS-Bagger tool, with a discussion of the benefits, design principles used for the packaging specification, and potential next steps.
Presented at Code4Lib BC, November 28, 2013.
UBC Library's Digital Preservation StrategyUBC Library
Presented by Bronwen Sprout & Sarah Romkey, UBC Library.
In early 2011, UBC Library began work on creating a digital preservation strategy in collaboration with Vancouver-based Artefactual Systems. Based on the results of a number of pilot projects, the strategy developed for UBC Library consists of using the open-source Archivematica digital preservation system to provide preservation functionality for the Library’s digitized and born-digital holdings. In addition, the strategy identifies the software requirements, existing and new system components, staffing and business processes that can be implemented to establish operational digital preservation systems and processes. They will discuss the strategy generally and cover three areas of implementation in greater detail: UBC Library’s Rare Books and Special Collections, cIRcle, a DSpace-based institutional repository, and CONTENTdm, UBC Library’s access system for digitized objects.
In April 2014, the Bentley Historical Library received a $355,000 grant from the Mellon Foundation to integrate ArchivesSpace, Archivematica and DSpace into an end-to-end digital archives workflow. This presentation will identify key project goals and outcomes and demonstrate features and functionality of Archivematica’s new “Appraisal and Arrangement” tab.
PERICLES Information Packaging TechniquesPERICLES_FP7
This presentation by Anna-Grit Eggers (University of Goettingen) introduces main methods and standards for information packaging.
PERICLES is a four-year Integrated Project (2013-2017) funded by the European Union under its Seventh Framework Programme (ICT Call 9).
http://pericles-project.eu/
Presentation to the PREMIS Implementation Fair at iPRES 2016, about how PREMIS in METS metadata is implemented in the Archivematica digital preservation system.
UBC Library's Digital Preservation StrategyUBC Library
Presented by Bronwen Sprout & Sarah Romkey, UBC Library.
In early 2011, UBC Library began work on creating a digital preservation strategy in collaboration with Vancouver-based Artefactual Systems. Based on the results of a number of pilot projects, the strategy developed for UBC Library consists of using the open-source Archivematica digital preservation system to provide preservation functionality for the Library’s digitized and born-digital holdings. In addition, the strategy identifies the software requirements, existing and new system components, staffing and business processes that can be implemented to establish operational digital preservation systems and processes. They will discuss the strategy generally and cover three areas of implementation in greater detail: UBC Library’s Rare Books and Special Collections, cIRcle, a DSpace-based institutional repository, and CONTENTdm, UBC Library’s access system for digitized objects.
In April 2014, the Bentley Historical Library received a $355,000 grant from the Mellon Foundation to integrate ArchivesSpace, Archivematica and DSpace into an end-to-end digital archives workflow. This presentation will identify key project goals and outcomes and demonstrate features and functionality of Archivematica’s new “Appraisal and Arrangement” tab.
PERICLES Information Packaging TechniquesPERICLES_FP7
This presentation by Anna-Grit Eggers (University of Goettingen) introduces main methods and standards for information packaging.
PERICLES is a four-year Integrated Project (2013-2017) funded by the European Union under its Seventh Framework Programme (ICT Call 9).
http://pericles-project.eu/
Presentation to the PREMIS Implementation Fair at iPRES 2016, about how PREMIS in METS metadata is implemented in the Archivematica digital preservation system.
NCompass Live - Nov. 21, 2018
http://nlc.nebraska.gov/ncompasslive/
To enhance access to their diverse materials, libraries are digitizing those materials and making them freely available online as digital collections on digital platforms. These collections provide another way for libraries to re-envision their materials and make them relevant to their communities. This presentation will cover best practices for creating and preserving digital collections, including workflows, standards, and staffing. It will also discuss the policies which should be developed for building successful digital collections, as well as the privacy issues which should be considered. In this presentation, individual digital collections from the University of Nebraska at Omaha and Creighton University Law Library, including the Omaha Oral History Collection and the Delaney Tokyo Trial Papers, will be demonstrated.
Presenters: Corinne Jacox, Catalog/Reference Librarian, Creighton University Law Library & Yumi Ohira, Digital Initiatives Librarian, UNO Criss Library.
Archivematica and Local Authority Archive ServicesPaweł Jaskulski
Presentation accompanying demonstration of Archivematica to EERAC (East of England Regional Archives Council) members introducing OAIS (Open Archival Information System) methodology. Identifies common operations for both: transfer and ingest of digitally born archives into digital repository and accessioning paper-based archives. How digital preservation relates to and fits within traditional archival processing.
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES_FP7
This presentation was delivered by Noa Campos López and Marcel Hellkamp from PERICLES project partner Georg-August-Universität Göttingen (GWDG), at the interactive workshop ‘Eye of the Storm: Preserving Digital Content in an Ever-Changing World’ (Wellcome Collection Conference Centre, London, 2 December 2016).
This full-day event aimed at introducing and experimenting with the PERICLES model-driven approach demonstrating its usefulness for managing change in evolving digital ecosystems.
http://pericles-project.eu/
Digital Asset Management and Digital Asset Management Software explained by leading vendor Asset Bank. This presentation starts with a definition of Digital Asset Management (DAM) and DAM software. It then cover key elements such as; uploading files. organising assets, user permissions, downloading files, lightboxes, searching, enterprise features, reporting, storage, pricing, and finally, a bit more information about Asset Bank
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?BIOVIA
Collection Update 1 for Pipeline Pilot 8.5 includes key new features for the Pipeline Pilot Client, as well as the Imaging, Next Gen Sequencing, Chemistry, Documents and Text, and Statistics and Modeling collections. An exciting new feature for the Pipeline Pilot Client is Protocol Comparison – the ability to compare protocols, or versions of protocols, allowing you to see and resolve differences between them.
Page 18Goal Implement a complete search engine. Milestones.docxsmile790243
Page 1/8
Goal: Implement a complete search engine. Milestones Overview
Milestone Goal #1 Produce an initial index for the corpus and a basic retrieval component
#2 Complete Search System
Page 2/8
PROJECT: SEARCH ENGINE Corpus: all ICS web pages We will provide you with the crawled data as a zip file (webpages_raw.zip). This contains the downloaded content of the ICS web pages that were crawled by a previous quarter. You are expected to build your search engine index off of this data. Main challenges: Full HTML parsing, File/DB handling, handling user input (either using command line or desktop GUI application or web interface) COMPONENT 1 - INDEX: Create an inverted index for all the corpus given to you. You can either use a database to store your index (MongoDB, Redis, memcached are some examples) or you can store the index in a file. You are free to choose an approach here. The index should store more than just a simple list of documents where the token occurs. At the very least, your index should store the TF-IDF of every term/document. Sample Index:
Note: This is a simplistic example provided for your understanding. Please do not consider this as the expected index format. A good inverted index will store more information than this. Index Structure: token – docId1, tf-idf1 ; docId2, tf-idf2
Example: informatics – doc_1, 5 ; doc_2, 10 ; doc_3, 7 You are encouraged to come up with heuristics that make sense and will help in retrieving relevant search results. For e.g. - words in bold and in heading (h1, h2, h3) could be treated as more important than the other words. These are useful metadata that could be added to your inverted index data. Optional (1 point for each meta data item up to 2 points max):: Extra credit will be given for ideas that improve the quality of the retrieval, so you may add more metadata to your index, if you think it will help improve the quality of the retrieval. For this, instead of storing a simple TF-IDF count for every page, you can store more information related to the page (e.g. position of the words in the page). To store this information, you need to design your index in such a way that it can store and retrieve all this metadata efficiently. Your index lookup during search should not be horribly slow, so pay attention to the structure of your index COMPONENT 2 – SEARCH AND RETRIEVE: Your program should prompt the user for a query. This doesn’t need to be a Web interface, it can be a console prompt. At the time of the query, your program will look up your index, perform some calculations (see ranking below) and give out the ranked list of pages that are relevant for the query.
COMPONENT 3 - RANKING:
At the very least, your ranking formula should include tf-idf scoring, but you should feel free to add additional components to this formula if you think they improve the retrieval. Optional (1 point for each parameter up to 2 points max): Extra credit will be given if your ranking formula includes par.
Islandora & Archivematica combined NDSA RAG poster for LITAaaroncollie
This is a poster I created for LITA describing a proposed integration of Archivematica and Islandora. It attempts to describe, using a red-amber-green chart, the perceived benefit of the two softwares working in tandem.
NCompass Live - Nov. 21, 2018
http://nlc.nebraska.gov/ncompasslive/
To enhance access to their diverse materials, libraries are digitizing those materials and making them freely available online as digital collections on digital platforms. These collections provide another way for libraries to re-envision their materials and make them relevant to their communities. This presentation will cover best practices for creating and preserving digital collections, including workflows, standards, and staffing. It will also discuss the policies which should be developed for building successful digital collections, as well as the privacy issues which should be considered. In this presentation, individual digital collections from the University of Nebraska at Omaha and Creighton University Law Library, including the Omaha Oral History Collection and the Delaney Tokyo Trial Papers, will be demonstrated.
Presenters: Corinne Jacox, Catalog/Reference Librarian, Creighton University Law Library & Yumi Ohira, Digital Initiatives Librarian, UNO Criss Library.
Archivematica and Local Authority Archive ServicesPaweł Jaskulski
Presentation accompanying demonstration of Archivematica to EERAC (East of England Regional Archives Council) members introducing OAIS (Open Archival Information System) methodology. Identifies common operations for both: transfer and ingest of digitally born archives into digital repository and accessioning paper-based archives. How digital preservation relates to and fits within traditional archival processing.
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES_FP7
This presentation was delivered by Noa Campos López and Marcel Hellkamp from PERICLES project partner Georg-August-Universität Göttingen (GWDG), at the interactive workshop ‘Eye of the Storm: Preserving Digital Content in an Ever-Changing World’ (Wellcome Collection Conference Centre, London, 2 December 2016).
This full-day event aimed at introducing and experimenting with the PERICLES model-driven approach demonstrating its usefulness for managing change in evolving digital ecosystems.
http://pericles-project.eu/
Digital Asset Management and Digital Asset Management Software explained by leading vendor Asset Bank. This presentation starts with a definition of Digital Asset Management (DAM) and DAM software. It then cover key elements such as; uploading files. organising assets, user permissions, downloading files, lightboxes, searching, enterprise features, reporting, storage, pricing, and finally, a bit more information about Asset Bank
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?BIOVIA
Collection Update 1 for Pipeline Pilot 8.5 includes key new features for the Pipeline Pilot Client, as well as the Imaging, Next Gen Sequencing, Chemistry, Documents and Text, and Statistics and Modeling collections. An exciting new feature for the Pipeline Pilot Client is Protocol Comparison – the ability to compare protocols, or versions of protocols, allowing you to see and resolve differences between them.
Page 18Goal Implement a complete search engine. Milestones.docxsmile790243
Page 1/8
Goal: Implement a complete search engine. Milestones Overview
Milestone Goal #1 Produce an initial index for the corpus and a basic retrieval component
#2 Complete Search System
Page 2/8
PROJECT: SEARCH ENGINE Corpus: all ICS web pages We will provide you with the crawled data as a zip file (webpages_raw.zip). This contains the downloaded content of the ICS web pages that were crawled by a previous quarter. You are expected to build your search engine index off of this data. Main challenges: Full HTML parsing, File/DB handling, handling user input (either using command line or desktop GUI application or web interface) COMPONENT 1 - INDEX: Create an inverted index for all the corpus given to you. You can either use a database to store your index (MongoDB, Redis, memcached are some examples) or you can store the index in a file. You are free to choose an approach here. The index should store more than just a simple list of documents where the token occurs. At the very least, your index should store the TF-IDF of every term/document. Sample Index:
Note: This is a simplistic example provided for your understanding. Please do not consider this as the expected index format. A good inverted index will store more information than this. Index Structure: token – docId1, tf-idf1 ; docId2, tf-idf2
Example: informatics – doc_1, 5 ; doc_2, 10 ; doc_3, 7 You are encouraged to come up with heuristics that make sense and will help in retrieving relevant search results. For e.g. - words in bold and in heading (h1, h2, h3) could be treated as more important than the other words. These are useful metadata that could be added to your inverted index data. Optional (1 point for each meta data item up to 2 points max):: Extra credit will be given for ideas that improve the quality of the retrieval, so you may add more metadata to your index, if you think it will help improve the quality of the retrieval. For this, instead of storing a simple TF-IDF count for every page, you can store more information related to the page (e.g. position of the words in the page). To store this information, you need to design your index in such a way that it can store and retrieve all this metadata efficiently. Your index lookup during search should not be horribly slow, so pay attention to the structure of your index COMPONENT 2 – SEARCH AND RETRIEVE: Your program should prompt the user for a query. This doesn’t need to be a Web interface, it can be a console prompt. At the time of the query, your program will look up your index, perform some calculations (see ranking below) and give out the ranked list of pages that are relevant for the query.
COMPONENT 3 - RANKING:
At the very least, your ranking formula should include tf-idf scoring, but you should feel free to add additional components to this formula if you think they improve the retrieval. Optional (1 point for each parameter up to 2 points max): Extra credit will be given if your ranking formula includes par.
Islandora & Archivematica combined NDSA RAG poster for LITAaaroncollie
This is a poster I created for LITA describing a proposed integration of Archivematica and Islandora. It attempts to describe, using a red-amber-green chart, the perceived benefit of the two softwares working in tandem.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Generating a custom Ruby SDK for your web service or Rails API using Smithy
SFU Library's METS-Bagger Tool
1. METS-Bagger Tool
Normalizing existing digitized content into standardized
packages for robust long-term management.
Marcus Emmanuel Barnes
#c4lbc
2013-11-28
2. Background
● SFU Library holds about 15 TB of content
○ the Library has created high-quality master versions
of content it has digitized using ‘preservationfriendly’ formats.
○ descriptive metadata exists for almost all of it.
However, this content was not previously
managed with generally accepted digital
preservation practice.
3. Solution
● SFU Library Digitized Content Packaging
Specification
● METS-Bagger tool for normalizing existing
digitized content based on this specification
for robust long-term management.
4. METS-Bagger Tool
● Two components:
○ Collection normalization script
○ Integrity scripts based on collection
manifest
5. Collection Normalization
● Processes existing collections of files into a format
compliant with the SFU Library Digitized Content
Packaging Specification
● Packaging Formats:
○ METS (http://www.loc.gov/standards/mets/)
○ BagIt (http://tools.ietf.org/html/draft-kunze-bagit)
6. How Collection Normalization Works
1. Configuration file for settings
2. Script walks the directory tree of a collection, compiles
list of files to be preserved
3. Files are collated into items (e.g., newspaper issue),
METS file is generated
4. Items files and associated METS file are bagged (and
serialized)
5. Future: A collection manifest is created for the collection
for integrity checking (automatic or manual).
8. Design Principles
● a minimalist implementation - uses as few METS and
BagIt options as possible.
● incorporates three widely implemented and understood
standards: METS, BagIt and UUID (Universally Unique
Identifiers)
● Technical metadata included in METS should include at
a minimum bit-level checksums, file type identification,
creating application, and where possible format validity
● Whenever possible, include descriptive metadata for the
item in the METS file.
9. Script Details
● Configuration file, main script, log file, processed
collection output directory
● Uses Python for using the tool on multiple platforms
● Plugins for technical metadata (FITS) and descriptive
metadata.
● Configuration options include:
○ test run (limited run size)
○ skipping technical metadata creation
○ file types of interest
10. Future
● Addition of manifest and integrity checking
tools that check a collection against its
manifest
● Additional plugins
● Sharing code on GitHub
11. Thank You
This work was made possible by the support of:
● Simon Fraser University Library
● SFU Library Systems group
● Mark Jordan @mjordan