Digital archivists from the Universities of Hull (UK), Stanford, and Yale currently are collaborating on an Andrew W. Mellon Foundation-funded project. Born-Digital Collections: An Inter-Institutional Model for Stewardship (AIMS) will produce a common framework for managing born digital archives. Each digital archivist presents a short case study to cover areas of workflow for electronic records: collection development, accessioning, arrangement and description, and discovery and access.
Practical Legacy Data Remediation - Redgrave LLPRedgrave LLP
There are plenty of people echoing the risks associated with legacy data and a "keep everything” mentality. Join us for a webinar that takes those discussions a step further, offering insight from both a legal and technical perspective into how remediation projects can be managed cost effectively and in a manner that does not up-end everyday business operations. During this one-hour discussion, Redgrave LLP Partner Andy Cosgrove and Analysts Diana Fasching and Christian Rummelhoff also outline a defensible framework for the disposition of legacy data, and share real-world examples of paper and electronic remediation projects. Victoria Edelman, Vice President of Education for the ALSP and Director of Training for iCONECT Development, facilitates.
Presentation on electronic records management and archival issues. Originally presented at the Fall 2008 meeting of the Southeastern Wisconsin Archivists Group
Practical Legacy Data Remediation - Redgrave LLPRedgrave LLP
There are plenty of people echoing the risks associated with legacy data and a "keep everything” mentality. Join us for a webinar that takes those discussions a step further, offering insight from both a legal and technical perspective into how remediation projects can be managed cost effectively and in a manner that does not up-end everyday business operations. During this one-hour discussion, Redgrave LLP Partner Andy Cosgrove and Analysts Diana Fasching and Christian Rummelhoff also outline a defensible framework for the disposition of legacy data, and share real-world examples of paper and electronic remediation projects. Victoria Edelman, Vice President of Education for the ALSP and Director of Training for iCONECT Development, facilitates.
Presentation on electronic records management and archival issues. Originally presented at the Fall 2008 meeting of the Southeastern Wisconsin Archivists Group
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATTony Ross-Hellauer
OpenAIRE and EUDAT co-present this webinar which aims to introduce researchers and others to the concept of research data management (RDM). As well as presenting the benefits of taking an active approach to research data management – including increased speed and ease of access, efficiency (fund once, reuse many times), and improved quality and transparency of research – the webinar will advise on strategies for successful RDM, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management, stewardship and archiving.
Webinar recording available: http://www.instantpresenter.com/eifl/EB57D6888147
DEVELOPING A KNOWLEDGE MANAGEMENT SPIRAL FOR THE LONG-TERM PRESERVATION SYSTE...cscpconf
The goal of Long-term preservation (LTP) is to make the sustainability of archives lasting for a foreseeable enough time. The efforts are primarily hampered by challenges such as missing of standards, formal methodology and workflow model during archiving. This research is aiming to explore the LTP of various kinds of documents independently from the evolution of time and changes in techniques within digital environments. Basic requirements come from integration of storage management and information management, securing preservation of data, metadata, indexes, etc. This paper presents the evolutionary development of the LTP process for Governmental Archive Management and Knowledge . Effective search to resources and efficient storage/access on data, recovery drawing on co-location back-up, dynamic regulation on authentication and security management are tasks followed. Then, a pilot Semantic Data Grid and service matching mechanisms are described, where the ontologism
plays a crucial role
Digital Preservation Best Practices: Lessons Learned From Across the PondBenoit Pauwels
Digital Preservation Best Practices: Lessons Learned From Across the Pond. Slavko Manojlovich (Associate University Librarian (IT) / Manager, Digital Archives Initiative Memorial University St Johns Canada) and Benoit Pauwels (Head, Library Automation Team, Université libre de Bruxelles Belgium)
A presentation for researcher, majorly scientists, on how to prepare proposal with well structured and documented data management plan. it presentation also covered key aspect of data management planning as well as the importance of data management planning. What are donors or funders looking for in a research proposal?
Overview of the Research on Open Educational Resources for Development (ROER4D) Open Data initiative, highlighting data management principles, the five pillars of the ROER4D data publication approach and the project de-identification approach.
Introduction to archival processing, presented as part of a one-day workshop on the same topic, Drexel University, April 23, 2010. Adapted with permission from training materials created by Holly Mengel for the PACSCL Hidden Collections Processing Project. http://clir.pacscl.org/
Archival description and archival standards, an introduction to General International Standard Archival Description ISAD(G) and International Standard Archival Authority Record for Corporate Bodies ISAAR(CPF).
Archivematica and Local Authority Archive ServicesPaweł Jaskulski
Presentation accompanying demonstration of Archivematica to EERAC (East of England Regional Archives Council) members introducing OAIS (Open Archival Information System) methodology. Identifies common operations for both: transfer and ingest of digitally born archives into digital repository and accessioning paper-based archives. How digital preservation relates to and fits within traditional archival processing.
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATTony Ross-Hellauer
OpenAIRE and EUDAT co-present this webinar which aims to introduce researchers and others to the concept of research data management (RDM). As well as presenting the benefits of taking an active approach to research data management – including increased speed and ease of access, efficiency (fund once, reuse many times), and improved quality and transparency of research – the webinar will advise on strategies for successful RDM, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management, stewardship and archiving.
Webinar recording available: http://www.instantpresenter.com/eifl/EB57D6888147
DEVELOPING A KNOWLEDGE MANAGEMENT SPIRAL FOR THE LONG-TERM PRESERVATION SYSTE...cscpconf
The goal of Long-term preservation (LTP) is to make the sustainability of archives lasting for a foreseeable enough time. The efforts are primarily hampered by challenges such as missing of standards, formal methodology and workflow model during archiving. This research is aiming to explore the LTP of various kinds of documents independently from the evolution of time and changes in techniques within digital environments. Basic requirements come from integration of storage management and information management, securing preservation of data, metadata, indexes, etc. This paper presents the evolutionary development of the LTP process for Governmental Archive Management and Knowledge . Effective search to resources and efficient storage/access on data, recovery drawing on co-location back-up, dynamic regulation on authentication and security management are tasks followed. Then, a pilot Semantic Data Grid and service matching mechanisms are described, where the ontologism
plays a crucial role
Digital Preservation Best Practices: Lessons Learned From Across the PondBenoit Pauwels
Digital Preservation Best Practices: Lessons Learned From Across the Pond. Slavko Manojlovich (Associate University Librarian (IT) / Manager, Digital Archives Initiative Memorial University St Johns Canada) and Benoit Pauwels (Head, Library Automation Team, Université libre de Bruxelles Belgium)
A presentation for researcher, majorly scientists, on how to prepare proposal with well structured and documented data management plan. it presentation also covered key aspect of data management planning as well as the importance of data management planning. What are donors or funders looking for in a research proposal?
Overview of the Research on Open Educational Resources for Development (ROER4D) Open Data initiative, highlighting data management principles, the five pillars of the ROER4D data publication approach and the project de-identification approach.
Introduction to archival processing, presented as part of a one-day workshop on the same topic, Drexel University, April 23, 2010. Adapted with permission from training materials created by Holly Mengel for the PACSCL Hidden Collections Processing Project. http://clir.pacscl.org/
Archival description and archival standards, an introduction to General International Standard Archival Description ISAD(G) and International Standard Archival Authority Record for Corporate Bodies ISAAR(CPF).
Archivematica and Local Authority Archive ServicesPaweł Jaskulski
Presentation accompanying demonstration of Archivematica to EERAC (East of England Regional Archives Council) members introducing OAIS (Open Archival Information System) methodology. Identifies common operations for both: transfer and ingest of digitally born archives into digital repository and accessioning paper-based archives. How digital preservation relates to and fits within traditional archival processing.
Update on the University of Michigan Bentley Historical Library's "ArchivesSpace, Archivematica - Dspace Workflow Integration" project (funded by a generous grant from the Andrew W. Mellon Foundation). The project seeks to integrate these platforms into an end-to-end digital archives workflow that will facilitate the deposit of content into a digital repository and enable the reuse of descriptive and administrative metadata across platforms. This presentation was made to the March 27, 2015 meeting of the Mid-Michigan Digital Practitioners in Ann Arbor.
Rebecca Grant - Archival Description and Archival Arrangementdri_ireland
Presentation given by Rebecca Grant of the Digital Repository of Ireland as part of a training session in The National Irish Visual Arts Library (NIVAL), 12 August 2014.
An introduction to the main principles of archival arrangement and description, including an overview of hierarchical arrangement of archives and the archival descriptive standard ISAD(G).
lecture presented at the Seminar-Workshop on the theme “Organizing and Digitizing Library Archival Materials: ISAD (G) and Technology” organized by the Philippine Librarians Association, Inc. – Western Visayas Region Librarians Council (PLAI-WVRLC) in coordination with the National Committee for Libraries and Information Services – National Commission for Culture and The Arts (NCLIS-NCCA) held at the Colegio de San Agustin—Bacolod, Bacolod City, 27 September 2012.
lecture conducted for the Department of Health personnel during a 5-day seminar organized by the Society of Philippine Health History, Inc. on “Basic Library Management” at Kimberly Hotel, Pedro Gil, Ermita, Manila, Philippines (2004 Oct 8)
presented at PAARL's Summer Conference on
Promoting Skills Enhancement and Core Competencies for the Professionalization of Librarians, held at Casa Pilar Resort, Boracay, Malay, Aklan, Philippines on 2002 April 10
Introduction to arrangement and description (feb 4&5, 2012)Amanda Hill
Slide presented at the 'Introduction to Arrangement and Description' workshop at the University of Guelph on February 4 and 5, 2012. They include an overview of key elements of the Rules for Archival Description and an introduction to creating descriptions for the new Archeion service.
A 3-day training program developed for the seminar-workshop on Archival Management, sponsored by South Manila Inter-Institutional Consortium Committee of Librarians, held on March 26-28, 2008.
Webinar presented for WiLS by Emily Pfotenhauer, Recollection Wisconsin Program Manager, June 24, 2014. Based on information from the Demystifying Born Digital reports from OCLC Research and the Digital Preservation Education and Outreach (DPOE) curriculum developed by the Library of Congress.
This presentation provides a few key tips for effective data management: how to plan ahead, how to organize data, how to preserve data, and how to market.
This presentation will provide an overview of issues in digital preservation. Presentation was delivered during the joint DPE/Planets/CAPAR/nestor training event, ‘The Preservation challenge: basic concepts and practical applications’ (Barcelona, March 2009)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)dri_ireland
Presentation given by Rebecca Grant, Digital Archivist with Digital Repository of Ireland, part of a workshop on Digital Archiving and Digital Preservation held as part of Figshare Fest in London, May 12th 2016. Figshare is an online digital repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. Its annual Figshare Fest is a chance to gather together institutional clients, advocates and friends to talk about open research.
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...faflrt
ALA/FAFLRT Workshop on Open Archival Information Service (OAIS). Presented by Robin Dale, RLG. Sponsored by ALA Federal and Armed Forces Libraries Roundtable (FAFLRT). Presented on June 16, 2001 at the ALA Annual Conference.
Who Decides? Reinterpreting archival processes for the management of digital ...GarethKnight
Management of digital records can benefit from the contribution of digital curators and archivists. The presentation outlines the efforts of the PEKin project at King's College London to develop a management strategy that combines these disparate skillsets
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Epistemic Interaction - tuning interfaces to provide information for AI support
Saa Session 502 Born Digital Archives in Collecting Repositories
1. Born-Digital Archives inCollecting Repositories: Turning Challenges into Byte-Size Opportunities Gretchen Gueguen, Mark A. Matienzo, Simon Wilson, and Peter Chan Session 502, 27 August 2011 Society of American Archivists Annual Meeting
2. AIMS Project "Born-Digital Collections: An Inter-Institutional Model for Stewardship“ Two year project to create a framework for stewardship of born-digital archival records in collecting repositories Funded by the Andrew W. Mellon Foundation
4. Grant Goals Processing of Hybrid Collections Software Development Community Development Unconference (May 2011, Charlottesville, VA) UK Symposium (June 2011, London, England) Workshop (August 2011, Chicago, IL) White Paper and Project Report
5. Framework Development A framework for collecting and delivering the born-digital materials that are quickly beginning to constitute the collections of contemporary scholarly, literary, and political figures and organizations.
8. What is Collection Development? Actions and policies of institutions to bring in material for end users (both current and future); includes prioritizing, developing relationships with creators, assessments, negotiating agreements and preparing for accessioning. Within the AIMS framework Viable, practical method to capture/process born-digital material from hybrid collections requires sound work at the beginning (i.e. policies, practices, agreements with donors, etc.) to set up later work
9. Elements of Collection Development Prerequisites Establish relationship with donor Analyze Feasibility Negotiate Agreements Prepare for Accessioning
10. Prerequisites… Neil Beagrie, "Plenty of Room at the Bottom? Personal Digital Libraries and Collections," D-Lib Magazine (June 2005) Blagofaire. http://xkcd.com/239/
15. Prepare for Accessioning... Scope and extent determined? Coordination with acquisition of analog material? Method and time determined? Pre-acquisition appraisal performed? Enhanced curationcarried out? Test capture if needed? Development of new methodologies undertaken as needed/possible?
17. What is Accessioning? Archival institution takes physical and legal custody of a group of records from a donor and documents the transfer in a register or other representation of the institution’s holdings Within AIMS Framework Processes which establish physical, administrative and intellectual control over transferred records; assessment and documentation of future needs; documentation of actions taken; beginning of safe storage and maintenance
18. Elements of Accessioning Prerequisites Transfer records and gain administrative control Physical control and stabilization Intellectual control and documentation to support further processes Maintain accessioned records
19. Case Study:Re-Accessioning at Yale Collaborative capacity building across two repositories Manuscripts and Archives Beinecke Rare Book and Manuscript Library Addressing previously received accessions of containing electronic records on media Still in testing phase, but working towards implementing in production
20. Types of Records and Media Wide variety of records creators Literary authors University faculty University offices Architectural firms Common types of media Floppy disks: 5.25” and 3.5” Optical media: CDROM, CD-R, DVD-R, etc. Zip disks USB flash drives
21. Goals of Re-Accessioning Identify, document, and register media Mitigate risk of media deterioration and obsolescence Extract basic metadata from filesystems on media and files contained on filesystems
23. Disk Imaging Using “forensic” (bit-level) imaging process Ensure data on media is not manipulated using write-protection Uses software to acquire images Includes hash-based verification process
24.
25. Media Log Using SharePoint list Contains unique identifier of media Records physical/logical characteristics of media Documents success, failure, or status of various processes and additional notes
28. Metadata Extraction Can be repurposed for descriptive, administrative, and technical metadata Uses command-line tools (Sleuthkit, fiwalk) Outputs XML document
29. Packaging and Transfer Using BagIt packages/Bagger application Packages contain disk images, extracted metadata, imaging logs, and high-level accession information Transfer to storage is verified by comparison against manifest
32. Purpose of Arrangement & Description The general objectives for Arrangement & Description are: - to preserve context - to establish intellectual control of the material - to provide a means of discovery SAA definition, emphasis on minimizing the amount of handling Within the AIMS framework Processes which establish intellectual control of the material including implementation of policies and agreements with donors etc. to enable subsequent discovery and access
33. Elements of Arrangement and Description Prerequisites Plan for processing - gather supporting information; files captured from media (accessioning); convert files (for viewing); appraisal strategy; assess arrangement options; consider preservation issues Processing - implement arrangement strategy; add descriptive metadata and wider context (eg Collection Level Description); copyright & other legal considerations 4. Prepare for Discovery & Access- remove restricted access to b-d material during processing
34. Case Study - Stephen Gallagher Background:2005: 42 boxes paper archives 2010: born-digital material: 14,320 files (13.6GB) transferred to us via external hard drive and a box of Amstrad disks Create integrated catalogue to accommodate paper, born-digital and future accruals
35. Case Study - Stephen Gallagher Approach: - current work higher priority in filing system - considered each work a distinct ‘project’ - structure reflect his way of working & the archival principles of control that creator, archivist & user can all understand Series level was most logical solution- all related files placed in the series - reasonable return for our effort
36.
37. commercial implications: access via repository = publication? - re-purposing of work from one (unsuccessful) project to another
42. can we appraise without knowing the contents? similar to paper material that is in a different language?
43. Challenges faced Volume of material : - depositor perception that 'storage is cheap‘ - does this mean we shouldn’t appraise the material we receive? - wide range of file types encountered - not practical to describe each and every file - risk management - if you don’t check every file for sensitive information - we need to automate as much of the processing as possible
46. ability to return to original order of the material
47. view some file types, add descriptive metadata etc
48. high level of granularity when applying rights & permissionsTechnical (acquired at accessioning) and descriptive metadata - Discovery & Access process
50. What is Discovery & Access Discovery and Accessrefers to the systems and workflows that make processed or unprocessedmaterial and the metadata that support it available to users.
61. D&A – Tag & Annotation by Invited Persons / Public Annotation:
62. Impacts fromCollection Development File formats: no restriction Computer medium: no restriction (punch card, open reel tape, 5.25 inch floppy, 3.5 inch floppy), File type: no restriction (computer program, data set, document, spreadsheet), Agreement: permission to post contents online.
63. Impacts fromAccessioning Built 5.25 inch floppy capture station Ask Computer History Museum to read punch cards Open reel tapes – still outstanding
64. Impacts fromProcessing AccessData FTK was used to search files with restricted information, annotate files with appropriate descriptive metadata (book title, articles, etc.), and rights metadata (access restriction), generate technical metadata for the delivery platform to act upon. Transit Solution was used to transform files to html format for display in web. A XSLT program was written to transform the XSL-FO output from FTK to XML content document. A Ruby program was written to ingest the XML content document, original files, and the display derivatives to Fedora.
71. Want to know more? http://born-digital-archives.blogspot.com http://born-digital-archives.blogspot.com Gretchen Gueguen Mark Matienzo gmg2n@virginia.edumark.matienzo@yale.edu Simon Wilson Peter Chan s.wilson@hull.ac.ukpchan3@stanford.edu
Editor's Notes
Hello and welcome to session 502: Born-Digital Archives in Collecting Repository: Turning Challenges into Byte-Size OpportunitiesMy name is Gretchen Gueguen and I’m Digital Archivist at the University of Virginia. This morning, along with my colleagues Mark Matienzo from Yale, Simon Wilson from the University of Hull, and Peter Chan from Stanford, I’m going to talk with you about the AIMS project.
AIMS is the short title for a Mellon-funded grant project entitled Born-Digital Collections: An Inter-Institutional Model for Stewardship. This two-year project set out to create a framework for stewardship of born-digital archival records in the collecting repositories.
As I’ve mentioned, the grant partners include UVA, Stanford, Hull and Yale and Virginia serves as the PI
The grant set out to achieve it’s goal through 4 different areas of activity. The first was the processing of several hybrid collections which you are going to hear about later this morning. The Digital archivists at each institution, the four of us here this morning, were funded by the grant to carry out this processing.To facilitate this stewardship, the partners also sought to develop some software solutions. You won’t hear as much about these this morning, but they include Rubymatica, a ruby-based reworking of Archivematica for the creation of Submission Information Packages, and Functional Requirement for a software tool to facilitate arrangement, description and access to born-digital archival materials. These requirements led to work on developing Hypatia, which is what is known as a “Hydra Head” or a module for the Fedora/Solr/Blacklight Hydra stack, for access to born-digital materials.The partners also hosted several events to garner feedback and to encourage communication among the archival community, including a workshop that took place here in Chicago earlier this week.The final project deliverables will include a White Paper synthesizes the research done during the project and a project report to the Mellon Foundation.
A large part of the White Paper focuses on what we are currently referring to as the AIMS framwork: “A framework for collecting and delivering the born-digital materials that are quickly beginning to constitute the collections of contemporary scholarly, literary, and political figures and organizations.”This is really a high-level look at the tools, strategies, methodologies, and practices needed to effectively manage b-d content
The framework is characterized by four main functions of stewardship:Collection DevelopmentAccessioningArrangement and DescriptionDiscovery and AccessYou’ll notice that we do not include “preservation” as an explicit function here. That is an intentional omission because we believe that preservation is implicit in all of these functions. In addition aspects such developing a preservation repository or undertaking preservation activities are outside of this scope because they are larger institutional initiatives. They are mentioned as prerequisites to being able to do work in many steps, but since there are many guidelines out there we didn’t feel the need to reiterate them here.We are going to focus the rest of our presentation this morning on these four areas and share with you some of the work we have done.If you are interested in more on the background of the project, I will encourage you to check out our project blog, called born Digital Archives and I’ll put a URL up for the blog at the end of the presentation
We are starting our model with activities related to Collection Development. These are the activities undertaken in order to bring material in to the institution. These include activities we may be very familiar with like prioritizing, developing relationships with creators, doing assessments and negotiating agreements.Within the concept of the AIMS model, which is primarily a hybrid collection environment, this work will be necessary to develop sound capturing and processing activities later.
We’ve defined collection development as having five distinct stages which I’m going to go over with you this morning:PrerequisitesEstablish relationship with donorAnalyze FeasibilityNegotiate AgreementsPrepare for Accessioning
The first step is going through some prerequisites like having an appraisal processes: how will you assess or evaluate materials? How will you be able to determine value? Also you need to evaluate your storage capacity: Do you have enough space to keep this material in both the short- and long-term? What about future transfers? Do you have a sound data preservation strategy or methodology?One of the most important prerequisites is establishing Collection policies.Defining what it is that we want to collect takes on a couple of different questions.The first might be what types of material are we interested in, in the traditional collecting sense: prominent people, organizational records, etc.Next, we need to consider what part of those figures lives we are collecting. We use our digital devices for private activities, as well as more public ones…which are we interested in collecting?The next logical step then is to think about where this information might be on digital devices: stored files probably yes, but do we also need software, operating systems, hardware, internet activity or cloud material?All of these factors, and more come together in a collection development policy, and it can be very difficult to write, especially when you are just starting and don’t know
Assuming that you have the needed prerequisites in place or have the capacity to work on them, you can move on to the actual work of collection development:The first step is establishing a relationship with the donor. In many ways this is parallel to existing analog work, but when dealing with born-digital materials you should start thinking early about how digital archive staff need to be involved? This is potentially going to be very different from access to physical materials and now is the time to discuss options. Now is also the time to discuss the creation of the data with the donor and capture any documentation that will help with later processing and access. But, how comfortable is your donor with digital concepts and access to digital materials? As an example of the difficulty that this can cause, I’d like to show the example of some work that the AIMS project did in this regard. This is a digital donor survey that the AIMS project created based on one created for the PARADIGM workbook. The original intention was that a donor could fill this out before accessioning.This is the first page…and this is the second…and the third…and the fourth….and this is part two!We quickly realized that this would be overwhelming to potential donors, especially ones who hadn’t really thought much about things like their online persona or email preservation. We changed tactics and now recommend that this survey be used as prompt sheet for the archivist in an interview.
Such an interview may be part of a program of enhanced curation, something Jeremy Leighton John at the British Library describes as not only collect[ing] the original archive but add[ing] value to it.“enhanced curation” techniques include things like documenting the creator’s workspace with high-resolution digital photography, creating a digital film of an oral history interviewwith donors about their computers and their computing habits, perhaps capturing video of screencasts of the donor describing the organization on their computer. This type of information can be invaluable as materials are accessioned and processed as the level of abstraction or unfamiliarity with a new system can make it difficult to gain intellectual control.
Okay, so you are ready to move on to considering whether or not you even *can* acquire this material or more likely whether it is worth the costs. What is the cost analysis and risk analysis? Try a test capture…how does it work? Do you have the needed infrastructure and policies or can you create them? Can you even view files in order to appraise them? Do you need these guys to accomplish this? Or maybe these guys?It’s very easy to say “analyze costs” or “evaluate your home institution infrastructure” but if you’ve never encountered a particular software or hardware it’s difficult to be prepared for them. This is where having technologists or digital archivists involved early in the process can help. If possible during a test capture they can do a triage to determine if there are serious preservation concerns, if any forensic processing might be needed to recover damaged or deleted files. Etc.
Moving on then, the next step is negotiating agreements. One of the big problems here is that there is a lack of models for agreements and appraisals. Many elements of standard agreements remain applicable in the hybrid or born-digital archive, but have different implications. It’s not the same to provide unrestricted access to paper documents in a reading room and unrestricted access to digital materials online. Furthermore, you have a much larger potential for capturing and inadvertently exposing sensitive electronic information like financial and health information, passwords and other personal data.The legal agreement with the donor needs to specify:An Agreement about copyright – either transferred to repository/institution or remain with creator/heirsUnderstanding that collecting repository will be “sole” repository of b-d material Understanding of capabilities/limits for capturing b-d material (currently)Understanding of preservation strategies and capabilitiesUnderstanding of delivery capabilities and limits (current)Understanding of what/how files will be restricted or deleted & how this will be confirmed Understanding of capabilities/limits of appraisal, viewing, description/processing of b-d materialUnderstanding of the creative process and relationship with b-d materials, computers, hand-held devices, cloud computing, etc.
The final step in collection development is to prepare for processing. This may seem a little odd in a traditional sense, but what we are alluding to here is making sure that all of your technical steps for transfer, which may not be in the agreement, are planned ahead of time. Specifically, Scope and extent determinedMethod and time determinedPre-acquisition appraisal performedTest capture if neededDevelopment of new methodologies undertaken as needed/possibleEnhanced curation carried outCoordination with acquisition of analog materialThis is really the “action” step where many of the activities you have been planning prior are carried out. Overall, the steps in Collection Development help to set up later activities. By the end of the collection development step, the institution should be ready to take legal and physical custody of material. Doing this in a forward-thinking, planfull manner will help later processes go much smoother. You’ve made it to the finish line of collection development, but now we need to move on to Accessioning.
Accessioning is generally understood as the set of processes wherein a repository takes physical and legal custody of records from a donor and formally documents, or "registers." the transfer. The processes have clear links to both collection development and arrangement and description, and in some cases, institutions may view them as part of those processes. However, we have situated accessioning as a primary function within the AIMS framework.Within our framework, accessioning serves a vital role to allow a collecting repository to establish physical, administrative, and intellectual control over records that have been transferred. The accessioning processes allow archivists to gather a wide variety of information that will inform and prioritize other processes, such as arrangement and description, further appraisal, and requirements for access. Accessioning also provides an environment in which archivists can document their actions and ultimately transfer the accessioned records into an environment for their storage and maintenance.The goals of accessioning therefore reflect the need to establish control over and ensure the authenticity and reliability of transferred records. Archivists must therefore be diligent during accessioning and understand that they understand the potential impact of the actions they take during these processes. If a collecting repository is unable to establish an adequate level of control over transferred electronic records, then it is likely that it has not successfully accessioned them. Accordingly, archivists with "legacy" accessions of electronic records, such as those containing computer media, may want to consider "reaccessioning" those transfers to establish a suitable level of control.
The prerequisites, like the other areas of the AIMS model, broadly fall into several categories; in this case, they are policies, procedures, and infrastructure. There are many policies required to support accessioning properly. These may range from departmental preferences to requirements set at the institutional level. Procedures may account for a number of different options, such as minimal processing, accessioning of born-digital materials with paper records, deferment of digital accessioning, accessioning as resources allow, and retrospective accessioning of previously received electronic records. Infrastructure to support accessioning includes a wide variety of software and hardware, and expertise. This infrastructure will take resources to build, and archivists are urged to consider collaborative partnerships to allow for the better sharing of knowledge. The transfer and administrative control processes in the AIMS framework are very similar to those for other formats of records. Archivists working with electronic records should be familiar with the various types for transfers and their implications. Types of transfers can include receipt of retired media formerly in use by a creator, records copied to media only used for transfer (such as external hard drives, CDs or DVDs), or a direct transfer using disk imaging software or by copying files across a network.Once the under administrative control, archivists should focus their efforts to gain physical control over records and media. Much of this work concerns identifying and potentially addressing threats preservation issues in the records, such as viruses, unknown file formats, and the physical condition of media if appropriate.Archivists next need to establish intellectual control and gather documentation that will enable further work necessary to process, maintain, or use the records. For some transfers, a listing of directories or files may be repurposed for archival description if the existing arrangement appears to be of value.Finally, the archivist should prepare the records to be maintained over time. This may include actions such as normalizing to preservation formats. Ultimately, the records should also be transferred to a secure storage location that can be monitored by the collecting repository.
At Yale University, we have worked on a reaccessioning project that has allowed us to develop our thinking of how this accessioning of electronic records could best be realized for us going forward. Two repositories, Manuscripts and Archives and the Beinecke Rare Book and Manuscript Library, have worked in collaboration to implement software, hardware, and procedures that can be shared to support accessioning. In our reaccessioning project, we are working to establish better control over previously transferred accessions that contain electronic records on media such as floppy disks and CD-ROMs. These pieces of media were often received as part of a hybrid accession that also contained paper records, but in some cases we have received accessions of boxes containing only media.
The goals of our reaccessioning project are fairly straightforward and relate to the three types of control discussed previously. First, we seek to establish administrative control of the media by identifying what it is and documenting its physical and logical characteristics and by assigning a unique identifier to each piece. Secondly, we are working towards gaining physical control of the media, which will allow us to mitigate the risks of media deterioration and obsolescence. Finally, we are trying to establish a basic level of intellectual control by extracting metadata about the filesystems and files contained on the media, such as file names, directory structures, and creation, access, and modification dates.
Our reaccessioning workflow roughly looks like the following. We begin by retrieving the media and bringing it to the electronic records workstation, documenting its change in location within the Archivists’ Toolkit. We then assign unique identifiers to each of the media. We establish the best means by which to write-protect the media for imaging and record its identifying characteristics in a media log. We then put the media in the appropriate drive and create a forensic bit-level disk image, which includes all the files, the filesystem metadata, unused space – in other words, the entirety of the data on the media. We verify the image against the raw contents of the media and extract metadata from the disk image. Finally, we package the images and metadata and transfer the package into storage and complete the rest of the documentation.
To acquire the data off media, we are using a forensic imaging process that extracts the entirety of the data off the media at the lowest level possible. To ensure that we do not intentionally or accidentally manipulate any of the data on the original media, we write-protect the media or reader. For floppy disks, we can use physical write protect tabs. For USB flash media, hard drives, and the like, we connect the drive or reader to a write-blocker, which is a piece of hardware connected to the computer that blocks low-level write signals from a computer. We use a variety of software to acquire the images, such as FTK Imager. The imaging software extracts the data from the media and calculates a cryptographic hash of the data on the media and the data within the image file. If the checksums match, the imaging is viewed as successful. [ADD FTK Imager SCREENSHOT? WRITEBLOCKER PHOTO?]
This is a screenshot of FTK Imager, which we use to image media and to inspect disk images. You can see that the file listing includes regular files, slack or unused space on the disk, and deleted files, as denoted by the red X on the file icons.
Our media log is a SharePoint list that contains identifying characteristics and physical and logical information about the media, such as the type of media, when it was imaged, the text of a label or writing on the media, and the type of filesystem or filesystems it contains. We assign each piece of media a unique identifier, which is a combination of theaccession number and incremental number. The media log also contains the workflow status of the accessioning process for each piece of media and whether processes succeeded or failed.
The first screenshot is an overview for several pieces of media. You can see the unique media identifiers, the media format, and the workflow status.
This expanded view shows all the fields, including further documentation about the disk image, the filesystem contained, and additional notes.
If imaging is successful, we then extract metadata from the filesystem and files within the image. This is a software-based process that provides metadata such as file names, directory structures, creation and modification times, and approximate categorization of the types of files. This metadata can be repurposed in a variety of ways and provides a basic level of intellectual control that is comparable to a box list or other type of inventory for paper records. We are using open source software such as Sleuthkit and fiwalk to perform this extraction, but occasionally we need to rely on other tools for older or less common types of file systems.
Finally, we create a transfer package using the BagIt specification as developed by the Library of Congress and the California Digital Library. To create the packages, we are using the Library of Congress-developed Bagger application. These packages contain the disk images, extracted metadata, and logs generated by the disk imaging software during the acquisition process. The BagIt packages also contain high-level information about the accession. For the time being, we are making a rough connection of one bag per accession, but we realize we may need to modify depending on the size of the accessions.
This an overview of a sample bag, showing the structure and high-level metadata. Once packaged, we transfer the package to storage and verify the success of the transfer using procedures for the BagIt specification which compare the contents of the package against its manifest. If successful, we complete the rest of the documentation and record the success in the media log. We also record the storage location of the transferred package within the Archivists’ Toolkit and add the date of completion.
SAA definition for description puts emphasis on minimizing the amount of handling needs to be updated to consider preservation actions due to file format obsolescence etc
- reasonable return for our effort for us to describe the ‘project’ and indicative content that we held
What is sensitive will vary from collection to collection information (social security; personal e-mail address/mobile no etc) - Could also be discussion behind a decision (Larkin 25 funding)
As a result of experiences to tackle arrangement and description, the AIMS digital archivists' defined the requirements for a new tool - designed to work with technical and professional standards- use drag'n'drop to create intellectual arrangement, changes a relationship between digital assets (asset doesn’t move) using Fedora "sets“ - rights & permissions to single file, a discrete series or the entire collection
“the systems and workflows that make processed or unprocessedmaterial and the metadata that support it available to users.”Discovery and access is also not possible without completion of many of the prior steps described in this model. The outcomes of those steps have a significant impact on what is either appropriate or achievable in terms of discovery and access. Given the impact of these prior steps on discovery and access it is crucial to consider the desired outcomes for discovery and access as early as possible — ideally during the Collection Development phase — and to continue to update and revise these plans are work on the collection progresses.
Overall though, we have three major goals in discovery and access.The first is to make material available to user communities. This includes ensuring that the users can find the material, understand if it’s available, and get access to it if possibleHowever, that access must follow guidelines for access restrictions related to privacy, and intellectual property.An overarching goal of all three is to ensure that the significant properties of the material are inherent in whichever form the delivery takes.
We plan to delivery Stephen Jay Gould papers in the Hypatia platform. Hypatia is a fedora platform In Hypatia, we have one EAD for the hybrid collectionSeries 6 – for born digital material. We provide a link for people to go to an interface where they can browse and perform full text search on the born digital material of the papers.
Convert the files in obsolete file format such as WordPerfect to html. If not, people have to download the files and find a viewer to view the file or create an emulated environment to view the file.
Discovery and access is also not possible without completion of many of the prior steps described in this model. Some institution accept certain file formats only.
Researchers may also need to bookmark or label the files they found.
In additional to Hypatia mentioned above. Stanford also try to use FTK the software we use for processing) to delivery born digital materials.One of the features of the FTK, which I believe will be interested by researchers, is the ability to generate Fuzzy hash.Files with the same hash are the same in contents. What about similar files?Fuzzy hash provide you the information how close files are Full text searchHow many characters mis-speltFuzzy hashing is a tool which provides the ability to compare two different files and determine a fundamental level of similarity. This similarity is expressed as score from 0-100. The higher the score reported the more similar the two pieces of data. A score of 100 would indicate that the files are close to identical. Alternatively a score of 0 would indicate no meaningful common sequence of data between the two files.
I mentioned before that the goal of D&A is to ensure that the significant properties of the material are inherent in whichever form the delivery takes.For design file I believe VM is the appropriate platform.I have built a virtual machine containing some design files with the associated fonts.People want to know the exact fonts, font spacing, etc. used. They don’t have the fonts – so even they download the file, they cannot recreate the appearance of the file,Virtual machine created using Parallels Desktop.
How to delivery 50,000 emails? I worked with colleague at Stanford to produce network graph of 50,000 emails. Name of the network software: Gephi is an open-source software for visualizing and analyzing large networks graphs.
I am very lucky to meet Computer Science candidate at Stanford. SudheendraHangalEmail visualization tool for sentiment analysis.Psychology literature to define what words constitute happiness, love, etc. Topic analysis using software