Saa Session 502 Born Digital Archives in Collecting Repositories

•Download as PPTX, PDF•

4 likes•1,694 views

Digital archivists from the Universities of Hull (UK), Stanford, and Yale currently are collaborating on an Andrew W. Mellon Foundation-funded project. Born-Digital Collections: An Inter-Institutional Model for Stewardship (AIMS) will produce a common framework for managing born digital archives. Each digital archivist presents a short case study to cover areas of workflow for electronic records: collection development, accessioning, arrangement and description, and discovery and access.

Technology Education

Born-Digital Archives inCollecting Repositories: Turning Challenges into Byte-Size Opportunities Gretchen Gueguen, Mark A. Matienzo, Simon Wilson, and Peter Chan Session 502, 27 August 2011 Society of American Archivists Annual Meeting

AIMS Project "Born-Digital Collections: An Inter-Institutional Model for Stewardship“ Two year project to create a framework for stewardship of born-digital archival records in collecting repositories Funded by the Andrew W. Mellon Foundation

Grant Goals Processing of Hybrid Collections Software Development Community Development Unconference (May 2011, Charlottesville, VA) UK Symposium (June 2011, London, England) Workshop (August 2011, Chicago, IL) White Paper and Project Report

Framework Development A framework for collecting and delivering the born-digital materials that are quickly beginning to constitute the collections of contemporary scholarly, literary, and political figures and organizations.

AIMS Framework Discovery and Access Accessioning

Collection Development Gretchen Gueguen University of Virginia

What is Collection Development? Actions and policies of institutions to bring in material for end users (both current and future); includes prioritizing, developing relationships with creators, assessments, negotiating agreements and preparing for accessioning. Within the AIMS framework Viable, practical method to capture/process born-digital material from hybrid collections requires sound work at the beginning (i.e. policies, practices, agreements with donors, etc.) to set up later work

Elements of Collection Development Prerequisites Establish relationship with donor Analyze Feasibility Negotiate Agreements Prepare for Accessioning

Prerequisites… Neil Beagrie, "Plenty of Room at the Bottom? Personal Digital Libraries and Collections," D-Lib Magazine (June 2005) Blagofaire. http://xkcd.com/239/

Negotiate Agreements… All rights reserved by Chevrolet UK

Prepare for Accessioning... Scope and extent determined? Coordination with acquisition of analog material? Method and time determined? Pre-acquisition appraisal performed? Enhanced curationcarried out? Test capture if needed? Development of new methodologies undertaken as needed/possible?

Accessioning Mark A. Matienzo, Yale University

What is Accessioning? Archival institution takes physical and legal custody of a group of records from a donor and documents the transfer in a register or other representation of the institution’s holdings Within AIMS Framework Processes which establish physical, administrative and intellectual control over transferred records; assessment and documentation of future needs; documentation of actions taken; beginning of safe storage and maintenance

Elements of Accessioning Prerequisites Transfer records and gain administrative control Physical control and stabilization Intellectual control and documentation to support further processes Maintain accessioned records

Case Study:Re-Accessioning at Yale Collaborative capacity building across two repositories Manuscripts and Archives Beinecke Rare Book and Manuscript Library Addressing previously received accessions of containing electronic records on media Still in testing phase, but working towards implementing in production

Types of Records and Media Wide variety of records creators Literary authors University faculty University offices Architectural firms Common types of media Floppy disks: 5.25” and 3.5” Optical media: CDROM, CD-R, DVD-R, etc. Zip disks USB flash drives

Goals of Re-Accessioning Identify, document, and register media Mitigate risk of media deterioration and obsolescence Extract basic metadata from filesystems on media and files contained on filesystems

Disk Imaging Using “forensic” (bit-level) imaging process Ensure data on media is not manipulated using write-protection Uses software to acquire images Includes hash-based verification process

Media Log Using SharePoint list Contains unique identifier of media Records physical/logical characteristics of media Documents success, failure, or status of various processes and additional notes

Metadata Extraction Can be repurposed for descriptive, administrative, and technical metadata Uses command-line tools (Sleuthkit, fiwalk) Outputs XML document

Packaging and Transfer Using BagIt packages/Bagger application Packages contain disk images, extracted metadata, imaging logs, and high-level accession information Transfer to storage is verified by comparison against manifest

Arrangement & Description Simon WilsonHull University Archives

Purpose of Arrangement & Description The general objectives for Arrangement & Description are: - to preserve context - to establish intellectual control of the material - to provide a means of discovery SAA definition, emphasis on minimizing the amount of handling Within the AIMS framework Processes which establish intellectual control of the material including implementation of policies and agreements with donors etc. to enable subsequent discovery and access

Elements of Arrangement and Description Prerequisites Plan for processing - gather supporting information; files captured from media (accessioning); convert files (for viewing); appraisal strategy; assess arrangement options; consider preservation issues Processing - implement arrangement strategy; add descriptive metadata and wider context (eg Collection Level Description); copyright & other legal considerations 4. Prepare for Discovery & Access- remove restricted access to b-d material during processing

Case Study - Stephen Gallagher Background:2005: 42 boxes paper archives 2010: born-digital material: 14,320 files (13.6GB) transferred to us via external hard drive and a box of Amstrad disks Create integrated catalogue to accommodate paper, born-digital and future accruals

Case Study - Stephen Gallagher Approach: - current work higher priority in filing system - considered each work a distinct ‘project’ - structure reflect his way of working & the archival principles of control that creator, archivist & user can all understand Series level was most logical solution- all related files placed in the series - reasonable return for our effort

Case Study - Stephen Gallagher 300 files created using FinalDraft screenwriter software ,[object Object],appropriate format for long term preservation Other issues: ,[object Object]

commercial implications: access via repository = publication? - re-purposing of work from one (unsuccessful) project to another

Challenges faced Each collection is unique, approach will vary: ,[object Object]

one-off collection (eg project) or likely to be subsequent accruals?

collection type; differs for personal papers & organisational records

same personnel work on paper and born-digital components?

can we appraise without knowing the contents? similar to paper material that is in a different language?

Challenges faced Volume of material : - depositor perception that 'storage is cheap‘ - does this mean we shouldn’t appraise the material we receive? - wide range of file types encountered - not practical to describe each and every file - risk management - if you don’t check every file for sensitive information - we need to automate as much of the processing as possible

Hypatia Digital archivists' identified a gap in current tools – used experiences to define the requirements for a new tool Key features identified: ,[object Object]

drag'n'drop to create the intellectual arrangement

ability to return to original order of the material

view some file types, add descriptive metadata etc

high level of granularity when applying rights & permissionsTechnical (acquired at accessioning) and descriptive metadata - Discovery & Access process

Discovery and Access Peter Chan Stanford University

What is Discovery & Access Discovery and Accessrefers to the systems and workflows that make processed or unprocessedmaterial and the metadata that support it available to users.

understand whether it is available for consultation and if so, how

To apply appropriate access restrictions in order to protect private and sensitive information as well as intellectual property.

To provide access to material in a format and/or environment that presents the original’s significant properties.,[object Object]

There are plenty of people echoing the risks associated with legacy data and a "keep everything” mentality. Join us for a webinar that takes those discussions a step further, offering insight from both a legal and technical perspective into how remediation projects can be managed cost effectively and in a manner that does not up-end everyday business operations. During this one-hour discussion, Redgrave LLP Partner Andy Cosgrove and Analysts Diana Fasching and Christian Rummelhoff also outline a defensible framework for the disposition of legacy data, and share real-world examples of paper and electronic remediation projects. Victoria Edelman, Vice President of Education for the ALSP and Director of Training for iCONECT Development, facilitates.

Digitisation and institutional repositoriesLibsoul Technologies Pvt. Ltd.

Data management plansBrad Houston

Digital Destiny

Brad Houston

Digital practice guidelines : the new generation presented by Scott Wajon

PublicLibraryServices

20100401 정영임 da 전략 tft_0330

광영 김

Developing a Data Management Plan

Martin Donnelly

Digital Preservation

Michael Day

OpenAIRE and EUDAT co-present this webinar which aims to introduce researchers and others to the concept of research data management (RDM). As well as presenting the benefits of taking an active approach to research data management – including increased speed and ease of access, efficiency (fund once, reuse many times), and improved quality and transparency of research – the webinar will advise on strategies for successful RDM, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management, stewardship and archiving. Webinar recording available: http://www.instantpresenter.com/eifl/EB57D6888147

Hypatia for dlf 2011DLFCLIR

MANTRA Research Data Lifecycle

EDINA, University of Edinburgh

Data accessibilityandchallenges

jyotikhadake

DEVELOPING A KNOWLEDGE MANAGEMENT SPIRAL FOR THE LONG-TERM PRESERVATION SYSTE...

cscpconf

The goal of Long-term preservation (LTP) is to make the sustainability of archives lasting for a foreseeable enough time. The efforts are primarily hampered by challenges such as missing of standards, formal methodology and workflow model during archiving. This research is aiming to explore the LTP of various kinds of documents independently from the evolution of time and changes in techniques within digital environments. Basic requirements come from integration of storage management and information management, securing preservation of data, metadata, indexes, etc. This paper presents the evolutionary development of the LTP process for Governmental Archive Management and Knowledge . Effective search to resources and efficient storage/access on data, recovery drawing on co-location back-up, dynamic regulation on authentication and security management are tasks followed. Then, a pilot Semantic Data Grid and service matching mechanisms are described, where the ontologism plays a crucial role

ERA CoBioTech Data Management Webinar

FAIRDOM

Digital Preservation Best Practices: Lessons Learned From Across the Pond

Benoit Pauwels

Digital Preservation

Michael Day

Data management for proposal writing

Olatunbosun Obileye

Research Data Management and Sharing for the Social Sciences and Humanities

Rebekah Cummings

ROER4D Open Data Initiative

Michelle Willmers

Processing workshop 2010_04_23_final

archiwicz

Cataloguing Photographs at The British Postal Museum & Archive

Martind1199

Archival Standards – ISAD(G) & ISAAR(CPF)

Henny van Schie

Archivematica and Local Authority Archive Services

Paweł Jaskulski

Presentation accompanying demonstration of Archivematica to EERAC (East of England Regional Archives Council) members introducing OAIS (Open Archival Information System) methodology. Identifies common operations for both: transfer and ingest of digitally born archives into digital repository and accessioning paper-based archives. How digital preservation relates to and fits within traditional archival processing.

What's hot

FAIRDOM data management support for ERACoBioTech Proposals

FAIRDOM

DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?

Incremental Project

Research Data Management Fundamentals for MSU Engineering Students

Aaron Collie

University of Bath Research Data Management training for researchers

Jez Cope

Digital Preservation Process: Preparation and Requirements

DigitalPreservationEurope

Martin Donnelly Sarah Jones DMP Online

Future Perfect 2012

From policy to practice with DMP Online

Sarah Jones

Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT

Tony Ross-Hellauer

Hypatia for dlf 2011DLFCLIR

MANTRA Research Data Lifecycle

EDINA, University of Edinburgh

Data accessibilityandchallenges

jyotikhadake

DEVELOPING A KNOWLEDGE MANAGEMENT SPIRAL FOR THE LONG-TERM PRESERVATION SYSTE...

cscpconf

ERA CoBioTech Data Management Webinar

FAIRDOM

Digital Preservation Best Practices: Lessons Learned From Across the Pond

Benoit Pauwels

Digital Preservation

Michael Day

Data management for proposal writing

Olatunbosun Obileye

Research Data Management and Sharing for the Social Sciences and Humanities

Rebekah Cummings

ROER4D Open Data Initiative

Michelle Willmers

What's hot (18)

FAIRDOM data management support for ERACoBioTech Proposals

DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?

Research Data Management Fundamentals for MSU Engineering Students

University of Bath Research Data Management training for researchers

Digital Preservation Process: Preparation and Requirements

Martin Donnelly Sarah Jones DMP Online

From policy to practice with DMP Online

Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT

Hypatia for dlf 2011

MANTRA Research Data Lifecycle

Data accessibilityandchallenges

DEVELOPING A KNOWLEDGE MANAGEMENT SPIRAL FOR THE LONG-TERM PRESERVATION SYSTE...

ERA CoBioTech Data Management Webinar

Digital Preservation Best Practices: Lessons Learned From Across the Pond

Digital Preservation

Data management for proposal writing

Research Data Management and Sharing for the Social Sciences and Humanities

ROER4D Open Data Initiative

Viewers also liked

Processing workshop 2010_04_23_final

archiwicz

Cataloguing Photographs at The British Postal Museum & Archive

Martind1199

Archival Standards – ISAD(G) & ISAAR(CPF)

Henny van Schie

Archivematica and Local Authority Archive Services

Paweł Jaskulski

ArchivesSpace-Archivematica-DSpace Workflow Integration Project Update (March...

mikeum

Update on the University of Michigan Bentley Historical Library's "ArchivesSpace, Archivematica - Dspace Workflow Integration" project (funded by a generous grant from the Andrew W. Mellon Foundation). The project seeks to integrate these platforms into an end-to-end digital archives workflow that will facilitate the deposit of content into a digital repository and enable the reuse of descriptive and administrative metadata across platforms. This presentation was made to the March 27, 2015 meeting of the Mid-Michigan Digital Practitioners in Ann Arbor.

Arranging and Describing Archives

Kevin Conrad Tansiongco

Rebecca Grant - Archival Description and Archival Arrangement

dri_ireland

Sm iic seminar workshop by arlanteMa. Lourdes Flores

Chapter 24 the persuasive speech

ProfessorEvans

The Needs of Archives: 16 (simple) rules for a better archival management

Tom Cobbaert

Archival cataloging using ISAD-G

Fe Angela Verzosa

lecture presented at the Seminar-Workshop on the theme “Organizing and Digitizing Library Archival Materials: ISAD (G) and Technology” organized by the Philippine Librarians Association, Inc. – Western Visayas Region Librarians Council (PLAI-WVRLC) in coordination with the National Committee for Libraries and Information Services – National Commission for Culture and The Arts (NCLIS-NCCA) held at the Colegio de San Agustin—Bacolod, Bacolod City, 27 September 2012.

Archival Arrangement, Description & Access

lindyhopper38

Archival Processing And DescriptionMichelle Belden

Archiving

Fe Angela Verzosa

Organization of Archival Materials

Fe Angela Verzosa

Chapter 12 types of organizational arrangements

ProfessorEvans

Introduction to arrangement and description (feb 4&5, 2012)

Amanda Hill

Archival Management: Principles and Techniques

Fe Angela Verzosa

Principles Of Marketing 1

ali.jibran

Overview of Archival Processing

jennifer whitlock

Viewers also liked (20)

Processing workshop 2010_04_23_final

Cataloguing Photographs at The British Postal Museum & Archive

Archival Standards – ISAD(G) & ISAAR(CPF)

Archivematica and Local Authority Archive Services

ArchivesSpace-Archivematica-DSpace Workflow Integration Project Update (March...

Arranging and Describing Archives

Rebecca Grant - Archival Description and Archival Arrangement

Sm iic seminar workshop by arlante

Chapter 24 the persuasive speech

The Needs of Archives: 16 (simple) rules for a better archival management

Archival cataloging using ISAD-G

Archival Arrangement, Description & Access

Archival Processing And Description

Archiving

Organization of Archival Materials

Chapter 12 types of organizational arrangements

Introduction to arrangement and description (feb 4&5, 2012)

Archival Management: Principles and Techniques

Principles Of Marketing 1

Overview of Archival Processing

Similar to Saa Session 502 Born Digital Archives in Collecting Repositories

Introduction to digital curation

GarethKnight

Data management

Graça Gabriel

Best Practices for Managing Born Digital Content

Recollection Wisconsin

20130222 kaptur training_goldsmiths

JISC funded KAPTUR project

Keep Calm and Curate

GarethKnight

Data management plansBrad Houston

Di d dlf_handout

cwilliford

Data management plans (dmp) for nsfBrad Houston

Data management

London School of Commerce (UK) Group of Colleges

An Introduction to Digital Preservation

DigitalPreservationEurope

Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)

dri_ireland

Presentation given by Rebecca Grant, Digital Archivist with Digital Repository of Ireland, part of a workshop on Digital Archiving and Digital Preservation held as part of Figshare Fest in London, May 12th 2016. Figshare is an online digital repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. Its annual Figshare Fest is a chance to gather together institutional clients, advocates and friends to talk about open research.

OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...

faflrt

Who Decides? Reinterpreting archival processes for the management of digital ...

GarethKnight

Pekin eca2010-v2

Anna Ashton

Data management (newest version)

Graça Gabriel

Impact of Covid-19 on Learning and Education

MANENDRASINGH30

RDM for trainee physicians

Historic Environment Scotland

Completepresentation

Andrew Wesolek

Data Management for Undergraduate Researchers

Rebekah Cummings

Similar to Saa Session 502 Born Digital Archives in Collecting Repositories (20)

Introduction to digital curation

Data management

Best Practices for Managing Born Digital Content

20130222 kaptur training_goldsmiths

Keep Calm and Curate

Data management plans

Di d dlf_handout

Data management plans (dmp) for nsf

Data management

An Introduction to Digital Preservation

Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)

OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...

Who Decides? Reinterpreting archival processes for the management of digital ...

Pekin eca2010-v2

Data management (newest version)

Impact of Covid-19 on Learning and Education

RDM for trainee physicians

Completepresentation

Data Management for Undergraduate Researchers

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Search and Society: Reimagining Information Access for Radical Futures

Bhaskar Mitra

The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

DevOps and Testing slides at DASA Connect

Kari Kakkonen

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Connector Corner: Automate dynamic content and events by pushing a button

DianaGray10

Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to: Create a campaign using Mailchimp with merge tags/fields Send an interactive Slack channel message (using buttons) Have the message received by managers and peers along with a test email for review But there’s more: In a second workflow supporting the same use case, you’ll see: Your campaign sent to target colleagues for approval If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team But—if the “Reject” button is pushed, colleagues will be alerted via Slack message Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors. And... Speakers: Akshay Agnihotri, Product Manager Charlie Greenberg, Host

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Search and Society: Reimagining Information Access for Radical Futures

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Assuring Contact Center Experiences for Your Customers With ThousandEyes

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

FIDO Alliance Osaka Seminar: Overview.pdf

DevOps and Testing slides at DASA Connect

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Connector Corner: Automate dynamic content and events by pushing a button

Designing Great Products: The Power of Design and Leadership by Chief Designe...

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

The Art of the Pitch: WordPress Relationships and Sales

Epistemic Interaction - tuning interfaces to provide information for AI support

Saa Session 502 Born Digital Archives in Collecting Repositories

1. Born-Digital Archives inCollecting Repositories: Turning Challenges into Byte-Size Opportunities Gretchen Gueguen, Mark A. Matienzo, Simon Wilson, and Peter Chan Session 502, 27 August 2011 Society of American Archivists Annual Meeting

2. AIMS Project "Born-Digital Collections: An Inter-Institutional Model for Stewardship“ Two year project to create a framework for stewardship of born-digital archival records in collecting repositories Funded by the Andrew W. Mellon Foundation

3. Partners

4. Grant Goals Processing of Hybrid Collections Software Development Community Development Unconference (May 2011, Charlottesville, VA) UK Symposium (June 2011, London, England) Workshop (August 2011, Chicago, IL) White Paper and Project Report

5. Framework Development A framework for collecting and delivering the born-digital materials that are quickly beginning to constitute the collections of contemporary scholarly, literary, and political figures and organizations.

6. AIMS Framework Discovery and Access Accessioning

7. Collection Development Gretchen Gueguen University of Virginia

8. What is Collection Development? Actions and policies of institutions to bring in material for end users (both current and future); includes prioritizing, developing relationships with creators, assessments, negotiating agreements and preparing for accessioning. Within the AIMS framework Viable, practical method to capture/process born-digital material from hybrid collections requires sound work at the beginning (i.e. policies, practices, agreements with donors, etc.) to set up later work

9. Elements of Collection Development Prerequisites Establish relationship with donor Analyze Feasibility Negotiate Agreements Prepare for Accessioning

10. Prerequisites… Neil Beagrie, "Plenty of Room at the Bottom? Personal Digital Libraries and Collections," D-Lib Magazine (June 2005) Blagofaire. http://xkcd.com/239/

11. Donor Relationship…

12. Enhanced Curation

13. Analyzing Feasibility…

15. Prepare for Accessioning... Scope and extent determined? Coordination with acquisition of analog material? Method and time determined? Pre-acquisition appraisal performed? Enhanced curationcarried out? Test capture if needed? Development of new methodologies undertaken as needed/possible?

16. Accessioning Mark A. Matienzo, Yale University

17. What is Accessioning? Archival institution takes physical and legal custody of a group of records from a donor and documents the transfer in a register or other representation of the institution’s holdings Within AIMS Framework Processes which establish physical, administrative and intellectual control over transferred records; assessment and documentation of future needs; documentation of actions taken; beginning of safe storage and maintenance

18. Elements of Accessioning Prerequisites Transfer records and gain administrative control Physical control and stabilization Intellectual control and documentation to support further processes Maintain accessioned records

19. Case Study:Re-Accessioning at Yale Collaborative capacity building across two repositories Manuscripts and Archives Beinecke Rare Book and Manuscript Library Addressing previously received accessions of containing electronic records on media Still in testing phase, but working towards implementing in production

20. Types of Records and Media Wide variety of records creators Literary authors University faculty University offices Architectural firms Common types of media Floppy disks: 5.25” and 3.5” Optical media: CDROM, CD-R, DVD-R, etc. Zip disks USB flash drives

21. Goals of Re-Accessioning Identify, document, and register media Mitigate risk of media deterioration and obsolescence Extract basic metadata from filesystems on media and files contained on filesystems

22. Re-Accessioning Workflow

23. Disk Imaging Using “forensic” (bit-level) imaging process Ensure data on media is not manipulated using write-protection Uses software to acquire images Includes hash-based verification process

24.

25. Media Log Using SharePoint list Contains unique identifier of media Records physical/logical characteristics of media Documents success, failure, or status of various processes and additional notes

26. Media Log

27. Media Log

28. Metadata Extraction Can be repurposed for descriptive, administrative, and technical metadata Uses command-line tools (Sleuthkit, fiwalk) Outputs XML document

29. Packaging and Transfer Using BagIt packages/Bagger application Packages contain disk images, extracted metadata, imaging logs, and high-level accession information Transfer to storage is verified by comparison against manifest

30.

31. Arrangement & Description Simon WilsonHull University Archives

32. Purpose of Arrangement & Description The general objectives for Arrangement & Description are: - to preserve context - to establish intellectual control of the material - to provide a means of discovery SAA definition, emphasis on minimizing the amount of handling Within the AIMS framework Processes which establish intellectual control of the material including implementation of policies and agreements with donors etc. to enable subsequent discovery and access

33. Elements of Arrangement and Description Prerequisites Plan for processing - gather supporting information; files captured from media (accessioning); convert files (for viewing); appraisal strategy; assess arrangement options; consider preservation issues Processing - implement arrangement strategy; add descriptive metadata and wider context (eg Collection Level Description); copyright & other legal considerations 4. Prepare for Discovery & Access- remove restricted access to b-d material during processing

34. Case Study - Stephen Gallagher Background:2005: 42 boxes paper archives 2010: born-digital material: 14,320 files (13.6GB) transferred to us via external hard drive and a box of Amstrad disks Create integrated catalogue to accommodate paper, born-digital and future accruals

35. Case Study - Stephen Gallagher Approach: - current work higher priority in filing system - considered each work a distinct ‘project’ - structure reflect his way of working & the archival principles of control that creator, archivist & user can all understand Series level was most logical solution- all related files placed in the series - reasonable return for our effort

36.

37. commercial implications: access via repository = publication? - re-purposing of work from one (unsuccessful) project to another

38.

39. one-off collection (eg project) or likely to be subsequent accruals?

40. collection type; differs for personal papers & organisational records

41. same personnel work on paper and born-digital components?

42. can we appraise without knowing the contents? similar to paper material that is in a different language?

43. Challenges faced Volume of material : - depositor perception that 'storage is cheap‘ - does this mean we shouldn’t appraise the material we receive? - wide range of file types encountered - not practical to describe each and every file - risk management - if you don’t check every file for sensitive information - we need to automate as much of the processing as possible

44.

45. drag'n'drop to create the intellectual arrangement

46. ability to return to original order of the material

47. view some file types, add descriptive metadata etc

48. high level of granularity when applying rights & permissionsTechnical (acquired at accessioning) and descriptive metadata - Discovery & Access process

49. Discovery and Access Peter Chan Stanford University

50. What is Discovery & Access Discovery and Accessrefers to the systems and workflows that make processed or unprocessedmaterial and the metadata that support it available to users.

51.

52. find out about material

53. understand whether it is available for consultation and if so, how

54. access material.

55. To apply appropriate access restrictions in order to protect private and sensitive information as well as intellectual property.

56.

57. D&A – EAD

58. D&A – Facet Browsing

59. D&A – Full text search

60. D&A – See Contents on Web

61. D&A – Tag & Annotation by Invited Persons / Public Annotation:

62. Impacts fromCollection Development File formats: no restriction Computer medium: no restriction (punch card, open reel tape, 5.25 inch floppy, 3.5 inch floppy), File type: no restriction (computer program, data set, document, spreadsheet), Agreement: permission to post contents online.

63. Impacts fromAccessioning Built 5.25 inch floppy capture station Ask Computer History Museum to read punch cards Open reel tapes – still outstanding

64. Impacts fromProcessing AccessData FTK was used to search files with restricted information, annotate files with appropriate descriptive metadata (book title, articles, etc.), and rights metadata (access restriction), generate technical metadata for the delivery platform to act upon. Transit Solution was used to transform files to html format for display in web. A XSLT program was written to transform the XSL-FO output from FTK to XML content document. A Ruby program was written to ingest the XML content document, original files, and the display derivatives to Fedora.

65. FTK – Bookmark and Label

66. FTK – Full Text, Pattern Search & Fuzzy Hash

67. Emulation – Design Files

68. Network Diagram for 50,000 Creeley Emails

69. MUSE: Sentiment Analysis for Emails

70. MUSE: See Individual Email

71. Want to know more? http://born-digital-archives.blogspot.com http://born-digital-archives.blogspot.com Gretchen Gueguen Mark Matienzo gmg2n@virginia.edumark.matienzo@yale.edu Simon Wilson Peter Chan s.wilson@hull.ac.ukpchan3@stanford.edu

Editor's Notes

Hello and welcome to session 502: Born-Digital Archives in Collecting Repository: Turning Challenges into Byte-Size OpportunitiesMy name is Gretchen Gueguen and I’m Digital Archivist at the University of Virginia. This morning, along with my colleagues Mark Matienzo from Yale, Simon Wilson from the University of Hull, and Peter Chan from Stanford, I’m going to talk with you about the AIMS project.
AIMS is the short title for a Mellon-funded grant project entitled Born-Digital Collections: An Inter-Institutional Model for Stewardship. This two-year project set out to create a framework for stewardship of born-digital archival records in the collecting repositories.
As I’ve mentioned, the grant partners include UVA, Stanford, Hull and Yale and Virginia serves as the PI
The grant set out to achieve it’s goal through 4 different areas of activity. The first was the processing of several hybrid collections which you are going to hear about later this morning. The Digital archivists at each institution, the four of us here this morning, were funded by the grant to carry out this processing.To facilitate this stewardship, the partners also sought to develop some software solutions. You won’t hear as much about these this morning, but they include Rubymatica, a ruby-based reworking of Archivematica for the creation of Submission Information Packages, and Functional Requirement for a software tool to facilitate arrangement, description and access to born-digital archival materials. These requirements led to work on developing Hypatia, which is what is known as a “Hydra Head” or a module for the Fedora/Solr/Blacklight Hydra stack, for access to born-digital materials.The partners also hosted several events to garner feedback and to encourage communication among the archival community, including a workshop that took place here in Chicago earlier this week.The final project deliverables will include a White Paper synthesizes the research done during the project and a project report to the Mellon Foundation.
A large part of the White Paper focuses on what we are currently referring to as the AIMS framwork: “A framework for collecting and delivering the born-digital materials that are quickly beginning to constitute the collections of contemporary scholarly, literary, and political figures and organizations.”This is really a high-level look at the tools, strategies, methodologies, and practices needed to effectively manage b-d content
The framework is characterized by four main functions of stewardship:Collection DevelopmentAccessioningArrangement and DescriptionDiscovery and AccessYou’ll notice that we do not include “preservation” as an explicit function here. That is an intentional omission because we believe that preservation is implicit in all of these functions. In addition aspects such developing a preservation repository or undertaking preservation activities are outside of this scope because they are larger institutional initiatives. They are mentioned as prerequisites to being able to do work in many steps, but since there are many guidelines out there we didn’t feel the need to reiterate them here.We are going to focus the rest of our presentation this morning on these four areas and share with you some of the work we have done.If you are interested in more on the background of the project, I will encourage you to check out our project blog, called born Digital Archives and I’ll put a URL up for the blog at the end of the presentation
We are starting our model with activities related to Collection Development. These are the activities undertaken in order to bring material in to the institution. These include activities we may be very familiar with like prioritizing, developing relationships with creators, doing assessments and negotiating agreements.Within the concept of the AIMS model, which is primarily a hybrid collection environment, this work will be necessary to develop sound capturing and processing activities later.
We’ve defined collection development as having five distinct stages which I’m going to go over with you this morning:PrerequisitesEstablish relationship with donorAnalyze FeasibilityNegotiate AgreementsPrepare for Accessioning
The first step is going through some prerequisites like having an appraisal processes: how will you assess or evaluate materials? How will you be able to determine value? Also you need to evaluate your storage capacity: Do you have enough space to keep this material in both the short- and long-term? What about future transfers? Do you have a sound data preservation strategy or methodology?One of the most important prerequisites is establishing Collection policies.Defining what it is that we want to collect takes on a couple of different questions.The first might be what types of material are we interested in, in the traditional collecting sense: prominent people, organizational records, etc.Next, we need to consider what part of those figures lives we are collecting. We use our digital devices for private activities, as well as more public ones…which are we interested in collecting?The next logical step then is to think about where this information might be on digital devices: stored files probably yes, but do we also need software, operating systems, hardware, internet activity or cloud material?All of these factors, and more come together in a collection development policy, and it can be very difficult to write, especially when you are just starting and don’t know
Assuming that you have the needed prerequisites in place or have the capacity to work on them, you can move on to the actual work of collection development:The first step is establishing a relationship with the donor. In many ways this is parallel to existing analog work, but when dealing with born-digital materials you should start thinking early about how digital archive staff need to be involved? This is potentially going to be very different from access to physical materials and now is the time to discuss options. Now is also the time to discuss the creation of the data with the donor and capture any documentation that will help with later processing and access. But, how comfortable is your donor with digital concepts and access to digital materials? As an example of the difficulty that this can cause, I’d like to show the example of some work that the AIMS project did in this regard. This is a digital donor survey that the AIMS project created based on one created for the PARADIGM workbook. The original intention was that a donor could fill this out before accessioning.This is the first page…and this is the second…and the third…and the fourth….and this is part two!We quickly realized that this would be overwhelming to potential donors, especially ones who hadn’t really thought much about things like their online persona or email preservation. We changed tactics and now recommend that this survey be used as prompt sheet for the archivist in an interview.
Such an interview may be part of a program of enhanced curation, something Jeremy Leighton John at the British Library describes as not only collect[ing] the original archive but add[ing] value to it.“enhanced curation” techniques include things like documenting the creator’s workspace with high-resolution digital photography, creating a digital film of an oral history interviewwith donors about their computers and their computing habits, perhaps capturing video of screencasts of the donor describing the organization on their computer. This type of information can be invaluable as materials are accessioned and processed as the level of abstraction or unfamiliarity with a new system can make it difficult to gain intellectual control.
Okay, so you are ready to move on to considering whether or not you even *can* acquire this material or more likely whether it is worth the costs. What is the cost analysis and risk analysis? Try a test capture…how does it work? Do you have the needed infrastructure and policies or can you create them? Can you even view files in order to appraise them? Do you need these guys to accomplish this? Or maybe these guys?It’s very easy to say “analyze costs” or “evaluate your home institution infrastructure” but if you’ve never encountered a particular software or hardware it’s difficult to be prepared for them. This is where having technologists or digital archivists involved early in the process can help. If possible during a test capture they can do a triage to determine if there are serious preservation concerns, if any forensic processing might be needed to recover damaged or deleted files. Etc.
Moving on then, the next step is negotiating agreements. One of the big problems here is that there is a lack of models for agreements and appraisals. Many elements of standard agreements remain applicable in the hybrid or born-digital archive, but have different implications. It’s not the same to provide unrestricted access to paper documents in a reading room and unrestricted access to digital materials online. Furthermore, you have a much larger potential for capturing and inadvertently exposing sensitive electronic information like financial and health information, passwords and other personal data.The legal agreement with the donor needs to specify:An Agreement about copyright – either transferred to repository/institution or remain with creator/heirsUnderstanding that collecting repository will be “sole” repository of b-d material Understanding of capabilities/limits for capturing b-d material (currently)Understanding of preservation strategies and capabilitiesUnderstanding of delivery capabilities and limits (current)Understanding of what/how files will be restricted or deleted & how this will be confirmed Understanding of capabilities/limits of appraisal, viewing, description/processing of b-d materialUnderstanding of the creative process and relationship with b-d materials, computers, hand-held devices, cloud computing, etc.
The final step in collection development is to prepare for processing. This may seem a little odd in a traditional sense, but what we are alluding to here is making sure that all of your technical steps for transfer, which may not be in the agreement, are planned ahead of time. Specifically, Scope and extent determinedMethod and time determinedPre-acquisition appraisal performedTest capture if neededDevelopment of new methodologies undertaken as needed/possibleEnhanced curation carried outCoordination with acquisition of analog materialThis is really the “action” step where many of the activities you have been planning prior are carried out. Overall, the steps in Collection Development help to set up later activities. By the end of the collection development step, the institution should be ready to take legal and physical custody of material. Doing this in a forward-thinking, planfull manner will help later processes go much smoother. You’ve made it to the finish line of collection development, but now we need to move on to Accessioning.
Accessioning is generally understood as the set of processes wherein a repository takes physical and legal custody of records from a donor and formally documents, or "registers." the transfer. The processes have clear links to both collection development and arrangement and description, and in some cases, institutions may view them as part of those processes. However, we have situated accessioning as a primary function within the AIMS framework.Within our framework, accessioning serves a vital role to allow a collecting repository to establish physical, administrative, and intellectual control over records that have been transferred. The accessioning processes allow archivists to gather a wide variety of information that will inform and prioritize other processes, such as arrangement and description, further appraisal, and requirements for access. Accessioning also provides an environment in which archivists can document their actions and ultimately transfer the accessioned records into an environment for their storage and maintenance.The goals of accessioning therefore reflect the need to establish control over and ensure the authenticity and reliability of transferred records. Archivists must therefore be diligent during accessioning and understand that they understand the potential impact of the actions they take during these processes. If a collecting repository is unable to establish an adequate level of control over transferred electronic records, then it is likely that it has not successfully accessioned them. Accordingly, archivists with "legacy" accessions of electronic records, such as those containing computer media, may want to consider "reaccessioning" those transfers to establish a suitable level of control.
The prerequisites, like the other areas of the AIMS model, broadly fall into several categories; in this case, they are policies, procedures, and infrastructure. There are many policies required to support accessioning properly. These may range from departmental preferences to requirements set at the institutional level. Procedures may account for a number of different options, such as minimal processing, accessioning of born-digital materials with paper records, deferment of digital accessioning, accessioning as resources allow, and retrospective accessioning of previously received electronic records. Infrastructure to support accessioning includes a wide variety of software and hardware, and expertise. This infrastructure will take resources to build, and archivists are urged to consider collaborative partnerships to allow for the better sharing of knowledge. The transfer and administrative control processes in the AIMS framework are very similar to those for other formats of records. Archivists working with electronic records should be familiar with the various types for transfers and their implications. Types of transfers can include receipt of retired media formerly in use by a creator, records copied to media only used for transfer (such as external hard drives, CDs or DVDs), or a direct transfer using disk imaging software or by copying files across a network.Once the under administrative control, archivists should focus their efforts to gain physical control over records and media. Much of this work concerns identifying and potentially addressing threats preservation issues in the records, such as viruses, unknown file formats, and the physical condition of media if appropriate.Archivists next need to establish intellectual control and gather documentation that will enable further work necessary to process, maintain, or use the records. For some transfers, a listing of directories or files may be repurposed for archival description if the existing arrangement appears to be of value.Finally, the archivist should prepare the records to be maintained over time. This may include actions such as normalizing to preservation formats. Ultimately, the records should also be transferred to a secure storage location that can be monitored by the collecting repository.
At Yale University, we have worked on a reaccessioning project that has allowed us to develop our thinking of how this accessioning of electronic records could best be realized for us going forward. Two repositories, Manuscripts and Archives and the Beinecke Rare Book and Manuscript Library, have worked in collaboration to implement software, hardware, and procedures that can be shared to support accessioning. In our reaccessioning project, we are working to establish better control over previously transferred accessions that contain electronic records on media such as floppy disks and CD-ROMs. These pieces of media were often received as part of a hybrid accession that also contained paper records, but in some cases we have received accessions of boxes containing only media.
The goals of our reaccessioning project are fairly straightforward and relate to the three types of control discussed previously. First, we seek to establish administrative control of the media by identifying what it is and documenting its physical and logical characteristics and by assigning a unique identifier to each piece. Secondly, we are working towards gaining physical control of the media, which will allow us to mitigate the risks of media deterioration and obsolescence. Finally, we are trying to establish a basic level of intellectual control by extracting metadata about the filesystems and files contained on the media, such as file names, directory structures, and creation, access, and modification dates.
Our reaccessioning workflow roughly looks like the following. We begin by retrieving the media and bringing it to the electronic records workstation, documenting its change in location within the Archivists’ Toolkit. We then assign unique identifiers to each of the media. We establish the best means by which to write-protect the media for imaging and record its identifying characteristics in a media log. We then put the media in the appropriate drive and create a forensic bit-level disk image, which includes all the files, the filesystem metadata, unused space – in other words, the entirety of the data on the media. We verify the image against the raw contents of the media and extract metadata from the disk image. Finally, we package the images and metadata and transfer the package into storage and complete the rest of the documentation.
To acquire the data off media, we are using a forensic imaging process that extracts the entirety of the data off the media at the lowest level possible. To ensure that we do not intentionally or accidentally manipulate any of the data on the original media, we write-protect the media or reader. For floppy disks, we can use physical write protect tabs. For USB flash media, hard drives, and the like, we connect the drive or reader to a write-blocker, which is a piece of hardware connected to the computer that blocks low-level write signals from a computer. We use a variety of software to acquire the images, such as FTK Imager. The imaging software extracts the data from the media and calculates a cryptographic hash of the data on the media and the data within the image file. If the checksums match, the imaging is viewed as successful. [ADD FTK Imager SCREENSHOT? WRITEBLOCKER PHOTO?]
This is a screenshot of FTK Imager, which we use to image media and to inspect disk images. You can see that the file listing includes regular files, slack or unused space on the disk, and deleted files, as denoted by the red X on the file icons.
Our media log is a SharePoint list that contains identifying characteristics and physical and logical information about the media, such as the type of media, when it was imaged, the text of a label or writing on the media, and the type of filesystem or filesystems it contains. We assign each piece of media a unique identifier, which is a combination of theaccession number and incremental number. The media log also contains the workflow status of the accessioning process for each piece of media and whether processes succeeded or failed.
The first screenshot is an overview for several pieces of media. You can see the unique media identifiers, the media format, and the workflow status.
This expanded view shows all the fields, including further documentation about the disk image, the filesystem contained, and additional notes.
If imaging is successful, we then extract metadata from the filesystem and files within the image. This is a software-based process that provides metadata such as file names, directory structures, creation and modification times, and approximate categorization of the types of files. This metadata can be repurposed in a variety of ways and provides a basic level of intellectual control that is comparable to a box list or other type of inventory for paper records. We are using open source software such as Sleuthkit and fiwalk to perform this extraction, but occasionally we need to rely on other tools for older or less common types of file systems.
Finally, we create a transfer package using the BagIt specification as developed by the Library of Congress and the California Digital Library. To create the packages, we are using the Library of Congress-developed Bagger application. These packages contain the disk images, extracted metadata, and logs generated by the disk imaging software during the acquisition process. The BagIt packages also contain high-level information about the accession. For the time being, we are making a rough connection of one bag per accession, but we realize we may need to modify depending on the size of the accessions.
This an overview of a sample bag, showing the structure and high-level metadata. Once packaged, we transfer the package to storage and verify the success of the transfer using procedures for the BagIt specification which compare the contents of the package against its manifest. If successful, we complete the rest of the documentation and record the success in the media log. We also record the storage location of the transferred package within the Archivists’ Toolkit and add the date of completion.
SAA definition for description puts emphasis on minimizing the amount of handling needs to be updated to consider preservation actions due to file format obsolescence etc
- reasonable return for our effort for us to describe the ‘project’ and indicative content that we held
What is sensitive will vary from collection to collection information (social security; personal e-mail address/mobile no etc) - Could also be discussion behind a decision (Larkin 25 funding)
As a result of experiences to tackle arrangement and description, the AIMS digital archivists' defined the requirements for a new tool - designed to work with technical and professional standards- use drag'n'drop to create intellectual arrangement, changes a relationship between digital assets (asset doesn’t move) using Fedora "sets“ - rights & permissions to single file, a discrete series or the entire collection
“the systems and workflows that make processed or unprocessedmaterial and the metadata that support it available to users.”Discovery and access is also not possible without completion of many of the prior steps described in this model. The outcomes of those steps have a significant impact on what is either appropriate or achievable in terms of discovery and access. Given the impact of these prior steps on discovery and access it is crucial to consider the desired outcomes for discovery and access as early as possible — ideally during the Collection Development phase — and to continue to update and revise these plans are work on the collection progresses.
Overall though, we have three major goals in discovery and access.The first is to make material available to user communities. This includes ensuring that the users can find the material, understand if it’s available, and get access to it if possibleHowever, that access must follow guidelines for access restrictions related to privacy, and intellectual property.An overarching goal of all three is to ensure that the significant properties of the material are inherent in whichever form the delivery takes.
We plan to delivery Stephen Jay Gould papers in the Hypatia platform. Hypatia is a fedora platform In Hypatia, we have one EAD for the hybrid collectionSeries 6 – for born digital material. We provide a link for people to go to an interface where they can browse and perform full text search on the born digital material of the papers.
Facets:SubjectsTeaching materialsBooksArticlesNSF reports
Convert the files in obsolete file format such as WordPerfect to html. If not, people have to download the files and find a viewer to view the file or create an emulated environment to view the file.
Discovery and access is also not possible without completion of many of the prior steps described in this model. Some institution accept certain file formats only.
Researchers may also need to bookmark or label the files they found.
In additional to Hypatia mentioned above. Stanford also try to use FTK the software we use for processing) to delivery born digital materials.One of the features of the FTK, which I believe will be interested by researchers, is the ability to generate Fuzzy hash.Files with the same hash are the same in contents. What about similar files?Fuzzy hash provide you the information how close files are Full text searchHow many characters mis-speltFuzzy hashing is a tool which provides the ability to compare two different files and determine a fundamental level of similarity. This similarity is expressed as score from 0-100. The higher the score reported the more similar the two pieces of data. A score of 100 would indicate that the files are close to identical. Alternatively a score of 0 would indicate no meaningful common sequence of data between the two files.
I mentioned before that the goal of D&A is to ensure that the significant properties of the material are inherent in whichever form the delivery takes.For design file I believe VM is the appropriate platform.I have built a virtual machine containing some design files with the associated fonts.People want to know the exact fonts, font spacing, etc. used. They don’t have the fonts – so even they download the file, they cannot recreate the appearance of the file,Virtual machine created using Parallels Desktop.
How to delivery 50,000 emails? I worked with colleague at Stanford to produce network graph of 50,000 emails. Name of the network software: Gephi is an open-source software for visualizing and analyzing large networks graphs.
I am very lucky to meet Computer Science candidate at Stanford. SudheendraHangalEmail visualization tool for sentiment analysis.Psychology literature to define what words constitute happiness, love, etc. Topic analysis using software
Annotation, see individual email

Saa Session 502 Born Digital Archives in Collecting Repositories

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to Saa Session 502 Born Digital Archives in Collecting Repositories

Similar to Saa Session 502 Born Digital Archives in Collecting Repositories (20)

More from AIMS_Archives

More from AIMS_Archives (9)

Recently uploaded

Recently uploaded (20)

Saa Session 502 Born Digital Archives in Collecting Repositories

Editor's Notes