SlideShare a Scribd company logo
정보 생애주기에 따른 데이터 보존을 위해 고려할 사항 - 국가 디지털 아카이빙 전략 연구 TF 내부 세미나 - 2010. 4. 1. 정영임 한국과학기술정보연구원 정보유통본부 지식기반실
- 2 - Table of Contents Digital Archiving in the Framework of Information Life Cycle Management Creation Acquisition Cataloging/Identification Storage Preservation Access
Digital Archiving in the Framework of Information Life Cycle Management Digital archiving framework Considered at all stages of the information life cycle management Information life cycle Creation Acquisition Cataloging/Identification Storage Preservation Access - 3 -
Creation Creation  Defined as an act of producing the information product in the broadest sense Should be regarded as a starting point of long-term and preservation Suggestion of provision of a preservation indicator for creators  U.S. Department of Agriculture’s Digital Publications Preservation Steering Committee Establishment of guidelines for creators  Oak Ridge National Laboratory, USA   A Guide To Record Series Supporting Epidemiological Studies Conducted for the Department of Energy Limits on software Format and layout of the documents - 4 -
Creation Adaption of Standard Descriptive Languages Standard groups incorporate XML and RDF architectures  Attachment of Metadata on Digital Contents - 5 -
Acquisition and Collection Development Three main aspects to acquisition of digital objects  Collection policies Gathering methods Intellectual Property Concerns - 6 -
Establishment of Collection Policies Collection policies Selecting What to Archive Purpose For Dark Archiving: Back issue For Light Archiving: Current issue Criteria  Easiness of Content Acquisition Quality of Contents  Utilization On-going access fee Content Type Coverage: E-journals/R&D Reports/Patents/Scientific Data Determining Extent Archiving Links Refreshing the Archived Contents - 7 -
Considerations on Gathering Method Gathering methods Hand selection Value Judgment and Retention Scheduling (Edinburgh University Library) Not preserved  Preserved for defined period  Preserved indefinitely  Automatic selection National Library of Sweden: Automatic acquisition without making value judgment (priority: periodicals, static documents, HTML pages >> conferences, usenet groups, ftp archives) EVA projects: Establishment of time limits to avoid the overloading - 8 -
Considerations on Intellectual Property Concerns Reliance on Legislation Freedom of Information Act 2001 The public may have unrestricted access to certain records.  (Consider what categories of information may need to be viewed by the public - these records need to remain accessible at all times.) In general, due to absence of international digital deposit legislation PANDORA project seeks permission from the copyright owner Swedish and Finnish national library projects do not contact the owners Making Agreement with Content Providers E-journal: Publishers or academic associations CLIR/DLF draft model license, NESLi2 Standard license model Agreement of Cornell University with publishers Government document: Open to public Scientific data: individual creators or data centers Arts and Humanities Data Service provide information on what is needed for a digital archive and what creators are likely to be willing to deposit - 9 -
Agreement of Cornell University with Publishers Topics identified in the agreement(Thomson and Kroch, 2000) The general responsibilities of the publishers and Cornell  Characteristics of the data, accompanying metadata, and any additional documentation that are to be deposited  Guidelines on transmission methods and media for deposit  Procedures for the deposit  Procedures and protocols Cornell will use to verify the arrival and completeness of the data  Rights of the depositing organizations to audit the repository  The respective roles, responsibilities, and rights of the Cornell and the data producers with regard to the data  Articulation of Cornell's responsibilities and capabilities with regard to the accessioning, description, management, and even transformation of the deposited data  Access policies for users of the repository, and how they may vary over time  Conditions on the use of the data, and again how they may vary over time  Fees (if any) associated with the deposit  Cornell's ability to share the data with partners to create an agreed-upon level of redundancy  Clarification of issues surrounding copyright retained by authors  - 10 -
Identification and Cataloging Identification Provision of a unique key for finding the digital object and linking object to other related objects Cataloging in the form of metadata Support for organization, access and curation - 11 -
Persistent Identification Problems in using URL as Identifier Use of server as location identifier can result in lack of persistent over time both for the source object and any linked objects Continuous use of URL New approaches on persistent identification OCLC: PURLs ACS: Digital Object Identifier (DOI), MN (Manuscript Number) DTIC: Handle® system AAS: Bibcode, PubRef numbers - 12 -
Creation of Metadata at Cataloging Stage (1/3) Creation Method of Metadata Manual creation of metadata Automatic generation of metadata A project by US Environmental Protection Agency Defense Information Technology Testbed project - 13 -
Creation of Metadata at Cataloging Stage (2/3) Formats of Descriptive Metadata E-journal Full MARC cataloging  Traditional library cataloging standards NLA’s PANDORA Archive Current development of descriptive metadata standards MARCXML, MODS(Metadata Object Descriptive Schema) Web-based resources  Dublin Core-like format  EVA project Non-textual data Identification of metadata elements needed for non-textual data types such as images, video, multimedia and others Z39.87 NISO/AIIM Technical metadata for digital still images AES X089 core audio metadata - 14 -
Creation of Metadata at Cataloging Stage (3/3) Management of Heterogeneous Metadata Format Translation between various metadata formats Key to the development of networked, heterogeneous archives Adaption of packaging metadata standards Open Archival Information System (OAIS) Reference Model Is developed by ISO Consultative Committee for Space DataSystems Encapsulates specific metadata as needed for each object type in a consistent data model Metadata Encoding and Transmission Standard (METS)  Is produced by Library of Congress Standards Office and Digital Library Federation Provides framework for holding all types of metadata for digital object Others MPEG-21 Digital Item Declaration Language IMS Global Learning Consortium Content Packaging Standards Sharable Content Object Reference Model (SCORM) CCSDS XML Packaging scheme - 15 -
Development of Technical Model for Storage Recommendation for Developing a technical model for the repository (Cornell University) Establishing a baseline of e-journal software and file format needs  Specify the archival repository Specifying monitoring tools that will flag documents within the repository that require migration Specifying a baseline hardware and software infrastructure to house the repository Exploring the need and implementation models for redundancy in the repository - 16 -
Issues on Changing Storage Media Problem of changing storage media Block size, tape size and tape drive mechanism have changed over time. Common Solution Data migration to new storage systems Much cost and imperfect transferring system is still an issue. Check/validation algorithms are extremely important Manual check is still necessary. Atmospheric Radiation Monitoring Center plans to migrate to new storage systems every 4-5 years Each data migration will take 6-12 months - 17 -
Issues on Terabytes of Data Storage Problem of dealing with large-scale data Extensive validation routines to ensure the quality of the information as the information is migrated NCBI has 30 Ph.D.s reviewing the information manually, even after it has passed a variety of validation algorithms Similar cost has been spent for Corrections and additions to particular records Maintenance of a history of changes Approval by the owner of all changes controlled by NCBI Common Solution Large-scale data can be stored in different file formats Biological sequence data is held in simple ASCII files for preservation purposes. Data in a structured database is provided for searching, reporting and maintenance Extensive tasks can be transitioned to a non-profit consortia Protein Data Bank: Collaboratory for Structured Bioinformatics  - 18 -
Preservation Long-term preservation No common agreement on the definition of long-term preservation Main aspects on preservation Selection of digital preservation strategies/technologies Cycle for hardware/software migration  No specific investigation on the cycle for hw/sw migration has been done. Depending on the particular technologies and subject disciplines, it can be vary from 2 to 10 years. Preservation of the “look and feel” of digital contents - 19 -
Digital Preservation Strategies Bitstream Copying Refreshing Durable/Persistent Media Technology Preservation Digital Archaeology Analog Backups Migration (SW, HW migration) Replication Reliance on Standards Normalization Canonicalization Emulation Encapsulation Universal Virtual Computer - 20 -
Hardware and Software Migration Problems on Migration Migration is not guaranteed to work for all data types Migration of information products having used sophisticated software feature is unreliable Generally, there is no backward compatibility, and if it is possible, there is certainly loss of integrity in the result. Emulation as an alternative to migration Encapsulates the behavior of the hardware/software with the objects MS Word 2000 document with metadata indicating how to reconstruct the document at the engineering level Creates an emulation registry identifying the HW/SW environment and providing information on how to recreate the environment - 21 -
Advantages and Disadvantages of Preservation Strategies - 22 -
Selection of Preservation Strategies A schematic diagram for selection of preservation techniques of digital information.  (Lee et al, 2002) - 23 -
Preservation of the Look and Feel Format of materials  In order to save the “look and feel” of material TIFF The most prevalent for those organizations involved with the conversion of paper back file E.g.) JSTOR This does not allow the embedded references to be active hyper links SGML/HTML Used by many large publishers after years of converting publication systems from proprietary format to SGML American Astronomical Society (AAS) PDF The most prevalent format for purely electronic documents used for both formal publications and grey literature National Library of Sweden Concerns remain for long-time preservation It may not be accepted as a legal depository form because of its proprietary nature - 24 -
Normalization vs. Native Formats Normalization Process of converting the native format to a standard format AAS, ACS transform the incoming file into SGML-tagged ASCII format Electronic master copy is able to serve as the robust electronic archival copy. Well-tagged copy can be updated periodically, at very little cost. It takes advantage of advances in both technology and standards. Content remains unchanged, but the public electronic version can be updated to remain compatible with the browsers and other access technology Examples of data normalization provided data community NASA Data Active Archive Centers Transform incoming satellite and ground monitoring information into standard Common Data Format U.K’s National Digital Archive of Datasets Transforms the native format into one of its own devising Normalized formats are considered to be the archival versions Intellectual property question - 25 -
Reliance on Standards Emphasis on Standards DOE OSTI  Limited the number of acceptable input formats Text in SGML (and its relatives HTML and XML), PDF, WordPerfect and Word. Image in TIFF Group4 and PDF Image - 26 -
Preservation Strategies Used in Major Projects - 27 - CSI: CISTI Csi, ECO: OCLC Electronic Collections Online, EJO: Ohio LINK Electronic Journal Center  KB: KB e-Depot, KOP: Kopal DDB, LA: LOCKSS Alliance, LANL: Los Alamos National Laboratory Research Library,  NLA: National Library of Australia PANDORA, OSP: Ontario Scholars Portal, PMC: PubMed Central, PORT: Portico
Issues on Access Access Mechanisms Access and display mechanisms Providing access Restricting access Rights Management and Security Requirements Security and version control Creation metadata to manage encryption, watermarks, digital signatures - 28 -
Access Mechanisms Providing Access  NLM’s Profiles in Science Creates an electronic archive of the photographs, text, video, etc Electronic archive is used to create new access versions as access mechanisms change  Providing access technologies Super Distribution Value-chain support Restricting Access Usage rule Persistent protection - 29 -
Access Rights Management and Security Requirements Most difficult access issues for digital archiving Security and version control impact digital archiving Right management includes providing or restricting access as appropriate Content protection technologies Contents Encryption Trusted Environment Metadata for managing encryption, watermarks, digital signatures needs to be created. - 30 -
References CLIR, 2002. The State of Digital Preservation: An International Perspective [online] [cited 2009-07-23]  Hodge, 2000. Best Practices for Digital Archiving: An Information Life Cycle Approach, D-Lib Magazine:6(1) [online] [cited 2009-07-23] < http://www.dlib.org/dlib/january00/01hodge.html> Hodge et al, 2004. Digital Preservation and Permanent Access to Scientific Information, [online] [cited 2009-07-23]  ICPSR, 2009. Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems [online] [cited 2009-12-03] http://www.icpsr.umich.edu/dpm/index.html Kenney, A. R., Entlich, R., Hirtle, P. B., McGovern, N. Y. and Buckley E. L., 2006. E-Journal Archiving Metes and Bounds: A Survey of the Landscape[online] [cited 2009-12-03]  Lee, K., Slattery, O., Lu, R., Tang, X. and McCrary, V. 2002. The State of the Art and Practice in Digital Preservation, Journal of Research of the National Institute of Standards and Technology: 107(1), 93-106. Thomas, S. E. and Kroch, C. A. 2000, Project Harvest: The Cornell University Library's Proposal to The Andrew W. Mellon Foundation To Develop a Repository for E-Journals, [online] [cited 2010-03-26] <http http://www.diglib.org/preserve/cornellprop.htm > Edinburgh University Library Digital Archives Research Project. A report and recommendations - 31 -

More Related Content

What's hot

Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Archiver
 
Getting started in digital preservation
Getting started in digital preservationGetting started in digital preservation
Getting started in digital preservation
Sarah Jones
 
Preparation, Proceed and Review of preservation of Digital Library
Preparation, Proceed and Review of preservation of Digital Library Preparation, Proceed and Review of preservation of Digital Library
Preparation, Proceed and Review of preservation of Digital Library
Asheesh Kamal
 
Completepresentation
CompletepresentationCompletepresentation
Completepresentation
Andrew Wesolek
 
Digital preservation
Digital preservationDigital preservation
Digital preservation
Sarika Sawant
 
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondDigital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Benoit Pauwels
 
Digital preservation
Digital preservationDigital preservation
Digital preservation
Michael Day
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
DigitalPreservationEurope
 
Data preservation
Data preservationData preservation
Data preservation
Amreen Ahmed
 
ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016
dp-blog-cz
 
The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...
ManjulaPatel
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservationtrbeck
 
iRODS UGM 2016 Preso Summary FINAL
iRODS UGM 2016 Preso Summary FINALiRODS UGM 2016 Preso Summary FINAL
iRODS UGM 2016 Preso Summary FINALRandy Splinter
 
Repositories and digital preservation
Repositories and digital preservationRepositories and digital preservation
Repositories and digital preservation
Michael Day
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
smtcd
 
Strategies for the curation of CAD Engineering Models
Strategies for the curation of CAD Engineering ModelsStrategies for the curation of CAD Engineering Models
Strategies for the curation of CAD Engineering Models
ManjulaPatel
 
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
faflrt
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
ManjulaPatel
 
Trm Trusted Repositories
Trm Trusted RepositoriesTrm Trusted Repositories
Trm Trusted Repositories
DigitalPreservationEurope
 

What's hot (19)

Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
 
Getting started in digital preservation
Getting started in digital preservationGetting started in digital preservation
Getting started in digital preservation
 
Preparation, Proceed and Review of preservation of Digital Library
Preparation, Proceed and Review of preservation of Digital Library Preparation, Proceed and Review of preservation of Digital Library
Preparation, Proceed and Review of preservation of Digital Library
 
Completepresentation
CompletepresentationCompletepresentation
Completepresentation
 
Digital preservation
Digital preservationDigital preservation
Digital preservation
 
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondDigital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the Pond
 
Digital preservation
Digital preservationDigital preservation
Digital preservation
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
 
Data preservation
Data preservationData preservation
Data preservation
 
ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016
 
The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
iRODS UGM 2016 Preso Summary FINAL
iRODS UGM 2016 Preso Summary FINALiRODS UGM 2016 Preso Summary FINAL
iRODS UGM 2016 Preso Summary FINAL
 
Repositories and digital preservation
Repositories and digital preservationRepositories and digital preservation
Repositories and digital preservation
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Strategies for the curation of CAD Engineering Models
Strategies for the curation of CAD Engineering ModelsStrategies for the curation of CAD Engineering Models
Strategies for the curation of CAD Engineering Models
 
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
 
Trm Trusted Repositories
Trm Trusted RepositoriesTrm Trusted Repositories
Trm Trusted Repositories
 

Similar to 20100401 정영임 da 전략 tft_0330

Digital Preservation
Digital PreservationDigital Preservation
Digital PreservationSmita Chandra
 
Tdr Overview Pres Advocates
Tdr Overview Pres AdvocatesTdr Overview Pres Advocates
Tdr Overview Pres Advocates
jamestoon
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver
 
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
faflrt
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
ManjulaPatel
 
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
dri_ireland
 
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
faflrt
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
BigData_Europe
 
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics ClassCourtney Mumma
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
Jack Eapen
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
Jack Eapen
 
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management
Gary Wilhelm
 
Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster
LEARN Project
 
IR and DSpace - International Seminar, Dhaka University
IR and DSpace - International Seminar, Dhaka UniversityIR and DSpace - International Seminar, Dhaka University
IR and DSpace - International Seminar, Dhaka University
Md. Zahid Hossain Shoeb
 
Dc101 oxford sj_16062010
Dc101 oxford sj_16062010Dc101 oxford sj_16062010
Dc101 oxford sj_16062010
Sarah Jones
 
What is a DMP
What is a DMPWhat is a DMP
What is a DMP
Sarah Jones
 
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
University of Piraeus
 

Similar to 20100401 정영임 da 전략 tft_0330 (20)

Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Tdr Overview Pres Advocates
Tdr Overview Pres AdvocatesTdr Overview Pres Advocates
Tdr Overview Pres Advocates
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
3
33
3
 
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
 
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
 
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management
 
Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster
 
IR and DSpace - International Seminar, Dhaka University
IR and DSpace - International Seminar, Dhaka UniversityIR and DSpace - International Seminar, Dhaka University
IR and DSpace - International Seminar, Dhaka University
 
Dc101 oxford sj_16062010
Dc101 oxford sj_16062010Dc101 oxford sj_16062010
Dc101 oxford sj_16062010
 
What is a DMP
What is a DMPWhat is a DMP
What is a DMP
 
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
 

More from glorykim

2010 06 22_김희정_digital archiving - kisti (2010 6) - 완료
2010 06 22_김희정_digital archiving - kisti (2010 6) - 완료2010 06 22_김희정_digital archiving - kisti (2010 6) - 완료
2010 06 22_김희정_digital archiving - kisti (2010 6) - 완료glorykim
 
2010 0603 황혜경_해외저널_0603
2010 0603 황혜경_해외저널_06032010 0603 황혜경_해외저널_0603
2010 0603 황혜경_해외저널_0603glorykim
 
2010 0603 최명석_웹 아카이빙-글꼴포함-20100602
2010 0603 최명석_웹 아카이빙-글꼴포함-201006022010 0603 최명석_웹 아카이빙-글꼴포함-20100602
2010 0603 최명석_웹 아카이빙-글꼴포함-20100602glorykim
 
2010 0603 이상호_과학데이터 아카이빙-이상호
2010 0603 이상호_과학데이터 아카이빙-이상호2010 0603 이상호_과학데이터 아카이빙-이상호
2010 0603 이상호_과학데이터 아카이빙-이상호glorykim
 
20100511 최선희 사업추친체계_20100511 최선희 송부용
20100511 최선희 사업추친체계_20100511 최선희 송부용20100511 최선희 사업추친체계_20100511 최선희 송부용
20100511 최선희 사업추친체계_20100511 최선희 송부용glorykim
 
20100526 노경란 우선적용분야및대상
20100526 노경란 우선적용분야및대상20100526 노경란 우선적용분야및대상
20100526 노경란 우선적용분야및대상glorykim
 
20100511 신진섭 [발표자료]디지털 자료의 보존과 저작권20100511v1.0
20100511 신진섭 [발표자료]디지털 자료의 보존과 저작권20100511v1.020100511 신진섭 [발표자료]디지털 자료의 보존과 저작권20100511v1.0
20100511 신진섭 [발표자료]디지털 자료의 보존과 저작권20100511v1.0glorykim
 
20100413 노경란 선진-주요국의_디지털_아카이빙_프로젝트_사례조사(0407)
20100413 노경란 선진-주요국의_디지털_아카이빙_프로젝트_사례조사(0407)20100413 노경란 선진-주요국의_디지털_아카이빙_프로젝트_사례조사(0407)
20100413 노경란 선진-주요국의_디지털_아카이빙_프로젝트_사례조사(0407)
glorykim
 
20100407 박진호 d_lifecycle_kisti
20100407 박진호 d_lifecycle_kisti20100407 박진호 d_lifecycle_kisti
20100407 박진호 d_lifecycle_kisti
glorykim
 
20100407 이규철 digital archiving
20100407 이규철 digital archiving20100407 이규철 digital archiving
20100407 이규철 digital archiving
glorykim
 
20100401 황혜경 디지털아카이빙계획v03312010
20100401 황혜경 디지털아카이빙계획v0331201020100401 황혜경 디지털아카이빙계획v03312010
20100401 황혜경 디지털아카이빙계획v03312010
glorykim
 
20100401 신진섭 아카이빙 관련 법제도정리
20100401 신진섭 아카이빙 관련 법제도정리20100401 신진섭 아카이빙 관련 법제도정리
20100401 신진섭 아카이빙 관련 법제도정리
glorykim
 
6호 디지털자료 보존과 저작권
6호  디지털자료 보존과 저작권6호  디지털자료 보존과 저작권
6호 디지털자료 보존과 저작권
glorykim
 

More from glorykim (13)

2010 06 22_김희정_digital archiving - kisti (2010 6) - 완료
2010 06 22_김희정_digital archiving - kisti (2010 6) - 완료2010 06 22_김희정_digital archiving - kisti (2010 6) - 완료
2010 06 22_김희정_digital archiving - kisti (2010 6) - 완료
 
2010 0603 황혜경_해외저널_0603
2010 0603 황혜경_해외저널_06032010 0603 황혜경_해외저널_0603
2010 0603 황혜경_해외저널_0603
 
2010 0603 최명석_웹 아카이빙-글꼴포함-20100602
2010 0603 최명석_웹 아카이빙-글꼴포함-201006022010 0603 최명석_웹 아카이빙-글꼴포함-20100602
2010 0603 최명석_웹 아카이빙-글꼴포함-20100602
 
2010 0603 이상호_과학데이터 아카이빙-이상호
2010 0603 이상호_과학데이터 아카이빙-이상호2010 0603 이상호_과학데이터 아카이빙-이상호
2010 0603 이상호_과학데이터 아카이빙-이상호
 
20100511 최선희 사업추친체계_20100511 최선희 송부용
20100511 최선희 사업추친체계_20100511 최선희 송부용20100511 최선희 사업추친체계_20100511 최선희 송부용
20100511 최선희 사업추친체계_20100511 최선희 송부용
 
20100526 노경란 우선적용분야및대상
20100526 노경란 우선적용분야및대상20100526 노경란 우선적용분야및대상
20100526 노경란 우선적용분야및대상
 
20100511 신진섭 [발표자료]디지털 자료의 보존과 저작권20100511v1.0
20100511 신진섭 [발표자료]디지털 자료의 보존과 저작권20100511v1.020100511 신진섭 [발표자료]디지털 자료의 보존과 저작권20100511v1.0
20100511 신진섭 [발표자료]디지털 자료의 보존과 저작권20100511v1.0
 
20100413 노경란 선진-주요국의_디지털_아카이빙_프로젝트_사례조사(0407)
20100413 노경란 선진-주요국의_디지털_아카이빙_프로젝트_사례조사(0407)20100413 노경란 선진-주요국의_디지털_아카이빙_프로젝트_사례조사(0407)
20100413 노경란 선진-주요국의_디지털_아카이빙_프로젝트_사례조사(0407)
 
20100407 박진호 d_lifecycle_kisti
20100407 박진호 d_lifecycle_kisti20100407 박진호 d_lifecycle_kisti
20100407 박진호 d_lifecycle_kisti
 
20100407 이규철 digital archiving
20100407 이규철 digital archiving20100407 이규철 digital archiving
20100407 이규철 digital archiving
 
20100401 황혜경 디지털아카이빙계획v03312010
20100401 황혜경 디지털아카이빙계획v0331201020100401 황혜경 디지털아카이빙계획v03312010
20100401 황혜경 디지털아카이빙계획v03312010
 
20100401 신진섭 아카이빙 관련 법제도정리
20100401 신진섭 아카이빙 관련 법제도정리20100401 신진섭 아카이빙 관련 법제도정리
20100401 신진섭 아카이빙 관련 법제도정리
 
6호 디지털자료 보존과 저작권
6호  디지털자료 보존과 저작권6호  디지털자료 보존과 저작권
6호 디지털자료 보존과 저작권
 

20100401 정영임 da 전략 tft_0330

  • 1. 정보 생애주기에 따른 데이터 보존을 위해 고려할 사항 - 국가 디지털 아카이빙 전략 연구 TF 내부 세미나 - 2010. 4. 1. 정영임 한국과학기술정보연구원 정보유통본부 지식기반실
  • 2. - 2 - Table of Contents Digital Archiving in the Framework of Information Life Cycle Management Creation Acquisition Cataloging/Identification Storage Preservation Access
  • 3. Digital Archiving in the Framework of Information Life Cycle Management Digital archiving framework Considered at all stages of the information life cycle management Information life cycle Creation Acquisition Cataloging/Identification Storage Preservation Access - 3 -
  • 4. Creation Creation Defined as an act of producing the information product in the broadest sense Should be regarded as a starting point of long-term and preservation Suggestion of provision of a preservation indicator for creators U.S. Department of Agriculture’s Digital Publications Preservation Steering Committee Establishment of guidelines for creators Oak Ridge National Laboratory, USA A Guide To Record Series Supporting Epidemiological Studies Conducted for the Department of Energy Limits on software Format and layout of the documents - 4 -
  • 5. Creation Adaption of Standard Descriptive Languages Standard groups incorporate XML and RDF architectures Attachment of Metadata on Digital Contents - 5 -
  • 6. Acquisition and Collection Development Three main aspects to acquisition of digital objects Collection policies Gathering methods Intellectual Property Concerns - 6 -
  • 7. Establishment of Collection Policies Collection policies Selecting What to Archive Purpose For Dark Archiving: Back issue For Light Archiving: Current issue Criteria Easiness of Content Acquisition Quality of Contents Utilization On-going access fee Content Type Coverage: E-journals/R&D Reports/Patents/Scientific Data Determining Extent Archiving Links Refreshing the Archived Contents - 7 -
  • 8. Considerations on Gathering Method Gathering methods Hand selection Value Judgment and Retention Scheduling (Edinburgh University Library) Not preserved Preserved for defined period Preserved indefinitely Automatic selection National Library of Sweden: Automatic acquisition without making value judgment (priority: periodicals, static documents, HTML pages >> conferences, usenet groups, ftp archives) EVA projects: Establishment of time limits to avoid the overloading - 8 -
  • 9. Considerations on Intellectual Property Concerns Reliance on Legislation Freedom of Information Act 2001 The public may have unrestricted access to certain records. (Consider what categories of information may need to be viewed by the public - these records need to remain accessible at all times.) In general, due to absence of international digital deposit legislation PANDORA project seeks permission from the copyright owner Swedish and Finnish national library projects do not contact the owners Making Agreement with Content Providers E-journal: Publishers or academic associations CLIR/DLF draft model license, NESLi2 Standard license model Agreement of Cornell University with publishers Government document: Open to public Scientific data: individual creators or data centers Arts and Humanities Data Service provide information on what is needed for a digital archive and what creators are likely to be willing to deposit - 9 -
  • 10. Agreement of Cornell University with Publishers Topics identified in the agreement(Thomson and Kroch, 2000) The general responsibilities of the publishers and Cornell Characteristics of the data, accompanying metadata, and any additional documentation that are to be deposited Guidelines on transmission methods and media for deposit Procedures for the deposit Procedures and protocols Cornell will use to verify the arrival and completeness of the data Rights of the depositing organizations to audit the repository The respective roles, responsibilities, and rights of the Cornell and the data producers with regard to the data Articulation of Cornell's responsibilities and capabilities with regard to the accessioning, description, management, and even transformation of the deposited data Access policies for users of the repository, and how they may vary over time Conditions on the use of the data, and again how they may vary over time Fees (if any) associated with the deposit Cornell's ability to share the data with partners to create an agreed-upon level of redundancy Clarification of issues surrounding copyright retained by authors - 10 -
  • 11. Identification and Cataloging Identification Provision of a unique key for finding the digital object and linking object to other related objects Cataloging in the form of metadata Support for organization, access and curation - 11 -
  • 12. Persistent Identification Problems in using URL as Identifier Use of server as location identifier can result in lack of persistent over time both for the source object and any linked objects Continuous use of URL New approaches on persistent identification OCLC: PURLs ACS: Digital Object Identifier (DOI), MN (Manuscript Number) DTIC: Handle® system AAS: Bibcode, PubRef numbers - 12 -
  • 13. Creation of Metadata at Cataloging Stage (1/3) Creation Method of Metadata Manual creation of metadata Automatic generation of metadata A project by US Environmental Protection Agency Defense Information Technology Testbed project - 13 -
  • 14. Creation of Metadata at Cataloging Stage (2/3) Formats of Descriptive Metadata E-journal Full MARC cataloging Traditional library cataloging standards NLA’s PANDORA Archive Current development of descriptive metadata standards MARCXML, MODS(Metadata Object Descriptive Schema) Web-based resources Dublin Core-like format EVA project Non-textual data Identification of metadata elements needed for non-textual data types such as images, video, multimedia and others Z39.87 NISO/AIIM Technical metadata for digital still images AES X089 core audio metadata - 14 -
  • 15. Creation of Metadata at Cataloging Stage (3/3) Management of Heterogeneous Metadata Format Translation between various metadata formats Key to the development of networked, heterogeneous archives Adaption of packaging metadata standards Open Archival Information System (OAIS) Reference Model Is developed by ISO Consultative Committee for Space DataSystems Encapsulates specific metadata as needed for each object type in a consistent data model Metadata Encoding and Transmission Standard (METS) Is produced by Library of Congress Standards Office and Digital Library Federation Provides framework for holding all types of metadata for digital object Others MPEG-21 Digital Item Declaration Language IMS Global Learning Consortium Content Packaging Standards Sharable Content Object Reference Model (SCORM) CCSDS XML Packaging scheme - 15 -
  • 16. Development of Technical Model for Storage Recommendation for Developing a technical model for the repository (Cornell University) Establishing a baseline of e-journal software and file format needs Specify the archival repository Specifying monitoring tools that will flag documents within the repository that require migration Specifying a baseline hardware and software infrastructure to house the repository Exploring the need and implementation models for redundancy in the repository - 16 -
  • 17. Issues on Changing Storage Media Problem of changing storage media Block size, tape size and tape drive mechanism have changed over time. Common Solution Data migration to new storage systems Much cost and imperfect transferring system is still an issue. Check/validation algorithms are extremely important Manual check is still necessary. Atmospheric Radiation Monitoring Center plans to migrate to new storage systems every 4-5 years Each data migration will take 6-12 months - 17 -
  • 18. Issues on Terabytes of Data Storage Problem of dealing with large-scale data Extensive validation routines to ensure the quality of the information as the information is migrated NCBI has 30 Ph.D.s reviewing the information manually, even after it has passed a variety of validation algorithms Similar cost has been spent for Corrections and additions to particular records Maintenance of a history of changes Approval by the owner of all changes controlled by NCBI Common Solution Large-scale data can be stored in different file formats Biological sequence data is held in simple ASCII files for preservation purposes. Data in a structured database is provided for searching, reporting and maintenance Extensive tasks can be transitioned to a non-profit consortia Protein Data Bank: Collaboratory for Structured Bioinformatics - 18 -
  • 19. Preservation Long-term preservation No common agreement on the definition of long-term preservation Main aspects on preservation Selection of digital preservation strategies/technologies Cycle for hardware/software migration No specific investigation on the cycle for hw/sw migration has been done. Depending on the particular technologies and subject disciplines, it can be vary from 2 to 10 years. Preservation of the “look and feel” of digital contents - 19 -
  • 20. Digital Preservation Strategies Bitstream Copying Refreshing Durable/Persistent Media Technology Preservation Digital Archaeology Analog Backups Migration (SW, HW migration) Replication Reliance on Standards Normalization Canonicalization Emulation Encapsulation Universal Virtual Computer - 20 -
  • 21. Hardware and Software Migration Problems on Migration Migration is not guaranteed to work for all data types Migration of information products having used sophisticated software feature is unreliable Generally, there is no backward compatibility, and if it is possible, there is certainly loss of integrity in the result. Emulation as an alternative to migration Encapsulates the behavior of the hardware/software with the objects MS Word 2000 document with metadata indicating how to reconstruct the document at the engineering level Creates an emulation registry identifying the HW/SW environment and providing information on how to recreate the environment - 21 -
  • 22. Advantages and Disadvantages of Preservation Strategies - 22 -
  • 23. Selection of Preservation Strategies A schematic diagram for selection of preservation techniques of digital information. (Lee et al, 2002) - 23 -
  • 24. Preservation of the Look and Feel Format of materials In order to save the “look and feel” of material TIFF The most prevalent for those organizations involved with the conversion of paper back file E.g.) JSTOR This does not allow the embedded references to be active hyper links SGML/HTML Used by many large publishers after years of converting publication systems from proprietary format to SGML American Astronomical Society (AAS) PDF The most prevalent format for purely electronic documents used for both formal publications and grey literature National Library of Sweden Concerns remain for long-time preservation It may not be accepted as a legal depository form because of its proprietary nature - 24 -
  • 25. Normalization vs. Native Formats Normalization Process of converting the native format to a standard format AAS, ACS transform the incoming file into SGML-tagged ASCII format Electronic master copy is able to serve as the robust electronic archival copy. Well-tagged copy can be updated periodically, at very little cost. It takes advantage of advances in both technology and standards. Content remains unchanged, but the public electronic version can be updated to remain compatible with the browsers and other access technology Examples of data normalization provided data community NASA Data Active Archive Centers Transform incoming satellite and ground monitoring information into standard Common Data Format U.K’s National Digital Archive of Datasets Transforms the native format into one of its own devising Normalized formats are considered to be the archival versions Intellectual property question - 25 -
  • 26. Reliance on Standards Emphasis on Standards DOE OSTI Limited the number of acceptable input formats Text in SGML (and its relatives HTML and XML), PDF, WordPerfect and Word. Image in TIFF Group4 and PDF Image - 26 -
  • 27. Preservation Strategies Used in Major Projects - 27 - CSI: CISTI Csi, ECO: OCLC Electronic Collections Online, EJO: Ohio LINK Electronic Journal Center KB: KB e-Depot, KOP: Kopal DDB, LA: LOCKSS Alliance, LANL: Los Alamos National Laboratory Research Library, NLA: National Library of Australia PANDORA, OSP: Ontario Scholars Portal, PMC: PubMed Central, PORT: Portico
  • 28. Issues on Access Access Mechanisms Access and display mechanisms Providing access Restricting access Rights Management and Security Requirements Security and version control Creation metadata to manage encryption, watermarks, digital signatures - 28 -
  • 29. Access Mechanisms Providing Access NLM’s Profiles in Science Creates an electronic archive of the photographs, text, video, etc Electronic archive is used to create new access versions as access mechanisms change Providing access technologies Super Distribution Value-chain support Restricting Access Usage rule Persistent protection - 29 -
  • 30. Access Rights Management and Security Requirements Most difficult access issues for digital archiving Security and version control impact digital archiving Right management includes providing or restricting access as appropriate Content protection technologies Contents Encryption Trusted Environment Metadata for managing encryption, watermarks, digital signatures needs to be created. - 30 -
  • 31. References CLIR, 2002. The State of Digital Preservation: An International Perspective [online] [cited 2009-07-23] Hodge, 2000. Best Practices for Digital Archiving: An Information Life Cycle Approach, D-Lib Magazine:6(1) [online] [cited 2009-07-23] < http://www.dlib.org/dlib/january00/01hodge.html> Hodge et al, 2004. Digital Preservation and Permanent Access to Scientific Information, [online] [cited 2009-07-23] ICPSR, 2009. Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems [online] [cited 2009-12-03] http://www.icpsr.umich.edu/dpm/index.html Kenney, A. R., Entlich, R., Hirtle, P. B., McGovern, N. Y. and Buckley E. L., 2006. E-Journal Archiving Metes and Bounds: A Survey of the Landscape[online] [cited 2009-12-03] Lee, K., Slattery, O., Lu, R., Tang, X. and McCrary, V. 2002. The State of the Art and Practice in Digital Preservation, Journal of Research of the National Institute of Standards and Technology: 107(1), 93-106. Thomas, S. E. and Kroch, C. A. 2000, Project Harvest: The Cornell University Library's Proposal to The Andrew W. Mellon Foundation To Develop a Repository for E-Journals, [online] [cited 2010-03-26] <http http://www.diglib.org/preserve/cornellprop.htm > Edinburgh University Library Digital Archives Research Project. A report and recommendations - 31 -