Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
정보 생애주기에 따른 데이터 보존을 위해 고려할 사항<br />- 국가 디지털 아카이빙 전략 연구 TF 내부 세미나 -<br />2010. 4. 1.<br />정영임<br />한국과학기술정보연구원 정보유통본부 지식기반실...
- 2 -<br />Table of Contents<br />Digital Archiving in the Framework of Information Life Cycle Management<br />Creation<br...
Digital Archiving in the Framework of Information Life Cycle Management<br />Digital archiving framework<br />Considered a...
Creation<br />Creation <br />Defined as an act of producing the information product in the broadest sense<br />Should be r...
Creation<br />Adaption of Standard Descriptive Languages<br />Standard groups incorporate XML and RDF architectures <br />...
Acquisition and Collection Development<br />Three main aspects to acquisition of digital objects <br />Collection policies...
Establishment of Collection Policies<br />Collection policies<br />Selecting What to Archive<br />Purpose<br />For Dark Ar...
Considerations on Gathering Method<br />Gathering methods<br />Hand selection<br />Value Judgment and Retention Scheduling...
Considerations on Intellectual Property Concerns<br />Reliance on Legislation<br />Freedom of Information Act 2001<br />Th...
Agreement of Cornell University with Publishers<br />Topics identified in the agreement(Thomson and Kroch, 2000)<br />The ...
Identification and Cataloging<br />Identification<br />Provision of a unique key for finding the digital object and linkin...
Persistent Identification<br />Problems in using URL as Identifier<br />Use of server as location identifier can result in...
Creation of Metadata at Cataloging Stage (1/3)<br />Creation Method of Metadata<br />Manual creation of metadata<br />Auto...
Creation of Metadata at Cataloging Stage (2/3)<br />Formats of Descriptive Metadata<br />E-journal<br />Full MARC catalogi...
Creation of Metadata at Cataloging Stage (3/3)<br />Management of Heterogeneous Metadata Format<br />Translation between v...
Development of Technical Model for Storage<br />Recommendation for Developing a technical model for the repository (Cornel...
Issues on Changing Storage Media<br />Problem of changing storage media<br />Block size, tape size and tape drive mechanis...
Issues on Terabytes of Data Storage<br />Problem of dealing with large-scale data<br />Extensive validation routines to en...
Preservation<br />Long-term preservation<br />No common agreement on the definition of long-term preservation<br />Main as...
Digital Preservation Strategies<br />Bitstream Copying<br />Refreshing<br />Durable/Persistent Media<br />Technology Prese...
Hardware and Software Migration<br />Problems on Migration<br />Migration is not guaranteed to work for all data types<br ...
Advantages and Disadvantages of Preservation Strategies<br />- 22 -<br />
Selection of Preservation Strategies<br />A schematic diagram for selection of preservation techniques of digital informat...
Preservation of the Look and Feel<br />Format of materials <br />In order to save the “look and feel” of material<br />TIF...
Normalization vs. Native Formats<br />Normalization<br />Process of converting the native format to a standard format<br /...
Reliance on Standards<br />Emphasis on Standards<br />DOE OSTI <br />Limited the number of acceptable input formats<br />T...
Preservation Strategies Used in Major Projects<br />- 27 -<br />CSI: CISTI Csi, ECO: OCLC Electronic Collections Online, E...
Issues on Access<br />Access Mechanisms<br />Access and display mechanisms<br />Providing access<br />Restricting access<b...
Access Mechanisms<br />Providing Access <br />NLM’s Profiles in Science<br />Creates an electronic archive of the photogra...
Access<br />Rights Management and Security Requirements<br />Most difficult access issues for digital archiving<br />Secur...
References<br />CLIR, 2002. The State of Digital Preservation: An International Perspective [online] [cited 2009-07-23] <b...
Upcoming SlideShare
Loading in …5
×

20100401 정영임 da 전략 tft_0330

768 views

Published on

정보 생애주기별 데이터 보존을 위해 고려할 사항

  • Be the first to comment

  • Be the first to like this

20100401 정영임 da 전략 tft_0330

  1. 1. 정보 생애주기에 따른 데이터 보존을 위해 고려할 사항<br />- 국가 디지털 아카이빙 전략 연구 TF 내부 세미나 -<br />2010. 4. 1.<br />정영임<br />한국과학기술정보연구원 정보유통본부 지식기반실<br />
  2. 2. - 2 -<br />Table of Contents<br />Digital Archiving in the Framework of Information Life Cycle Management<br />Creation<br />Acquisition<br />Cataloging/Identification<br />Storage<br />Preservation<br />Access<br />
  3. 3. Digital Archiving in the Framework of Information Life Cycle Management<br />Digital archiving framework<br />Considered at all stages of the information life cycle management<br />Information life cycle<br />Creation<br />Acquisition<br />Cataloging/Identification<br />Storage<br />Preservation<br />Access<br />- 3 -<br />
  4. 4. Creation<br />Creation <br />Defined as an act of producing the information product in the broadest sense<br />Should be regarded as a starting point of long-term and preservation<br />Suggestion of provision of a preservation indicator for creators <br />U.S. Department of Agriculture’s Digital Publications Preservation Steering Committee<br />Establishment of guidelines for creators <br />Oak Ridge National Laboratory, USA <br />A Guide To Record Series Supporting Epidemiological Studies Conducted for the Department of Energy<br />Limits on software<br />Format and layout of the documents<br />- 4 -<br />
  5. 5. Creation<br />Adaption of Standard Descriptive Languages<br />Standard groups incorporate XML and RDF architectures <br />Attachment of Metadata on Digital Contents<br />- 5 -<br />
  6. 6. Acquisition and Collection Development<br />Three main aspects to acquisition of digital objects <br />Collection policies<br />Gathering methods<br />Intellectual Property Concerns<br />- 6 -<br />
  7. 7. Establishment of Collection Policies<br />Collection policies<br />Selecting What to Archive<br />Purpose<br />For Dark Archiving: Back issue<br />For Light Archiving: Current issue<br />Criteria <br />Easiness of Content Acquisition<br />Quality of Contents <br />Utilization<br />On-going access fee<br />Content Type Coverage: E-journals/R&D Reports/Patents/Scientific Data<br />Determining Extent<br />Archiving Links<br />Refreshing the Archived Contents<br />- 7 -<br />
  8. 8. Considerations on Gathering Method<br />Gathering methods<br />Hand selection<br />Value Judgment and Retention Scheduling (Edinburgh University Library)<br />Not preserved <br />Preserved for defined period <br />Preserved indefinitely <br />Automatic selection<br />National Library of Sweden: Automatic acquisition without making value judgment (priority: periodicals, static documents, HTML pages >> conferences, usenet groups, ftp archives)<br />EVA projects: Establishment of time limits to avoid the overloading<br />- 8 -<br />
  9. 9. Considerations on Intellectual Property Concerns<br />Reliance on Legislation<br />Freedom of Information Act 2001<br />The public may have unrestricted access to certain records. <br />(Consider what categories of information may need to be viewed by the public - these records need to remain accessible at all times.)<br />In general, due to absence of international digital deposit legislation<br />PANDORA project seeks permission from the copyright owner<br />Swedish and Finnish national library projects do not contact the owners<br />Making Agreement with Content Providers<br />E-journal: Publishers or academic associations<br />CLIR/DLF draft model license, NESLi2 Standard license model<br />Agreement of Cornell University with publishers<br />Government document: Open to public<br />Scientific data: individual creators or data centers<br />Arts and Humanities Data Service provide information on what is needed for a digital archive and what creators are likely to be willing to deposit<br />- 9 -<br />
  10. 10. Agreement of Cornell University with Publishers<br />Topics identified in the agreement(Thomson and Kroch, 2000)<br />The general responsibilities of the publishers and Cornell <br />Characteristics of the data, accompanying metadata, and any additional documentation that are to be deposited <br />Guidelines on transmission methods and media for deposit <br />Procedures for the deposit <br />Procedures and protocols Cornell will use to verify the arrival and completeness of the data <br />Rights of the depositing organizations to audit the repository <br />The respective roles, responsibilities, and rights of the Cornell and the data producers with regard to the data <br />Articulation of Cornell's responsibilities and capabilities with regard to the accessioning, description, management, and even transformation of the deposited data <br />Access policies for users of the repository, and how they may vary over time <br />Conditions on the use of the data, and again how they may vary over time <br />Fees (if any) associated with the deposit <br />Cornell's ability to share the data with partners to create an agreed-upon level of redundancy <br />Clarification of issues surrounding copyright retained by authors <br />- 10 -<br />
  11. 11. Identification and Cataloging<br />Identification<br />Provision of a unique key for finding the digital object and linking object to other related objects<br />Cataloging in the form of metadata<br />Support for organization, access and curation<br />- 11 -<br />
  12. 12. Persistent Identification<br />Problems in using URL as Identifier<br />Use of server as location identifier can result in lack of persistent over time both for the source object and any linked objects<br />Continuous use of URL<br />New approaches on persistent identification<br />OCLC: PURLs<br />ACS: Digital Object Identifier (DOI), MN (Manuscript Number)<br />DTIC: Handle® system<br />AAS: Bibcode, PubRef numbers<br />- 12 -<br />
  13. 13. Creation of Metadata at Cataloging Stage (1/3)<br />Creation Method of Metadata<br />Manual creation of metadata<br />Automatic generation of metadata<br />A project by US Environmental Protection Agency<br />Defense Information Technology Testbed project<br />- 13 -<br />
  14. 14. Creation of Metadata at Cataloging Stage (2/3)<br />Formats of Descriptive Metadata<br />E-journal<br />Full MARC cataloging <br />Traditional library cataloging standards<br />NLA’s PANDORA Archive<br />Current development of descriptive metadata standards<br />MARCXML, MODS(Metadata Object Descriptive Schema)<br />Web-based resources <br />Dublin Core-like format <br />EVA project<br />Non-textual data<br />Identification of metadata elements needed for non-textual data types such as images, video, multimedia and others<br />Z39.87 NISO/AIIM Technical metadata for digital still images<br />AES X089 core audio metadata<br />- 14 -<br />
  15. 15. Creation of Metadata at Cataloging Stage (3/3)<br />Management of Heterogeneous Metadata Format<br />Translation between various metadata formats<br />Key to the development of networked, heterogeneous archives<br />Adaption of packaging metadata standards<br />Open Archival Information System (OAIS) Reference Model<br />Is developed by ISO Consultative Committee for Space DataSystems<br />Encapsulates specific metadata as needed for each object type in a consistent data model<br />Metadata Encoding and Transmission Standard (METS) <br />Is produced by Library of Congress Standards Office and Digital Library Federation<br />Provides framework for holding all types of metadata for digital object<br />Others<br />MPEG-21 Digital Item Declaration Language<br />IMS Global Learning Consortium Content Packaging Standards<br />Sharable Content Object Reference Model (SCORM)<br />CCSDS XML Packaging scheme<br />- 15 -<br />
  16. 16. Development of Technical Model for Storage<br />Recommendation for Developing a technical model for the repository (Cornell University)<br />Establishing a baseline of e-journal software and file format needs <br />Specify the archival repository<br />Specifying monitoring tools that will flag documents within the repository that require migration<br />Specifying a baseline hardware and software infrastructure to house the repository<br />Exploring the need and implementation models for redundancy in the repository<br />- 16 -<br />
  17. 17. Issues on Changing Storage Media<br />Problem of changing storage media<br />Block size, tape size and tape drive mechanism have changed over time.<br />Common Solution<br />Data migration to new storage systems<br />Much cost and imperfect transferring system is still an issue.<br />Check/validation algorithms are extremely important<br />Manual check is still necessary.<br />Atmospheric Radiation Monitoring Center plans to migrate to new storage systems every 4-5 years<br />Each data migration will take 6-12 months<br />- 17 -<br />
  18. 18. Issues on Terabytes of Data Storage<br />Problem of dealing with large-scale data<br />Extensive validation routines to ensure the quality of the information as the information is migrated<br />NCBI has 30 Ph.D.s reviewing the information manually, even after it has passed a variety of validation algorithms<br />Similar cost has been spent for<br />Corrections and additions to particular records<br />Maintenance of a history of changes<br />Approval by the owner of all changes controlled by NCBI<br />Common Solution<br />Large-scale data can be stored in different file formats<br />Biological sequence data is held in simple ASCII files for preservation purposes.<br />Data in a structured database is provided for searching, reporting and maintenance<br />Extensive tasks can be transitioned to a non-profit consortia<br />Protein Data Bank: Collaboratory for Structured Bioinformatics <br />- 18 -<br />
  19. 19. Preservation<br />Long-term preservation<br />No common agreement on the definition of long-term preservation<br />Main aspects on preservation<br />Selection of digital preservation strategies/technologies<br />Cycle for hardware/software migration <br />No specific investigation on the cycle for hw/sw migration has been done.<br />Depending on the particular technologies and subject disciplines, it can be vary from 2 to 10 years.<br />Preservation of the “look and feel” of digital contents<br />- 19 -<br />
  20. 20. Digital Preservation Strategies<br />Bitstream Copying<br />Refreshing<br />Durable/Persistent Media<br />Technology Preservation<br />Digital Archaeology<br />Analog Backups<br />Migration (SW, HW migration)<br />Replication<br />Reliance on Standards<br />Normalization<br />Canonicalization<br />Emulation<br />Encapsulation<br />Universal Virtual Computer<br />- 20 -<br />
  21. 21. Hardware and Software Migration<br />Problems on Migration<br />Migration is not guaranteed to work for all data types<br />Migration of information products having used sophisticated software feature is unreliable<br />Generally, there is no backward compatibility, and if it is possible, there is certainly loss of integrity in the result.<br />Emulation as an alternative to migration<br />Encapsulates the behavior of the hardware/software with the objects<br />MS Word 2000 document with metadata indicating how to reconstruct the document at the engineering level<br />Creates an emulation registry identifying the HW/SW environment and providing information on how to recreate the environment<br />- 21 -<br />
  22. 22. Advantages and Disadvantages of Preservation Strategies<br />- 22 -<br />
  23. 23. Selection of Preservation Strategies<br />A schematic diagram for selection of preservation techniques of digital information. <br />(Lee et al, 2002)<br />- 23 -<br />
  24. 24. Preservation of the Look and Feel<br />Format of materials <br />In order to save the “look and feel” of material<br />TIFF<br />The most prevalent for those organizations involved with the conversion of paper back file<br />E.g.) JSTOR<br />This does not allow the embedded references to be active hyper links<br />SGML/HTML<br />Used by many large publishers after years of converting publication systems from proprietary format to SGML<br />American Astronomical Society (AAS)<br />PDF<br />The most prevalent format for purely electronic documents used for both formal publications and grey literature<br />National Library of Sweden<br />Concerns remain for long-time preservation<br />It may not be accepted as a legal depository form because of its proprietary nature<br />- 24 -<br />
  25. 25. Normalization vs. Native Formats<br />Normalization<br />Process of converting the native format to a standard format<br />AAS, ACS transform the incoming file into SGML-tagged ASCII format<br />Electronic master copy is able to serve as the robust electronic archival copy.<br />Well-tagged copy can be updated periodically, at very little cost.<br />It takes advantage of advances in both technology and standards.<br />Content remains unchanged, but the public electronic version can be updated to remain compatible with the browsers and other access technology<br />Examples of data normalization provided data community<br />NASA Data Active Archive Centers<br />Transform incoming satellite and ground monitoring information into standard Common Data Format<br />U.K’s National Digital Archive of Datasets<br />Transforms the native format into one of its own devising<br />Normalized formats are considered to be the archival versions<br />Intellectual property question<br />- 25 -<br />
  26. 26. Reliance on Standards<br />Emphasis on Standards<br />DOE OSTI <br />Limited the number of acceptable input formats<br />Text in SGML (and its relatives HTML and XML), PDF, WordPerfect and Word.<br />Image in TIFF Group4 and PDF Image<br />- 26 -<br />
  27. 27. Preservation Strategies Used in Major Projects<br />- 27 -<br />CSI: CISTI Csi, ECO: OCLC Electronic Collections Online, EJO: Ohio LINK Electronic Journal Center <br />KB: KB e-Depot, KOP: Kopal DDB, LA: LOCKSS Alliance, LANL: Los Alamos National Laboratory Research Library, <br />NLA: National Library of Australia PANDORA, OSP: Ontario Scholars Portal, PMC: PubMed Central, PORT: Portico<br />
  28. 28. Issues on Access<br />Access Mechanisms<br />Access and display mechanisms<br />Providing access<br />Restricting access<br />Rights Management and Security Requirements<br />Security and version control<br />Creation metadata to manage encryption, watermarks, digital signatures<br />- 28 -<br />
  29. 29. Access Mechanisms<br />Providing Access <br />NLM’s Profiles in Science<br />Creates an electronic archive of the photographs, text, video, etc<br />Electronic archive is used to create new access versions as access mechanisms change<br /> Providing access technologies<br />Super Distribution<br />Value-chain support<br />Restricting Access<br />Usage rule<br />Persistent protection<br />- 29 -<br />
  30. 30. Access<br />Rights Management and Security Requirements<br />Most difficult access issues for digital archiving<br />Security and version control impact digital archiving<br />Right management includes providing or restricting access as appropriate<br />Content protection technologies<br />Contents Encryption<br />Trusted Environment<br />Metadata for managing encryption, watermarks, digital signatures needs to be created.<br />- 30 -<br />
  31. 31. References<br />CLIR, 2002. The State of Digital Preservation: An International Perspective [online] [cited 2009-07-23] <br />Hodge, 2000. Best Practices for Digital Archiving: An Information Life Cycle Approach, D-Lib Magazine:6(1) [online] [cited 2009-07-23] < http://www.dlib.org/dlib/january00/01hodge.html><br />Hodge et al, 2004. Digital Preservation and Permanent Access to Scientific Information, [online] [cited 2009-07-23] <br />ICPSR, 2009. Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems [online] [cited 2009-12-03] http://www.icpsr.umich.edu/dpm/index.html<br />Kenney, A. R., Entlich, R., Hirtle, P. B., McGovern, N. Y. and Buckley E. L., 2006. E-Journal Archiving Metes and Bounds: A Survey of the Landscape[online] [cited 2009-12-03] <br />Lee, K., Slattery, O., Lu, R., Tang, X. and McCrary, V. 2002. The State of the Art and Practice in Digital Preservation, Journal of Research of the National Institute of Standards and Technology: 107(1), 93-106.<br />Thomas, S. E. and Kroch, C. A. 2000, Project Harvest: The Cornell University Library's Proposal to The Andrew W. Mellon Foundation To Develop a Repository for E-Journals, [online] [cited 2010-03-26] <http http://www.diglib.org/preserve/cornellprop.htm ><br />Edinburgh University Library Digital Archives Research Project. A report and recommendations<br />- 31 -<br />

×