• Like
  • Save
Data Preservation
Upcoming SlideShare
Loading in...5

Data Preservation






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • 1
  • 2 The Data Intensive Computing Environment group at the San Diego Supercomputer Center has 16 full-time staff members, and 6-10 associated graduate students, working on topics from: - data handling systems (Wan, Rajasekar) - collection management (Rajasekar) - collection building (Kremenek, Zhu) - information management (Baru, Ludascher, Marciano) - knowledge management (Ludascher, Gupta) - presentation systems & GIS systems (Zaslavsky) - user interfaces (Cowart, Ludascher, Marciano, Zaslavasky, Zhu)
  • 11
  • 18
  • 17
  • 24
  • 26

Data Preservation Data Preservation Presentation Transcript

  • Preservation and Long Term Access to Data and Records in a Knowledge-based Society Reagan W. Moore San Diego Supercomputer Center [email_address] http://www.npaci.edu/DICE/
  • Data and Knowledge Systems Group
    • Staff
    • Reagan Moore
    • Ilkai Altintas
    • Chaitan Baru
    • Sheau Yen Chen
    • Charles Cowart
    • Amarnath Gupta
    • George Kremenek
    • M. Kulrul
    • Bertram Ludäscher
    • Richard Marciano
    • A. Memon
    • XuFei Qian
    • Roman Olshanowsky
    • Arcot Rajasekar
    • Abe Singer
    • Michael Wan
    • Ilya Zaslavsky
    • Bing Zhu
    • Graduate Students
    • A. Bagchi
    • S. Bansal
    • A. Behere
    • R. Bharath
    • S. Bharath
    • L. Sui
    • Undergraduate Interns
    • N. Cotofana
    • D. Le
    • J. Trang
    • L. Yin
    • +/- NN
  • Topics
    • Building persistent archives
    • Data grids
    • Authenticity mechanisms
    • Managing technology evolution
    • Knowledge-based access
  • Archival Processes
      •  Appraisal –determine the archivable content
      •  Accession - determine the initial physical location for the data, and the relationship of the new collection to existing collections
      • Arrangemen t - add administration control, describe the information content (provenance, authenticity, structure, administrative), and decompose digital objects into their components as needed.
      • Description - complete the definition of collection attributes by iterating between arrangement, reformatting, and representation.
      • Preservation – build an archivable form of the digital entities, characterize the collection context , and manage their storage
      •  Access – provide query mechanisms for discovering, retrieving, and presenting the digital entities.
  • ERA Concept model
  • Common Approach (digital library, persistent archive, data grid)
    • Logical name space used to organize digital entities, and associate attributes
    • Separation of information management from data storage management
    • Definition of abstraction mechanisms for dealing with repositories
    • Emergence of need for knowledge management
  • Java, NT Browsers Web WSDL Prolog Predicate SDSC Storage Resource Broker & Meta-data Catalog Levels of Abstraction Application HRM Clients Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python Unix Shell Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX C, C++, Libraries
  • Authenticity
    • Guarantee that the data has not been changed
      • Collection owned data, only accessible through the data handling system
      • Support roles defining access (curation, owner, annotation, read)
      • Support access controls mapping users to roles
    • Audit trails that record all operations on files
    • Digital signatures - cryptographic checksums
  • Managing Technology Evolution
    • Data grids provide interoperability mechanisms to access data in multiple administration domains and multiple types of storage systems.
    • Persistent archives migrate collections from old technology to new technology to support presentation on new systems
    • Both require the ability to access heterogeneous systems
  • Presentation of Digital Objects Storage System Operating System Application Digital Object Display System
  • Technology Management - Emulation New Storage System New Operating System Old Application Digital Object New Display System Wrap Application
  • Technology Management New Storage System New Operating System Old Application Digital Object New Display System Add Operating System Call
  • Technology Management Old Storage System New Operating System Old Application Digital Object Old Display System Add Operating System Call Add Operating System Call
  • Technology Management Migration New Storage System New Operating System New Application Digital Object New Display System Migrate Encoding Format
  • Technology Management - SDSC Old Storage System New Operating System New Application Digital Object Old Display System Wrap Storage System Wrap Display System Migrate Encoding Format
  • Accessing Archived Data
    • Name transparency
      • Access data without knowing the file name
      • Map from attributes to a local file name
    • Location transparency
      • Access data without knowing where it is stored
      • Map from global file name to local file name
    • Collection transparency
      • Access data without knowing the collection attributes
      • Map from concept space to collection attributes
  • Information Management- Logical Name Space
    • Set of attributes to describe digital entities that are registered into the logical name space
        • SRB metadata - Unix file system semantics
        • Provenance metadata - Dublin Core
        • Resource metadata - User access control lists
        • Discipline metadata - User defined attributes
    • Each digital entity may have unique attributes
  • Knowledge Management - Discovery across Collections
    • Mapping from collection attributes to discipline concepts
      • Make queries based on discipline concepts
    • Characterization of relationships between attributes
      • Semantic / logical - cross-walks
      • Procedural / temporal - records management
      • Structural / spatial - GIS
  • Knowledge Based Data Grids Attributes Semantics Knowledge Information Data Ingest Services Management Access Services (Model-based Access) (Data Handling System - SRB) MCAT/HDF Grids XML DTD SDLIP XTM DTD
      • Rules - KQL
    Information Repository Attribute- based Query Feature-based Query Knowledge or Topic-Based Query / Browse Knowledge Repository for Rules Relationships Between Concepts Fields Containers Folders Storage (Replicas, Persistent IDs)
  • Further Information http://www.npaci.edu/DICE