• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Rdap12 wrap up reagan moore
 

Rdap12 wrap up reagan moore

on

  • 807 views

Presentation at Research Data Access & Preservation Summit

Presentation at Research Data Access & Preservation Summit
23 March 2012

Statistics

Views

Total Views
807
Views on SlideShare
807
Embed Views
0

Actions

Likes
1
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Rdap12 wrap up reagan moore Rdap12 wrap up reagan moore Presentation Transcript

    • RDAP Summary Topics that drive future digital libraries Reagan Moore4/4/2012 ASIST RDAP 2012 1
    • Topics• Data Management Plans and Policies – Scientific research data support – Planning for NSF Data Management Plans• Data Citation Panel – Digital identifiers – Data representation (context)• Curation Service Models – Institution-based repositories• SIG-DL Sustainability Panel – Cost model – Business model• Training Data Management Practitioners – Theory for information and knowledge, but not digital data – Teaching eScience librarians how to manage data for researchers4/4/2012 ASIST RDAP 2012 2
    • Data Management Plans• Enforcement of regulations: – IRB, FERPA, HIPAA• Enforcement of agency policies: – NSF Data management plans• Enforcement of institutional policies: – Trustworthiness• Compliance with community consensus on collection properties – Compliance with standards for discovery and access• Enforcement of management policies: – Integrity, authenticity, retention, disposition, replication• Automation of administrative tasks – Migration• Validation of assessment criteria4/4/2012 ASIST RDAP 2012 3
    • Data Identifiers• Generate identifiers that are location independent – Handle system, hash function – Data management system updates link from identifier to representation of location (replicas)• Given an identifier, what does it represent – Landing page that provides context for the data – Data model that approximates data in space and time – Direct access to the data – Access to procedure that generates the data4/4/2012 ASIST RDAP 2012 4
    • Data Identifiers• For derived data – NASA Level 0 – raw data – NASA Level 1 – Calibrated – NASA Level 2 – Transformed to physical quantities – NASA Level 3 – Functional transformations, projections• Can we identify the process that created the data – Generalization of workflow provenance – Re-execute the workflow to re-create the data• Create identifier for the workflow – Need workflow virtualization• Reproducible science4/4/2012 ASIST RDAP 2012 5
    • Curation Service Models• Driven by user requirements – Unique services for each science and engineering domain – Different data formats, data analyses, semantics• Can generic software support each unique collection? – View curation as a continuum with varying policies and procedures for each stage of the data life cycle – Characterize domains by access methods, policies, and procedures• Are there standard best practices for a data center? – Data colocation – minimize administrative costs – Evolution of center to broaden range of supported communities 4/4/2012 ASIST RDAP 2012 6
    • Standard Services• Data discovery• Data access• Data manipulation – Re-creation of derived data products – Transformation – Feature detection – Indexing – Representation – fit polynomial in space and time • Manipulate data based on polynomial4/4/2012 ASIST RDAP 2012 7
    • Sustainability• Business models – Identification of a sustaining community – Quantification of benefit• Cost model – Distribution of cost across entire community – Membership fee – Pro-rated per item cost• Minimizing cost – Automate curation – Transfer curation tasks to submitter – FITS file (astronomy) • Metadata for project/observatory • Metadata for each image4/4/2012 ASIST RDAP 2012 8
    • Creating a Repository• Identify a support community – Tie to requirements of researchers – Tie to new science and research initiatives – Tie to intellectual capital of the university• Identify cost benefit – Co-location of services – Benefit of scale• Demonstrate responsiveness – Support for users4/4/2012 ASIST RDAP 2012 9
    • Educating Next Generation• Identify a motivating challenge• Curriculum development – Coupling of research to education – Competency in scientific data management and technology• Data intensive science – Interest driven by a domain – Multi-disciplinary problems – Treat as a skill• Work with live data – Enable students to make a discovery 4/4/2012 ASIST RDAP 2012 10
    • Data – Information – Knowledge (iRODS)• Data – instantiation of an approximation to reality – Form of representation of reality – Requires description of the physical approximation (context)• Information – application of label to data – Requires identification of the relationships that must be satisfied for the label to be applied – Reification of knowledge (extraction of features)• Knowledge – relationships between labels – Requires procedures to parse data to see if relationships are present• Data science – transformation of data into knowledge – Use case driven4/4/2012 ASIST RDAP 2012 11
    • Digital Library Evolution• Witnessing rapid evolution of digital libraries – Item level indexing – Item level searching – Data manipulation services• Driven by scale – Completeness of semantics • Represent every word in the English language (15 million) • Represent cultural knowledge (~ 1 Tbyte) – Types of reified relationships • Index based on more than 100 relationships present within documents (IBM-Watson) • Spatial, temporal, organizational, familial, … – Ability to couple indexing to data within storage4/4/2012 ASIST RDAP 2012 12
    • Vision• Dynamic digital library – Continually extract features from data – Generate index based on features within the data• Create knowledge base – Link local index to community index• Support evolution of the library – Define new relationships – Analyze contents – Generate new index4/4/2012 ASIST RDAP 2012 13
    • Implications• Characterize scientific data by the workflow that creates the published version – Transform from a library of data files into a library of workflows• Support re-execution of workflows – Modify input parameters, generate new version• Generate discovery semantics (features) through reification of relationships – Must be able to parse each file – Create algorithm that tests for the desired relationship – Apply algorithms within storage systems – Build terabyte index of reified relationships for each storage system4/4/2012 ASIST RDAP 2012 14
    • Virtualization• Digital library represents data as searchable metadata• Collection virtualization defines and manages the properties of the collection – Assertions about each file in the collection – Location independent naming and access – Management of state information• Workflow virtualization defines the properties of procedures – Provenance information for each procedure – Location independent naming and execution – Management of state information4/4/2012 ASIST RDAP 2012 15
    • Digital Library in 2050• Links contents to cultural knowledge – Terabyte indices• Enables analysis of library contents – Feature detection services• Provides workspace in which research is conducted – Coupling of processing to data storage• Validates assertions about collection properties – Published policies• Scalable infrastructure4/4/2012 ASIST RDAP 2012 16