IEDA Overview & Updates, March 2014


Published on

IEDA Overview presentation given by Kerstin Lehnert at the March 2014 IEDA Policy Committee Meeting. Location: NSF, Arlington, VA.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Development, governance, and promotion of domain-specific, community-based standards for data and metadataProvenance documentation, uncertainties, semantics (vocabularies, taxonomy), formatsData qualityQuality assessment & control of ingested & served dataScience-driven software tools for data discovery, access, and analysisHarmonization & integration of data for advanced data mining and analysis (data products)User support/training for data managementMapping of data to standards-based interfaces for interoperability
  • IEDA is a data facility that hosts observational solid earth data andtools from the marine, terrestrial, and polar environments.¤ Multiple diverse data systems that were developed independently,serving both¤ sensor data from large collaborative cruise programs¤ sample-based measurements from unique analytical laboratories¤ IEDA data systems enable the data to be discovered and reused.
  • IEDA Overview & Updates, March 2014

    2. 2. IEDA Supports the Full Data Life Cycle 2
    3. 3. Domain-Specific Data Stewardship 3 • Domain-specific guidelines, templates, software tools, and user support/training that facilitate data submission • including domain-specific tools for data management planning and compliance reporting • Development, maintenance, and promotion of domain- specific, community-based standards for data and metadata • Provenance documentation, uncertainties, semantics (vocabularies, taxonomy), formats • User interfaces optimized for science questions • Harmonization & integration of data for advanced mining & analysis • Access to external data in relevant otherMapping of data to standards-based interfaces for interoperability
    4. 4. Domain-specific Repository Science Community Central Role of Discipline-specific Repositories 4 Libraries Archives Computer Science Publishers, e ditors Metadata registration Software (tool) development Interoperability Data policies Persistent access Bibliometrics Data Curation Data access & discovery Data products Data harmonization (standards) User Support Funding Agencies Data Facilities Registries
    5. 5. IEDA Foci Data Discovery & Access Data Preservation & Curation Data Analysis Investigator Support • QA/QC, documentation • Persistent identification (DOI) • Long-term archiving 5
    6. 6. Marine Geoscience Data System 6 Data Collections and Custom Data Access: • GeoPRISMs • Ridge2000 • MARGINS • Academic Seismic Portal (ASP) • Antarctic and Southern Ocean (ASODS) • Metadata Catalog and File repository • Catalog inventory > 0.5 million files, 47 TB, 2,500 programs
    7. 7. EarthChem Library 7 • Repository for geochemical data • analytical data sets • syntheses • models • reports • Online data submission • Templates for data annotations • Quality control following the Editors Roundtable best practices
    8. 8. IGSN / SESAR • IGSN: Unique, persistent, resolvable identifiers • SESAR: registry of samples in the Earth Sciences • Searchable catalog of samples across the Earth Sciences • Preservation and persistent access of sample metadata • Used across all Earth Science communities that deal with samples • User services for sample metadata management • submission, editing, transfer of ownership, tracking of subsamples, etc. • International governance by the IGSN e.V. • non-profit organization, founded in 2011, registered in Germany • currently 13 members (4 new members in 2013) 8
    9. 9. IEDA Foci Data Discovery & Access Data Preservation & Curation Data Analysis Investigator Support • Web-based User interfaces (specialist & non-specialists) • Programmatic access interfaces (interoperability) • GeoMapApp, GoogleEarth, etc. • Links to the literature 9
    10. 10. New in 2013: IEDA Data Browser 10
    11. 11. IEDA Foci Data Discovery & Access Data Preservation & Curation Data Analysis Investigator Support • Visualization tools (GeoMapApp, Virtual Ocean, Earth Observer) • Syntheses & Products 11
    12. 12. GlobalMulti-Resolution TopographySynthesis 12 Compilation of multi-beam sonar data collected by scientists and institutions worldwide, edited and merged into a single continuously updated compilation of high- resolution seafloor topography.
    13. 13. Global Synthesis of rock compositions (EarthChem, PetDB) 13 • Map of basalt samples from mid-ocean ridges • Color scaled to the 87Sr/86Sr ratio measured on these samples
    14. 14. IEDA Foci Data Discovery & Access Data Preservation & Curation Data Analysis Investigator Support • Web-based data submission • Data Management Plan tool • Data Compliance Report tool • Community 14
    15. 15. Use of DMP Tool 15 Target Program Count other 27 OIA 3 OCE 134 OPP 11 SBE 1 BIO 5 EAR 94 Total 275
    16. 16. IEDA Infrastructure • Cooperative Agreement with NSF • Sustainable funding • Formal community governance & guidance • Professional data management policies & procedures • Persistent identification of data & samples (DOI, IGSN) • Standards-compliant metadata catalog • Long-term archiving agreements with National Geophysical Data Center & Columbia University Libraries • Risk management • “Accreditation” as member of the World Data System • Disciplinary expertise 16
    17. 17. System Usage • # of unique visitors to the IEDA web site increased by 251% • 7998 unique visitors between Oct 2012 and Sept 2013 • primary pages accessed: Data Management Plan & IEDA collections. 17 Results of the user survey of the project “Stakeholder Alignment in the Geosciences: Assessing the Potential Impacts of EarthCube”, showing that IEDA ranks with top 5-8 most cited data sources in the Earth Sciences J. Cutcher-Gershenfeld, presentation at the EarthCube Domain End-user workshop for Paleogeoscience, February 2013
    18. 18. Downloads from IEDA Systems 18 Data Collection Year 2 Year 3 PetDB 2166 2326 SedDB 52 200 EarthChem Library 95 401 EarthChem Portal (1153) 567 MGDS 5,049 4,331 GMRT 7,200 10,177
    19. 19. Citations of IEDA Systems 19 0 100 200 300 400 500 600 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 PetDB + EarthChem MGDS + GMRT + GMA
    20. 20. IEDA Data Publication 20
    21. 21. IEDA Data Publication: “Best Practice” 21 JournalTrusted Data Repository ArticleData File Reciprocal citation by DOI
    22. 22. Data Publication Flow Synthesis databases Journal Portal IEDA Metadata Catalog Data Manuscript DOI linking Review IEDA Data Managers Editors Submission Publication Integration IEDA Data Managers 22
    23. 23. Links to Journals 23
    24. 24. Links to Journals 24
    25. 25. New: Linking with Data Journals 25
    26. 26. New: Linking Samples, Data, & Publications 26
    27. 27. Future Capabilities 27 Mockups by Elsevier Developer Beate Specker
    28. 28. Editors Roundtable • Based on the Editors Roundtable in Geochemistry (2007/8) • policy recommendations for reporting of geochemical data • Goal: Establish an ongoing forum for information exchange between editors, publishers, professional societies, and data facilities • regular meetings at major conferences • wiki (knowledge hub) for best practices, guidelines, capabilities for data publication and data citation, • focus on domain-specific requirements, practices, data facilities, etc. • Will be international and independent of a specific institution or society (ESIP?) • Could serve as a role model for other disciplines 28
    29. 29. IEDA Data Rescue Initiative • preserve valuable legacy data sets that are in danger because of impending retirement or degradation • augment data collections maintained by IEDA • improve procedures and tools for user contributions • 2013 International Data Rescue Award in the Geosciences • IEDA Data Rescue Mini-Awards • Data Rescue Process Study (collaboration with Elsevier Research Data Services) 29
    30. 30. IEDA Data Rescue Mini-awards 30 379 409 380 381 378 382 411 417 711 413 414 410 412 415 416 418 420 419 422 421 408 15427, 72 neg. 15’ Delano J, Hauri E, Saal A, Shearer C: “Geochemistry of Lunar Glasses” Gill J: “Geochemical & geochronological data from Fiji,IBM, and Endeavor segments” Tivey, M:“Near-bottom Magnetic Data Rescue”
    31. 31. Lessons Learned 31 • Investigators Lessons • Take ownership of your own legacy • Data curation by others may not be complete or correct • Data rescue of an entire career does not need to be overwhelming • Start with small steps • Disciplinary repositories will help and guide you to what is needed • Despite the time investment, data rescue is worth it • Others will now be able to re-use the data • Notes taken years ago actually explain anomalies • Repository Lessons • For Long Tail Data, every project is different • A small incentive will motivate investigators • Data Rescue missions help the repository determine next steps for development of tools and services
    32. 32. • $5,000 award (sponsored by Elsevier) plus trophy • International jury • 16 submissions 32
    33. 33. Award Ceremony at AGU 2013 33
    34. 34. 34
    35. 35. Collaborations • New subawards • to UTIG for ASP@UTIG • to M. Ghiorso (OFM-Research) to migrate LEPR data system into IEDA infrastructure (includes Trace KD database developed by Roger Nielsen) • Industry collaborations • Elsevier funds Data Rescue Process Study • ESRI will help with GeoMapApp • EarthCube projects 35
    36. 36. EarthCube Projects • “Deploying Web Services Across Multiple Geoscience Domains” • BB, lead: T.Ahern, IRIS; IEDA co-PI: Carbotte): The project is focused on developing web services for broadening access to data collections of IRIS, IEDA, UNAVCO, UCAR, Caltec, and SDSC by other disciplines. (main) • “Community Inventory of EarthCube Resources for Geosciences Interoperability (CINERGI)” • BB, lead: I. Zaslavsky, SDSC/UCSD; IEDA co-PI: Lehnert): The project focuses on developing an inventory of EarthCube resources, including data systems, standards, services, etc. • “Leveraging Semantics and Crowdsourcing in Data Sharing and Discovery” • BB, lead: T. Narock, University of Maryland, IEDA co-PI: R. Arko: The project focuses on applying Semantic Web technologies, including Linked Data, to support sharing and integration of ocean science data sets. • “C4P: Collaboration and Cyberinfrastructure for Paleobiosciences” • RCN, lead: K. Lehnert, IEDA; project focuses on advancing cyberinfrastructure for paleobiosciences • “Building a Sediment Experimentalist Network (SEN)” • RCN, lead: W. Kim, UT Austin; IEDA co-PI: Hsu • “EarthCube Test Enterprise Governance: An Agile Approach” • Test Enterprise Governance, lead: L. Allison, University of Arizona; IEDA sub-awardee: Lehnert 36
    37. 37. Council of Data Facilities “The mission of the Council of Data Facilities is to serve in a coordinating and facilitating role” • Provide a collective voice on behalf of the member data facilities to the NSF and other foundations and associations, as appropriate. • Identify, endorse, and promote standards and best or exemplary practices in the organization and operation of a data facility. • Identify and support the development and utilization of shared infrastructure services, including computing services, professional staff development and training services, and related activities. • Foster innovation through collaborative projects. • Collaborate with standard-setting bodies with respect to standards for data sharing and interoperability, metadata, and related matters. 37
    38. 38. Council of Data Facilities • Definition: “A data facility is eligible for membership in the Council if it acquires, curates, preserves, and/or disseminates data, software, models and data services for one or more defined communities in the geosciences.” • Category A: NSF-funded not-for-profit or academic data facilities • Category B: Federally Funded Research and Development Centers (FFRDCs) and other federal, state, and local data facilities. • Category C: International, private, and other not-for-profit or academic data facilities.. • Category D: Associate members • Membership categories A, B, and C are all voting members of the Council, with each member sending one designated representative to the General Assembly. 38
    39. 39. Council of Data Facilities • provide advice and guidance to the NSF via the Council’s Executive Committee on matters pertaining • identify and develop opportunities for collaboration (shared infrastructure, professional development of staff, etc.) • contribute to the development of geoscience cyberinfrastructure standards and identified best practices and their implementation or adoption, and help ensure compliance and integration into architectures and workflows in their respective facilities • educate other members of the Council on new developments relevant to data centers in their respective fields, disciplines, and domains (international, private foundation, etc.). 39
    40. 40. IEDA: A Multi-Disciplinary Microcosm 40 • geochemistry, marine geophysics, marine geology, geochronology, and more • sensor data versus sample-based observations & experiments • raw data (e.g. multi-beam), field data, lab data, derived data, samples • gridded data, point data, time-series data, maps, photos, and more • file sizes vary from a few kilobytes to terabytes