Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability


Published on

This series of presentations was given at the EarthCube Data Facilities End-User Workshop held January 15-17, 2014 in Washington, DC. This workshop provided a forum to discuss the unique requirements and challenges associated with developing the communication, collaboration, interoperability, and governance structures that will be required to build EarthCube in conjunction with existing and emerging NSF/GEO facilities.

This panel and discussion, specifically, outlined and explained several current concepts in data sharing and interoperability, featuring presentations by:

Paul Morin (UMN): Polar Cyberinfrastructure
Don Middleton (UCAR): Atmospheric/Climate
Kerstin Lehnert (LDEO): Domain Repositories & Physical Samples
David Schindel (CBOL, GRBio): Biological Perspective & Collections
Hank Leoscher (NEON): Observation Networks
Daniel Fuka (Virginia Tech) and Ruth Duerr (NSIDC): Brokering
Ilya Zaslavsky (UCSD): Cross-Domain Interoperability

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • For example…. Being able to position ourselves globally…NEO Task Force Assessment Working Group (AWG) ,first National assessment by July 1, 2012,“Societal Benefit Areas” (SBAs) are the organizing construct12 SBA Teams + 1 Reference Measurements Team
  • For example…. Being able to position ourselves globally…NEO Task Force Assessment Working Group (AWG) ,first National assessment by July 1, 2012,“Societal Benefit Areas” (SBAs) are the organizing construct12 SBA Teams + 1 Reference Measurements Team
  • We will take several columns from this diagram
  • Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

    1. 1. PANEL DISCUSSION – CURRENT CONCEPTS IN DATA SHARING & INTEROPERABILITY EarthCube Data Facilities Workshop Wednesday, January 15th 2014
    2. 2. White Island AWS D-10
    3. 3. Archive includes 5 satellites New tasking is WV-1 and 2 Worldview 2 Geoeye Quickbird Ikonos Worldview 1
    4. 4. 7 PGC Imagery Viewers · June 24, 2013
    5. 5. 8 PGC Imagery Viewers · June 24, 2013
    6. 6. 9 PGC Imagery Viewers · June 24, 2013
    7. 7. 10 PGC Imagery Viewers · June 24, 2013
    8. 8. Data System Interoperability and Standards for UCAR/NCAR and Collaborative Activities August 13, 2013 Data Facility Workshop; Arlington, VA. Don Middleton (on behalf of many others) University Corporation for Atmospheric Research U.S. National Center for Atmospheric Research Computational and Information Systems Laboratory Boulder, Colorado, USA
    9. 9. Data Cyberinfrastructure for “Big Head” and “Long Tail” Scientific Research Research Data Archive Mauna Loa Solar Observatory High Altitude Observatory (HAO) Field Project Archive Earth Observing Lab (EOL) Earth System Grid Community Data Portal ACADIS Arctic Gateway Computational and Information Systems Laboratory (CISL) and Earth System Laboratory (NESL) NCAR Wyoming Supercomputing Center, Cheyenne. Disk, archive, and computational resources. ACADIS is joint venture of NCAR EOL & CISL, the National Snow and Ice Data Center, and UCAR Unidata UCAR Unidata netCDF, THREDDS, TDS, LDM, IDV, Ros etta These systems federate in various ways among themselves, across organizations such as as ACADIS, and with external programs such as GCMD, the UN/WMO WIS, ESGF, TIGGE, and others.
    10. 10. Automated Modeling and Observation Systems Federation with Other Systems (GCMD, WMO, ADE) Data Users and Publishers (SelfPub) RESTful Pub Services ACADIS Gateway Identity Management (OpenID, SAML) Discovery Services (Apache SOLR) Publishing Services Metadata and Database Services Catalog Harvester (OpenSearch, DIF, THRE DDS) Metrics Data Services, Access Control OAI-PMH Repository (DC, DIF, ISO) Core Technology •Spring Framework •Hibernate •Liquibase •Apache SOLR •OpenID4Java/OpenSAML •OAI-PMH, OpenSearch •ActiveMQ •FreeMarker •Java NetCDF Library •DOI’s via EZID/DataCite Bagit (from the LoC) EOL ACADIS Collections (via THREDDS) NSIDC Arctic Collections (via Brokering) Future Federated Collections NWSC GLADE ACADIS Arctic Collections RDB HPSS ACADIS is sponsored by NSF/GEO/PLR
    11. 11. The Chronopolis Data Preservation • A Consortium ofNetwork UCSD Libraries, SDSC, Univ. of Maryland, and NCAR • • • Using LoC Bagit for deposits Based on iRods and ACE (Audit Control Environment) TRAC-certified (i.e. ISO 16363)
    12. 12. Questions?
    13. 13. courtesy of: Lesley Wyborn, Geoscience Australia (talk at the IGSN workshop at IGC 2012) EarthCube Data Facilities Workshop 18
    14. 14.    Access to the physical samples is needed to verify & reproduce published observations. Access to sample metadata is needed for proper interpretation and re-use of sample-based data. Access to both is needed to facilitate sharing of samples for use & re-use. ▪ Samples are often expensive to collect (drilling, remote locations). ▪ Many samples are unique and irreplaceable. ▪ Re-analysis augments utility of existing data. EarthCube Data Facilities Workshop 19
    15. 15. Geochemistry  Structural Geology and Tectonics  Experimental Stratigraphy  Critical Zone Community  Envisioning a Digital Crust  Cyberinfrastructure for Paleogeoscience  Petrology and Geochemistry  Inland Waters  Deep Seafloor Processes and Dynamics  Coral Reef Systems Science  Geochronology   Rock Deformation and Mineral Physics Research EarthCube Data Facilities Workshop 20
    16. 16.  “Global Access to Global Collections: establish repositories for all physical samples and the biological, geochemical and physical measurements made from those samples.” (Paleogeoscience)  “Poor and uneven access and management of sample collections, incomplete sample tracking and linking of samples to analyses in the literature and databases, discoverability of existing samples” (Petrology & Geochem)  “Most geological terrains of interest do not have sufficient or even sample density through space and time.” (Petrology & Geochem)  “Central archive of experimental samples with integrated workflows, database templates, and community-wide DOI system for samples” (Mineral Physics & Rock Deformation) EarthCube Data Facilities Workshop 21
    17. 17. EarthCube Data Facilities Workshop 22
    18. 18. Infrastructure and resources for preservation and access of physical samples  Tools for repositories to efficiently manage and improve online access to their collections.  Online registry for discovery, access, and preservation of sample data & metadata  Best practices & standards   for sample curation and sample sharing  for sample data & data exchange  Funding strategies, business models EarthCube Data Facilities Workshop 23
    19. 19.  A multi-institutional initiative to build a “Digital Environment for Sample Curation”  to advance access and re-use of physical samples  to support and simplify the work of curators  to advance best practices, standards, & policies for sample curation, distribution, attribution, and citation EarthCube Data Facilities Workshop 24
    20. 20.  Physical collection facilities  NSF-funded repositories: LDEO, OSU, SIO, LacCore, WHOI, USPRR, UT Austin, ARF, and growing  State Surveys (AASG), USGS  Industry Data facilities & systems: IGSN/SESAR, IMLGS, USGIN  Computer & Information Science: RENCI, UT Austin  Biocollection informatics: iPlant, iDigBio  EarthCube Data Facilities Workshop 25
    21. 21. Curators (Admin GUI) Samplers (User GUI) Public (Admin GUI) DESC (data, tools, services) Data Systems IGSN Registry EarthCube Data Facilities Workshop Publications 26
    22. 22. US Interagency Working Group on Scientific Collections (IWGSC) • Covers all scientific disciplines • Created under White House S&T Council, reports to Life Sciences Subcommittee • ~10 participating Departments/Agencies • USDA and Smithsonian Co-chairs 2009 recommendations included: • Increase impact and improve management of collections • Clarify and standardize management and budgeting for collections • Create an online clearinghouse of information on Federal scientific collections • Covers all scientific disciplines • Created under OECD Global Science Forum • Independent project, no legal status • National and Institutional memberships • Governance by Executive Board • Secretariat Office at Smithsonian SciColl Priorities: • Develop first crossdisciplinary registry of objectbased scientific collections (GRSciColl) • Promote interdisciplinary research utilizing scientific collections
    23. 23. Global Registry of Scientific Collections (GRSciColl) GRSciColl Disease banks Veterinary samples Human medical samples Human artefacts And more, what else? Standards repositories Air, water, soil samples Rocks, sediment and ice cores Extraterrestrial samples SciColl and IWGSC ask: How can we connect collections across disciplines? Fossils and microfossils Microbes in BRCs Living material in genebanks, culture collections Plants and animals in zoos, botanical gardens, aquariums Plants and animals in museums, herbaria
    24. 24. Structure of GRSciColl Institutional Collection Table • Institution ID • Collection ID • Collection Name • Collection Discipline • Content Type(s) • Primary Contact Personal Collection Table • Institution ID = “Personal” • Collection ID • Collection Name • Collection Discipline • Content Type(s) • Primary Contact Institution Table • Institution ID • Institution Name • Institution Discipline(s) • Primary Contact Contacts Table • Contact Name • Primary Institution • Primary Collection • Additional Inst/Coll SciColl and IWGSC ask: What terms constitute the common vocabularies of discipline and content type?
    25. 25. INTEROPERABILITY PHILOSOPHY (OBSERVATIONAL INFRASTRUCTURE) Hank Loescher | National Ecological Observatory Network (NEON) Director Strategic Development | CEO Office
    26. 26. Get Specific Data Many respondents appeared to desire more specific details and expressed an interest in data communicated that can be readily used in their work.
    27. 27. Lots and lots of data… 9/2008 3/2010 10/2009 2/2011
    28. 28. Data as a National Resource NSF Director Suresh‟s emphasis on: • “Era of Observations” • “Era of Data and Information” March 2012: White House $200M “Big Data” initiative: • NSF • NIH • DOE • DOD • DARPA • USGS
    29. 29. The President’s Council of Advisors on Science and Technology (PCAST) The PCAST report (2011) urge that even as the government deals with our nation‟s economic challenges, it must: “…address the threats to both the environmental and the economic aspects of well-being that derive from the accelerating degradation of the environmental capital – the Nation‟s ecosystems and the biodiversity they contain”. PCAST New Directions…..
    30. 30. Global Themes – Global Observations Increasing importance on designing new x-discipline data structures to support policy/decision-making Societal Benefit Areas (SBAs) Agriculture Biodiversity Climate Disasters Ecosystems Energy Health Water Weather Essential Climate Variables (ECVs) Essential Biodiversity Variables (EBVs) Essential Carbon Variables (ECVs) Aligned with OSTP (NEO, US-GEO) NSF/EU Strategic Planning Aligned with GEO, GEO-BON, GCOS, Diversitas, WMO, WCRP, etc… Aligned with Suresh, S., 2012. Research funding: Global challenges need global solutions, Nature, 490, 337-338, doi:10.1038/490337a
    31. 31. Why Interoperability? • The rapid pace of large-scale environmental global changes underscores the value of accessible long-term data sets. • Natural, managed, and socioeconomic systems are subject to complex interacting stresses that play out over extended periods of time and space. • An era of large-scale, interdisciplinary science fueled by large data sets. • Data Interoperability enhances the value of current scientific efforts and investment. • Interoperability is needed to forecast future conditions for basic understanding, and for future planning, policy, and societal benefit. • Currently, there is no accepted approach to make large datasets interoperable • Provides new leadership opportunities for Scientists globally
    32. 32. Interoperability Philosophy - scientific utility 1. Linking Science Questions and Hypotheses and Requirements • • • • Mapping Questions to „what must be done‟ „how‟ data can/will be used jointly Defining Joint Science Scope Defines interfaces and Functionality 2. Traceability of Measurements • Use of Recognized Standards • Traceability to Recognized Standards, or First Principles • Known and managed signal:noise • Managing QA/QC • Uncertainty budgets (ISO traceable) 3. Algorithms/Procedures • What is the algorithm or procedural process to create a data product? • Provides “consistent and compatible” data • Managed through intercomparisons • What are their relative uncertainties? 4. Informatics • • • • • Standards - Data Formats Standards - Metadata formats Persistent Identifiers / Open-source /Policies Discovery tools / Dissemination / Discovery Ontologies, semantics and controlled vocabularies • Archival and Curation Activities • Providence
    33. 33. Interoperability Philosophy - scientific utility The degree to which Observatories are truly interoperable is the degree to which these four elements are adopted by collaborative facilities Signal:noise and uncertainty estimates must also be known in order for data to have broader, global utility and prognostic capability (ecological forecasting) Provides the frame for individual approaches and creativity, spans organizational and programmatic maturity Is a framework by which all parties can engage (policy and social dimension, incl) Facilitates establishing a Baseline/infrastructure with scientific creativity Real work, real tasks can be defined This Interoperability Framework is currently being implemented as part of a joint EU FP7 and US NSF Project called CoopEUS (
    34. 34. Frontiers - Interoperability Top-down Organizations Bottom-up Organizations European Union - ICOS European Union - Lifewatch Australia – TERN (EU) France – ANAEE Mexico and Canada – CarboNA / MexFlux Korea – KEON/KoFlux/AsiaFlux China – CERN iLTER - global
    35. 35. The National Ecological Observatory Network is a project sponsored by the National Science Foundation and managed under cooperative agreement by NEON Inc.
    36. 36. Stacking Environmental Observatories- SoS Other Terrestrial Datasets Biodiversity Observatories NEON
    37. 37. Stacking Environmental Observatories - SoS Biodiversity datasets Others NEON Collapse the layers
    38. 38. “Stuff” in the middle
    39. 39. NEON Interactions – Other Organizations The Type of Interaction and Efficacy is Dependant on the Organizational Development of the other Institution • Balancing Scientific Creativity vs. Baseline Infrastructure • Level of System Engineering Maturity • Base Capacity - Critical Mass • Cultural Sensitivity
    40. 40. BCube: A Broker Framework for Next Generation Geoscience Siri Jodha - PI
    41. 41. Brokering Framework Principles • A broker connects information resources by mediating interactions between those resources without requiring the maintainers of those resources to adapt their existing systems EPOS Workshop, Erice 2013
    42. 42. Brokers mediate between Service Buses Discover Evaluate Access Use
    43. 43. What if.... A scientist could find data and services that matched their interests as easy as subscribing to the news? Greenland 1 km DEM has been published A Digital Elevation Model (DEM) of Greenland acquired by A. Researcher is available in binary format at a 1 KM grid spacing in a polar stereographic projection ... more myData Greenland Ice Sheet Melt Characteristics Data updated Greenland Ice Sheet Melt Characteristics now available via OpenSearch API Preparing Data for Ingest, presented 10/27/09 by R. Duerr LID590DCL Foundations of Data Curation
    44. 44. What if.... Scientists could advertise AND INDEX their data so other scientists could find it AND REFERENCE IT, as simply as... 1 - Filling out a web form 2 - Saving it to your website 3 - Adding it's link to your site
    45. 45. BCube Broker • • • • • Service Bus Mediator Scientific Field to Field Translator Crawling, Advertising, (and Indexing)
    46. 46. Domain data repositories and cross-disciplinary data integration governance issues technical issues ILYA ZASLAVSKY AND THE EARTHCUBE CINERGI PROJECT (NSF ICER-1343816)
    47. 47. High-level inventory and readiness assessment: viewer
    48. 48. Community Inventory of EarthCube Resources for Geoscience Interoperability CINERGI data discovery is the most often cited issue in executive summaries on the EarthCube web site
    49. 49. Short questionnaire Potential added value by a cross-domain system Integration with cross-domain search Key characteristics for CINERGI Function Importance Making metadata from your facility available for search using standard metadata, via standard APIs 1 2 3 4 5 6 7 Unimportant Essential NA DK Tracking demand for and cross-domain usage of your resources 1 2 3 4 Unimportant NA 1 2 3 4 Unimportant NA Identifying issues related to data and metadata quality and completeness Tracking search hits that become searches for resources managed by your data facility Connecting owners of relevant datasets to your facility for potential longer-term data management Connecting data from your facility with people, publications, models, and projects Identifying communities using data, tools, and models from your facility Validating published metadata and service signatures from your facility Finding and reporting to you resources that appear as duplicates across multiple registries 5 6 7 Essential DK 5 6 7 Essential DK 1 2 3 4 5 6 7 Unimportant Essential NA DK 1 2 3 4 5 6 7 Unimportant Essential NA DK 1 2 3 4 5 6 7 Unimportant Essential NA DK 1 2 3 4 5 6 7 Unimportant Essential NA DK 1 2 3 4 5 6 7 Unimportant Essential NA DK 1 2 3 4 5 6 7 Unimportant Essential NA DK Comments