Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Earth Science Platform


Published on

Science platforms are made up of (at least) four planks: data formats, services, tools and conventions. I focus here on formats and conventions, specifically the HDF5 format, already used in many disciplines, and the Climate-Forecast and HDF-EOS Conventions. Many science disciplines have already agreed on HDF as the preferred format for storing and sharing data. It is well established in high performance computing and supports arbitrary grouping and annotation. Community conventions are critical for useful data on top of the format. The Climate-Forecast (CF) conventions were created for relatively simple gridded data types while the HDF-EOS conventions originally considered more complex data (swaths). Making simple conventions more complex makes adoption more difficult. Community input and the need for stable data processing systems must be balanced in governance of conventions.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Earth Science Platform

  1. 1. The Earth Science Platform Ted Habermann, Mike Folk, The HDF Group Conventions Tools Formats Services December 12, 2013 AGU, Fall 2013 1
  2. 2. Formats with HDF Inside HDF5 December 12, 2013 AGU, Fall 2013 2
  3. 3. High Performance / Parallel Computing Problem: Support I/O and analysis needs for stateof-the-art plasma physics code Novel Accomplishments:  Ran Trillion particle VPIC simulation on 120,000 hopper cores and generated 350 TB dataset  Parallel HDF5 obtained peak 35GB/s I/O rate and 80% sustained bandwidth  Developed hybrid parallel FastQuery using FastBit to utilize multicore hardware  FastQuery took 10 minutes to index and 3 seconds to query energetic particles  SC12 paper, XLDB 2012 poster Impact  Demonstrated software scalability for writing and analyzing ~40TB HDF5 files  Enabled novel discoveries in plasma physics *Vector Particle-in-Cell December 12, 2013 AGU, Fall 2013 3
  4. 4. Grouping Data and Metadata (HDF-EOS) HDF File with HDF-EOS Conventions Grids Points Swaths Zonal Averages Grid_1 Data Fields Grid_N Attributes Swath_1 Data Fields Swath_N Geolocation Fields Profile Fields Latitude Data Field.1 Data Field.1 Profile Field.1 Longitude Data Field.2 Data Field.2 Time Profile Field.2 Colatitude December 12, 2013 AGU, Fall 2013 4
  5. 5. Conventions / History Processing Level 3 1 Derived geophysical variables at the same resolution and location as Level 1 source data. Reconstructed, unprocessed instrument data at full resolution, time-referenced, and annotated with ancillary information, including radiometric and geometric calibration coefficients and georeferencing parameters (e.g., platform ephemeris) computed and appended but not applied to Level 0 data. December 12, 2013 CF CF ? AGU, Fall 2013 Grid HDF-EOS 2 Model Results / Variables mapped on uniform space-time grid scales, usually with some completeness and consistency. Zonal Average CF Feature Types: Points Timeseries Trajectory Profile TimeSeriesProfile TrajectoryProfile ? Points Swath 5
  6. 6. Convention Governance Community / Users December 12, 2013 AGU, Fall 2013 Operational Data Processing System 6
  7. 7. Community Using HDF to share data? Tweet #HDFInside December 12, 2013 AGU, Fall 2013 7
  8. 8. Acknowledgements This work was partially supported by NASA contract number NNG10HP02C. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of NASA or The HDF Group. December 12, 2013 AGU, Fall 2013 8