www.hdfgroup.org
The HDF Group
HDF5 and The HDF Group
May 2014
www.hdfgroup.org
THE HDF GROUP
HDF5 and The HDF Group
www.hdfgroup.org
Mission
To provide high quality software for
managing large complex data,
to provide outstanding services...
www.hdfgroup.org
Goals of The HDF Group
• To create, maintain, and evolve software and
services that enable society to man...
www.hdfgroup.org
The HDF Group
• 1988-2006: Software group at University of Illinois
National Center for Supercomputing Ap...
www.hdfgroup.org
Current project list for the HDF Group
• NASA – Earth Observing System (EOS)
• The basis for global clima...
www.hdfgroup.org
The HDF Group Services
• Helpdesk and Mailing Lists
• Available to all users as a first level of support
...
www.hdfgroup.org
WHO USES HDF5?
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
Who uses HDF5?
• Applications that deal with large or complex
data
• Over 200 diffe...
www.hdfgroup.org
Members of the HDF support community
• NASA – Earth Observing System
• NOAA/NASA/Riverside Tech – NPOESS
...
www.hdfgroup.org
New Areas We’re Exploring
• Fusion research data storage
• Submitted proposal for ITER project’s data
man...
www.hdfgroup.org
HDF5 SCIENCE
APPLICATIONS
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
NASA EOS Remote Sensed Data
• HDF format is the standard file format for
storing da...
HDF5 and The HDF Group
www.hdfgroup.org
What is JPSS?
• JPSS is the next generation of NOAA's polar-orbiting
environmental satellites.
• JPSS obs...
www.hdfgroup.org
CFD General Notation System
HDF5 and The HDF Group
www.hdfgroup.org
What is CFD?
Computational fluid dynamics (CFD) is a
branch of fluid mechanics that uses numerical
method...
www.hdfgroup.org
This CFD computer generated image shows a model of
the space shuttle. CFD has taken the place of wind tun...
www.hdfgroup.org
What is CGNS ?
• Standard Interface Data Structures (SIDS)
– Collection of conventions and definitions th...
www.hdfgroup.org
CGNS and HDF5*
• CGNS was originally built using the ADF format.
• However, ADF does not have parallel I/...
www.hdfgroup.org
• An adaptive mesh refinement (AMR), grid-based
hybrid code which is designed to do simulations
of cosmol...
HDF5 and The HDF Group
Image credit: Alexei Kritsuk, Paolo Padoan & Mike Norman
www.hdfgroup.org
What is ENZO for?
• At UC San Diego ENZO cosmology is used to
simulate the universe from first principles...
www.hdfgroup.org
• “AMR codes begin with a coarse grid spacing, and then
spawn more detailed subgrids as needed to track k...
NeXus
HDF5 and The HDF Group
www.hdfgroup.org
What is NeXus?
• In recent years, scientists and programmers in
neutron and synchrotron facilities around...
www.hdfgroup.org
HDF5 TECHNOLOGIES
HDF5 and The HDF Group
www.hdfgroup.org
Data challenges addressed by HDF5
• Ability to organize complex collections of data
• Efficient and scala...
www.hdfgroup.orgHDF5 and The HDF Group
HDF is…
• HDF stands for ‘Hierarchical Data Format’
• A file format for storing any...
www.hdfgroup.org
HDF5 Technology Platform
HDF5 data
model
• The “building
blocks” for data
organization
HDF5 software
• Li...
www.hdfgroup.org
Professionally managed
• Source under version control, public access
• Automatic daily testing,
• 200+ co...
www.hdfgroup.org
Professionally supported
• Helpdesk
• FORUM and mailing lists
• Extensive web documentation – User’s Guid...
www.hdfgroup.org
HDF5 file
lat | lon | temp
----|-----|-----
12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
An HDF5 file is a
c...
www.hdfgroup.org
HDF5 file organization
lat | lon | temp
----|-----|-----
12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
Experi...
www.hdfgroup.orgHDF5 and The HDF Group
A single platform with multiple uses
• One general data model
• One general format
...
www.hdfgroup.org
HDF5 Software Layers & Storage
HDF5 File
Format File Split
Files
File on
Parallel
Filesystem
Other
h5dump...
www.hdfgroup.org
HDF ecosystem
Storage
EOS Domain
Data Objects
Applications
EOS
Applications
MATLAB
HDF Library
IDL
HDF-EO...
www.hdfgroup.org
Other Software
• The HDF Group
• HDFView – an HDF4 & HDF5 browser
• Command-line utilities
• Regression a...
www.hdfgroup.org
www.hdfgroup.org
HDF5 and The HDF Group
Upcoming SlideShare
Loading in …5
×

HDF5 and The HDF Group

856 views

Published on

This is the latest information about The HDF Group and HDF5.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
856
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

HDF5 and The HDF Group

  1. 1. www.hdfgroup.org The HDF Group HDF5 and The HDF Group May 2014
  2. 2. www.hdfgroup.org THE HDF GROUP HDF5 and The HDF Group
  3. 3. www.hdfgroup.org Mission To provide high quality software for managing large complex data, to provide outstanding services for users of these technologies, and to insure effective management of data throughout the data life cycle. HDF5 and The HDF Group
  4. 4. www.hdfgroup.org Goals of The HDF Group • To create, maintain, and evolve software and services that enable society to manage large complex data at every stage of the data life cycle. • To establish and maintain a sustainable organization with a highly-skilled and committed team devoted to accomplishing the first goal. HDF5 and The HDF Group
  5. 5. www.hdfgroup.org The HDF Group • 1988-2006: Software group at University of Illinois National Center for Supercomputing Applications • 2005-present: Non-profit company in Champaign, IL • Passionate about managing large, complex, heterogeneous data throughout its life cycle • Creators and stewards of HDF4 and HDF5 • Own HDF4 and HDF5 • Formats, libraries, and tools are open and free • Committed to high quality and reliability • Currently employ 33 staff HDF5 and The HDF Group
  6. 6. www.hdfgroup.org Current project list for the HDF Group • NASA – Earth Observing System (EOS) • The basis for global climate research • HDF is the standard archive and distribution format for EOS • Hundreds of data products, 8 petabyte archive and growing • NOAA/NASA – JPSS • Next generation weather satellite system and EOS • HDF5 is the primary distribution format (6 TB/day) • Sandia National Laboratory • High throughput, multi-stream satellite image management • Synchrotron community • Scalable solutions for high throughput data acquisition and management • ExaHDF5 (Lawrence Berkeley National Lab) • High end scientific simulations • Tuning HDF5 for high performance parallel I/O • FastForward Computing (DOE) • Solving I/O challenges for exascale computing HDF5 and The HDF Group
  7. 7. www.hdfgroup.org The HDF Group Services • Helpdesk and Mailing Lists • Available to all users as a first level of support • Priority Support • Rapid issue resolution and advice • Consulting • Needs assessment, troubleshooting, design reviews, etc. • Training • Tutorials and hands-on practical experience • Enterprise Support • Coordinating HDF activities across departments • Special Projects • Adapting customer applications to HDF • New features and tools • Research and Development HDF5 and The HDF Group
  8. 8. www.hdfgroup.org WHO USES HDF5? HDF5 and The HDF Group
  9. 9. www.hdfgroup.orgHDF5 and The HDF Group Who uses HDF5? • Applications that deal with large or complex data • Over 200 different application areas • >2 million data product users world-wide • Academia, government agencies, industry
  10. 10. www.hdfgroup.org Members of the HDF support community • NASA – Earth Observing System • NOAA/NASA/Riverside Tech – NPOESS • A large financial institution • DOE – projects w/LBNL & PNNL, ANL & ORNL • Lawrence Livermore National Lab • Army Geospatial Center • NIH/Geospiza (bio software company ) • Lawrence Berkeley National Lab • University of Illinois/NCSA • Sandia National Lab • A leading U.S. aerospace company • Projects for petroleum industry, vehicle testing, weapons research, others • “In kind” support HDF5 and The HDF Group
  11. 11. www.hdfgroup.org New Areas We’re Exploring • Fusion research data storage • Submitted proposal for ITER project’s data management w/large industrial fusion partner • Astronomy • Submitted NSF SI2 grant w/NRAO • Working toward new standard for radioastronomy data storage • Electron Microscopy • Submitted NSF SI2 grant w/LSU, et al • Proposing new standard for storing imaging data • Synthesis of HDF5 and cloud storage w/Microsoft • Developing “RESTful” API for accessing HDF5 data in Azure cloud HDF5 and The HDF Group
  12. 12. www.hdfgroup.org HDF5 SCIENCE APPLICATIONS HDF5 and The HDF Group
  13. 13. www.hdfgroup.orgHDF5 and The HDF Group NASA EOS Remote Sensed Data • HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission. • Petabytes of data stored in HDF4 and HDF5 to support the Global Climate Change Research Program.
  14. 14. HDF5 and The HDF Group
  15. 15. www.hdfgroup.org What is JPSS? • JPSS is the next generation of NOAA's polar-orbiting environmental satellites. • JPSS observations enable forecasting severe weather like hurricanes, tornadoes and blizzards, and assessing environmental hazards such as droughts, forest fires, poor air quality and harmful coastal waters. • JPSS will provide continuity of critical, global Earth observations— including our atmosphere, oceans and land through 2025. • During Hurricane Sandy in October 2012, JPSS data helped forecasters and scientists accurately predict Sandy's hurricane track and infamous 'left hook' landfall into New York and New Jersey–more than five days in advance. HDF5 and The HDF Group
  16. 16. www.hdfgroup.org CFD General Notation System HDF5 and The HDF Group
  17. 17. www.hdfgroup.org What is CFD? Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows. HDF5 and The HDF Group
  18. 18. www.hdfgroup.org This CFD computer generated image shows a model of the space shuttle. CFD has taken the place of wind tunnels for many evaluations of aircraft and, as computing power increases and computer models become more sophisticated, CFD will largely replace wind tunnels. HDF5 and The HDF Group
  19. 19. www.hdfgroup.org What is CGNS ? • Standard Interface Data Structures (SIDS) – Collection of conventions and definitions that defines the intellectual content of CFD-related data. • SIDS to ADF Mapping – Advanced Data Format • SIDS to HDF5 Mapping – Defines how the SIDS is represented in HDF5 • CGNS Mid-Level Library (MLL) – Application Programming Interface (API) which conforms to the SIDS – Built on top of ADF/HDF5, which do I/O operations HDF5 and The HDF Group
  20. 20. www.hdfgroup.org CGNS and HDF5* • CGNS was originally built using the ADF format. • However, ADF does not have parallel I/O or data compression capabilities, and does not have the support and tools that HDF5 offers. • HDF5 has rapidly grown to become a world-wide format standard for scientific data. • HDF5 has parallel capability as well as a broader support base than ADF. • Therefore, CGNS has adopted HDF5 as the default (official) data storage mechanism. * Paraphrased from http://cgns.sourceforge.net/hdf5.html. HDF5 and The HDF Group
  21. 21. www.hdfgroup.org • An adaptive mesh refinement (AMR), grid-based hybrid code which is designed to do simulations of cosmological structure formation. HDF5 and The HDF Group
  22. 22. HDF5 and The HDF Group Image credit: Alexei Kritsuk, Paolo Padoan & Mike Norman
  23. 23. www.hdfgroup.org What is ENZO for? • At UC San Diego ENZO cosmology is used to simulate the universe from first principles, starting near the Big Bang. • Researchers using ENZO have conducted the most detailed simulations ever of a region of the universe more than 1.5 billion light years across. • “We need to zoom in on these dense regions to capture the key physical processes -- including gravitation, flows of normal and ‘dark’ matter, and shock heating and radiative cooling of the gas,” said Mike Norman. “This requires ENZO’s ‘adaptive mesh refinement’ capability.” HDF5 and The HDF Group
  24. 24. www.hdfgroup.org • “AMR codes begin with a coarse grid spacing, and then spawn more detailed subgrids as needed to track key processes in higher density regions. • “We achieved unprecedented detail by reaching seven levels of subgrids throughout the survey volume -- something never done before -- producing more than 400,000 subgrids,” said SDSC computational scientist Robert Harkness. • “Norman is one of the largest users of supercomputing time in the world, with 16 million computing hours at the TACC, and millions more on TeraGrid systems at SDSC, PSC, and NCSA.” • “The HDF Group provided important support for handling the output, and SDSC’s data storage environment allowed the researchers to efficiently store and manage the massive data.” HDF5 and The HDF Group
  25. 25. NeXus HDF5 and The HDF Group
  26. 26. www.hdfgroup.org What is NeXus? • In recent years, scientists and programmers in neutron and synchrotron facilities around the world concluded that a common data format would fulfill a valuable function in the scattering community. • As instrumentation becomes more complex and data visualization more challenging, scientists find it difficult to keep up with new developments. • A common data format makes it easier to exchange experimental results and to exchange ideas about how to analyze them. It promotes greater cooperation in software development and stimulates the design of more sophisticated visualization tools. • The NeXus data format has been developed in response to these needs. HDF5 and The HDF Group
  27. 27. www.hdfgroup.org HDF5 TECHNOLOGIES HDF5 and The HDF Group
  28. 28. www.hdfgroup.org Data challenges addressed by HDF5 • Ability to organize complex collections of data • Efficient and scalable data storage and access • A growing need to integrate a wide variety of types of data • The evolution of data technologies • Long term preservation of data HDF5 and The HDF Group
  29. 29. www.hdfgroup.orgHDF5 and The HDF Group HDF is… • HDF stands for ‘Hierarchical Data Format’ • A file format for storing any kind of data • Software system to manage data in the format • Designed for high volume or complex data • Designed for every size and type of system • Open format and software library, tools • There are two HDF’s: HDF4 and HDF5 • Here we focus on HDF5
  30. 30. www.hdfgroup.org HDF5 Technology Platform HDF5 data model • The “building blocks” for data organization HDF5 software • Library, language interfaces, tools HDF5 file format • Byte-level organization of data HDF5 and The HDF Group
  31. 31. www.hdfgroup.org Professionally managed • Source under version control, public access • Automatic daily testing, • 200+ configurations • Performance, backward/forward compatibility • “C, C++, Fortran, Java, Python APIs • Build supports Autoconfigure and CMake • Sound development, coding practices • Maintenance releases every May, November HDF5 and The HDF Group
  32. 32. www.hdfgroup.org Professionally supported • Helpdesk • FORUM and mailing lists • Extensive web documentation – User’s Guide, Ref Manual, examples, tutorials, other docs • Community friendly • Integrate contributions from external developers • Solicit feedback on new features and pre- releases • Collaborate on projects, especially in testing HDF5 and The HDF Group
  33. 33. www.hdfgroup.org HDF5 file lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 An HDF5 file is a container that holds data objects. HDF5 and The HDF Group
  34. 34. www.hdfgroup.org HDF5 file organization lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Config: Standard 3 / SimOutViz HDF5 groups and links organize data objects. Parameters 10;100;1000 Timestep 36,000 HDF5 and The HDF Group
  35. 35. www.hdfgroup.orgHDF5 and The HDF Group A single platform with multiple uses • One general data model • One general format • One library • Adaptable for almost any kind of data • Works on almost any architecture • Ability to interact well with other technologies • Attention to past, present, future compatibility HDF5 Philosophy
  36. 36. www.hdfgroup.org HDF5 Software Layers & Storage HDF5 File Format File Split Files File on Parallel Filesystem Other h5dump tool High Level APIs HDFView tool Tools h5repack tool … I/O Drivers Internals Datatype Conversion data compression Chunked Storage Version Compatibility and so on… Language Interfaces C, Fortran, C++ HDF5 Data Model Groups, Datasets, Attributes, … HDF5Library Posix I/O Split Files Parallel I/O Custom HDF5 and The HDF Group
  37. 37. www.hdfgroup.org HDF ecosystem Storage EOS Domain Data Objects Applications EOS Applications MATLAB HDF Library IDL HDF-EOS Library Swath Grid Point Etc. HDF tools HDF5 and The HDF Group
  38. 38. www.hdfgroup.org Other Software • The HDF Group • HDFView – an HDF4 & HDF5 browser • Command-line utilities • Regression and performance testing software • 3rd Party • NetCDF-4, IDL, MATLAB, Mathematica, PyTables, Pandas • Communities • EOS, ASC, CGNS, Energistics, NeXuS • Integration with other software • iRODS, OPeNDAP, MPI HDF5 and The HDF Group
  39. 39. www.hdfgroup.org www.hdfgroup.org HDF5 and The HDF Group

×