3. www.hdfgroup.org
Mission
To provide high quality software for
managing large complex data,
to provide outstanding services for users
of these technologies,
and to insure effective management of
data throughout the data life cycle.
HDF5 and The HDF Group
4. www.hdfgroup.org
Goals of The HDF Group
⢠To create, maintain, and evolve software and
services that enable society to manage large
complex data at every stage of the data life
cycle.
⢠To establish and maintain a sustainable
organization with a highly-skilled and
committed team devoted to accomplishing the
first goal.
HDF5 and The HDF Group
5. www.hdfgroup.org
The HDF Group
⢠1988-2006: Software group at University of Illinois
National Center for Supercomputing Applications
⢠2005-present: Non-profit company in Champaign, IL
⢠Passionate about managing large, complex,
heterogeneous data throughout its life cycle
⢠Creators and stewards of HDF4 and HDF5
⢠Own HDF4 and HDF5
⢠Formats, libraries, and tools are open and free
⢠Committed to high quality and reliability
⢠Currently employ 33 staff
HDF5 and The HDF Group
6. www.hdfgroup.org
Current project list for the HDF Group
⢠NASA â Earth Observing System (EOS)
⢠The basis for global climate research
⢠HDF is the standard archive and distribution format for EOS
⢠Hundreds of data products, 8 petabyte archive and growing
⢠NOAA/NASA â JPSS
⢠Next generation weather satellite system and EOS
⢠HDF5 is the primary distribution format (6 TB/day)
⢠Sandia National Laboratory
⢠High throughput, multi-stream satellite image management
⢠Synchrotron community
⢠Scalable solutions for high throughput data acquisition and
management
⢠ExaHDF5 (Lawrence Berkeley National Lab)
⢠High end scientific simulations
⢠Tuning HDF5 for high performance parallel I/O
⢠FastForward Computing (DOE)
⢠Solving I/O challenges for exascale computing
HDF5 and The HDF Group
7. www.hdfgroup.org
The HDF Group Services
⢠Helpdesk and Mailing Lists
⢠Available to all users as a first level of support
⢠Priority Support
⢠Rapid issue resolution and advice
⢠Consulting
⢠Needs assessment, troubleshooting, design reviews, etc.
⢠Training
⢠Tutorials and hands-on practical experience
⢠Enterprise Support
⢠Coordinating HDF activities across departments
⢠Special Projects
⢠Adapting customer applications to HDF
⢠New features and tools
⢠Research and Development
HDF5 and The HDF Group
9. www.hdfgroup.orgHDF5 and The HDF Group
Who uses HDF5?
⢠Applications that deal with large or complex
data
⢠Over 200 different application areas
⢠>2 million data product users world-wide
⢠Academia, government agencies, industry
10. www.hdfgroup.org
Members of the HDF support community
⢠NASA â Earth Observing System
⢠NOAA/NASA/Riverside Tech â NPOESS
⢠A large financial institution
⢠DOE â projects w/LBNL & PNNL, ANL & ORNL
⢠Lawrence Livermore National Lab
⢠Army Geospatial Center
⢠NIH/Geospiza (bio software company )
⢠Lawrence Berkeley National Lab
⢠University of Illinois/NCSA
⢠Sandia National Lab
⢠A leading U.S. aerospace company
⢠Projects for petroleum industry, vehicle testing,
weapons research, others
⢠âIn kindâ support
HDF5 and The HDF Group
11. www.hdfgroup.org
New Areas Weâre Exploring
⢠Fusion research data storage
⢠Submitted proposal for ITER projectâs data
management w/large industrial fusion partner
⢠Astronomy
⢠Submitted NSF SI2 grant w/NRAO
⢠Working toward new standard for radioastronomy
data storage
⢠Electron Microscopy
⢠Submitted NSF SI2 grant w/LSU, et al
⢠Proposing new standard for storing imaging data
⢠Synthesis of HDF5 and cloud storage w/Microsoft
⢠Developing âRESTfulâ API for accessing HDF5
data in Azure cloud
HDF5 and The HDF Group
13. www.hdfgroup.orgHDF5 and The HDF Group
NASA EOS Remote Sensed Data
⢠HDF format is the standard file format for
storing data from NASA's Earth Observing
System (EOS) mission.
⢠Petabytes of data stored in HDF4 and HDF5
to support the Global Climate Change
Research Program.
15. www.hdfgroup.org
What is JPSS?
⢠JPSS is the next generation of NOAA's polar-orbiting
environmental satellites.
⢠JPSS observations enable forecasting severe
weather like hurricanes, tornadoes and blizzards, and
assessing environmental hazards such as droughts,
forest fires, poor air quality and harmful coastal
waters.
⢠JPSS will provide continuity of critical, global Earth
observationsâ including our atmosphere, oceans and
land through 2025.
⢠During Hurricane Sandy in October 2012, JPSS data
helped forecasters and scientists accurately predict
Sandy's hurricane track and infamous 'left hook'
landfall into New York and New Jerseyâmore than
five days in advance.
HDF5 and The HDF Group
17. www.hdfgroup.org
What is CFD?
Computational fluid dynamics (CFD) is a
branch of fluid mechanics that uses numerical
methods and algorithms to solve and analyze
problems that involve fluid flows.
HDF5 and The HDF Group
18. www.hdfgroup.org
This CFD computer generated image shows a model of
the space shuttle. CFD has taken the place of wind tunnels
for many evaluations of aircraft and, as computing power
increases and computer models become more
sophisticated, CFD will largely replace wind tunnels.
HDF5 and The HDF Group
19. www.hdfgroup.org
What is CGNS ?
⢠Standard Interface Data Structures (SIDS)
â Collection of conventions and definitions that
defines the intellectual content of CFD-related
data.
⢠SIDS to ADF Mapping
â Advanced Data Format
⢠SIDS to HDF5 Mapping
â Defines how the SIDS is represented in HDF5
⢠CGNS Mid-Level Library (MLL)
â Application Programming Interface (API) which
conforms to the SIDS
â Built on top of ADF/HDF5, which do I/O operations
HDF5 and The HDF Group
20. www.hdfgroup.org
CGNS and HDF5*
⢠CGNS was originally built using the ADF format.
⢠However, ADF does not have parallel I/O or data
compression capabilities, and does not have the
support and tools that HDF5 offers.
⢠HDF5 has rapidly grown to become a world-wide
format standard for scientific data.
⢠HDF5 has parallel capability as well as a broader
support base than ADF.
⢠Therefore, CGNS has adopted HDF5 as the
default (official) data storage mechanism.
* Paraphrased from http://cgns.sourceforge.net/hdf5.html.
HDF5 and The HDF Group
21. www.hdfgroup.org
⢠An adaptive mesh refinement (AMR), grid-based
hybrid code which is designed to do simulations
of cosmological structure formation.
HDF5 and The HDF Group
22. HDF5 and The HDF Group
Image credit: Alexei Kritsuk, Paolo Padoan & Mike Norman
23. www.hdfgroup.org
What is ENZO for?
⢠At UC San Diego ENZO cosmology is used to
simulate the universe from first principles, starting
near the Big Bang.
⢠Researchers using ENZO have conducted the
most detailed simulations ever of a region of the
universe more than 1.5 billion light years across.
⢠âWe need to zoom in on these dense regions to
capture the key physical processes -- including
gravitation, flows of normal and âdarkâ matter, and
shock heating and radiative cooling of the gas,â
said Mike Norman. âThis requires ENZOâs
âadaptive mesh refinementâ capability.â
HDF5 and The HDF Group
24. www.hdfgroup.org
⢠âAMR codes begin with a coarse grid spacing, and then
spawn more detailed subgrids as needed to track key
processes in higher density regions.
⢠âWe achieved unprecedented detail by reaching seven levels
of subgrids throughout the survey volume -- something never
done before -- producing more than 400,000 subgrids,â said
SDSC computational scientist Robert Harkness.
⢠âNorman is one of the largest users of supercomputing time
in the world, with 16 million computing hours at the TACC,
and millions more on TeraGrid systems at SDSC, PSC, and
NCSA.â
⢠âThe HDF Group provided important support for handling the
output, and SDSCâs data storage environment allowed the
researchers to efficiently store and manage the massive
data.â
HDF5 and The HDF Group
26. www.hdfgroup.org
What is NeXus?
⢠In recent years, scientists and programmers in
neutron and synchrotron facilities around the world
concluded that a common data format would fulfill a
valuable function in the scattering community.
⢠As instrumentation becomes more complex and data
visualization more challenging, scientists find it
difficult to keep up with new developments.
⢠A common data format makes it easier to exchange
experimental results and to exchange ideas about
how to analyze them. It promotes greater cooperation
in software development and stimulates the design of
more sophisticated visualization tools.
⢠The NeXus data format has been developed in
response to these needs.
HDF5 and The HDF Group
28. www.hdfgroup.org
Data challenges addressed by HDF5
⢠Ability to organize complex collections of data
⢠Efficient and scalable data storage and access
⢠A growing need to integrate a wide variety of
types of data
⢠The evolution of data technologies
⢠Long term preservation of data
HDF5 and The HDF Group
29. www.hdfgroup.orgHDF5 and The HDF Group
HDF isâŚ
⢠HDF stands for âHierarchical Data Formatâ
⢠A file format for storing any kind of data
⢠Software system to manage data in the format
⢠Designed for high volume or complex data
⢠Designed for every size and type of system
⢠Open format and software library, tools
⢠There are two HDFâs: HDF4 and HDF5
⢠Here we focus on HDF5
30. www.hdfgroup.org
HDF5 Technology Platform
HDF5 data
model
⢠The âbuilding
blocksâ for data
organization
HDF5 software
⢠Library, language
interfaces, tools
HDF5 file
format
⢠Byte-level
organization of
data
HDF5 and The HDF Group
31. www.hdfgroup.org
Professionally managed
⢠Source under version control, public access
⢠Automatic daily testing,
⢠200+ configurations
⢠Performance, backward/forward compatibility
⢠âC, C++, Fortran, Java, Python APIs
⢠Build supports Autoconfigure and CMake
⢠Sound development, coding practices
⢠Maintenance releases every May, November
HDF5 and The HDF Group
32. www.hdfgroup.org
Professionally supported
⢠Helpdesk
⢠FORUM and mailing lists
⢠Extensive web documentation â Userâs Guide,
Ref Manual, examples, tutorials, other docs
⢠Community friendly
⢠Integrate contributions from external
developers
⢠Solicit feedback on new features and pre-
releases
⢠Collaborate on projects, especially in testing
HDF5 and The HDF Group
33. www.hdfgroup.org
HDF5 file
lat | lon | temp
----|-----|-----
12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
An HDF5 file is a
container that
holds data
objects.
HDF5 and The HDF Group
34. www.hdfgroup.org
HDF5 file organization
lat | lon | temp
----|-----|-----
12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
Experiment Notes:
Serial Number: 99378920
Date: 3/13/09
Config: Standard 3
/
SimOutViz
HDF5 groups and links
organize data objects.
Parameters
10;100;1000
Timestep
36,000
HDF5 and The HDF Group
35. www.hdfgroup.orgHDF5 and The HDF Group
A single platform with multiple uses
⢠One general data model
⢠One general format
⢠One library
⢠Adaptable for almost any kind of data
⢠Works on almost any architecture
⢠Ability to interact well with other technologies
⢠Attention to past, present, future compatibility
HDF5 Philosophy
36. www.hdfgroup.org
HDF5 Software Layers & Storage
HDF5 File
Format File Split
Files
File on
Parallel
Filesystem
Other
h5dump
tool
High Level
APIs
HDFView
tool
Tools
h5repack
tool âŚ
I/O Drivers
Internals
Datatype
Conversion
data
compression
Chunked
Storage
Version
Compatibility
and so onâŚ
Language Interfaces
C, Fortran, C++
HDF5 Data Model
Groups, Datasets, Attributes, âŚ
HDF5Library
Posix
I/O
Split
Files
Parallel
I/O
Custom
HDF5 and The HDF Group