What will be new in HDF5?

October 15, 2008

HDF and HDF-EOS Workshop XII

1
HDF5 Road Map

Performance
Ease of use
Robustness
Innovation

October 15, 2008

HDF and HDF-EOS Workshop XII

2
Outline
• Performance improvements
• Fortran 2003 features
• HDF5 file recover (metadata journaling)

October 15, 2008

HDF and HDF-EOS Workshop XII

3
Performance Improvements

October 15, 2008

HDF and HDF-EOS Workshop XII

4
Performance Improvements in HDF5
• Examples of completed work:
• New implementation of metadata cache to
improve I/O performance and memory usage when
accessing many objects (HDF5 1.8.0)
• Faster, more scalable storage and access for
large groups (HDF5 1.8.0)

October 15, 2008

HDF and HDF-EOS Workshop XII

5
Performance Improvements in HDF5
• Work in progress
• New implementation of free-space management
• Affects “dynamic” applications that
add/delete/modify existing objects
• When creating HDF5 objects, space is allocated
from available space tracked by the free-space
manager
• When deleting objects, unused space is added to
the free-space pool via free-space manager
• Current implementation uses O(N2) operations for
each N allocations or freeing space
• New implementation O(log2N)
October 15, 2008

HDF and HDF-EOS Workshop XII

6
Free-space Management
• Test: creating/deleting attributes
•
•
•
•

Create first set of attributes
Delete odd-numbered attributes from the first set
Create second set of attributes
Delete all attributes from the second set

Number of
attributes

Old
New
Improvement
implementation implementation ratio

500

786.5 sec

68.2 sec

11.5x

1000

11000 sec

289 sec

38x

October 15, 2008

HDF and HDF-EOS Workshop XII

7
Performance Improvements in HDF5
• Work in progress
• Fast data append (along slowest changing
dimension)

• Future areas of interest
• Efficient chunking cache implementation
(NetCDF4)
• Efficient handling of the variable-length data
including compression (will affect sizes of NPOESS
files)

October 15, 2008

HDF and HDF-EOS Workshop XII

8
Fortran 2003 features

October 15, 2008

HDF and HDF-EOS Workshop XII

9
Status of the HDF5 Fortran Library
• HDF5 Fortran library is a part of standard HDF5
distribution
• First release goes back to 1999
• Implemented as Fortran90 wrappers on top of the
HDF5 C library
• Supported on Linux, Windows, Mac Intel, Solaris,
VMS, clusters, etc.
• Compilers
• Open source gfortran, g95
• Vendors (SUN, Intel, PGI, Absoft)
• 32 and 64-bit versions
October 15, 2008

HDF and HDF-EOS Workshop XII

10
HDF5 Fortran Library
• Mimics HDF5 C APIs
• Fortran 90 features used
•
•
•
•
•

Modules
Function overloading
Function interfaces
Dynamic memory allocation
Optional parameters
• Many Fortran APIs are simpler than their C
counterparts

October 15, 2008

HDF and HDF-EOS Workshop XII

11
Current Limitations
• Supports only “native” Fortran types such as
•
•
•
•

INTEGER
REAL
CHARACTER
DOUBLE PRECISION (obsolete Fortran feature)

• No support for INTEGER*1, INTEGER*2,
INTEGER*8, INTEGER*16, REAL*8, REAL*16
• Cannot write/read buffers of those types

• Fortran types have to match C types
• No support for –r8 and –r16 flags (Fortran real =/=
C float)
October 15, 2008

HDF and HDF-EOS Workshop XII

12
Current Limitations
• No support for
INTEGER(KIND=n) and REAL(KIND=m)
• Integers n and m are called KIND parameters
• Returned by
selected _int_kind (r)
-10r < n < 10r
selected_real_kind(p,r)
p – precision
r – decimal exponent range

October 15, 2008

HDF and HDF-EOS Workshop XII

13
Current Limitations
• Limited support for derived types (compare with C
structures and HDF5 compound datatypes)
• Supports derived types with “native” fields only
• Doesn’t support complex HDF5 datatypes
(e.g., with array member, or nested compound)

• Writes/reads HDF5 compound datasets by fields
only
• Cannot be used in parallel applications
• No support for enum types

• No support for callback functions

October 15, 2008

HDF and HDF-EOS Workshop XII

14
Current Limitations - Summary
• Any application written according to Fortran
95/2003 standard will struggle using HDF5
Fortran Library
• Many HDF5 features are not available to Fortran
applications

October 15, 2008

HDF and HDF-EOS Workshop XII

15
Fortran 2003 Features
• Fortran 2003 provides a standardized mechanism
for interoperating with C
• Module ISO_C_BINDING for interoperability of
intrinsic types
• C_PTR and C_FUNCPTR for interoperability with
C pointers
• C_LOC(x) and C_FUNLOC(x) inquiry functions for
getting addresses of variables and procedures
• BIND attribute for interoperability of derived types
and C structures and enumerated types

October 15, 2008

HDF and HDF-EOS Workshop XII

16
Fortran 2003 Features and HDF5
• New 2003 features allowed us to support
• Any Fortran INTEGER and REAL type data in
HDF5 files
• Fortran derived types and HDF5 compound
datatypes
• Fortran enumerated types and HDF5 enumerated
types
• HDF5 APIs with callbacks

October 15, 2008

HDF and HDF-EOS Workshop XII

17
Fortran Compilers with 2003 Features
Compiler

Current versions and
status of HDF5

Future versions

Intel

Versions 10.1 and 11
All F2003 functionality
works in HDF5

g95

August 2008 and later; All
Fortran 2003 functionality
works in HDF5

gfortran

Version 4.4
Limited support for C
interoperability; HDF5
doesn’t work

No plans from compiler
developers to improve the
support

SUN compilers

Express-July 2008 build;
HDF5 doesn’t work

No timeline for fixes; may
be in a year

PGI

Version 7.2.1; HDF5
doesn’t work

Fixes will be available in
Version 8.

October 15, 2008

HDF and HDF-EOS Workshop XII

Version 11 will have a fix
that will allow us to
remove common blocks

18
Example

October 15, 2008

HDF and HDF-EOS Workshop XII

19
Example

October 15, 2008

HDF and HDF-EOS Workshop XII

20
Example
HDF5 "SDScompound.h5" {
GROUP "/" {
DATASET "ArrayOfStructures" {
DATATYPE H5T_COMPOUND {
H5T_ARRAY { [13] H5T_STRING {
STRSIZE 1;
STRPAD H5T_STR_SPACEPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
} } "chr_name";
H5T_STD_I8LE "a_name";
H5T_IEEE_F64LE "c_name";
H5T_IEEE_F32LE "b_name";
}
…..
October 15, 2008

HDF and HDF-EOS Workshop XII

21
Information

October 15, 2008

HDF and HDF-EOS Workshop XII

22
HDF5 file recovery or
Surviving a System Failure
through Metadata
Journaling
October 15, 2008

HDF and HDF-EOS Workshop XII

23
Surviving a System Failure in HDF5
• Problem:
• Data in an opened HDF5 files susceptible to
corruption in the event of an application or system
crash.
• Corruption possible if an opened HDF5 file has
been updated when the crash occurs.

• Initial Objective:
• Guarantee an HDF5 file with consistent metadata
can be reconstructed in the event of a crash.
• No guarantee on state of raw data – contains
whatever made it to disk prior to crash.
October 15, 2008

HDF and HDF-EOS Workshop XII

24
Crash Survivability in HDF5
• Approach: Metadata Journaling
• When an HDF5 file is opened with Metadata
Journaling enabled, a companion Journal file is
created.
• When an HDF5 API function that modifies
metadata is completed, a transaction is recorded in
the Journal file.
• If the application crashes, a recovery program can
replay the journal by applying in order all metadata
writes until the end of the last completed
transaction written to the journal file.

October 15, 2008

HDF and HDF-EOS Workshop XII

25
HDF5 Metadata Journaling Recovery
Application
crashed
h5recover tool

liFe Corrupted 5DFH

Restored
HDF5
File

Companion Journal File

Oct. 16 2008

HDF and HDF-EOS Workshop XII

26
Implementation Status
• Serial HDF5 with synchronous write mode
• Alpha1 released August 2008
• User interface (API definition and h5recover tool)
and file format may change

October 15, 2008

HDF and HDF-EOS Workshop XII

27
Metadata Journaling Plans
• Serial HDF5 with synchronous write mode
• Finalize User interface definitions and file format

• Serial HDF5 with asynchronous write mode
• To improve Journal file write speed

• More features (need funding)
• Make raw data operations atomic
• Allow "super‐transactions" to be created by
applications
• Enable journaling for Parallel HDF5

October 15, 2008

HDF and HDF-EOS Workshop XII

28
Questions?

October 15, 2008

HDF and HDF-EOS Workshop XII

29
Acknowledgement
• This report is based upon work supported in part
by a Cooperative Agreement with the National
Aeronautics and Space Administration (NASA)
under NASA Awards NNX06AC83A and
NNX08AO77A. Any opinions, findings, and
conclusions or recommendations expressed in
this material are those of the author(s) and do not
necessarily reflect the views of the National
Aeronautics and Space Administration.

October 16, 2008

HDF and HDF-EOS Workshop XII

30

What will be new in HDF5?

  • 1.
    What will benew in HDF5? October 15, 2008 HDF and HDF-EOS Workshop XII 1
  • 2.
    HDF5 Road Map Performance Easeof use Robustness Innovation October 15, 2008 HDF and HDF-EOS Workshop XII 2
  • 3.
    Outline • Performance improvements •Fortran 2003 features • HDF5 file recover (metadata journaling) October 15, 2008 HDF and HDF-EOS Workshop XII 3
  • 4.
    Performance Improvements October 15,2008 HDF and HDF-EOS Workshop XII 4
  • 5.
    Performance Improvements inHDF5 • Examples of completed work: • New implementation of metadata cache to improve I/O performance and memory usage when accessing many objects (HDF5 1.8.0) • Faster, more scalable storage and access for large groups (HDF5 1.8.0) October 15, 2008 HDF and HDF-EOS Workshop XII 5
  • 6.
    Performance Improvements inHDF5 • Work in progress • New implementation of free-space management • Affects “dynamic” applications that add/delete/modify existing objects • When creating HDF5 objects, space is allocated from available space tracked by the free-space manager • When deleting objects, unused space is added to the free-space pool via free-space manager • Current implementation uses O(N2) operations for each N allocations or freeing space • New implementation O(log2N) October 15, 2008 HDF and HDF-EOS Workshop XII 6
  • 7.
    Free-space Management • Test:creating/deleting attributes • • • • Create first set of attributes Delete odd-numbered attributes from the first set Create second set of attributes Delete all attributes from the second set Number of attributes Old New Improvement implementation implementation ratio 500 786.5 sec 68.2 sec 11.5x 1000 11000 sec 289 sec 38x October 15, 2008 HDF and HDF-EOS Workshop XII 7
  • 8.
    Performance Improvements inHDF5 • Work in progress • Fast data append (along slowest changing dimension) • Future areas of interest • Efficient chunking cache implementation (NetCDF4) • Efficient handling of the variable-length data including compression (will affect sizes of NPOESS files) October 15, 2008 HDF and HDF-EOS Workshop XII 8
  • 9.
    Fortran 2003 features October15, 2008 HDF and HDF-EOS Workshop XII 9
  • 10.
    Status of theHDF5 Fortran Library • HDF5 Fortran library is a part of standard HDF5 distribution • First release goes back to 1999 • Implemented as Fortran90 wrappers on top of the HDF5 C library • Supported on Linux, Windows, Mac Intel, Solaris, VMS, clusters, etc. • Compilers • Open source gfortran, g95 • Vendors (SUN, Intel, PGI, Absoft) • 32 and 64-bit versions October 15, 2008 HDF and HDF-EOS Workshop XII 10
  • 11.
    HDF5 Fortran Library •Mimics HDF5 C APIs • Fortran 90 features used • • • • • Modules Function overloading Function interfaces Dynamic memory allocation Optional parameters • Many Fortran APIs are simpler than their C counterparts October 15, 2008 HDF and HDF-EOS Workshop XII 11
  • 12.
    Current Limitations • Supportsonly “native” Fortran types such as • • • • INTEGER REAL CHARACTER DOUBLE PRECISION (obsolete Fortran feature) • No support for INTEGER*1, INTEGER*2, INTEGER*8, INTEGER*16, REAL*8, REAL*16 • Cannot write/read buffers of those types • Fortran types have to match C types • No support for –r8 and –r16 flags (Fortran real =/= C float) October 15, 2008 HDF and HDF-EOS Workshop XII 12
  • 13.
    Current Limitations • Nosupport for INTEGER(KIND=n) and REAL(KIND=m) • Integers n and m are called KIND parameters • Returned by selected _int_kind (r) -10r < n < 10r selected_real_kind(p,r) p – precision r – decimal exponent range October 15, 2008 HDF and HDF-EOS Workshop XII 13
  • 14.
    Current Limitations • Limitedsupport for derived types (compare with C structures and HDF5 compound datatypes) • Supports derived types with “native” fields only • Doesn’t support complex HDF5 datatypes (e.g., with array member, or nested compound) • Writes/reads HDF5 compound datasets by fields only • Cannot be used in parallel applications • No support for enum types • No support for callback functions October 15, 2008 HDF and HDF-EOS Workshop XII 14
  • 15.
    Current Limitations -Summary • Any application written according to Fortran 95/2003 standard will struggle using HDF5 Fortran Library • Many HDF5 features are not available to Fortran applications October 15, 2008 HDF and HDF-EOS Workshop XII 15
  • 16.
    Fortran 2003 Features •Fortran 2003 provides a standardized mechanism for interoperating with C • Module ISO_C_BINDING for interoperability of intrinsic types • C_PTR and C_FUNCPTR for interoperability with C pointers • C_LOC(x) and C_FUNLOC(x) inquiry functions for getting addresses of variables and procedures • BIND attribute for interoperability of derived types and C structures and enumerated types October 15, 2008 HDF and HDF-EOS Workshop XII 16
  • 17.
    Fortran 2003 Featuresand HDF5 • New 2003 features allowed us to support • Any Fortran INTEGER and REAL type data in HDF5 files • Fortran derived types and HDF5 compound datatypes • Fortran enumerated types and HDF5 enumerated types • HDF5 APIs with callbacks October 15, 2008 HDF and HDF-EOS Workshop XII 17
  • 18.
    Fortran Compilers with2003 Features Compiler Current versions and status of HDF5 Future versions Intel Versions 10.1 and 11 All F2003 functionality works in HDF5 g95 August 2008 and later; All Fortran 2003 functionality works in HDF5 gfortran Version 4.4 Limited support for C interoperability; HDF5 doesn’t work No plans from compiler developers to improve the support SUN compilers Express-July 2008 build; HDF5 doesn’t work No timeline for fixes; may be in a year PGI Version 7.2.1; HDF5 doesn’t work Fixes will be available in Version 8. October 15, 2008 HDF and HDF-EOS Workshop XII Version 11 will have a fix that will allow us to remove common blocks 18
  • 19.
    Example October 15, 2008 HDFand HDF-EOS Workshop XII 19
  • 20.
    Example October 15, 2008 HDFand HDF-EOS Workshop XII 20
  • 21.
    Example HDF5 "SDScompound.h5" { GROUP"/" { DATASET "ArrayOfStructures" { DATATYPE H5T_COMPOUND { H5T_ARRAY { [13] H5T_STRING { STRSIZE 1; STRPAD H5T_STR_SPACEPAD; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } } "chr_name"; H5T_STD_I8LE "a_name"; H5T_IEEE_F64LE "c_name"; H5T_IEEE_F32LE "b_name"; } ….. October 15, 2008 HDF and HDF-EOS Workshop XII 21
  • 22.
    Information October 15, 2008 HDFand HDF-EOS Workshop XII 22
  • 23.
    HDF5 file recoveryor Surviving a System Failure through Metadata Journaling October 15, 2008 HDF and HDF-EOS Workshop XII 23
  • 24.
    Surviving a SystemFailure in HDF5 • Problem: • Data in an opened HDF5 files susceptible to corruption in the event of an application or system crash. • Corruption possible if an opened HDF5 file has been updated when the crash occurs. • Initial Objective: • Guarantee an HDF5 file with consistent metadata can be reconstructed in the event of a crash. • No guarantee on state of raw data – contains whatever made it to disk prior to crash. October 15, 2008 HDF and HDF-EOS Workshop XII 24
  • 25.
    Crash Survivability inHDF5 • Approach: Metadata Journaling • When an HDF5 file is opened with Metadata Journaling enabled, a companion Journal file is created. • When an HDF5 API function that modifies metadata is completed, a transaction is recorded in the Journal file. • If the application crashes, a recovery program can replay the journal by applying in order all metadata writes until the end of the last completed transaction written to the journal file. October 15, 2008 HDF and HDF-EOS Workshop XII 25
  • 26.
    HDF5 Metadata JournalingRecovery Application crashed h5recover tool liFe Corrupted 5DFH Restored HDF5 File Companion Journal File Oct. 16 2008 HDF and HDF-EOS Workshop XII 26
  • 27.
    Implementation Status • SerialHDF5 with synchronous write mode • Alpha1 released August 2008 • User interface (API definition and h5recover tool) and file format may change October 15, 2008 HDF and HDF-EOS Workshop XII 27
  • 28.
    Metadata Journaling Plans •Serial HDF5 with synchronous write mode • Finalize User interface definitions and file format • Serial HDF5 with asynchronous write mode • To improve Journal file write speed • More features (need funding) • Make raw data operations atomic • Allow "super‐transactions" to be created by applications • Enable journaling for Parallel HDF5 October 15, 2008 HDF and HDF-EOS Workshop XII 28
  • 29.
    Questions? October 15, 2008 HDFand HDF-EOS Workshop XII 29
  • 30.
    Acknowledgement • This reportis based upon work supported in part by a Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA Awards NNX06AC83A and NNX08AO77A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration. October 16, 2008 HDF and HDF-EOS Workshop XII 30

Editor's Notes

  • #25 Maybe Objective bullets do belong on later slide… not sure.