Migrating from HDF5 1.6 to
HDF5 1.8

October 15, 2008

HDF and HDF-EOS Workshop XII

1
Outline
• Status of the HDF5 1.6 and 1.8 releases
• Overview of the HDF5 1.8 features
• How to move applications to HDF5 1.8 ?

October 15, 2008

HDF and HDF-EOS Workshop XII

2
Status of HDF5 releases

October 15, 2008

HDF and HDF-EOS Workshop XII

3
Current HDF5 Releases
• HDF5 1.8.0 was released in February 2008
• Major update of HDF5 1.6.* series (stable set of
features and APIs since 1998)
•
•
•
•
•

New features
200 new APIs
Changes to file format
Changes to APIs
Backward compatible

• HDF5 1.8.1 was released in June 2008
• Minor bug fixes
• Included Fortran90 APIs for new C functions
October 15, 2008

HDF and HDF-EOS Workshop XII

4
Current HDF5 Releases
• HDF5 1.6.7 was released in February 2008
• Addressed backward compatibility bug for reading
files with corrupted object header information

• New maintenance releases will be in November
2008
• HDF5 1.6.8 and 1.8.2
• Minor bug fixes
• Tools improvements

• Current plans are to support HDF5 1.6 and 1.8
until November 2009

October 15, 2008

HDF and HDF-EOS Workshop XII

5
Information About Current Releases

http://www.hdfgroup.org/HDF5

October 15, 2008

HDF and HDF-EOS Workshop XII

6
Goal of the Tutorial
• Help with transition to the 1.8 releases
• Discuss new features beneficial to applications
written for 1.6 releases
• Raise awareness about forward/backward
compatibility issues with the 1.8 releases
• Get feedback from the users who already moved
to 1.8 releases

October 15, 2008

HDF and HDF-EOS Workshop XII

7
Why New Features?
• Need to address some deficiencies in initial
design
• Examples:
• Big overhead in file sizes
• Non-tunable metadata cache implementation
• Handling of free-space in a file

October 15, 2008

HDF and HDF-EOS Workshop XII

8
Why New Features?
• Need to address new requirements
• Add support for
• New types of indexing (object creation order)
• Big volumes of variable-length data (DNA
sequences)
• Simultaneous real-time streams (fast append to one
-dimensional datasets)
• UTF-8 encoding for objects’ path names
• Accessing objects stored in another HDF5 files
(external or user-defined links)

October 15, 2008

HDF and HDF-EOS Workshop XII

9
What Did We Do in HDF5 1.8?
•
•
•
•
•
•
•
•

Extended File Format Specification
Reviewed group implementations
Introduced new link object
Revamped metadata cache implementation
Improved handling of datasets and datatypes
Introduced shared object header message
Extended error handling
Enhanced backward/forward APIs and file format
compatibility

October 15, 2008

HDF and HDF-EOS Workshop XII

10
What Did We Do in HDF5 1.8?
And much more good stuff to make HDF5

Better and Faster

October 15, 2008

HDF and HDF-EOS Workshop XII

11
HDF5 File Format Extension

October 15, 2008

HDF and HDF-EOS Workshop XII

12
HDF5 File Format Extension
• Why:
• Address deficiencies of the original file format
• Address space overhead in an HDF5 file
• Enable new features

• What:
• New routine that instructs the HDF5 library to
create all objects using the latest version of the
HDF5 file format (cmp. with the earliest version
when object became available, for example, array
datatype)

October 15, 2008

HDF and HDF-EOS Workshop XII

13
HDF5 File Format Extension
Example
/* Use the latest version of a file format for each
object created in a file */
fapl_id = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_libver_bounds(fapl_id, H5F_LIBVER_LATEST,
H5F_LIBVER_LATEST);
fid = H5Fcreate(…,…,…,fapl_id);
or
fid = H5Fopen(…,…,fapl_id);

October 15, 2008

HDF and HDF-EOS Workshop XII

14
Group Revisions

October 15, 2008

HDF and HDF-EOS Workshop XII

15
Better Large Group Storage
• Why:
• Faster, more scalable storage and access for large
groups

• What:
• New format and method for storing groups with
many links

October 15, 2008

HDF and HDF-EOS Workshop XII

16
Informal Benchmark
• Create a file and a group in a file
• Create up to 10^6 groups with one dataset in
each group
• Compare files sizes and performance of HDF5
1.8.1 using the latest group format with the
performance of HDF5 1.8.1 (default, old format)
and 1.6.7
• Note: Default 1.8.1 and 1.6.7 became very slow
after 700000 groups

October 15, 2008

HDF and HDF-EOS Workshop XII

17
Time to Open and Read a Dataset

October 15, 2008

HDF and HDF-EOS Workshop XII

18
Time to Close the File

October 15, 2008

HDF and HDF-EOS Workshop XII

19
File Size

October 15, 2008

HDF and HDF-EOS Workshop XII

20
Access Links by Creation Order
• Why:
• Allow iteration and lookup of group’s links
(children) by creation order as well as by name
order
• Support NetCDF access model for NetCDF-4

• What:
• Option to access objects in group according to
relative creation time

October 15, 2008

HDF and HDF-EOS Workshop XII

21
Access Links by Creation Order
Example
/* Track and index creation order of the links */
H5Pset_link_creation_order(gcpl_id,
(H5P_CRT_ORDER_TRACKED | H5P_CRT_ORDER_INDEXED));
/* Create a group */
gid = H5Gcreate(fid, GNAME, H5P_DEFAULT, gcpl_id,
H5P_DEFAULT);

October 15, 2008

HDF and HDF-EOS Workshop XII

22
Example: h5dump --group=1 tordergr.h5
HDF5 "tordergr.h5" {
GROUP "1" {
GROUP "a" {
GROUP "a1" {
}
GROUP "a2" {
GROUP "a21" {
}
GROUP "a22" {
}
}
}
GROUP "b" {
}
GROUP "c" {
…
October 15, 2008

HDF and HDF-EOS Workshop XII

23
Example: h5dump --sort_by=creation_order
HDF5 "tordergr.h5" {
GROUP "1" {
GROUP "c" {
}
GROUP "b" {
}
GROUP "a" {
GROUP "a1" {
}
GROUP "a2" {
GROUP "a22" {
}
GROUP "a21" {
}
}
}
October 15, 2008

HDF and HDF-EOS Workshop XII

24
Compact Groups
• Why:
• Save space and access time for small groups
• If groups are small, don’t need B-tree overhead

• What:
• Alternate storage for groups with few links
• Default storage when “latest format” is specified
• Library converts to “original” storage (B-tree based)
using default or user-specified threshold

October 15, 2008

HDF and HDF-EOS Workshop XII

25
Compact Groups
• Example
•
•
•
•
•

File with 11,600 groups
With original group structure, file size ~ 20 MB
With compact groups, file size ~ 12 MB
Total savings: 8 MB (40%)
Average savings/group: ~700 bytes

October 15, 2008

HDF and HDF-EOS Workshop XII

26
Compact Groups
Example
/* Change storage to “dense” if number of group
members is bigger than 16 and go back to compact
storage if number of group members is smaller than
12 */
H5Pset_link_phase_change(gcpl_id, 16, 12)
/* Create a group */
g_id = H5Gcreate(…,…,…,gcpl_id,…);

October 15, 2008

HDF and HDF-EOS Workshop XII

27
Intermediate Group Creation
• Why:
• Simplify creation of a series of connected groups
• Avoid having to create each intermediate group
separately, one by one

• What:
• Intermediate groups can be created when creating
an object in a file, with one function call

October 15, 2008

HDF and HDF-EOS Workshop XII

28
Intermediate Group Creation
• Want to create “/A/B/C/dset1”
• “A” exists, but “B/C/dset1” do not

/

/

A

A
B
C
dset1

One call creates groups “B” & “C”, then creates
“dset1”

October 15, 2008

HDF and HDF-EOS Workshop XII

29
Intermediate Group Creation
Example
/* Create link creation property list */
lcrp_id = H5Pcreate(H5P_LINK_CREATE);
/* Set flag for intermediate group creation
Groups B and C will be created automatically */
H5Pset_create_intermediate_group(lcrp_id, TRUE);
ds_id = H5Dcreate (file_id, "/A/B/C/dset1",…,…,
lcrp_id,…,…,);

October 15, 2008

HDF and HDF-EOS Workshop XII

30
Link Revisions

October 15, 2008

HDF and HDF-EOS Workshop XII

31
What are Links?
• Links connect groups to their members
• “Hard” links point to a target by address
• “Soft” links store the path to a target

root group
Hard link
<address>

Soft link
“/target dataset”

dataset
October 15, 2008

HDF and HDF-EOS Workshop XII

32
Links: Before and After
• New data model for handling links
• Links may have properties (UTF-8 name
encoding, creation order indexing, storage
property, etc.)
Before

After

Group

Group
Name and other properties

Name

Object
October 15, 2008

Object
HDF and HDF-EOS Workshop XII

33
Anonymous Object
• Object can be created without being immediately
linked into graph structure
• Group, dataset and datatype

• See new H5*create_anon APIs
Group

Object

• Use H5O* APIs to manipulate the objects
October 15, 2008

HDF and HDF-EOS Workshop XII

34
New: External Links
• Why:
• Access objects stored in other HDF5 files in a
transparent way

• What:
• Store location of file and path within that file
• Can link across files

October 15, 2008

HDF and HDF-EOS Workshop XII

35
New: External Links
file2.h5
root group

file1.h5

“target object”

root group

<address>

“External_link”
“file2.h5”
“/A/B/C/D/E”

group

External link object “External_link” in file1.h5 points to the group
/A/B/C/D/E in file2.h5
October 15, 2008

HDF and HDF-EOS Workshop XII

36
External Links
Example
/* Create an external link */
H5Lcreate_external(TARGET_FILE, ”/A/B/C/D/E",
source_file_id, ”External_link”, …,…);
/* We will use external link to create a group in a
target file */
gr_id = H5Gcreate(source_file_id,”External_link/F”,…,
…,…,…);
/* We can access group “External_link/F” in the source
file and group “/A/B/C/D/E/F” in the target file */
October 15, 2008

HDF and HDF-EOS Workshop XII

37
New: User-defined Links
• Why:
• Allow applications to create their own kinds of links and
link operations, such as
• Create “hard” external link that finds an object by address
• Create link that accesses a URL
• Keep track of how often a link is accessed, or other behavior

• What:
• Applications can create new kinds of links by supplying
custom callback functions
• Can do anything HDF5 hard, soft, or external links do

October 15, 2008

HDF and HDF-EOS Workshop XII

38
Traversing an HDF5 File

October 15, 2008

HDF and HDF-EOS Workshop XII

39
Traversing HDF5 File
• Why:
• Allow applications to iterate through the objects in a
group or visit recursively all objects under a group

• What:
• New APIs to traverse a group hierarchy
• New APIs to iterate through a group using different
types of indices (name or creation order)
• H5Giterate is deprecated in favor of new functions

October 15, 2008

HDF and HDF-EOS Workshop XII

40
Traversing HDF5 File
Example of some new APIs
/* Check if object “A/B” exists in a root group */
H5Lexists(file_id, “A/B”, …);
/* Iterate through group members of a root group
using name as an index; this function doesn’t
recursively follow links into subgroups */
H5Literate(file_id, H5_INDEX_NAME, H5_ITER_INC, &idx,
iter_link_cb, &info);
/* Visit all objects under the root group; this
function recursively follow links into subgroups */
H5Lvisit(file_id, H5_INDEX_NAME, H5_ITER_INC,
visit_link_cb, &info);
October 15, 2008

HDF and HDF-EOS Workshop XII

41
Traversing HDF5 File
• Things to remember
• Never use H5Ldelete in any HDF5 iterate or
visit call back functions
• Always close parent object before deleting a
child object

October 15, 2008

HDF and HDF-EOS Workshop XII

42
Shared Object Header
Messages

October 15, 2008

HDF and HDF-EOS Workshop XII

43
Shared Object Header Messages
• Why: metadata duplicated many times, wasting
space
• Example:
• You create a file with 10,000 datasets
• All use the same datatype and dataspace
• HDF5 needs to write this information 10,000 times!
Dataset 1

Dataset 2

Dataset 3

datatype

datatype

datatype

dataspace

dataspace

dataspace

data 1

data 2

data 3

October 15, 2008

HDF and HDF-EOS Workshop XII

44
Shared Object Header Messages
What:
• Enable messages to be shared automatically
• HDF5 shares duplicated messages on its own!

Dataset 1

Dataset 2
datatype
dataspace

data 1

October 15, 2008

data 2

HDF and HDF-EOS Workshop XII

45
Shared Messages
• Happens automatically
• Works with datatypes, dataspaces, attributes, fill
values, and filter pipelines
• Saves space if these objects are relatively large
• May be faster if HDF5 can cache shared
messages
• Drawbacks
• Usually slower than non-shared messages
• Adds overhead to the file
• Index for storing shared datatypes
• 25 bytes per instance

• Older library versions can’t read files with shared
messages
October 15, 2008

HDF and HDF-EOS Workshop XII

46
Two Informal Tests
• File with 24 datasets, all with same big datatype
• 26,000 bytes normally
• 17,000 bytes with shared messages enabled
• Saves 375 bytes per dataset

• But, make a bad decision: invoke shared
messages but only create one dataset…
• 9,000 bytes normally
• 12,000 bytes with shared messages enabled
• Probably slower when reading and writing, too.

• Moral: shared messages can be a big help, but
only in the right situation!

October 15, 2008

HDF and HDF-EOS Workshop XII

47
Error Handling

October 15, 2008

HDF and HDF-EOS Workshop XII

48
Extendible Error-handling APIs
• Why: Enable application to integrate error reporting with
HDF5 library error stack
• What: New error handling API
• H5Epush - push major and minor error ID on specified error

stack

•
•
•
•
•

H5Eprint – print specified stack
H5Ewalk – walk through specified stack
H5Eclear – clear specified stack
H5Eset_auto – turn error printing on/off for specified stack
H5Eget_auto – return settings for specified stack traversal

October 15, 2008

HDF and HDF-EOS Workshop XII

49
Error-handling Programming Model
• Create new class, major and minor error messages
• Register messages with the HDF5 library
• Manage errors
•
•
•
•

Use default or create new error stack
Push error
Print error stack
Close stack

October 15, 2008

HDF and HDF-EOS Workshop XII

50
Error-handling Example
#define ERR_CLS_NAME
"Error Test"
#define PROG_NAME
"Error Program"
#define PROG_VERS
"1.0”
……
#define ERR_MAJ_TEST_MSG
"Error in test”
#define ERR_MIN_MYFUNC_MSG "Error in my function”
……
/* Initialize error information for application */
ERR_CLS = H5Eregister_class(ERR_CLS_NAME, PROG_NAME, PROG_VERS);
ERR_MAJ_TEST = H5Ecreate_msg(ERR_CLS, H5E_MAJOR, ERR_MAJ_TEST_MSG);
ERR_MIN_MYFUNC = H5Ecreate_msg(ERR_CLS, H5E_MINOR,
ERR_MIN_MYFUNC_MSG);
……..
/* Unregister major and minor error, and class handles when done */
H5Eunregister_class(ERR_CLS);

October 15, 2008

HDF and HDF-EOS Workshop XII

51
Error-handling Example
/* This function creates and write a dataset */
static herr_t my_function(hid_t fid)
{
…….
/* Force this function to fail and make it push error */
H5E_BEGIN_TRY {
dataset = H5Dcreate1(FAKE_ID, DSET_NAME, H5T_STD_I32BE, space,
H5P_DEFAULT);
} H5E_END_TRY;
if(dataset < 0) {
H5Epush(H5E_DEFAULT, __FILE__, FUNC_my_function, __LINE__,
ERR_CLS, ERR_MAJ_IO, ERR_MIN_CREATE, "H5Dcreate failed");
goto error;
} /* end if */
……
October 15, 2008

HDF and HDF-EOS Workshop XII

52
Error-handling Example
Error Test-DIAG: Error detected in Error Program (1.0) thread 0:
#000: error_example.c line 160 in main(): Error stack test failed
major: Error in test
minor: Error in my function
#001: error_example.c line 100 in my_function(): H5Dcreate failed
major: Error in IO
minor: Error in H5Dcreate
HDF5-DIAG: Error detected in HDF5 (1.8.1) thread 0:
#002: H5Ddeprec.c line 154 in H5Dcreate1(): not a location ID
major: Invalid arguments to routine
minor: Inappropriate type
#003: H5Gloc.c line 241 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
October 15, 2008

HDF and HDF-EOS Workshop XII

53
Metadata Cache

October 15, 2008

HDF and HDF-EOS Workshop XII

54
Metadata Cache Improvements
• Why:
• Improve I/O performance and memory usage when
accessing many objects

• What:
• New metadata cache APIs
• control cache size
• monitor actual cache size and current hit rate

• Under the hood: adaptive cache resizing
• Automatically detects the current working size
• Sets max cache size to the working set size

October 15, 2008

HDF and HDF-EOS Workshop XII

55
Metadata Cache Improvements
• Note: most applications do not need to worry
about the cache.
• See “Special topics” in the HDF5 User’s Guide for
details.
• And if you do see unusual memory growth or poor
performance, please contact us. We want to help
you.

October 15, 2008

HDF and HDF-EOS Workshop XII

56
Dataset and Datatype
Improvements

October 15, 2008

HDF and HDF-EOS Workshop XII

57
Text-based Datatype Descriptions
• Why:
• Simplify data type creation
• Make data type creation code more readable
• Facilitate debugging by printing the text description
of a data type

• What:
• New routines to create an HDF5 data type through
the text description of the data type and get a text
description from the HDF5 data type

October 15, 2008

HDF and HDF-EOS Workshop XII

58
Text Datatype Description
Example
/* Create the data type from DDL text description */
dtype = H5LTtext_to_dtype(
"H5T_IEEE_F32BEn”,H5LT_DDL);
/* Convert the data type back to text */
H5LTtype_to_text(dtype, NULL, H5LT_DLL, str_len);
dt_str = (char*)calloc(str_len, sizeof(char));
H5LTdtype_to_text(dtype, dt_str, H5LT_DDL, &str_len);

October 15, 2008

HDF and HDF-EOS Workshop XII

59
Serialized Datatypes and Dataspaces
• Why:
• Allow datatype and dataspace info to be
transmitted between processes
• Allow datatype/dataspace to be stored in nonHDF5 files

• What:
• A new set of routines to serialize/deserialize HDF5
datatypes and dataspaces.

October 15, 2008

HDF and HDF-EOS Workshop XII

60
Serialized Datatypes and Dataspaces
Example
/* Find the buffer length and encode a datatype into
buffer */
status = H5Tencode(t_id, NULL, &cmpd_buf_size);
cmpd_buf = (unsigned char*)calloc(1, cmpd_buf_size);
H5Tencode(t_id, cmpd_buf, &cmpd_buf_size)
/* Decode a binary description of a datatype and
retune a datatype handle */
t_id = H5Tdecode(cmpd_buf);

October 15, 2008

HDF and HDF-EOS Workshop XII

61
Integer to Float Convert During I/O
• Why:
• HDF5 1.6 and earlier supported conversion within
the same class (16-bit integer 32-bit integer, 64bit float  32-bit float)
• Conversion needed to support NetCDF-4
programming model

• What:
• Integer to float conversion supported during I/O

October 15, 2008

HDF and HDF-EOS Workshop XII

62
Integer to Float Convert During I/O
Example: conversion is transparent to application
/* Create a dataset of 64-bit little-endian type */
dset_id = H5Dcreate(loc_id,“Mydata”,
H5T_IEEE_F64LE,space_id,
H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
/* Write integer data to “Mydata” */
status = H5Dwrite(dset_id, H5T_NATIVE_INT, …);

October 15, 2008

HDF and HDF-EOS Workshop XII

63
Revised Conversion Exception Handling
• Why:
• Give apps greater control over exceptions (range
errors, etc.) during datatype conversion
• Needed to support NetCDF-4 programming model

• What:
• Revised conversion exception handling

October 15, 2008

HDF and HDF-EOS Workshop XII

64
Revised Conversion Exception Handling
• To handle exceptions during conversions, register
handling function through H5Pset_type_conv_cb().
• Cases of exception:
•
•
•
•
•
•
•

H5T_CONV_EXCEPT_RANGE_HI
H5T_CONV_EXCEPT_RANGE_LOW
H5T_CONV_EXCEPT_TRUNCATE
H5T_CONV_EXCEPT_PRECISION
H5T_CONV_EXCEPT_PINF
H5T_CONV_EXCEPT_NINF
H5T_CONV_EXCEPT_NAN

• Return values: H5T_CONV_ABORT,
H5T_CONV_UNHANDLED,
H5T_CONV_HANDLED

October 15, 2008

HDF and HDF-EOS Workshop XII

65
Compression Filter for N-bit Data
• Why:
• Compact storage for user-defined datatypes

• What:
• When data stored on disk, padding bits chopped
off and only significant bits stored
• Works with compound datatypes

October 15, 2008

HDF and HDF-EOS Workshop XII

66
N-bit Compression Example
• In memory, one value of N-Bit datatype is stored like this:
| byte 3 | byte 2 | byte 1 | byte 0 |
|????????|????SPPP|PPPPPPPP|PPPP????|
S-sign bit

P-significant bit

?-padding bit

• After passing through the N-Bit filter, all padding bits are
chopped off, and the bits are stored on disk like this:
|
1st value
|
2nd value
|
|SPPPPPPP PPPPPPPP|SPPPPPPP PPPPPPPP|...

• Opposite (decompress) when going from disk to memory
• Limited to integer and floating-point data
October 15, 2008

HDF and HDF-EOS Workshop XII

67
N-bit Compression Example
Example
/* Create a N-bit datatype */
dt_id = H5Tcopy(H5T_STD_I32LE);
H5Tset_precision(dt_id, 16);
H5Tset_offset(dt_id, 4);
/* Create and write a dataset */
dcpl_id = H5Pcreate(H5P_DATASET_CREATE);
H5Pset_chunk(dcpl_id, …);
H5Pset_nbit(dcpl_id);
dset_id = H5Dcreate(…,…,…,…,…,dcpl_id,…);
H5Dwrite(dset_id,…,…,…,…,buf);

October 15, 2008

HDF and HDF-EOS Workshop XII

68
Offset+size Storage Filter
• Why:
• Use less storage when less precision needed

• What:
•
•
•
•

Performs scale/offset operation on each value
Truncates result to fewer bits before storing
Currently supports integers and floats
Precision may be lost

October 15, 2008

HDF and HDF-EOS Workshop XII

69
Example with Floating-Point Type
• Data: {104.561, 99.459, 100.545, 105.644}
• Choose scaling factor: decimal precision to keep
E.g. scale factor D = 2
1. Find minimum value (offset): 99.459
2. Subtract minimum value from each element
Result: {5.102, 0, 1.086, 6.185}
3. Scale data by multiplying 10D = 100
Result: {510.2, 0, 108.6, 618.5}
4. Round the data to integer
Result: {510 , 0, 109, 619}
5. Pack and store using min number of bits

October 15, 2008

HDF and HDF-EOS Workshop XII

70
Offset+size Storage Filter
Example
/* Use scale+offset filter on integer data; let
library figure out the number of minimum bits
necessary to story the data without loss of
precision */
H5Pset_scaleoffset
(dcrp_id,H5Z_SO_INT,H5Z_SO_INT_MINBITS_DEFAULT);
H5Pset_chunk(dcrp_id,…,…);
dset_id = H5Dcreate(…,…,…,…,…,dcpl_id, …);

/* Use sclae+offset filter on floating-point data;
compression may be lossy */
H5Pset_scaleoffset(dcrp_id,H5Z_SO_FLOAT_DSCALE,2 );

October 15, 2008

HDF and HDF-EOS Workshop XII

71
“NULL” Dataspace
• Why:
• Allow datasets with no elements to be described
• NetCDF-4 needed a “place holder” for attributes

• What:
• A dataset with no dimensions, no data

October 15, 2008

HDF and HDF-EOS Workshop XII

72
NULL Dataspace
Example
/* Create a dataset with “NULL” dataspace*/
sp_id
= H5Screate(H5S_NULL);
dset_id = H5Dcreate(…,"SDS.h5”,…,sp_id,…,…,…);
HDF5 "SDS.h5" {
GROUP "/" {
DATASET "IntArray" {
DATATYPE H5T_STD_I32LE
DATASPACE NULL
DATA {
}
}
}
}
October 15, 2008

HDF and HDF-EOS Workshop XII

73
Part II
How to Move an Application
to HDF5 1.8

October 15, 2008

HDF and HDF-EOS Workshop XII

74
If you are new to HDF5 …
• Start with the HDF5 Tutorial and use HDF5 1.8.1
(or later) libraries
http://www.hdfgroup.org/HDF5/Tutor/

• Look at the comprehensive set of examples

http://www.hdfgroup.uiuc.edu/UserSupport/examples-by-

• Use HDF5 Reference Manual 1.8.1 or later
http://www.hdfgroup.org/HDF5/doc
to get the latest APIs

• Use h5cc, h5fc, h5c++ scripts to build your
applications
October 15, 2008

HDF and HDF-EOS Workshop XII

75
If you are new to HDF5 …
• To check library version
• Talk to you system administrator
• Try any HDF5 command line utility with –V flag
./h5dump -V
h5dump: Version 1.8.1

• Look at the libhdf5.settings file on your system
(under lib directory in the HDF5 installation
directory); you should see the following lines
HDF5 Version: 1.8.1
Default API Mapping: v18 (can be v16)

• API mapping indicates API version (1.8 in this
case)
October 15, 2008

HDF and HDF-EOS Workshop XII

76
If you are new to HDF5 …
• Use the latest version of API from the Reference
Manual (for example, H5Gcreate or H5Gcreate2
instead of H5Gcreate1)
• Be aware of forward compatibility issues
• If you use HDF5 features introduced in 1.8.0 and
later, you may create a file that old applications
and third party tools may not be able to read
http://www.hdfgroup.org/HDF5/doc/ADGuide/CompatFormat180.html

• All HDF5 command line utilities and tools should
read HDF5 files created during the last 10 years
• Report problems to help@hdfgroup.org
October 15, 2008

HDF and HDF-EOS Workshop XII

77
If you used HDF5 before….
• And would like to switch to 1.8….
• Or you build and install HDF5 Libraries for
yourself or others….
• You need to know about HDF5 API versioning.

October 15, 2008

HDF and HDF-EOS Workshop XII

78
Introduction to HDF5 API
Versioning

October 15, 2008

HDF and HDF-EOS Workshop XII

79
API Versioning Example
• Old way
H5Gcreate ( loc_id, “New/My old group”, 0 )
H5Gcreate1( loc_id, “New/My old group”, 0 )
• No new features can be invoked

• New way
H5Gcreate ( loc_id, “New/My new group”, lcpl_id, gcpl_id, gapl_id)
H5Gcreate2( loc_id, “New/My new group”, lcpl_id, gcpl_id, gapl_id)

• New features can be invoked





Creation order
Unicode names
Compact storage
Intermediate group creation

• In 1.8.0 and later some functions have version number
suffix.
October 15, 2008

HDF and HDF-EOS Workshop XII

80
HDF5 Macros for API Compatibility
Name: H5Gcreate
Signature:
hid_t H5Gcreate( hid_t loc_id, const char *name, size_t
size_hint )
hid_t H5Gcreate( hid_t loc_id, const char *name, hid_t
lcpl_id, hid_t gcpl_id, hid_t gapl_id
Purpose:
Creates a new empty group and links it to a location in the
file.
Description:
H5Gcreate is a macro that is mapped to either
H5Gcreate1 or
H5Gcreate2, depending on the needs of the application.
October 15, 2008

HDF and HDF-EOS Workshop XII

81
HDF5 Library Configuration
• Mapping of APIs is set at configuration time
• Several configuration flags are provided to enable
different mappings
• Macro name is mapped to old API, e.g., H5Gcreate
is mapped to H5Gcreate1
• Macro name is mapped to new API, e.g.,
H5Gcreate is mapped to H5Gcreate2 (will have
different parameters from 1.6 H5Gcreate)
• Disable old APIs completely, e.g., H5Gcreate1 is
not available in the library

October 15, 2008

HDF and HDF-EOS Workshop XII

82
If you are HDF5 library maintainer…
• If you do installation of the HDF5 Library, there
are several choices
• Install library with 1.8 APIs as default APIs (default,
recommended).
• Install library with 1.6 APIs.
• Install library without 1.6 APIs to get smaller
footprint.
• Install library with strict file format checking option
to detect files that do not comply with the File
Format Specification.

October 15, 2008

HDF and HDF-EOS Workshop XII

83
HDF5 Library Configuration
Configure flag

Public APIs has version
1.8

--with-default-api-version=v18
(default)

(Example: H5Gcreate is mapped to
H5Gcreate2, old H5Gcreate is
H5Greate1)

1.8
--enable-strict-format-checks

Library checks compliance with the
file format

1.8
--disable-deprecated-symbols

(Example: H5Gcreate is mapped to
H5Gcreate2, H5Gcreate1 is not
available)

1.6
--with-default-api-version=v16
October 15, 2008

(Example: H5Gcreate is mapped to
H5Gcreate1, H5Gcreate2 is available)

HDF and HDF-EOS Workshop XII

84
If you used HDF5 before…
• And want to use new HDF5 1.8 Library with the
application written for the older versions
h5cc -DH5_USE_16_API my_program.c
• H5Gcreate is mapped to H5Gcreate1; all three
H5Gcreate1, H5Gcreate2 and H5Gcreate can be
used

• HDF5 compilation scripts and H5_USE_16_API
flag can be used with GNU auto tools to build
packages like HDF-EOS5.
• The HDF Group runs daily tests for HDF-EOS5
using the latest HDF5 Libraries under development

October 15, 2008

HDF and HDF-EOS Workshop XII

85
If you used HDF5 before …
• Other available options:
Assuming both deprecated and new symbols are
available in the library:

h5cc my_program.c
• Both H5Gcreate1, H5Gcreate2 and H5Gcreate
may be used
• H5Gcreate will have new signature

h5cc -DH5_NO_DEPRECATED_SYMBOLS

my_program.c
• Only new symbols are available for application;
H5Gcreate is mapped to H5Gcreate2; application
may use both, but cannot use H5Gcreate1

October 15, 2008

HDF and HDF-EOS Workshop XII

86
Example: --with-default-api-version=v18
hid_t file_id, group_id; /* identifiers */
...
/* Open “file.h5” */
file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR,
H5P_DEFAULT, H5P_DEFAULT);
/* Create several groups in a file */
grp1_id = H5Gcreate (file_id, ”New/A",
H5P_DEAFULT, gcpt, gapt);
grp2_id = H5Gcreate1(file_id,"/B",0);
…
grp3_id = H5Gcreate2(file_id,”New/A",
H5P_DEAFULT, gcpt, gapt);

October 15, 2008

HDF and HDF-EOS Workshop XII

87
Example: --with-default-api-version=v16
hid_t file_id, group_id; /* identifiers */
...
/* Open “file.h5” */
file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR,
H5P_DEFAULT, H5P_DEFAULT);
/* Create several groups in a file */
grp1_id = H5Gcreate (file_id, "/A",0);
grp2_id = H5Gcreate1(file_id,"/B",0);
grp3_id = H5Gcreate2(file_id,”New/C",
H5P_DEAFULT, gcpt, gapt);

October 15, 2008

HDF and HDF-EOS Workshop XII

88
Example: --disable-deprecated-symbols
hid_t file_id, group_id; /* identifiers */
...
/* Open “file.h5” */
file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR,
H5P_DEFAULT, H5P_DEFAULT);
/* Create several groups in a file */
grp1_id = H5Gcreate (file_id, ”New/A",
H5P_DEAFULT, gcpt, gapt);
/* Compilation will fail */
grp2_id = H5Gcreate1(file_id,"/B",0);
grp3_id = H5Gcreate2(file_id,”New/A",
H5P_DEAFULT, gcpt, gapt);

October 15, 2008

HDF and HDF-EOS Workshop XII

89
Thank you!
Questions?

October 15, 2008

HDF and HDF-EOS Workshop XII

90
Acknowledgement
• This report is based upon work supported in part
by a Cooperative Agreement with the National
Aeronautics and Space Administration (NASA)
under NASA Awards NNX06AC83A and
NNX08AO77A. Any opinions, findings, and
conclusions or recommendations expressed in
this material are those of the author(s) and do not
necessarily reflect the views of the National
Aeronautics and Space Administration.

October 16, 2008

HDF and HDF-EOS Workshop XII

91

Migrating from HDF5 1.6 to 1.8

  • 1.
    Migrating from HDF51.6 to HDF5 1.8 October 15, 2008 HDF and HDF-EOS Workshop XII 1
  • 2.
    Outline • Status ofthe HDF5 1.6 and 1.8 releases • Overview of the HDF5 1.8 features • How to move applications to HDF5 1.8 ? October 15, 2008 HDF and HDF-EOS Workshop XII 2
  • 3.
    Status of HDF5releases October 15, 2008 HDF and HDF-EOS Workshop XII 3
  • 4.
    Current HDF5 Releases •HDF5 1.8.0 was released in February 2008 • Major update of HDF5 1.6.* series (stable set of features and APIs since 1998) • • • • • New features 200 new APIs Changes to file format Changes to APIs Backward compatible • HDF5 1.8.1 was released in June 2008 • Minor bug fixes • Included Fortran90 APIs for new C functions October 15, 2008 HDF and HDF-EOS Workshop XII 4
  • 5.
    Current HDF5 Releases •HDF5 1.6.7 was released in February 2008 • Addressed backward compatibility bug for reading files with corrupted object header information • New maintenance releases will be in November 2008 • HDF5 1.6.8 and 1.8.2 • Minor bug fixes • Tools improvements • Current plans are to support HDF5 1.6 and 1.8 until November 2009 October 15, 2008 HDF and HDF-EOS Workshop XII 5
  • 6.
    Information About CurrentReleases http://www.hdfgroup.org/HDF5 October 15, 2008 HDF and HDF-EOS Workshop XII 6
  • 7.
    Goal of theTutorial • Help with transition to the 1.8 releases • Discuss new features beneficial to applications written for 1.6 releases • Raise awareness about forward/backward compatibility issues with the 1.8 releases • Get feedback from the users who already moved to 1.8 releases October 15, 2008 HDF and HDF-EOS Workshop XII 7
  • 8.
    Why New Features? •Need to address some deficiencies in initial design • Examples: • Big overhead in file sizes • Non-tunable metadata cache implementation • Handling of free-space in a file October 15, 2008 HDF and HDF-EOS Workshop XII 8
  • 9.
    Why New Features? •Need to address new requirements • Add support for • New types of indexing (object creation order) • Big volumes of variable-length data (DNA sequences) • Simultaneous real-time streams (fast append to one -dimensional datasets) • UTF-8 encoding for objects’ path names • Accessing objects stored in another HDF5 files (external or user-defined links) October 15, 2008 HDF and HDF-EOS Workshop XII 9
  • 10.
    What Did WeDo in HDF5 1.8? • • • • • • • • Extended File Format Specification Reviewed group implementations Introduced new link object Revamped metadata cache implementation Improved handling of datasets and datatypes Introduced shared object header message Extended error handling Enhanced backward/forward APIs and file format compatibility October 15, 2008 HDF and HDF-EOS Workshop XII 10
  • 11.
    What Did WeDo in HDF5 1.8? And much more good stuff to make HDF5 Better and Faster October 15, 2008 HDF and HDF-EOS Workshop XII 11
  • 12.
    HDF5 File FormatExtension October 15, 2008 HDF and HDF-EOS Workshop XII 12
  • 13.
    HDF5 File FormatExtension • Why: • Address deficiencies of the original file format • Address space overhead in an HDF5 file • Enable new features • What: • New routine that instructs the HDF5 library to create all objects using the latest version of the HDF5 file format (cmp. with the earliest version when object became available, for example, array datatype) October 15, 2008 HDF and HDF-EOS Workshop XII 13
  • 14.
    HDF5 File FormatExtension Example /* Use the latest version of a file format for each object created in a file */ fapl_id = H5Pcreate(H5P_FILE_ACCESS); H5Pset_libver_bounds(fapl_id, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST); fid = H5Fcreate(…,…,…,fapl_id); or fid = H5Fopen(…,…,fapl_id); October 15, 2008 HDF and HDF-EOS Workshop XII 14
  • 15.
    Group Revisions October 15,2008 HDF and HDF-EOS Workshop XII 15
  • 16.
    Better Large GroupStorage • Why: • Faster, more scalable storage and access for large groups • What: • New format and method for storing groups with many links October 15, 2008 HDF and HDF-EOS Workshop XII 16
  • 17.
    Informal Benchmark • Createa file and a group in a file • Create up to 10^6 groups with one dataset in each group • Compare files sizes and performance of HDF5 1.8.1 using the latest group format with the performance of HDF5 1.8.1 (default, old format) and 1.6.7 • Note: Default 1.8.1 and 1.6.7 became very slow after 700000 groups October 15, 2008 HDF and HDF-EOS Workshop XII 17
  • 18.
    Time to Openand Read a Dataset October 15, 2008 HDF and HDF-EOS Workshop XII 18
  • 19.
    Time to Closethe File October 15, 2008 HDF and HDF-EOS Workshop XII 19
  • 20.
    File Size October 15,2008 HDF and HDF-EOS Workshop XII 20
  • 21.
    Access Links byCreation Order • Why: • Allow iteration and lookup of group’s links (children) by creation order as well as by name order • Support NetCDF access model for NetCDF-4 • What: • Option to access objects in group according to relative creation time October 15, 2008 HDF and HDF-EOS Workshop XII 21
  • 22.
    Access Links byCreation Order Example /* Track and index creation order of the links */ H5Pset_link_creation_order(gcpl_id, (H5P_CRT_ORDER_TRACKED | H5P_CRT_ORDER_INDEXED)); /* Create a group */ gid = H5Gcreate(fid, GNAME, H5P_DEFAULT, gcpl_id, H5P_DEFAULT); October 15, 2008 HDF and HDF-EOS Workshop XII 22
  • 23.
    Example: h5dump --group=1tordergr.h5 HDF5 "tordergr.h5" { GROUP "1" { GROUP "a" { GROUP "a1" { } GROUP "a2" { GROUP "a21" { } GROUP "a22" { } } } GROUP "b" { } GROUP "c" { … October 15, 2008 HDF and HDF-EOS Workshop XII 23
  • 24.
    Example: h5dump --sort_by=creation_order HDF5"tordergr.h5" { GROUP "1" { GROUP "c" { } GROUP "b" { } GROUP "a" { GROUP "a1" { } GROUP "a2" { GROUP "a22" { } GROUP "a21" { } } } October 15, 2008 HDF and HDF-EOS Workshop XII 24
  • 25.
    Compact Groups • Why: •Save space and access time for small groups • If groups are small, don’t need B-tree overhead • What: • Alternate storage for groups with few links • Default storage when “latest format” is specified • Library converts to “original” storage (B-tree based) using default or user-specified threshold October 15, 2008 HDF and HDF-EOS Workshop XII 25
  • 26.
    Compact Groups • Example • • • • • Filewith 11,600 groups With original group structure, file size ~ 20 MB With compact groups, file size ~ 12 MB Total savings: 8 MB (40%) Average savings/group: ~700 bytes October 15, 2008 HDF and HDF-EOS Workshop XII 26
  • 27.
    Compact Groups Example /* Changestorage to “dense” if number of group members is bigger than 16 and go back to compact storage if number of group members is smaller than 12 */ H5Pset_link_phase_change(gcpl_id, 16, 12) /* Create a group */ g_id = H5Gcreate(…,…,…,gcpl_id,…); October 15, 2008 HDF and HDF-EOS Workshop XII 27
  • 28.
    Intermediate Group Creation •Why: • Simplify creation of a series of connected groups • Avoid having to create each intermediate group separately, one by one • What: • Intermediate groups can be created when creating an object in a file, with one function call October 15, 2008 HDF and HDF-EOS Workshop XII 28
  • 29.
    Intermediate Group Creation •Want to create “/A/B/C/dset1” • “A” exists, but “B/C/dset1” do not / / A A B C dset1 One call creates groups “B” & “C”, then creates “dset1” October 15, 2008 HDF and HDF-EOS Workshop XII 29
  • 30.
    Intermediate Group Creation Example /*Create link creation property list */ lcrp_id = H5Pcreate(H5P_LINK_CREATE); /* Set flag for intermediate group creation Groups B and C will be created automatically */ H5Pset_create_intermediate_group(lcrp_id, TRUE); ds_id = H5Dcreate (file_id, "/A/B/C/dset1",…,…, lcrp_id,…,…,); October 15, 2008 HDF and HDF-EOS Workshop XII 30
  • 31.
    Link Revisions October 15,2008 HDF and HDF-EOS Workshop XII 31
  • 32.
    What are Links? •Links connect groups to their members • “Hard” links point to a target by address • “Soft” links store the path to a target root group Hard link <address> Soft link “/target dataset” dataset October 15, 2008 HDF and HDF-EOS Workshop XII 32
  • 33.
    Links: Before andAfter • New data model for handling links • Links may have properties (UTF-8 name encoding, creation order indexing, storage property, etc.) Before After Group Group Name and other properties Name Object October 15, 2008 Object HDF and HDF-EOS Workshop XII 33
  • 34.
    Anonymous Object • Objectcan be created without being immediately linked into graph structure • Group, dataset and datatype • See new H5*create_anon APIs Group Object • Use H5O* APIs to manipulate the objects October 15, 2008 HDF and HDF-EOS Workshop XII 34
  • 35.
    New: External Links •Why: • Access objects stored in other HDF5 files in a transparent way • What: • Store location of file and path within that file • Can link across files October 15, 2008 HDF and HDF-EOS Workshop XII 35
  • 36.
    New: External Links file2.h5 rootgroup file1.h5 “target object” root group <address> “External_link” “file2.h5” “/A/B/C/D/E” group External link object “External_link” in file1.h5 points to the group /A/B/C/D/E in file2.h5 October 15, 2008 HDF and HDF-EOS Workshop XII 36
  • 37.
    External Links Example /* Createan external link */ H5Lcreate_external(TARGET_FILE, ”/A/B/C/D/E", source_file_id, ”External_link”, …,…); /* We will use external link to create a group in a target file */ gr_id = H5Gcreate(source_file_id,”External_link/F”,…, …,…,…); /* We can access group “External_link/F” in the source file and group “/A/B/C/D/E/F” in the target file */ October 15, 2008 HDF and HDF-EOS Workshop XII 37
  • 38.
    New: User-defined Links •Why: • Allow applications to create their own kinds of links and link operations, such as • Create “hard” external link that finds an object by address • Create link that accesses a URL • Keep track of how often a link is accessed, or other behavior • What: • Applications can create new kinds of links by supplying custom callback functions • Can do anything HDF5 hard, soft, or external links do October 15, 2008 HDF and HDF-EOS Workshop XII 38
  • 39.
    Traversing an HDF5File October 15, 2008 HDF and HDF-EOS Workshop XII 39
  • 40.
    Traversing HDF5 File •Why: • Allow applications to iterate through the objects in a group or visit recursively all objects under a group • What: • New APIs to traverse a group hierarchy • New APIs to iterate through a group using different types of indices (name or creation order) • H5Giterate is deprecated in favor of new functions October 15, 2008 HDF and HDF-EOS Workshop XII 40
  • 41.
    Traversing HDF5 File Exampleof some new APIs /* Check if object “A/B” exists in a root group */ H5Lexists(file_id, “A/B”, …); /* Iterate through group members of a root group using name as an index; this function doesn’t recursively follow links into subgroups */ H5Literate(file_id, H5_INDEX_NAME, H5_ITER_INC, &idx, iter_link_cb, &info); /* Visit all objects under the root group; this function recursively follow links into subgroups */ H5Lvisit(file_id, H5_INDEX_NAME, H5_ITER_INC, visit_link_cb, &info); October 15, 2008 HDF and HDF-EOS Workshop XII 41
  • 42.
    Traversing HDF5 File •Things to remember • Never use H5Ldelete in any HDF5 iterate or visit call back functions • Always close parent object before deleting a child object October 15, 2008 HDF and HDF-EOS Workshop XII 42
  • 43.
    Shared Object Header Messages October15, 2008 HDF and HDF-EOS Workshop XII 43
  • 44.
    Shared Object HeaderMessages • Why: metadata duplicated many times, wasting space • Example: • You create a file with 10,000 datasets • All use the same datatype and dataspace • HDF5 needs to write this information 10,000 times! Dataset 1 Dataset 2 Dataset 3 datatype datatype datatype dataspace dataspace dataspace data 1 data 2 data 3 October 15, 2008 HDF and HDF-EOS Workshop XII 44
  • 45.
    Shared Object HeaderMessages What: • Enable messages to be shared automatically • HDF5 shares duplicated messages on its own! Dataset 1 Dataset 2 datatype dataspace data 1 October 15, 2008 data 2 HDF and HDF-EOS Workshop XII 45
  • 46.
    Shared Messages • Happensautomatically • Works with datatypes, dataspaces, attributes, fill values, and filter pipelines • Saves space if these objects are relatively large • May be faster if HDF5 can cache shared messages • Drawbacks • Usually slower than non-shared messages • Adds overhead to the file • Index for storing shared datatypes • 25 bytes per instance • Older library versions can’t read files with shared messages October 15, 2008 HDF and HDF-EOS Workshop XII 46
  • 47.
    Two Informal Tests •File with 24 datasets, all with same big datatype • 26,000 bytes normally • 17,000 bytes with shared messages enabled • Saves 375 bytes per dataset • But, make a bad decision: invoke shared messages but only create one dataset… • 9,000 bytes normally • 12,000 bytes with shared messages enabled • Probably slower when reading and writing, too. • Moral: shared messages can be a big help, but only in the right situation! October 15, 2008 HDF and HDF-EOS Workshop XII 47
  • 48.
    Error Handling October 15,2008 HDF and HDF-EOS Workshop XII 48
  • 49.
    Extendible Error-handling APIs •Why: Enable application to integrate error reporting with HDF5 library error stack • What: New error handling API • H5Epush - push major and minor error ID on specified error stack • • • • • H5Eprint – print specified stack H5Ewalk – walk through specified stack H5Eclear – clear specified stack H5Eset_auto – turn error printing on/off for specified stack H5Eget_auto – return settings for specified stack traversal October 15, 2008 HDF and HDF-EOS Workshop XII 49
  • 50.
    Error-handling Programming Model •Create new class, major and minor error messages • Register messages with the HDF5 library • Manage errors • • • • Use default or create new error stack Push error Print error stack Close stack October 15, 2008 HDF and HDF-EOS Workshop XII 50
  • 51.
    Error-handling Example #define ERR_CLS_NAME "ErrorTest" #define PROG_NAME "Error Program" #define PROG_VERS "1.0” …… #define ERR_MAJ_TEST_MSG "Error in test” #define ERR_MIN_MYFUNC_MSG "Error in my function” …… /* Initialize error information for application */ ERR_CLS = H5Eregister_class(ERR_CLS_NAME, PROG_NAME, PROG_VERS); ERR_MAJ_TEST = H5Ecreate_msg(ERR_CLS, H5E_MAJOR, ERR_MAJ_TEST_MSG); ERR_MIN_MYFUNC = H5Ecreate_msg(ERR_CLS, H5E_MINOR, ERR_MIN_MYFUNC_MSG); …….. /* Unregister major and minor error, and class handles when done */ H5Eunregister_class(ERR_CLS); October 15, 2008 HDF and HDF-EOS Workshop XII 51
  • 52.
    Error-handling Example /* Thisfunction creates and write a dataset */ static herr_t my_function(hid_t fid) { ……. /* Force this function to fail and make it push error */ H5E_BEGIN_TRY { dataset = H5Dcreate1(FAKE_ID, DSET_NAME, H5T_STD_I32BE, space, H5P_DEFAULT); } H5E_END_TRY; if(dataset < 0) { H5Epush(H5E_DEFAULT, __FILE__, FUNC_my_function, __LINE__, ERR_CLS, ERR_MAJ_IO, ERR_MIN_CREATE, "H5Dcreate failed"); goto error; } /* end if */ …… October 15, 2008 HDF and HDF-EOS Workshop XII 52
  • 53.
    Error-handling Example Error Test-DIAG:Error detected in Error Program (1.0) thread 0: #000: error_example.c line 160 in main(): Error stack test failed major: Error in test minor: Error in my function #001: error_example.c line 100 in my_function(): H5Dcreate failed major: Error in IO minor: Error in H5Dcreate HDF5-DIAG: Error detected in HDF5 (1.8.1) thread 0: #002: H5Ddeprec.c line 154 in H5Dcreate1(): not a location ID major: Invalid arguments to routine minor: Inappropriate type #003: H5Gloc.c line 241 in H5G_loc(): invalid object ID major: Invalid arguments to routine minor: Bad value October 15, 2008 HDF and HDF-EOS Workshop XII 53
  • 54.
    Metadata Cache October 15,2008 HDF and HDF-EOS Workshop XII 54
  • 55.
    Metadata Cache Improvements •Why: • Improve I/O performance and memory usage when accessing many objects • What: • New metadata cache APIs • control cache size • monitor actual cache size and current hit rate • Under the hood: adaptive cache resizing • Automatically detects the current working size • Sets max cache size to the working set size October 15, 2008 HDF and HDF-EOS Workshop XII 55
  • 56.
    Metadata Cache Improvements •Note: most applications do not need to worry about the cache. • See “Special topics” in the HDF5 User’s Guide for details. • And if you do see unusual memory growth or poor performance, please contact us. We want to help you. October 15, 2008 HDF and HDF-EOS Workshop XII 56
  • 57.
    Dataset and Datatype Improvements October15, 2008 HDF and HDF-EOS Workshop XII 57
  • 58.
    Text-based Datatype Descriptions •Why: • Simplify data type creation • Make data type creation code more readable • Facilitate debugging by printing the text description of a data type • What: • New routines to create an HDF5 data type through the text description of the data type and get a text description from the HDF5 data type October 15, 2008 HDF and HDF-EOS Workshop XII 58
  • 59.
    Text Datatype Description Example /*Create the data type from DDL text description */ dtype = H5LTtext_to_dtype( "H5T_IEEE_F32BEn”,H5LT_DDL); /* Convert the data type back to text */ H5LTtype_to_text(dtype, NULL, H5LT_DLL, str_len); dt_str = (char*)calloc(str_len, sizeof(char)); H5LTdtype_to_text(dtype, dt_str, H5LT_DDL, &str_len); October 15, 2008 HDF and HDF-EOS Workshop XII 59
  • 60.
    Serialized Datatypes andDataspaces • Why: • Allow datatype and dataspace info to be transmitted between processes • Allow datatype/dataspace to be stored in nonHDF5 files • What: • A new set of routines to serialize/deserialize HDF5 datatypes and dataspaces. October 15, 2008 HDF and HDF-EOS Workshop XII 60
  • 61.
    Serialized Datatypes andDataspaces Example /* Find the buffer length and encode a datatype into buffer */ status = H5Tencode(t_id, NULL, &cmpd_buf_size); cmpd_buf = (unsigned char*)calloc(1, cmpd_buf_size); H5Tencode(t_id, cmpd_buf, &cmpd_buf_size) /* Decode a binary description of a datatype and retune a datatype handle */ t_id = H5Tdecode(cmpd_buf); October 15, 2008 HDF and HDF-EOS Workshop XII 61
  • 62.
    Integer to FloatConvert During I/O • Why: • HDF5 1.6 and earlier supported conversion within the same class (16-bit integer 32-bit integer, 64bit float  32-bit float) • Conversion needed to support NetCDF-4 programming model • What: • Integer to float conversion supported during I/O October 15, 2008 HDF and HDF-EOS Workshop XII 62
  • 63.
    Integer to FloatConvert During I/O Example: conversion is transparent to application /* Create a dataset of 64-bit little-endian type */ dset_id = H5Dcreate(loc_id,“Mydata”, H5T_IEEE_F64LE,space_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); /* Write integer data to “Mydata” */ status = H5Dwrite(dset_id, H5T_NATIVE_INT, …); October 15, 2008 HDF and HDF-EOS Workshop XII 63
  • 64.
    Revised Conversion ExceptionHandling • Why: • Give apps greater control over exceptions (range errors, etc.) during datatype conversion • Needed to support NetCDF-4 programming model • What: • Revised conversion exception handling October 15, 2008 HDF and HDF-EOS Workshop XII 64
  • 65.
    Revised Conversion ExceptionHandling • To handle exceptions during conversions, register handling function through H5Pset_type_conv_cb(). • Cases of exception: • • • • • • • H5T_CONV_EXCEPT_RANGE_HI H5T_CONV_EXCEPT_RANGE_LOW H5T_CONV_EXCEPT_TRUNCATE H5T_CONV_EXCEPT_PRECISION H5T_CONV_EXCEPT_PINF H5T_CONV_EXCEPT_NINF H5T_CONV_EXCEPT_NAN • Return values: H5T_CONV_ABORT, H5T_CONV_UNHANDLED, H5T_CONV_HANDLED October 15, 2008 HDF and HDF-EOS Workshop XII 65
  • 66.
    Compression Filter forN-bit Data • Why: • Compact storage for user-defined datatypes • What: • When data stored on disk, padding bits chopped off and only significant bits stored • Works with compound datatypes October 15, 2008 HDF and HDF-EOS Workshop XII 66
  • 67.
    N-bit Compression Example •In memory, one value of N-Bit datatype is stored like this: | byte 3 | byte 2 | byte 1 | byte 0 | |????????|????SPPP|PPPPPPPP|PPPP????| S-sign bit P-significant bit ?-padding bit • After passing through the N-Bit filter, all padding bits are chopped off, and the bits are stored on disk like this: | 1st value | 2nd value | |SPPPPPPP PPPPPPPP|SPPPPPPP PPPPPPPP|... • Opposite (decompress) when going from disk to memory • Limited to integer and floating-point data October 15, 2008 HDF and HDF-EOS Workshop XII 67
  • 68.
    N-bit Compression Example Example /*Create a N-bit datatype */ dt_id = H5Tcopy(H5T_STD_I32LE); H5Tset_precision(dt_id, 16); H5Tset_offset(dt_id, 4); /* Create and write a dataset */ dcpl_id = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk(dcpl_id, …); H5Pset_nbit(dcpl_id); dset_id = H5Dcreate(…,…,…,…,…,dcpl_id,…); H5Dwrite(dset_id,…,…,…,…,buf); October 15, 2008 HDF and HDF-EOS Workshop XII 68
  • 69.
    Offset+size Storage Filter •Why: • Use less storage when less precision needed • What: • • • • Performs scale/offset operation on each value Truncates result to fewer bits before storing Currently supports integers and floats Precision may be lost October 15, 2008 HDF and HDF-EOS Workshop XII 69
  • 70.
    Example with Floating-PointType • Data: {104.561, 99.459, 100.545, 105.644} • Choose scaling factor: decimal precision to keep E.g. scale factor D = 2 1. Find minimum value (offset): 99.459 2. Subtract minimum value from each element Result: {5.102, 0, 1.086, 6.185} 3. Scale data by multiplying 10D = 100 Result: {510.2, 0, 108.6, 618.5} 4. Round the data to integer Result: {510 , 0, 109, 619} 5. Pack and store using min number of bits October 15, 2008 HDF and HDF-EOS Workshop XII 70
  • 71.
    Offset+size Storage Filter Example /*Use scale+offset filter on integer data; let library figure out the number of minimum bits necessary to story the data without loss of precision */ H5Pset_scaleoffset (dcrp_id,H5Z_SO_INT,H5Z_SO_INT_MINBITS_DEFAULT); H5Pset_chunk(dcrp_id,…,…); dset_id = H5Dcreate(…,…,…,…,…,dcpl_id, …); /* Use sclae+offset filter on floating-point data; compression may be lossy */ H5Pset_scaleoffset(dcrp_id,H5Z_SO_FLOAT_DSCALE,2 ); October 15, 2008 HDF and HDF-EOS Workshop XII 71
  • 72.
    “NULL” Dataspace • Why: •Allow datasets with no elements to be described • NetCDF-4 needed a “place holder” for attributes • What: • A dataset with no dimensions, no data October 15, 2008 HDF and HDF-EOS Workshop XII 72
  • 73.
    NULL Dataspace Example /* Createa dataset with “NULL” dataspace*/ sp_id = H5Screate(H5S_NULL); dset_id = H5Dcreate(…,"SDS.h5”,…,sp_id,…,…,…); HDF5 "SDS.h5" { GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32LE DATASPACE NULL DATA { } } } } October 15, 2008 HDF and HDF-EOS Workshop XII 73
  • 74.
    Part II How toMove an Application to HDF5 1.8 October 15, 2008 HDF and HDF-EOS Workshop XII 74
  • 75.
    If you arenew to HDF5 … • Start with the HDF5 Tutorial and use HDF5 1.8.1 (or later) libraries http://www.hdfgroup.org/HDF5/Tutor/ • Look at the comprehensive set of examples http://www.hdfgroup.uiuc.edu/UserSupport/examples-by- • Use HDF5 Reference Manual 1.8.1 or later http://www.hdfgroup.org/HDF5/doc to get the latest APIs • Use h5cc, h5fc, h5c++ scripts to build your applications October 15, 2008 HDF and HDF-EOS Workshop XII 75
  • 76.
    If you arenew to HDF5 … • To check library version • Talk to you system administrator • Try any HDF5 command line utility with –V flag ./h5dump -V h5dump: Version 1.8.1 • Look at the libhdf5.settings file on your system (under lib directory in the HDF5 installation directory); you should see the following lines HDF5 Version: 1.8.1 Default API Mapping: v18 (can be v16) • API mapping indicates API version (1.8 in this case) October 15, 2008 HDF and HDF-EOS Workshop XII 76
  • 77.
    If you arenew to HDF5 … • Use the latest version of API from the Reference Manual (for example, H5Gcreate or H5Gcreate2 instead of H5Gcreate1) • Be aware of forward compatibility issues • If you use HDF5 features introduced in 1.8.0 and later, you may create a file that old applications and third party tools may not be able to read http://www.hdfgroup.org/HDF5/doc/ADGuide/CompatFormat180.html • All HDF5 command line utilities and tools should read HDF5 files created during the last 10 years • Report problems to help@hdfgroup.org October 15, 2008 HDF and HDF-EOS Workshop XII 77
  • 78.
    If you usedHDF5 before…. • And would like to switch to 1.8…. • Or you build and install HDF5 Libraries for yourself or others…. • You need to know about HDF5 API versioning. October 15, 2008 HDF and HDF-EOS Workshop XII 78
  • 79.
    Introduction to HDF5API Versioning October 15, 2008 HDF and HDF-EOS Workshop XII 79
  • 80.
    API Versioning Example •Old way H5Gcreate ( loc_id, “New/My old group”, 0 ) H5Gcreate1( loc_id, “New/My old group”, 0 ) • No new features can be invoked • New way H5Gcreate ( loc_id, “New/My new group”, lcpl_id, gcpl_id, gapl_id) H5Gcreate2( loc_id, “New/My new group”, lcpl_id, gcpl_id, gapl_id) • New features can be invoked     Creation order Unicode names Compact storage Intermediate group creation • In 1.8.0 and later some functions have version number suffix. October 15, 2008 HDF and HDF-EOS Workshop XII 80
  • 81.
    HDF5 Macros forAPI Compatibility Name: H5Gcreate Signature: hid_t H5Gcreate( hid_t loc_id, const char *name, size_t size_hint ) hid_t H5Gcreate( hid_t loc_id, const char *name, hid_t lcpl_id, hid_t gcpl_id, hid_t gapl_id Purpose: Creates a new empty group and links it to a location in the file. Description: H5Gcreate is a macro that is mapped to either H5Gcreate1 or H5Gcreate2, depending on the needs of the application. October 15, 2008 HDF and HDF-EOS Workshop XII 81
  • 82.
    HDF5 Library Configuration •Mapping of APIs is set at configuration time • Several configuration flags are provided to enable different mappings • Macro name is mapped to old API, e.g., H5Gcreate is mapped to H5Gcreate1 • Macro name is mapped to new API, e.g., H5Gcreate is mapped to H5Gcreate2 (will have different parameters from 1.6 H5Gcreate) • Disable old APIs completely, e.g., H5Gcreate1 is not available in the library October 15, 2008 HDF and HDF-EOS Workshop XII 82
  • 83.
    If you areHDF5 library maintainer… • If you do installation of the HDF5 Library, there are several choices • Install library with 1.8 APIs as default APIs (default, recommended). • Install library with 1.6 APIs. • Install library without 1.6 APIs to get smaller footprint. • Install library with strict file format checking option to detect files that do not comply with the File Format Specification. October 15, 2008 HDF and HDF-EOS Workshop XII 83
  • 84.
    HDF5 Library Configuration Configureflag Public APIs has version 1.8 --with-default-api-version=v18 (default) (Example: H5Gcreate is mapped to H5Gcreate2, old H5Gcreate is H5Greate1) 1.8 --enable-strict-format-checks Library checks compliance with the file format 1.8 --disable-deprecated-symbols (Example: H5Gcreate is mapped to H5Gcreate2, H5Gcreate1 is not available) 1.6 --with-default-api-version=v16 October 15, 2008 (Example: H5Gcreate is mapped to H5Gcreate1, H5Gcreate2 is available) HDF and HDF-EOS Workshop XII 84
  • 85.
    If you usedHDF5 before… • And want to use new HDF5 1.8 Library with the application written for the older versions h5cc -DH5_USE_16_API my_program.c • H5Gcreate is mapped to H5Gcreate1; all three H5Gcreate1, H5Gcreate2 and H5Gcreate can be used • HDF5 compilation scripts and H5_USE_16_API flag can be used with GNU auto tools to build packages like HDF-EOS5. • The HDF Group runs daily tests for HDF-EOS5 using the latest HDF5 Libraries under development October 15, 2008 HDF and HDF-EOS Workshop XII 85
  • 86.
    If you usedHDF5 before … • Other available options: Assuming both deprecated and new symbols are available in the library: h5cc my_program.c • Both H5Gcreate1, H5Gcreate2 and H5Gcreate may be used • H5Gcreate will have new signature h5cc -DH5_NO_DEPRECATED_SYMBOLS my_program.c • Only new symbols are available for application; H5Gcreate is mapped to H5Gcreate2; application may use both, but cannot use H5Gcreate1 October 15, 2008 HDF and HDF-EOS Workshop XII 86
  • 87.
    Example: --with-default-api-version=v18 hid_t file_id,group_id; /* identifiers */ ... /* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT, H5P_DEFAULT); /* Create several groups in a file */ grp1_id = H5Gcreate (file_id, ”New/A", H5P_DEAFULT, gcpt, gapt); grp2_id = H5Gcreate1(file_id,"/B",0); … grp3_id = H5Gcreate2(file_id,”New/A", H5P_DEAFULT, gcpt, gapt); October 15, 2008 HDF and HDF-EOS Workshop XII 87
  • 88.
    Example: --with-default-api-version=v16 hid_t file_id,group_id; /* identifiers */ ... /* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT, H5P_DEFAULT); /* Create several groups in a file */ grp1_id = H5Gcreate (file_id, "/A",0); grp2_id = H5Gcreate1(file_id,"/B",0); grp3_id = H5Gcreate2(file_id,”New/C", H5P_DEAFULT, gcpt, gapt); October 15, 2008 HDF and HDF-EOS Workshop XII 88
  • 89.
    Example: --disable-deprecated-symbols hid_t file_id,group_id; /* identifiers */ ... /* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT, H5P_DEFAULT); /* Create several groups in a file */ grp1_id = H5Gcreate (file_id, ”New/A", H5P_DEAFULT, gcpt, gapt); /* Compilation will fail */ grp2_id = H5Gcreate1(file_id,"/B",0); grp3_id = H5Gcreate2(file_id,”New/A", H5P_DEAFULT, gcpt, gapt); October 15, 2008 HDF and HDF-EOS Workshop XII 89
  • 90.
    Thank you! Questions? October 15,2008 HDF and HDF-EOS Workshop XII 90
  • 91.
    Acknowledgement • This reportis based upon work supported in part by a Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA Awards NNX06AC83A and NNX08AO77A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration. October 16, 2008 HDF and HDF-EOS Workshop XII 91

Editor's Notes

  • #19 This graph shows primarily that the access time with old format groups grows almost linearly with the number of groups, while it is nearly constant with the new groups. At the upper end of the test, old groups are 2-3 orders of magnitude slower than new groups. Plato: metadata cache felt enough pressure from cache misses that it resized the cache larger.  I&apos;m guessing that if you were able to keep going, it would start sloping upward again.
  • #20 This graph again shows that performance of the new groups is relatively unaffected by the number of groups, though the difference is not as dramatic as cold-cache access. The &quot;old&quot; group format, with a single [huge] local heap is probably the reason - it&apos;s being flushed to the file when the group closes.  The &quot;new&quot; group format which uses a fractal heap will never get a single block of heap data that&apos;s so large - smaller heap blocks will get evicted from the cache and flushed to disk as no more group entries are added to them.  It could be that the new v2 B-tree is much more efficient than the older B*-trees, but I&apos;m guessing that&apos;s a smaller effect.
  • #21 This shows the greater space efficiency of the new compact groups. The new indexed groups are also more space efficient, but that does not make as much difference as the compact groups.
  • #56 Give an example when it really helps old applications: size of the local heap may be bigger than cache size, all entries will be evicted when local heap is brought in; then bring in dataset with chunked properties, etc.