Introduction to HDF5

Introduction to HDF5
HDF & HDF-EOS Workshop XII
October 15, 2008

10/15/08

HDF & HDF-EOS Workshop XII 1

1

Topics Covered
- Introduce HDF5
- Describe HDF5 Data and Programming Models
- Walk Through Example Code

10/15/08


2

For More Information …

All workshop slides will be available from:
http://hdfeos.org/workshops/ws12/workshop_twelve.php

10/15/08


What is HDF5?

HDF = Hierarchical Data Format
• Data model, library and file format for managing
data
• Tools for accessing data in the HDF5 format

10/15/08


Brief History of HDF
1987 At NCSA (University of Illinois), a task force formed to create an
architecture-independent format and library:
AEHOO (All Encompassing Hierarchical Object Oriented format)
Became HDF
Early NASA adopted HDF for Earth Observing System project
1990’s
1996

DOE’s ASC (Advanced Simulation and Computing) Project began
collaborating with the HDF group (NCSA) to create “Big HDF”
(Increase in computing power of DOE systems at LLNL, LANL and
Sandia National labs, required bigger, more complex data files).
“Big HDF” became HDF5.

1998

HDF5 was released with support from National Labs, NASA, NCSA

2006 The HDF Group spun off from University of Illinois as non-profit
corporation

10/15/08


Why HDF5?

In one sentence ...

10/15/08


6

Answering big questions …

Matter and the universe

Life and nature

August 24, 2001

August 24, 2002

Total Column Ozone (Dobson)
60

385

610

Weather and climate

10/15/08


7

… involves big data …

10/15/08


8

… varied data …

LCI Tutorial

10/15/08

Thanks to Mark
HDF & HDF-EOS Workshop XII 9 Miller, LLNL
9

… and complex relationships …
SNP Score
Contig Summaries
Discrepancies

Contig Qualities

Coverage Depth

Trace
Reads

Aligned bases

Read
quality

Contig
Percent match

10/15/08

10

… on big computers …

… and small computers …
10/15/08

11

How do we…

• Describe our data?
• Read it? Store it? Find it? Share it? Mine it?
• Move it into, out of, and between computers and
repositories?
• Achieve storage and I/O efficiency?
• Give applications and tools easy access our data?

10/15/08

12

Solution: HDF5!
• Can store all kinds of data in a variety of ways
• Runs on most systems
• Lots of tools to access data
• Emphasis on standards (HDF-EOS, CGNS)
• Library and format emphasis on I/O efficiency and
storage
10/15/08


Structure of HDF5 Library

Applications
Object API (C, F90, C++, Java)
Library internals
Virtual file I/O
File or other “storage”
10/15/08


HDF Tools
- HDFView and Java Products
- Command-line utilities (h5dump, h5ls, h5cc,
h5diff, h5repack)

10/15/08

15

HDF5 Applications & Domains
Examples: Thermonuclear simulations
Product modeling
Data mining tools
Visualization tools
Climate models

Simulation, visualization,
remote sensing…
HDF-EOS
Virtual File Layer
(I/O Drivers)
Stdio

CGNS

HDF5 Data Model & API
Split Files

MPI I/O

Storage

HDF5
format
10/15/08

File

ASC

Custom

?

Split metadata File on parallel
and raw data files file system

User-defined
device


Communities

Lots of Layers in HDF5!
“Ogres are like onions.”

Shrek  HDF5 Monster??
Just like Shrek, once you get to
know HDF5 you will really like it!!
10/15/08


The HDF5 Format

10/15/08

18

An HDF5 file is a container…

…into
which you
can put
your data
objects.

10/15/08

lat | lon | temp
----|-----|----12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6

te
let
pa

19

HDF5 Structures for Organizing Objects

“/” (root)
“foo”

3-D array

lat | lon | temp
----|-----|----12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6

palette

Table

Raster image
Raster image

10/15/08

2-D array

20

HDF5 Data Model
Primary Objects
• Groups
• Datasets

Additional ways to organize and annotate data
• Attributes
• Storage and access properties

Everything else is built from these parts.
10/15/08

21

HDF5 Dataset

Metadata

Data

Dataspace

Rank Dimensions
3

Dim_1 = 4
Dim_2 = 5
Dim_3 = 7

Datatype
Integer

Storage Info

Attributes
Time = 32.4

Chunked

Pressure = 987

Compressed

Temp = 56

10/15/08

22

Dataspaces
Two roles:
• Dataspace contains spatial info about a dataset
stored in a file
• Rank and dimensions
• Permanent part of dataset
definition
Rank = 2
Dimensions = 4x6

• Partial I/0: Dataspace describes application’s data
buffer and data elements participating in I/O
Rank = 1
Dimension = 10

10/15/08

23

Write – from memory to disk

memory

10/15/08

disk

24

Partial I/O
Move just part of a dataset
memory

disk

(a) Slab from a 2D array to the
corner of a smaller 2D array

Elements in each must be same.

10/15/08

disk

memory

(b) Regular series of blocks from a
2D array to a contiguous sequence
at a certain offset in a 1D array

25

Datatypes (array elements)
• Datatype – how to interpret a data element
• Permanent part of the dataset definition
• Two classes: atomic and compound

10/15/08

26

Datatypes
• HDF5 atomic types include:
integer & float
user-definable (e.g., 13-bit integer)
variable length types (e.g., strings)
references to objects/dataset regions
enumeration - names mapped to integers
• HDF5 compound types
Comparable to C structs (“records”)
Members can be atomic or compound types
10/15/08

27

HDF5 dataset: array of records
3

5

Dimensionality: 5 x 3
int8

int4

int16 2x3x2 array of float32

Datatype:

Record

10/15/08

28

Properties

• Properties are characteristics of HDF5 objects
that can be modified
• Default properties handle most needs
• By changing properties can take advantage of the
more powerful features in HDF5

10/15/08


Special Storage Properties
Better subsetting
access time;
extensible

chunked

Improves storage
efficiency,
transmission speed

compressed

Arrays can be
extended in any
direction

extensible
File B

split file

Dataset “Fred”

File A
Metadata for Fred

10/15/08

Metadata in one file,
raw data in another

Data for Fred

30

Attributes (optional)
• Attribute – data of the form “name = value”,
attached to an object
• Operations similar to dataset operations, but …
Not extensible
No compression or partial I/O
• Can be overwritten, deleted, added during the
“life” of a dataset

10/15/08

31

HDF5 Dataset (again)

Metadata

Data

Dataspace

Rank Dimensions
3

Dim_1 = 4
Dim_2 = 5
Dim_3 = 7

Datatype
Integer

Storage info

Attributes
Time = 32.4

Chunked

Pressure = 987

Compressed

Temp = 56

10/15/08

32

Groups
• A mechanism for organizing collections
• Every file starts with a root group
• Similar to UNIX directories
A
• Can have attributes
k

10/15/08

“/”

B
l m

33

C

Path to HDF5 Object in a File

/ (root)
/x
/foo
/foo/temp
/foo/bar/temp

10/15/08

foo
temp

“/”

x

bar
temp

34

Shared Objects
“/”

A

P

C

B
R

P

/A/P
/B/R
/C/P
10/15/08

35

Questions So Far?

10/15/08


Useful Tools For New Users
h5dump:
Tool to “dump” or display contents of HDF5 files

h5cc, h5c++, h5fc:
Scripts to compile applications

HDFView:
Java browser to view HDF4 and HDF5 files

10/15/08


H5dump Command-line Utility To View HDF5 File

h5dump [--header] [-a ] [-d <names>] [-g <names>]
[-l <names>] [-t <names>] [-p] <file>
--header
Display header only; no data is displayed.
-a <names> Display the specified attribute(s).
-d <names> Display the specified dataset(s).
-g <names> Display the specified group(s) and all the members.
-l <names>
Displays the value(s) of the specified soft link(s).
-t <names> Display the specified named datatype(s).
-p
Display properties.

<names> is one or more appropriate object names.

10/15/08


Example of h5dump Output

HDF5 "dset.h5" {
GROUP "/" {
DATASET "dset" {
DATATYPE { H5T_STD_I32BE }
DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) }
DATA {
1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24
“/”
}
}
}
}

10/15/08


‘dset’

HDF5 Compile Scripts
• h5cc – HDF5 C compiler command
• h5fc – HDF5 F90 compiler command

• h5c++ – HDF5 C++ compiler command
To compile:
% h5cc h5prog.c
% h5fc h5prog.f90

10/15/08

40

Compile option: -show
-show: displays the compiler commands and options
without executing them
% h5cc –show Sample_c.c
gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API
-DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
-D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O
-fomit-frame-pointer -finline-functions -c Sample_c.c
gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions
-L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o
-L/home/packages/hdf5_1.6.6/Linux_2.6/lib
/home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a
/home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a
-lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib

10/15/08

41

Browsing HDF5 Files with HDFView

10/15/08


HDFView

Structure of File

10/15/08

Contents
of Dataset


HDFView File Menu

10/15/08


10/15/08


Simple HDF5 File in HDFView
Right-click and select
“Open” with mouse
“Show Properties”
with mouse

10/15/08


Simple HDF5 File in HDFView

10/15/08


HDF-EOS5 File in HDFView

10/15/08


“Open As” with mouse

10/15/08


What you can’t see
with slides:
-Picture displayed instantly
-File size is 906,229,176

10/15/08


Introduction to
HDF5 Programming Model
and APIs

10/15/08

51

Operations Supported by the API
• Create objects (groups, datasets, attributes, complex data
types, …)
• Assign storage and I/O properties to objects
• Perform complex subsetting during read/write
• Use variety of I/O “devices” (parallel, remote, etc.)
• Transform data during I/O
• Make inquiries on file and object structure, content,
properties
10/15/08

52

General Programming Paradigm
• Properties of object are optionally defined
 Creation properties
 Access property lists

• Object is opened or created
• Object is accessed, possibly many times
• Object is closed

10/15/08

53

Order of Operations
• An order is imposed on operations by argument
dependencies
For Example:
A file must be opened before a dataset
-becausethe dataset open call requires a file handle
as an argument.
• Objects can be closed in any order.

10/15/08

54

The General HDF5 API
• Currently C, Fortran 90, Java, and C++ bindings.
• C routines begin with prefix H5?
? is a character corresponding to the type of object
the function acts on

Example Functions:
H5D : Dataset interface
H5F : File interface

e.g., H5Dread
e.g., H5Fopen

H5S : dataSpace interface e.g., H5Sclose

10/15/08

55

HDF5 Defined Types
For portability, the HDF5 library has its own defined
types:
hid_t:
hsize_t:
hssize_t:

object identifiers (native integer)
size used for dimensions (unsigned long or
unsigned long long)
for specifying coordinates and sometimes for
dimensions (signed long or signed long long)

herr_t:

function return value

hvl_t:

variable length datatype

For C, include hdf5.h in your HDF5 application.
10/15/08

56

The HDF5 API
• For flexibility, the API is extensive
 300+ functions

Victronix
Swiss Army
Cybertool 34

• This can be daunting… but there is hope
 A few functions can do a lot
 Start simple
 Build up knowledge as more features are needed

10/15/08

57

Basic Functions
H5Fcreate (H5Fopen)
H5Screate_simple
H5Dcreate (H5Dopen)
H5Dread, H5Dwrite
H5Dclose
H5Sclose
H5Fclose
10/15/08

create (open) File
create dataSpace
create (open) Dataset
access Dataset
close Dataset
close dataSpace
close File


Other Common Functions
DataSpaces:

H5Sselect_hyperslab (Partial I/O)
H5Sselect_elements (Partial I/O)

Groups:

H5Gcreate, H5Gopen, H5Gclose

Attributes:

H5Acreate, H5Aopen_name,
H5Aclose, H5Aread, H5Awrite

Property lists:

H5Pcreate, H5Pclose
H5Pset_chunk, H5Pset_deflate

10/15/08


High Level APIs

• Included along with the HDF5 library
• Simplify steps for creating, writing, and reading
objects
• Do not entirely ‘wrap’ HDF5 library

10/15/08


Example HDF5 Code

10/15/08


Steps to Create a File
1. Decide on special properties the file should have
•
•
•

Creation properties, like size of user block
Access properties, such as metadata cache size
Use default properties (H5P_DEFAULT)

2. Create property lists, if necessary
3. Create the file
4. Close the file and the property lists, as needed

10/15/08

62

Code: Create a File
hid_t
herr_t

file_id;
status;

file_id = H5Fcreate ("file.h5", H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);
status = H5Fclose (file_id);
“/” (root)
Note: Return codes not checked for errors in code samples.

10/15/08

63

Dataset Components

Metadata

Data

Dataspace

Rank Dimensions
3

Dim_1 = 4
Dim_2 = 5
Dim_3 = 7

Datatype
Integer

Storage info

Attributes
Time = 32.4

Chunked

Pressure = 987

Compressed

Temp = 56

10/15/08

64

Steps to Create a Dataset
1. Define dataset characteristics
•
•
•

Dataspace - 4x6
Datatype – integer
Properties if needed, or use H5P_DEFAULT

2. Decide where to put it
•

Obtain location ID:
- Group ID puts it in a Group
- File ID puts it in Root Group

“/” (root)
A

3. Create dataset in file
4. Close everything
10/15/08

65

HDF5 Pre-defined Datatype Identifiers
HDF5 defines* set of Datatype Identifiers per HDF5
session.
For example:
C Type

HDF5 File Type

HDF5 Memory Type

int

H5T_STD_I32BE
H5T_STD_I32LE

H5T_NATIVE_INT

float

H5T_IEEE_F32BE
H5T_IEEE_F32LE

H5T_NATIVE_FLOAT

double

H5T_IEEE_F64BE
H5T_IEEE_F64LE

H5T_NATIVE_DOUBLE

* Value of datatype is NOT fixed

10/15/08


Pre-defined File Datatype Identifiers
Examples:
H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating-point
H5T_STD_I32LE Four-byte, little-endian, signed two's
complement integer
Architecture*

Programming
Type

NOTE: What you see in the file. Name is the same everywhere and
explicitly defines a datatype.
*STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…”

10/15/08


Pre-defined Native Datatypes
Examples of predefined native types in C:
H5T_NATIVE_INT
H5T_NATIVE_FLOAT
H5T_NATIVE_UINT
H5T_NATIVE_LONG
H5T_NATIVE_CHAR

(int)
(float )
(unsigned int)
(long )
(char )

NOTE: Memory types.
Different for each machine.
Used for reading/writing.

10/15/08


Dataset Creation Property List
Dataset creation property list: information on how to
organize data in storage.
Chunked

Chunked &
compressed

H5P_DEFAULT: contiguous
10/15/08

69

Code: Create a Dataset
1
2
3

hid_t
hsize_t
herr_t

file_id, dataset_id, dataspace_id;
dims[2];
status;

4

file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC,

Create a
5 dataspace= 4;
dims[0]
6
7

rank

dims[1] = 6;
dataspace_id = H5Screate_simple (2, dims, NULL);

Create a dataset
8

current dims

pathname

datatype

dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE,
dataspace_id, H5P_DEFAULT);

dataspace

Terminate access to dataset, dataspace,
file

9 status = H5Dclose (dataset_id);
10 status = H5Sclose (dataspace_id);
11 status = H5Fclose (file_id);

10/15/08

property list
(default)

70

Example Code - H5Dwrite
Dataset Identifier from
H5Dcreate or H5Dopen

Memory Datatype

status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL,
H5S_ALL, H5P_DEFAULT, dset_data);

10/15/08


Example Code – H5Dwrite

status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL,
H5P_DEFAULT, dset_data);

Data Transfer Property List
(MPI I/O, Transformations, …)

Memory
Dataspace

File
Dataspace

H5S_ALL selects entire
dataspace

10/15/08


Partial I/O
Memory Dataspace

H5S_ALL

File Dataspace (disk)

H5S_ALL

Get a Dataspace:
H5Screate_simple
H5Dget_space
Modify Dataspace:
H5Sselect_hyperslab
H5Sselect_elements

10/15/08


Example Code – H5Dread

status = H5Dread (dataset_id, H5T_NATIVE_INT,
H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_rdata);

10/15/08


High Level APIs: HDF5 Lite (H5LT)
#include "H5LT.h"
…
file_id = H5Fcreate (“file.h5", H5F_ACC_TRUNC,
status = H5LTmake_dataset (file_id,“A", 2, dims,
H5T_STD_I32BE, data);

10/15/08


High Level APIs
•
•
•
•
•

10/15/08

HDF5 Lite
HDF5 Image
HDF5 Table
HDF5 Dimension Scales
HDF5 Packet Table


Example: Create a Group
“/” (root)
A

B

4x6 array of
integers

file.h5

10/15/08

77

Steps to Create a Group
1. Decide where to put it – “root group”
•

Obtain location ID

2. Decide name – “B”
3. Create group in file
4. (Eventually) close the group.

10/15/08

78

Code: Create a Group
hid_t file_id, group_id;
...
/* Open “file.h5” */
file_id = H5Fopen (“file.h5”, H5F_ACC_RDWR,
H5P_DEFAULT);
/* Create group "/B" in file. */
group_id = H5Gcreate (file_id,"B",0);

Size hint for number of
bytes to store names of
objects. 0=default

/* Close group and file. */
status = H5Gclose (group_id);
10/15/08

79

Thank you!
This work was supported by the Cooperative Agreement with the
National Aeronautics and Space Administration (NASA) under NASA
grant NNX06AC83A and NNX08A077A. Any opinions, findings,
conclusions or recommendations expressed in this material are those of
the author(s) and do not necessarily reflect the views of NASA.

10/15/08


Introduction to HDF5

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to Introduction to HDF5

Similar to Introduction to HDF5 (20)

More from The HDF-EOS Tools and Information Center

More from The HDF-EOS Tools and Information Center (20)

Recently uploaded

Recently uploaded (20)

Introduction to HDF5

Editor's Notes