Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to HDF5

501 views

Published on

This tutorial is designed for new HDF5 users. We will go over a brief history of HDF and HDF5 software, and will cover basic HDF5 Data Model objects and their properties; we will give an overview of the HDF5 Libraries and APIs, and discuss the HDF5 programming model. Simple C and Fortran examples, and Java tool HDFView will be used to illustrate HDF5 concepts.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Introduction to HDF5

  1. 1. Introduction to HDF5 HDF & HDF-EOS Workshop XII October 15, 2008 10/15/08 HDF & HDF-EOS Workshop XII 1 1
  2. 2. Topics Covered - Introduce HDF5 - Describe HDF5 Data and Programming Models - Walk Through Example Code 10/15/08 HDF & HDF-EOS Workshop XII 2 2
  3. 3. For More Information … All workshop slides will be available from: http://hdfeos.org/workshops/ws12/workshop_twelve.php 10/15/08 HDF & HDF-EOS Workshop XII 3
  4. 4. What is HDF5? HDF = Hierarchical Data Format • Data model, library and file format for managing data • Tools for accessing data in the HDF5 format 10/15/08 HDF & HDF-EOS Workshop XII 4
  5. 5. Brief History of HDF 1987 At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library: AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF Early NASA adopted HDF for Earth Observing System project 1990’s 1996 DOE’s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create “Big HDF” (Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files). “Big HDF” became HDF5. 1998 HDF5 was released with support from National Labs, NASA, NCSA 2006 The HDF Group spun off from University of Illinois as non-profit corporation 10/15/08 HDF & HDF-EOS Workshop XII 5
  6. 6. Why HDF5? In one sentence ... 10/15/08 HDF & HDF-EOS Workshop XII 6 6
  7. 7. Answering big questions … Matter and the universe Life and nature August 24, 2001 August 24, 2002 Total Column Ozone (Dobson) 60 385 610 Weather and climate 10/15/08 HDF & HDF-EOS Workshop XII 7 7
  8. 8. … involves big data … 10/15/08 HDF & HDF-EOS Workshop XII 8 8
  9. 9. … varied data … LCI Tutorial 10/15/08 Thanks to Mark HDF & HDF-EOS Workshop XII 9 Miller, LLNL 9
  10. 10. … and complex relationships … SNP Score Contig Summaries Discrepancies Contig Qualities Coverage Depth Trace Reads Aligned bases Read quality Contig Percent match 10/15/08 HDF & HDF-EOS Workshop XII 10 10
  11. 11. … on big computers … … and small computers … 10/15/08 HDF & HDF-EOS Workshop XII 11 11
  12. 12. How do we… • Describe our data? • Read it? Store it? Find it? Share it? Mine it? • Move it into, out of, and between computers and repositories? • Achieve storage and I/O efficiency? • Give applications and tools easy access our data? 10/15/08 HDF & HDF-EOS Workshop XII 12 12
  13. 13. Solution: HDF5! • Can store all kinds of data in a variety of ways • Runs on most systems • Lots of tools to access data • Emphasis on standards (HDF-EOS, CGNS) • Library and format emphasis on I/O efficiency and storage 10/15/08 HDF & HDF-EOS Workshop XII 13
  14. 14. Structure of HDF5 Library Applications Object API (C, F90, C++, Java) Library internals Virtual file I/O File or other “storage” 10/15/08 HDF & HDF-EOS Workshop XII 14
  15. 15. HDF Tools - HDFView and Java Products - Command-line utilities (h5dump, h5ls, h5cc, h5diff, h5repack) 10/15/08 HDF & HDF-EOS Workshop XII 15 15
  16. 16. HDF5 Applications & Domains Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models Simulation, visualization, remote sensing… HDF-EOS Virtual File Layer (I/O Drivers) Stdio CGNS HDF5 Data Model & API Split Files MPI I/O Storage HDF5 format 10/15/08 File ASC Custom ? Split metadata File on parallel and raw data files file system User-defined device HDF & HDF-EOS Workshop XII 16 Communities
  17. 17. Lots of Layers in HDF5! “Ogres are like onions.” Shrek  HDF5 Monster?? Just like Shrek, once you get to know HDF5 you will really like it!! 10/15/08 HDF & HDF-EOS Workshop XII 17
  18. 18. The HDF5 Format 10/15/08 HDF & HDF-EOS Workshop XII 18 18
  19. 19. An HDF5 file is a container… …into which you can put your data objects. 10/15/08 lat | lon | temp ----|-----|----12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 te let pa HDF & HDF-EOS Workshop XII 19 19
  20. 20. HDF5 Structures for Organizing Objects “/” (root) “foo” 3-D array lat | lon | temp ----|-----|----12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 palette Table Raster image Raster image 10/15/08 2-D array HDF & HDF-EOS Workshop XII 20 20
  21. 21. HDF5 Data Model Primary Objects • Groups • Datasets Additional ways to organize and annotate data • Attributes • Storage and access properties Everything else is built from these parts. 10/15/08 HDF & HDF-EOS Workshop XII 21 21
  22. 22. HDF5 Dataset Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype Integer Storage Info Attributes Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 10/15/08 HDF & HDF-EOS Workshop XII 22 22
  23. 23. Dataspaces Two roles: • Dataspace contains spatial info about a dataset stored in a file • Rank and dimensions • Permanent part of dataset definition Rank = 2 Dimensions = 4x6 • Partial I/0: Dataspace describes application’s data buffer and data elements participating in I/O Rank = 1 Dimension = 10 10/15/08 HDF & HDF-EOS Workshop XII 23 23
  24. 24. Write – from memory to disk memory 10/15/08 disk HDF & HDF-EOS Workshop XII 24 24
  25. 25. Partial I/O Move just part of a dataset memory disk (a) Slab from a 2D array to the corner of a smaller 2D array Elements in each must be same. 10/15/08 disk memory (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array HDF & HDF-EOS Workshop XII 25 25
  26. 26. Datatypes (array elements) • Datatype – how to interpret a data element • Permanent part of the dataset definition • Two classes: atomic and compound 10/15/08 HDF & HDF-EOS Workshop XII 26 26
  27. 27. Datatypes • HDF5 atomic types include: integer & float user-definable (e.g., 13-bit integer) variable length types (e.g., strings) references to objects/dataset regions enumeration - names mapped to integers • HDF5 compound types Comparable to C structs (“records”) Members can be atomic or compound types 10/15/08 HDF & HDF-EOS Workshop XII 27 27
  28. 28. HDF5 dataset: array of records 3 5 Dimensionality: 5 x 3 int8 int4 int16 2x3x2 array of float32 Datatype: Record 10/15/08 HDF & HDF-EOS Workshop XII 28 28
  29. 29. Properties • Properties are characteristics of HDF5 objects that can be modified • Default properties handle most needs • By changing properties can take advantage of the more powerful features in HDF5 10/15/08 HDF & HDF-EOS Workshop XII 29
  30. 30. Special Storage Properties Better subsetting access time; extensible chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extensible File B split file Dataset “Fred” File A Metadata for Fred 10/15/08 Metadata in one file, raw data in another Data for Fred HDF & HDF-EOS Workshop XII 30 30
  31. 31. Attributes (optional) • Attribute – data of the form “name = value”, attached to an object • Operations similar to dataset operations, but … Not extensible No compression or partial I/O • Can be overwritten, deleted, added during the “life” of a dataset 10/15/08 HDF & HDF-EOS Workshop XII 31 31
  32. 32. HDF5 Dataset (again) Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype Integer Storage info Attributes Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 10/15/08 HDF & HDF-EOS Workshop XII 32 32
  33. 33. Groups • A mechanism for organizing collections • Every file starts with a root group • Similar to UNIX directories A • Can have attributes k 10/15/08 “/” B l m HDF & HDF-EOS Workshop XII 33 33 C
  34. 34. Path to HDF5 Object in a File / (root) /x /foo /foo/temp /foo/bar/temp 10/15/08 foo temp “/” x bar temp HDF & HDF-EOS Workshop XII 34 34
  35. 35. Shared Objects “/” A P C B R P /A/P /B/R /C/P 10/15/08 HDF & HDF-EOS Workshop XII 35 35
  36. 36. Questions So Far? 10/15/08 HDF & HDF-EOS Workshop XII 36
  37. 37. Useful Tools For New Users h5dump: Tool to “dump” or display contents of HDF5 files h5cc, h5c++, h5fc: Scripts to compile applications HDFView: Java browser to view HDF4 and HDF5 files 10/15/08 HDF & HDF-EOS Workshop XII 37
  38. 38. H5dump Command-line Utility To View HDF5 File h5dump [--header] [-a ] [-d <names>] [-g <names>] [-l <names>] [-t <names>] [-p] <file> --header Display header only; no data is displayed. -a <names> Display the specified attribute(s). -d <names> Display the specified dataset(s). -g <names> Display the specified group(s) and all the members. -l <names> Displays the value(s) of the specified soft link(s). -t <names> Display the specified named datatype(s). -p Display properties. <names> is one or more appropriate object names. 10/15/08 HDF & HDF-EOS Workshop XII 38
  39. 39. Example of h5dump Output HDF5 "dset.h5" { GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 “/” } } } } 10/15/08 HDF & HDF-EOS Workshop XII 39 ‘dset’
  40. 40. HDF5 Compile Scripts • h5cc – HDF5 C compiler command • h5fc – HDF5 F90 compiler command • h5c++ – HDF5 C++ compiler command To compile: % h5cc h5prog.c % h5fc h5prog.f90 10/15/08 HDF & HDF-EOS Workshop XII 40 40
  41. 41. Compile option: -show -show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -c Sample_c.c gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o -L/home/packages/hdf5_1.6.6/Linux_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib 10/15/08 HDF & HDF-EOS Workshop XII 41 41
  42. 42. Browsing HDF5 Files with HDFView 10/15/08 HDF & HDF-EOS Workshop XII 42
  43. 43. HDFView Structure of File 10/15/08 Contents of Dataset HDF & HDF-EOS Workshop XII 43
  44. 44. HDFView File Menu 10/15/08 HDF & HDF-EOS Workshop XII 44
  45. 45. 10/15/08 HDF & HDF-EOS Workshop XII 45
  46. 46. Simple HDF5 File in HDFView Right-click and select “Open” with mouse Right-click and select “Show Properties” with mouse 10/15/08 HDF & HDF-EOS Workshop XII 46
  47. 47. Simple HDF5 File in HDFView 10/15/08 HDF & HDF-EOS Workshop XII 47
  48. 48. HDF-EOS5 File in HDFView 10/15/08 HDF & HDF-EOS Workshop XII 48
  49. 49. Right-click and select “Open As” with mouse 10/15/08 HDF & HDF-EOS Workshop XII 49
  50. 50. What you can’t see with slides: -Picture displayed instantly -File size is 906,229,176 10/15/08 HDF & HDF-EOS Workshop XII 50
  51. 51. Introduction to HDF5 Programming Model and APIs 10/15/08 HDF & HDF-EOS Workshop XII 51 51
  52. 52. Operations Supported by the API • Create objects (groups, datasets, attributes, complex data types, …) • Assign storage and I/O properties to objects • Perform complex subsetting during read/write • Use variety of I/O “devices” (parallel, remote, etc.) • Transform data during I/O • Make inquiries on file and object structure, content, properties 10/15/08 HDF & HDF-EOS Workshop XII 52 52
  53. 53. General Programming Paradigm • Properties of object are optionally defined  Creation properties  Access property lists • Object is opened or created • Object is accessed, possibly many times • Object is closed 10/15/08 HDF & HDF-EOS Workshop XII 53 53
  54. 54. Order of Operations • An order is imposed on operations by argument dependencies For Example: A file must be opened before a dataset -becausethe dataset open call requires a file handle as an argument. • Objects can be closed in any order. 10/15/08 HDF & HDF-EOS Workshop XII 54 54
  55. 55. The General HDF5 API • Currently C, Fortran 90, Java, and C++ bindings. • C routines begin with prefix H5? ? is a character corresponding to the type of object the function acts on Example Functions: H5D : Dataset interface H5F : File interface e.g., H5Dread e.g., H5Fopen H5S : dataSpace interface e.g., H5Sclose 10/15/08 HDF & HDF-EOS Workshop XII 55 55
  56. 56. HDF5 Defined Types For portability, the HDF5 library has its own defined types: hid_t: hsize_t: hssize_t: object identifiers (native integer) size used for dimensions (unsigned long or unsigned long long) for specifying coordinates and sometimes for dimensions (signed long or signed long long) herr_t: function return value hvl_t: variable length datatype For C, include hdf5.h in your HDF5 application. 10/15/08 HDF & HDF-EOS Workshop XII 56 56
  57. 57. The HDF5 API • For flexibility, the API is extensive  300+ functions Victronix Swiss Army Cybertool 34 • This can be daunting… but there is hope  A few functions can do a lot  Start simple  Build up knowledge as more features are needed 10/15/08 HDF & HDF-EOS Workshop XII 57 57
  58. 58. Basic Functions H5Fcreate (H5Fopen) H5Screate_simple H5Dcreate (H5Dopen) H5Dread, H5Dwrite H5Dclose H5Sclose H5Fclose 10/15/08 create (open) File create dataSpace create (open) Dataset access Dataset close Dataset close dataSpace close File HDF & HDF-EOS Workshop XII 58
  59. 59. Other Common Functions DataSpaces: H5Sselect_hyperslab (Partial I/O) H5Sselect_elements (Partial I/O) Groups: H5Gcreate, H5Gopen, H5Gclose Attributes: H5Acreate, H5Aopen_name, H5Aclose, H5Aread, H5Awrite Property lists: H5Pcreate, H5Pclose H5Pset_chunk, H5Pset_deflate 10/15/08 HDF & HDF-EOS Workshop XII 59
  60. 60. High Level APIs • Included along with the HDF5 library • Simplify steps for creating, writing, and reading objects • Do not entirely ‘wrap’ HDF5 library 10/15/08 HDF & HDF-EOS Workshop XII 60
  61. 61. Example HDF5 Code 10/15/08 HDF & HDF-EOS Workshop XII 61
  62. 62. Steps to Create a File 1. Decide on special properties the file should have • • • Creation properties, like size of user block Access properties, such as metadata cache size Use default properties (H5P_DEFAULT) 2. Create property lists, if necessary 3. Create the file 4. Close the file and the property lists, as needed 10/15/08 HDF & HDF-EOS Workshop XII 62 62
  63. 63. Code: Create a File hid_t herr_t file_id; status; file_id = H5Fcreate ("file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); status = H5Fclose (file_id); “/” (root) Note: Return codes not checked for errors in code samples. 10/15/08 HDF & HDF-EOS Workshop XII 63 63
  64. 64. Dataset Components Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype Integer Storage info Attributes Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 10/15/08 HDF & HDF-EOS Workshop XII 64 64
  65. 65. Steps to Create a Dataset 1. Define dataset characteristics • • • Dataspace - 4x6 Datatype – integer Properties if needed, or use H5P_DEFAULT 2. Decide where to put it • Obtain location ID: - Group ID puts it in a Group - File ID puts it in Root Group “/” (root) A 3. Create dataset in file 4. Close everything 10/15/08 HDF & HDF-EOS Workshop XII 65 65
  66. 66. HDF5 Pre-defined Datatype Identifiers HDF5 defines* set of Datatype Identifiers per HDF5 session. For example: C Type HDF5 File Type HDF5 Memory Type int H5T_STD_I32BE H5T_STD_I32LE H5T_NATIVE_INT float H5T_IEEE_F32BE H5T_IEEE_F32LE H5T_NATIVE_FLOAT double H5T_IEEE_F64BE H5T_IEEE_F64LE H5T_NATIVE_DOUBLE * Value of datatype is NOT fixed 10/15/08 HDF & HDF-EOS Workshop XII 66
  67. 67. Pre-defined File Datatype Identifiers Examples: H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating-point H5T_STD_I32LE Four-byte, little-endian, signed two's complement integer Architecture* Programming Type NOTE: What you see in the file. Name is the same everywhere and explicitly defines a datatype. *STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…” 10/15/08 HDF & HDF-EOS Workshop XII 67
  68. 68. Pre-defined Native Datatypes Examples of predefined native types in C: H5T_NATIVE_INT H5T_NATIVE_FLOAT H5T_NATIVE_UINT H5T_NATIVE_LONG H5T_NATIVE_CHAR (int) (float ) (unsigned int) (long ) (char ) NOTE: Memory types. Different for each machine. Used for reading/writing. 10/15/08 HDF & HDF-EOS Workshop XII 68
  69. 69. Dataset Creation Property List Dataset creation property list: information on how to organize data in storage. Chunked Chunked & compressed H5P_DEFAULT: contiguous 10/15/08 HDF & HDF-EOS Workshop XII 69 69
  70. 70. Code: Create a Dataset 1 2 3 hid_t hsize_t herr_t file_id, dataset_id, dataspace_id; dims[2]; status; 4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); Create a 5 dataspace= 4; dims[0] 6 7 rank dims[1] = 6; dataspace_id = H5Screate_simple (2, dims, NULL); Create a dataset 8 current dims pathname datatype dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); dataspace Terminate access to dataset, dataspace, file 9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id); 10/15/08 property list (default) HDF & HDF-EOS Workshop XII 70 70
  71. 71. Example Code - H5Dwrite Dataset Identifier from H5Dcreate or H5Dopen Memory Datatype status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); 10/15/08 HDF & HDF-EOS Workshop XII 71
  72. 72. Example Code – H5Dwrite status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); Data Transfer Property List (MPI I/O, Transformations, …) Memory Dataspace File Dataspace H5S_ALL selects entire dataspace 10/15/08 HDF & HDF-EOS Workshop XII 72
  73. 73. Partial I/O Memory Dataspace H5S_ALL File Dataspace (disk) H5S_ALL Get a Dataspace: H5Screate_simple H5Dget_space Modify Dataspace: H5Sselect_hyperslab H5Sselect_elements 10/15/08 HDF & HDF-EOS Workshop XII 73
  74. 74. Example Code – H5Dread status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_rdata); 10/15/08 HDF & HDF-EOS Workshop XII 74
  75. 75. High Level APIs: HDF5 Lite (H5LT) #include "H5LT.h" … file_id = H5Fcreate (“file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); status = H5LTmake_dataset (file_id,“A", 2, dims, H5T_STD_I32BE, data); status = H5Fclose (file_id); 10/15/08 HDF & HDF-EOS Workshop XII 75
  76. 76. High Level APIs • • • • • 10/15/08 HDF5 Lite HDF5 Image HDF5 Table HDF5 Dimension Scales HDF5 Packet Table HDF & HDF-EOS Workshop XII 76
  77. 77. Example: Create a Group “/” (root) A B 4x6 array of integers file.h5 10/15/08 HDF & HDF-EOS Workshop XII 77 77
  78. 78. Steps to Create a Group 1. Decide where to put it – “root group” • Obtain location ID 2. Decide name – “B” 3. Create group in file 4. (Eventually) close the group. 10/15/08 HDF & HDF-EOS Workshop XII 78 78
  79. 79. Code: Create a Group hid_t file_id, group_id; ... /* Open “file.h5” */ file_id = H5Fopen (“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT); /* Create group "/B" in file. */ group_id = H5Gcreate (file_id,"B",0); Size hint for number of bytes to store names of objects. 0=default /* Close group and file. */ status = H5Gclose (group_id); status = H5Fclose (file_id); 10/15/08 HDF & HDF-EOS Workshop XII 79 79
  80. 80. Thank you! This work was supported by the Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA grant NNX06AC83A and NNX08A077A. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NASA. 10/15/08 HDF & HDF-EOS Workshop XII 80

×