Using HDF5 tools for
performance tuning and
troubleshooting

2/18/2014

HDF and HDF-EOS Workshop X, Landover, MD

1
Introduction
• HDF5 tools may be very useful for performance tuning
and troubleshooting
• Discover objects and their prope...
h5stat
• Prints different statistics about HDF5 file
• Helps
• To troubleshoot size overhead in HDF5 files
• To choose spe...
h5stat
• Reports two types of statistics:
• High-level information about objects (examples):
• Number of different objects...
h5stat
• Examples of high-level information:
File information
# of unique groups: 10008
# of unique datasets: 30
# of uniq...
h5stat
• Conclusion:
• There are a lot of empty groups in the file; good candidate for
compact group feature
• Some datase...
h5stat
• Examples of structural metadata information:
Object header size: (total/unused)
Groups: 1808/72
Datasets: 15792/8...
h5stat
• Conclusions
• File size: 6228197
• 1.5% overhead (not bad at all!)
• There some elements are of size 65535 and 32...
Case study: Using HDF5tools to debug a problem
• My applications creates files on Windows with VS2005 and VS2003. I can
re...
Case study: Using HDF5tools to debug a problem

• Conclusions
• Compound datatype “timespec” requires different
number of ...
Where is my data?
• h5ls –var be_data.h5:
Opened "be_data.h5" with sec2 driver.
/Array
Dataset {5/5, 6/6}
Location: 0:1:0:...
Questions? Comments?

?

Thank you!

2/18/2014

HDF and HDF-EOS Workshop X, Landover, MD

12
Upcoming SlideShare
Loading in …5
×

Using HDF5 tools for performance tuning and troubleshooting

478 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
478
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using HDF5 tools for performance tuning and troubleshooting

  1. 1. Using HDF5 tools for performance tuning and troubleshooting 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD 1
  2. 2. Introduction • HDF5 tools may be very useful for performance tuning and troubleshooting • Discover objects and their properties in HDF5 files h5dump -p • Get file size overhead information h5stat • Get locations of the objects in a file h5ls • Discover differences h5diff, h5ls • Location of raw data h5ls –vra 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD 2
  3. 3. h5stat • Prints different statistics about HDF5 file • Helps • To troubleshoot size overhead in HDF5 files • To choose specific object’s properties and storage strategies • To use h5stat --help h5stat file.h5 • Spec can be found http://www.hdfgroup.org/RFC/h5stat/ • Let us know if you need some “special” type of statistics 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD 3
  4. 4. h5stat • Reports two types of statistics: • High-level information about objects (examples): • Number of different objects (groups, datasets, datatypes) in a file • Number of unique datatypes • Size of raw data in a file • Information about object’s structural metadata • Sizes of structural metadata (total/free) • Object headers, local and global heaps • Sizes of B-trees • Object headers fragmentation 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD 4
  5. 5. h5stat • Examples of high-level information: File information # of unique groups: 10008 # of unique datasets: 30 # of unique named datatypes: 0 …………………… Max. # of links to object: 1 Max. depth of hierarchy: 4 Max. # of objects in group: 19 …………………… Group bins: # of groups of size 0: 10000 # of groups of size 1 - 9: 7 # of groups of size 10 - 99: 1 …………………… Max. dimension size of 1-D datasets: 1643 …………………… Dataset filters information: Number of datasets with ……………… SZIP filter: 2 ……………… NBIT filter: 10 USER-DEFINED filter: 1 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD 5
  6. 6. h5stat • Conclusion: • There are a lot of empty groups in the file; good candidate for compact group feature • Some datasets use “user-defined” filters and may not be readable by HDF5 library • SZIP compression is needed to read some datasets Oh… my application uses buffers of size 1024 to read data… No wonder it crashes on reading… Do I have all filters needed to read the data? 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD 6
  7. 7. h5stat • Examples of structural metadata information: Object header size: (total/unused) Groups: 1808/72 Datasets: 15792/832 ……… Dataset storage information: Total raw data size: 6140688 ……… Dataset datatype #3: Count (total/named) = (2/0) Size (desc./elmt) = (10/65535) Dataset datatype #4: Count (total/named) = (1/0) Size (desc./elmt) = (10/32000) 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD 7
  8. 8. h5stat • Conclusions • File size: 6228197 • 1.5% overhead (not bad at all!) • There some elements are of size 65535 and 32000 Oh… Is it really what I want? Should I use other datatype and get advantage of compression? 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD 8
  9. 9. Case study: Using HDF5tools to debug a problem • My applications creates files on Windows with VS2005 and VS2003. I can read the VS2003 file but not the VS2005 one. H5dump reads both files OK and there are no differences. What am I doing wrong? • h5diff good.h5 bad.h5 Datatype: </Definitions/timespec> and </Definitions/timespec> 1 differences found • h5ls –vr good.h5 /Definitions/timespec Location: 0:1:0:900 Type • h5debug good.h5 900 Message Information: Type class: Size: compound 8 bytes • h5debug bad.h5 900 Message Information: Type class: Size: 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD compound 16 bytes 9
  10. 10. Case study: Using HDF5tools to debug a problem • Conclusions • Compound datatype “timespec” requires different number of bytes on VS2005 (16 bytes; 2x8bytes) and on VS2003 (8bytes; 2x4bytes) Oh… How do I read my data back? I assumed that my struct would need only 8 bytes for each elements but it needs 16 bytes on VS2005. I need H5Tget_native_type function to find the type of my data in memory 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD 10
  11. 11. Where is my data? • h5ls –var be_data.h5: Opened "be_data.h5" with sec2 driver. /Array Dataset {5/5, 6/6} Location: 0:1:0:792 Links: 1 Modified: 2006-04-07 15:08:39 CDT Storage: 240 logical bytes, 240 allocated bytes, 100.00% utilization Type: IEEE 64-bit big-endian float Address: 2048 • 30 8-byte elements can be read from address 2048 by non-HDF5 application 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD 11
  12. 12. Questions? Comments? ? Thank you! 2/18/2014 HDF and HDF-EOS Workshop X, Landover, MD 12

×