• Share
  • Email
  • Embed
  • Like
  • Private Content
Advanced HDF5 Features
 

Advanced HDF5 Features

on

  • 159 views

This tutorial is designed for the HDF5 users with some HDF5 experience. ...

This tutorial is designed for the HDF5 users with some HDF5 experience.

It will cover advanced features of the HDF5 library for achieving better I/O performance and efficient storage. The following HDF5 features will be discussed: partial I/O, chunked storage layout, compression and other filters including new n-bit and scale+offset filters. Significant time will be devoted to the discussion of complex HDF5 datatypes such as strings, variable-length datatypes, array and compound datatypes.

Statistics

Views

Total Views
159
Views on SlideShare
138
Embed Views
21

Actions

Likes
0
Downloads
0
Comments
0

4 Embeds 21

http://localhost 11
http://hdfeos.org 8
http://hdfdap 1
http://www.hdfeos.org 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • H5Sselect_hyperslab: <br /> 1: dataspace identifier <br /> 2: selection operator to determine how the new selection is to be combined with the already existing selection for the dataspace. CUrrenlty on H5S_SELECT_SET operator is <br /> supported, which replaces the existing selection with the parameters from this call. Overlapping blocks are not supported with H5S_SELECT_SET. <br /> 3:start - starting coordinates. 0-beginning <br /> 4:stride: how many elements to move in each dimension. <br /> Stride of zero not supported: NULL==1 <br /> 5:count: determines how many spaces to select in each dimension <br /> 6: block: determines size of the element block: if NULL then <br /> defaults to single emenet in each dimension. <br /> H5Sselect_elements: <br /> 1: dataspace identifier <br /> 2: H5S_SELECT_SET: (see above) <br /> 3:NUMP: number of elements selected <br /> 4:coord: 2-D array of size (dataspace rank0 by <br /> num_elements in szie. The order of the element coordinates in the coord array also specifies the order that the array elements are iterated when I/O is performed. <br />
  • Dumping DS with references may be slow since library has to dereference each element; on our to-do list <br />
  • More data will be written in this case <br /> Ghost zones are filled with fill value unless fill value is disabled <br />
  • Chunk remains in the cache until evicted <br /> Filters applied on a way to a file only <br /> Different chunks may have different sizes in a file <br />
  • 2MB chunk doesn’t fit to 1MB cache. Chunk is allocated in the file, and written once with the first frame in the chunk, then HDF5 writes one frame at a time, writing 20x49 + 20 times = 1000 plus small I/O for metatdata. Almost twice the size participates in I/O <br /> 2MB chunk does fit into 5MB cache, we write 50 frames at once, when we fill the chunk and do it 20 times only! <br /> In case of compression situation is even worse: we need to write chunk every time we write a frame, then read it back to write another frame, etc. <br /> For each plane we write chunk 1000; in order to modify plane 2-50 (49 of them) in one chunk , we have to read the chunk 49 times x 20 times (number of chunks), so we get 1000 (writes) + 980 (reads). <br /> We do 1000 writes and 980 reads for raw data. <br /> Bigger cache works nicely. <br />
  • First case: to read one plane, we need to read each chunk (20 I/Os) and do it 100 times (number of rows) (doesn’t fit into cache) <br /> Second case: to read one plane we need to read each chunk (100 I/O) and do it 100 times (for each row) <br /> Third case: cache is bypassed; reads from the file 1000 x 100 (for each row) = 100000 <br /> Fourth case: chunk fits into cache, so for one plane we do 100 reads to bring in all chunks, then do it 100 times for each row <br />
  • No difference for the first two cases and fourth one (for first two we always bring chunk into memory and uncompress), the third one fits into cache <br /> In the third case, chunk doesn’t fit into cache and library reads directly from the file getting one element at a time (1000x 100 (# of rows) x 100 (# columns) = 10000000) <br />

Advanced HDF5 Features Advanced HDF5 Features Presentation Transcript