SlideShare a Scribd company logo
1 of 55
HDF5 Advanced Topics

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD

1
Outline
• Dataset selections
• Chunking
• Datatypes
• Overview
• Object and dataset region references
• Compound datatype

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
2
Working with Selections

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD

3
What is a Selection?
• Selection describes elements of a dataset that
participate in partial I/O
• Hyperslab selection
• Point selection
• Results of Set Operations on hyperslab selections or
point selections (union, difference, …)

• Used by sequential and parallel HDF5

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
4
Example of single hyperslab selection
16
11

10

7

Single
Hyperslab
Selection
7 x 11

Dataspace 10 x 16
02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
5
Example of regular hyperslab selection
16
2
3
10

2
3

2
3

2
3

3

2
3

2
3

2
3

2

2
3

Blocks
3x2

2
3

Dataspace 10 x 16
02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
6
Example of irregular hyperslab selection
16

10

Dataspace 10 x 16
02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
7
Example of hyperslab selection
16

10

Dataspace 10 x 16
02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
8
Example of point selection

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
9
Example of irregular selection

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
10
Hyperslab Description
•
•
•
•
•

Offset - starting location of a hyperslab (1,1)
Stride - number of elements that separate each block (3,2)
Count - number of blocks (2,6)
Block - block size (2,1)
Everything is “measured” in number of elements

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
11
H5Sselect_hyperslab

space_id Identifier of dataspace
op
Selection operator to use
H5S_SELECT_SET: replace existing selection w/parameters from
this call
H5S_SELECT_OR (creates a union with a previous selection)
offset
Array with starting coordinates of hyperslab
stride
Array specifying which positions along a dimension to select
count
Array specifying how many blocks to select from the dataspace, in each
dimension
block
Array specifying size of element block (NULL indicates a block size of
a single element in a dimension)

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
12
Reading/Writing Selections
•
•
•
•
•
•
•

Open the file
Open the dataset
Get file dataspace
Create a memory dataspace (data buffer)
Make the selection(s)
Read from or write to the dataset
Close the dataset, file dataspace, memory dataspace, and file

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
13
c-hyperslab.c example: reading two rows
1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

-1

-1

Data in file
4x6 matrix

Buffer in memory
1-dim array of length 14
-1

-1

02/18/14

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

HDF and HDF-EOS Workshop X, Landover, MD
14
c-hyperslab.c example: reading two rows
1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

20

21

22

23

=
=
=
=

{1,0}
{2,6}
{1,1}
{1,1}

18

19

offset
count
block
stride

24

filespace = H5Dget_space (dataset);
H5Sselect_hyperslab (filespace, H5S_SELECT_SET,
offset, NULL, count, NULL)

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
15
c-hyperslab.c example: reading two rows
offset = {1}
count = {12}

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

memspace = H5Screate_simple(1, 14, NULL);
H5Sselect_hyperslab (memspace, H5S_SELECT_SET,
offset, NULL, count, NULL)

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
16
c-hyperslab.c example: reading two rows
1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

H5Dread (…, …, memspace, filespace, …, …);

-1

7

02/18/14

8

9

10 11 12 13 14 15 16 17 18 -1

HDF and HDF-EOS Workshop X, Landover, MD
17
HDF5 Chunking
• Chunked layout is needed for
• Extendible datasets
• Compression and other filters
• To improve partial I/O for big datasets

chunked

Better subsetting
access time;
extendible

Only two chunks will be
written/read

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
18
Creating Chunked Dataset
•
•
•
•

Create a dataset creation property list
Set property list to use chunked storage layout
Create dataset with the above property list
Select part of or all data for writing or reading
plist

= H5Pcreate(H5P_DATASET_CREATE);
H5Pset_chunk(plist, rank, ch_dims);
dset_id = H5Dcreate (…, “Chunked”,…, plist);
H5Pclose(plist);

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
19
Writing or reading to/from chunked dataset
•
•
•
•

Use the same set of operation as for contiguous dataset
Selections do not need to coincide precisely with the chunks
Chunking mechanism is transparent to application (not the
same as in HDF4 library)
Chunking and compression parameters can affect
performance!!! (Will talk about it the next presentation)
H5Dopen(…);
…………
H5Sselect_hyperslab (…);
…………
H5Dread(…);

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
20
H5zlib.c example
Creates a compressed integer dataset 1000x20 in the zip.h5 file
h5dump –p –H zip.h5
HDF5 "zip.h5" {
GROUP "/" {
GROUP "Data" {
DATASET "Compressed_Data" {
DATATYPE H5T_STD_I32BE
DATASPACE SIMPLE { ( 1000, 20 )………
STORAGE_LAYOUT {
CHUNKED ( 20, 20 )
SIZE 5316
}

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
21
h5zlib.c example
FILTERS {
COMPRESSION DEFLATE { LEVEL 6 }
}
FILLVALUE {
FILL_TIME H5D_FILL_TIME_IFSET
VALUE 0
}
ALLOCATION_TIME {
H5D_ALLOC_TIME_INCR
}
}
}
}
}

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
22
Chunking basics to remember
•
•

Chunking creates storage overhead in the file
Performance is affected by
•
•

•

Chunking and compression parameters
Chunking cache size (H5Pset_cache call)

Some hints for getting better performance
•
•
•

Use chunk size no smaller than block size (4k) on your system
Use compression method appropriate for your data
Avoid using selections that do not coincide with the chunking
boundaries

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
23
Chunking and selections
Great performance

Selection coincides with a chunk

02/18/14

Poor performance

Selection spans over all chunks

HDF and HDF-EOS Workshop X, Landover, MD
24
HDF5 Datatypes

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
25
Datatypes
• A datatype is
• A classification specifying the interpretation of a data
element
• Specifies for a given data element
• the set of possible values it can have
• the operations that can be performed
• how the values of that type are stored

• May be shared between different datasets in one file

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
26
Hierarchy of the HDF5 datatypes classes

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
27
General Operations on HDF5 Datatypes
• Create
• Derived and compound datatypes only

• Copy
• All datatypes

• Commit (save in a file to share between different datatsets)
• All datatypes

• Open
• Committed datatypes only

• Discover properties (size, number of members, base type)
• Close

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
28
Basic Atomic HDF5 Datatypes

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
29
Basic Atomic Datatypes
• Atomic types classes
•
•
•
•
•

integers & floats
strings (fixed and variable size)
pointers - references to objects/dataset regions
opaque
bitfield

• Element of an atomic datatype is a smallest possible unit for
HDF5 I/O operation

• Cannot write or read just mantissa or exponent fields for floats or
sign filed for integers

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
30
HDF5 Predefined Datatypes

• HDF5 Library provides predefined datatypes (symbols) for all
basic atomic classes except opaque
• H5T_<arch>_<base>
• Examples:
•
•
•
•
•
•

H5T_IEEE_F64LE
H5T_STD_I32BE
H5T_C_S1
H5T_STD_B32LE
H5T_STD_REF_OBJ, H5T_STD_REF_DSETREG
H5T_NATIVE_INT

• Predefined datatypes do not have constant values; initialized when
library is initialized

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
31
When to use HDF5 Predefined Datatypes?
• In datasets and attributes creation operations
• Argument to H5Dcreate or to H5Acreate
• c-crtdat.c example:
H5Dcreate(file_id, "/dset", H5T_STD_I32BE, dataspace_id,
H5P_DEFAULT);

• In datasets and attributes read/write operations
• Argument to H5Dwrite/read, H5Awrite/read
• Always use H5T_NATIVE_* types to describe data in memory
• To create user-defined types
• Fixed and variable-length strings
• User-defined integers and floats (13-bit integer or non-standard floatingpoint)
• In composite types definitions
• Do not use for declaring variables

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
32
Reference Datatype
• Reference to an HDF5 object
• Pointers to Groups, datasets, and named datatypes in a file
• Predefined datatype H5T_STD_REG_OBJ
• H5Rcreate
• H5Rdereference

• Reference to a dataset region (selection)
• Pointer to the dataspace selection
• Predefined datatype H5T_STD_REF_DSETREG
• H5Rcreate
• H5Rdereference

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
33
Reference to Object
• h5-ref2obj.c
REF_OBJ.h5
Root

Group1

Group2

02/18/14

Integers

MYTYPE

Object References

HDF and HDF-EOS Workshop X, Landover, MD
34
Reference to Object
• h5dump REF_OBJ.h5
DATASET "OBJECT_REFERENCES" {
DATATYPE H5T_REFERENCE
DATASPACE SIMPLE { ( 4 ) / ( 4 ) }
DATA {
(0): GROUP 808 /GROUP1 , GROUP 1848 /GROUP1/GROUP2 ,
(2): DATASET 2808 /INTEGERS , DATATYPE 3352 /MYTYPE
}

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
35
Reference to Object
• Create a reference to group object
H5Rcreate(&ref[1], fileid, "/GROUP1/GROUP2",
H5R_OBJECT, -1);

• Write references to a dataset
H5Dwrite(dsetr_id, H5T_STD_REF_OBJ, H5S_ALL,
H5S_ALL, H5P_DEFAULT, ref);

• Read reference back with H5Dread and find an object it points to
type_id = H5Rdereference(dsetr_id, H5R_OBJECT, &ref[3]);
name_size = H5Rget_name(dsetr_id, H5R_OBJECT, &ref_out[3],
(char*)buf, 10);
• buf will contain /MYTYPE, name_size will be 8 (accommodating
“0”)

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
36
Reference to dataset region
• h5-ref2reg.c
REF_REG.h5
Root

Matrix

Object References

1 1 2 3 3 4 5 5 6
1 2 2 3 4 4 5 6 6
02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
37
Reference to Dataset Region
• h5dump REF_REG.h5
DATASET "REGION_REFERENCES" {
DATATYPE H5T_REFERENCE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): DATASET 808 {(0,3)-(1,5)}, DATASET 808
{(0,0), (1,6), (0,8)}
}
}

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
38
Reference to Dataset Region
• Create a reference to a dataset region
H5Sselect_hyperslab(space_id,H5S_SELECT_SET,start,NU
LL,count,NULL);
H5Rcreate(&ref[0], file_id, “MATRIX”,
H5R_DATASET_REGION, space_id);

• Write references to a dataset
H5Dwrite(dsetr_id, H5T_STD_REF_DSETREG, H5S_ALL,
H5S_ALL, H5P_DEFAULT, ref);

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
39
Reference to Dataset Region

• Read reference back with H5Dread and find a region it points to
dsetv_id = H5Rdereference(dsetr_id,
H5R_DATASET_REGION, &ref_out[0]);
space_id = H5Rget_region(dsetr_id,
H5R_DATASET_REGION,&ref_out[0]);

• Read selection
H5Dread(dsetv_id, H5T_NATIVE_INT, H5S_ALL, space_id,
H5P_DEFAULT, data_out);

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
40
Storing strings in HDF5
• Array of characters
• Access to each character
• Extra work to access and interpret each string
• Fixed length
string_id = H5Tcopy(H5T_C_S1);
H5Tset_size(string_id, size);
• Overhead for short strings
• Can be compressed

• Variable length
string_id = H5Tcopy(H5T_C_S1);
H5Tset_size(string_id, H5T_VARIABLE);

• Overhead as for all VL datatypes (later)
• Compression will not be applied to actual data

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
41
Bitfield Datatype
• C bitfield
• Bitfield – sequence of bytes packed in some integer type
• Examples of Predefined Datatypes
• H5T_NATIVE_B64 – native 8 byte bitfield
• H5T_STD_B32LE – standard 4 bytes bitfield

• Created by copying predefined bitfield type and setting
precision, offset and padding
• Use n-bit filter to store significant bits only

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
42
Bitfield Datatype
Example:
LE
0-padding

0

7

15

0 0 0 1 0 1 1 1 0 0 1 1 1 0 0 0
Offset 3
Precision 11

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
43
Storing Tables in HDF5 file

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
44
Example
a_name
(integer)

b_name
(float)

c_name
(double)

0

0.

1.0000

1

1.

0.5000

2

4.

0.3333

3

9.

0.2500

4

16.

0.2000

5

25.

0.1667

6

36.

0.1429

7

49.

0.1250

8

64.

0.1111

9
02/18/14

Multiple ways to store a table
Dataset for each field
Dataset with compound datatype
If all fields have the same type:
2-dim array
1-dim array of array datatype
continued…..

Choose to achieve your goal!
How much overhead each type of storage
will create?
Do I always read all fields?
Do I need to read some fields more often?
Do I want to use compression?
Do I want to access some records?

81.
0.1000
HDF and HDF-EOS Workshop X, Landover, MD
45
HDF5 Compound Datatypes

• Compound types
•
•
•
•
•

02/18/14

Comparable to C structs
Members can be atomic or compound types
Members can be multidimensional
Can be written/read by a field or set of fields
Non all data filters can be applied (shuffling, SZIP)

HDF and HDF-EOS Workshop X, Landover, MD
46
HDF5 Compound Datatypes
• Which APIs to use?
• H5TB APIs
•
•
•
•

Create, read, get info and merge tables
Add, delete, and append records
Insert and delete fields
Limited control over table’s properties (i.e. only GZIP compression, level
6, default allocation time for table, extendible, etc.)

• PyTables http://www.pytables.org
• Based on H5TB
• Python interface
• Indexing capabilities

• HDF5 APIs

• H5Tcreate(H5T_COMPOUND), H5Tinsert calls to create a compound
datatype
• H5Dcreate, etc.
• See H5Tget_member* functions for discovering properties of the HDF5
compound datatype

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
47
Creating and writing compound dataset
h5_compound.c example
typedef struct s1_t {
int a;
float b;
double c;
} s1_t;
s1_t

02/18/14

s1[LENGTH];

HDF and HDF-EOS Workshop X, Landover, MD
48
Creating and writing compound dataset
/* Create datatype in memory. */
s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t));
H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a),
H5T_NATIVE_INT);
H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c),
H5T_NATIVE_DOUBLE);
H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b),
H5T_NATIVE_FLOAT);

Note:
• Use HOFFSET macro instead of calculating offset by hand
• Order of H5Tinsert calls is not important if HOFFSET is used

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
49
Creating and writing compound dataset
/* Create dataset and write data */
dataset = H5Dcreate(file, DATASETNAME, s1_tid, space,
H5P_DEFAULT);
status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL,
H5P_DEFAULT, s1);
Note:
• In this example memory and file datatypes are the same
•Type is not packed
• Use H5Tpack to save space in the file
s2_tid = H5Tpack(s1_tid);
status = H5Dcreate(file, DATASETNAME, s2_tid, space,
H5P_DEFAULT);

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
50
File content with h5dump
HDF5 "SDScompound.h5" {
GROUP "/" {
DATASET "ArrayOfStructures" {
DATATYPE {
H5T_STD_I32BE "a_name";
H5T_IEEE_F32BE "b_name";
H5T_IEEE_F64BE "c_name"; }
DATASPACE { SIMPLE ( 10 ) / ( 10 ) }
DATA {
{
[ 0 ],
[ 0 ],
[ 1 ]
},
{
[ 1 ],
[ 1 ],
[ 0.5 ]
},
{
[ 2 ],
[ 4 ],
[ 0.333333 ]
},
….

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
51
Reading compound dataset
/* Create datatype in memory and read data. */
dataset
s2_tid
mem_tid
status

=
=
=
=

H5Dopen(file, DATSETNAME);
H5Dget_type(dataset);
H5Tget_native_type (s2_tid);
H5Dread(dataset, mem_tid, H5S_ALL,
H5S_ALL, H5P_DEFAULT, s1);

Note:
We could construct memory type as we did in writing example
For general applications we need discover the type in the file to
guess the structure to read to

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
52
Reading compound dataset: subsetting by fields
typedef struct s2_t {
double c;
int
a;
} s2_t;
s2_t s2[LENGTH];
…
s2_tid = H5Tcreate (H5T_COMPOUND, sizeof(s2_t));
H5Tinsert(s2_tid, "c_name", HOFFSET(s2_t, c),
H5T_NATIVE_DOUBLE);
H5Tinsert(s2_tid, “a_name", HOFFSET(s2_t, a),
H5T_NATIVE_INT);
…
status = H5Dread(dataset, s2_tid, H5S_ALL,
H5S_ALL, H5P_DEFAULT, s2);

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
53
Questions? Comments?

?
Thank you!

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
54
Acknowledgement

This report is based upon work supported in part by a Cooperative Agreement with NASA under
NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in
this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics
and Space Administration.

02/18/14

HDF and HDF-EOS Workshop X, Landover, MD
55

More Related Content

What's hot

Dealing with the Challenges of Large Life Science Data Sets from Acquisition ...
Dealing with the Challenges of Large Life Science Data Sets from Acquisition ...Dealing with the Challenges of Large Life Science Data Sets from Acquisition ...
Dealing with the Challenges of Large Life Science Data Sets from Acquisition ...inside-BigData.com
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersEdelweiss Kammermann
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce AnandMHadoop
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is overSteve Loughran
 
MO3.L10.5 - HDF5 FOR NPOESS SENSOR AND ENVIRONMENTAL DATA RECORDS
MO3.L10.5 - HDF5 FOR NPOESS SENSOR AND ENVIRONMENTAL DATA RECORDSMO3.L10.5 - HDF5 FOR NPOESS SENSOR AND ENVIRONMENTAL DATA RECORDS
MO3.L10.5 - HDF5 FOR NPOESS SENSOR AND ENVIRONMENTAL DATA RECORDSgrssieee
 
Rpg Pointers And User Space
Rpg Pointers And User SpaceRpg Pointers And User Space
Rpg Pointers And User Spaceramanjosan
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to HadoopAnandMHadoop
 
NameNode and DataNode Coupling for a Power-proportional Hadoop Distributed F...
NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed F...NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed F...
NameNode and DataNode Coupling for a Power-proportional Hadoop Distributed F...Hanh Le Hieu
 
Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...Sameer Tiwari
 
Pe Format
Pe FormatPe Format
Pe FormatHexxx
 

What's hot (20)

Chapter13
Chapter13Chapter13
Chapter13
 
Dealing with the Challenges of Large Life Science Data Sets from Acquisition ...
Dealing with the Challenges of Large Life Science Data Sets from Acquisition ...Dealing with the Challenges of Large Life Science Data Sets from Acquisition ...
Dealing with the Challenges of Large Life Science Data Sets from Acquisition ...
 
Ceph
CephCeph
Ceph
 
Distributed Interactive Computing Environment (DICE)
Distributed Interactive Computing Environment (DICE)Distributed Interactive Computing Environment (DICE)
Distributed Interactive Computing Environment (DICE)
 
HDF5 Tools
HDF5 ToolsHDF5 Tools
HDF5 Tools
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
hpc2013_20131223
hpc2013_20131223hpc2013_20131223
hpc2013_20131223
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
MO3.L10.5 - HDF5 FOR NPOESS SENSOR AND ENVIRONMENTAL DATA RECORDS
MO3.L10.5 - HDF5 FOR NPOESS SENSOR AND ENVIRONMENTAL DATA RECORDSMO3.L10.5 - HDF5 FOR NPOESS SENSOR AND ENVIRONMENTAL DATA RECORDS
MO3.L10.5 - HDF5 FOR NPOESS SENSOR AND ENVIRONMENTAL DATA RECORDS
 
Rpg Pointers And User Space
Rpg Pointers And User SpaceRpg Pointers And User Space
Rpg Pointers And User Space
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
NameNode and DataNode Coupling for a Power-proportional Hadoop Distributed F...
NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed F...NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed F...
NameNode and DataNode Coupling for a Power-proportional Hadoop Distributed F...
 
מיכאל
מיכאלמיכאל
מיכאל
 
Map/Reduce intro
Map/Reduce introMap/Reduce intro
Map/Reduce intro
 
Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...
 
Pe Format
Pe FormatPe Format
Pe Format
 

Similar to HDF5 Advanced Topics

HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesThe HDF-EOS Tools and Information Center
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 

Similar to HDF5 Advanced Topics (20)

HDF5 Advanced Topics
HDF5 Advanced TopicsHDF5 Advanced Topics
HDF5 Advanced Topics
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
HDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/OHDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/O
 
Performance Tuning in HDF5
Performance Tuning in HDF5 Performance Tuning in HDF5
Performance Tuning in HDF5
 
HDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - ChunkingHDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - Chunking
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Using HDF5 tools for performance tuning and troubleshooting
Using HDF5 tools for performance tuning and troubleshootingUsing HDF5 tools for performance tuning and troubleshooting
Using HDF5 tools for performance tuning and troubleshooting
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
 
Hdf5 intro
Hdf5 introHdf5 intro
Hdf5 intro
 
Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
NetCDF and HDF5
NetCDF and HDF5NetCDF and HDF5
NetCDF and HDF5
 
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 dataUsage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 

More from The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Recently uploaded

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

HDF5 Advanced Topics

  • 1. HDF5 Advanced Topics 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 1
  • 2. Outline • Dataset selections • Chunking • Datatypes • Overview • Object and dataset region references • Compound datatype 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 2
  • 3. Working with Selections 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 3
  • 4. What is a Selection? • Selection describes elements of a dataset that participate in partial I/O • Hyperslab selection • Point selection • Results of Set Operations on hyperslab selections or point selections (union, difference, …) • Used by sequential and parallel HDF5 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 4
  • 5. Example of single hyperslab selection 16 11 10 7 Single Hyperslab Selection 7 x 11 Dataspace 10 x 16 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 5
  • 6. Example of regular hyperslab selection 16 2 3 10 2 3 2 3 2 3 3 2 3 2 3 2 3 2 2 3 Blocks 3x2 2 3 Dataspace 10 x 16 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 6
  • 7. Example of irregular hyperslab selection 16 10 Dataspace 10 x 16 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 7
  • 8. Example of hyperslab selection 16 10 Dataspace 10 x 16 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 8
  • 9. Example of point selection 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 9
  • 10. Example of irregular selection 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 10
  • 11. Hyperslab Description • • • • • Offset - starting location of a hyperslab (1,1) Stride - number of elements that separate each block (3,2) Count - number of blocks (2,6) Block - block size (2,1) Everything is “measured” in number of elements 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 11
  • 12. H5Sselect_hyperslab space_id Identifier of dataspace op Selection operator to use H5S_SELECT_SET: replace existing selection w/parameters from this call H5S_SELECT_OR (creates a union with a previous selection) offset Array with starting coordinates of hyperslab stride Array specifying which positions along a dimension to select count Array specifying how many blocks to select from the dataspace, in each dimension block Array specifying size of element block (NULL indicates a block size of a single element in a dimension) 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 12
  • 13. Reading/Writing Selections • • • • • • • Open the file Open the dataset Get file dataspace Create a memory dataspace (data buffer) Make the selection(s) Read from or write to the dataset Close the dataset, file dataspace, memory dataspace, and file 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 13
  • 14. c-hyperslab.c example: reading two rows 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 -1 -1 Data in file 4x6 matrix Buffer in memory 1-dim array of length 14 -1 -1 02/18/14 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 HDF and HDF-EOS Workshop X, Landover, MD 14
  • 15. c-hyperslab.c example: reading two rows 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 20 21 22 23 = = = = {1,0} {2,6} {1,1} {1,1} 18 19 offset count block stride 24 filespace = H5Dget_space (dataset); H5Sselect_hyperslab (filespace, H5S_SELECT_SET, offset, NULL, count, NULL) 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 15
  • 16. c-hyperslab.c example: reading two rows offset = {1} count = {12} -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 memspace = H5Screate_simple(1, 14, NULL); H5Sselect_hyperslab (memspace, H5S_SELECT_SET, offset, NULL, count, NULL) 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 16
  • 17. c-hyperslab.c example: reading two rows 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 H5Dread (…, …, memspace, filespace, …, …); -1 7 02/18/14 8 9 10 11 12 13 14 15 16 17 18 -1 HDF and HDF-EOS Workshop X, Landover, MD 17
  • 18. HDF5 Chunking • Chunked layout is needed for • Extendible datasets • Compression and other filters • To improve partial I/O for big datasets chunked Better subsetting access time; extendible Only two chunks will be written/read 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 18
  • 19. Creating Chunked Dataset • • • • Create a dataset creation property list Set property list to use chunked storage layout Create dataset with the above property list Select part of or all data for writing or reading plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk(plist, rank, ch_dims); dset_id = H5Dcreate (…, “Chunked”,…, plist); H5Pclose(plist); 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 19
  • 20. Writing or reading to/from chunked dataset • • • • Use the same set of operation as for contiguous dataset Selections do not need to coincide precisely with the chunks Chunking mechanism is transparent to application (not the same as in HDF4 library) Chunking and compression parameters can affect performance!!! (Will talk about it the next presentation) H5Dopen(…); ………… H5Sselect_hyperslab (…); ………… H5Dread(…); 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 20
  • 21. H5zlib.c example Creates a compressed integer dataset 1000x20 in the zip.h5 file h5dump –p –H zip.h5 HDF5 "zip.h5" { GROUP "/" { GROUP "Data" { DATASET "Compressed_Data" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 1000, 20 )……… STORAGE_LAYOUT { CHUNKED ( 20, 20 ) SIZE 5316 } 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 21
  • 22. h5zlib.c example FILTERS { COMPRESSION DEFLATE { LEVEL 6 } } FILLVALUE { FILL_TIME H5D_FILL_TIME_IFSET VALUE 0 } ALLOCATION_TIME { H5D_ALLOC_TIME_INCR } } } } } 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 22
  • 23. Chunking basics to remember • • Chunking creates storage overhead in the file Performance is affected by • • • Chunking and compression parameters Chunking cache size (H5Pset_cache call) Some hints for getting better performance • • • Use chunk size no smaller than block size (4k) on your system Use compression method appropriate for your data Avoid using selections that do not coincide with the chunking boundaries 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 23
  • 24. Chunking and selections Great performance Selection coincides with a chunk 02/18/14 Poor performance Selection spans over all chunks HDF and HDF-EOS Workshop X, Landover, MD 24
  • 25. HDF5 Datatypes 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 25
  • 26. Datatypes • A datatype is • A classification specifying the interpretation of a data element • Specifies for a given data element • the set of possible values it can have • the operations that can be performed • how the values of that type are stored • May be shared between different datasets in one file 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 26
  • 27. Hierarchy of the HDF5 datatypes classes 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 27
  • 28. General Operations on HDF5 Datatypes • Create • Derived and compound datatypes only • Copy • All datatypes • Commit (save in a file to share between different datatsets) • All datatypes • Open • Committed datatypes only • Discover properties (size, number of members, base type) • Close 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 28
  • 29. Basic Atomic HDF5 Datatypes 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 29
  • 30. Basic Atomic Datatypes • Atomic types classes • • • • • integers & floats strings (fixed and variable size) pointers - references to objects/dataset regions opaque bitfield • Element of an atomic datatype is a smallest possible unit for HDF5 I/O operation • Cannot write or read just mantissa or exponent fields for floats or sign filed for integers 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 30
  • 31. HDF5 Predefined Datatypes • HDF5 Library provides predefined datatypes (symbols) for all basic atomic classes except opaque • H5T_<arch>_<base> • Examples: • • • • • • H5T_IEEE_F64LE H5T_STD_I32BE H5T_C_S1 H5T_STD_B32LE H5T_STD_REF_OBJ, H5T_STD_REF_DSETREG H5T_NATIVE_INT • Predefined datatypes do not have constant values; initialized when library is initialized 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 31
  • 32. When to use HDF5 Predefined Datatypes? • In datasets and attributes creation operations • Argument to H5Dcreate or to H5Acreate • c-crtdat.c example: H5Dcreate(file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); • In datasets and attributes read/write operations • Argument to H5Dwrite/read, H5Awrite/read • Always use H5T_NATIVE_* types to describe data in memory • To create user-defined types • Fixed and variable-length strings • User-defined integers and floats (13-bit integer or non-standard floatingpoint) • In composite types definitions • Do not use for declaring variables 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 32
  • 33. Reference Datatype • Reference to an HDF5 object • Pointers to Groups, datasets, and named datatypes in a file • Predefined datatype H5T_STD_REG_OBJ • H5Rcreate • H5Rdereference • Reference to a dataset region (selection) • Pointer to the dataspace selection • Predefined datatype H5T_STD_REF_DSETREG • H5Rcreate • H5Rdereference 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 33
  • 34. Reference to Object • h5-ref2obj.c REF_OBJ.h5 Root Group1 Group2 02/18/14 Integers MYTYPE Object References HDF and HDF-EOS Workshop X, Landover, MD 34
  • 35. Reference to Object • h5dump REF_OBJ.h5 DATASET "OBJECT_REFERENCES" { DATATYPE H5T_REFERENCE DATASPACE SIMPLE { ( 4 ) / ( 4 ) } DATA { (0): GROUP 808 /GROUP1 , GROUP 1848 /GROUP1/GROUP2 , (2): DATASET 2808 /INTEGERS , DATATYPE 3352 /MYTYPE } 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 35
  • 36. Reference to Object • Create a reference to group object H5Rcreate(&ref[1], fileid, "/GROUP1/GROUP2", H5R_OBJECT, -1); • Write references to a dataset H5Dwrite(dsetr_id, H5T_STD_REF_OBJ, H5S_ALL, H5S_ALL, H5P_DEFAULT, ref); • Read reference back with H5Dread and find an object it points to type_id = H5Rdereference(dsetr_id, H5R_OBJECT, &ref[3]); name_size = H5Rget_name(dsetr_id, H5R_OBJECT, &ref_out[3], (char*)buf, 10); • buf will contain /MYTYPE, name_size will be 8 (accommodating “0”) 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 36
  • 37. Reference to dataset region • h5-ref2reg.c REF_REG.h5 Root Matrix Object References 1 1 2 3 3 4 5 5 6 1 2 2 3 4 4 5 6 6 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 37
  • 38. Reference to Dataset Region • h5dump REF_REG.h5 DATASET "REGION_REFERENCES" { DATATYPE H5T_REFERENCE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): DATASET 808 {(0,3)-(1,5)}, DATASET 808 {(0,0), (1,6), (0,8)} } } 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 38
  • 39. Reference to Dataset Region • Create a reference to a dataset region H5Sselect_hyperslab(space_id,H5S_SELECT_SET,start,NU LL,count,NULL); H5Rcreate(&ref[0], file_id, “MATRIX”, H5R_DATASET_REGION, space_id); • Write references to a dataset H5Dwrite(dsetr_id, H5T_STD_REF_DSETREG, H5S_ALL, H5S_ALL, H5P_DEFAULT, ref); 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 39
  • 40. Reference to Dataset Region • Read reference back with H5Dread and find a region it points to dsetv_id = H5Rdereference(dsetr_id, H5R_DATASET_REGION, &ref_out[0]); space_id = H5Rget_region(dsetr_id, H5R_DATASET_REGION,&ref_out[0]); • Read selection H5Dread(dsetv_id, H5T_NATIVE_INT, H5S_ALL, space_id, H5P_DEFAULT, data_out); 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 40
  • 41. Storing strings in HDF5 • Array of characters • Access to each character • Extra work to access and interpret each string • Fixed length string_id = H5Tcopy(H5T_C_S1); H5Tset_size(string_id, size); • Overhead for short strings • Can be compressed • Variable length string_id = H5Tcopy(H5T_C_S1); H5Tset_size(string_id, H5T_VARIABLE); • Overhead as for all VL datatypes (later) • Compression will not be applied to actual data 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 41
  • 42. Bitfield Datatype • C bitfield • Bitfield – sequence of bytes packed in some integer type • Examples of Predefined Datatypes • H5T_NATIVE_B64 – native 8 byte bitfield • H5T_STD_B32LE – standard 4 bytes bitfield • Created by copying predefined bitfield type and setting precision, offset and padding • Use n-bit filter to store significant bits only 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 42
  • 43. Bitfield Datatype Example: LE 0-padding 0 7 15 0 0 0 1 0 1 1 1 0 0 1 1 1 0 0 0 Offset 3 Precision 11 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 43
  • 44. Storing Tables in HDF5 file 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 44
  • 45. Example a_name (integer) b_name (float) c_name (double) 0 0. 1.0000 1 1. 0.5000 2 4. 0.3333 3 9. 0.2500 4 16. 0.2000 5 25. 0.1667 6 36. 0.1429 7 49. 0.1250 8 64. 0.1111 9 02/18/14 Multiple ways to store a table Dataset for each field Dataset with compound datatype If all fields have the same type: 2-dim array 1-dim array of array datatype continued….. Choose to achieve your goal! How much overhead each type of storage will create? Do I always read all fields? Do I need to read some fields more often? Do I want to use compression? Do I want to access some records? 81. 0.1000 HDF and HDF-EOS Workshop X, Landover, MD 45
  • 46. HDF5 Compound Datatypes • Compound types • • • • • 02/18/14 Comparable to C structs Members can be atomic or compound types Members can be multidimensional Can be written/read by a field or set of fields Non all data filters can be applied (shuffling, SZIP) HDF and HDF-EOS Workshop X, Landover, MD 46
  • 47. HDF5 Compound Datatypes • Which APIs to use? • H5TB APIs • • • • Create, read, get info and merge tables Add, delete, and append records Insert and delete fields Limited control over table’s properties (i.e. only GZIP compression, level 6, default allocation time for table, extendible, etc.) • PyTables http://www.pytables.org • Based on H5TB • Python interface • Indexing capabilities • HDF5 APIs • H5Tcreate(H5T_COMPOUND), H5Tinsert calls to create a compound datatype • H5Dcreate, etc. • See H5Tget_member* functions for discovering properties of the HDF5 compound datatype 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 47
  • 48. Creating and writing compound dataset h5_compound.c example typedef struct s1_t { int a; float b; double c; } s1_t; s1_t 02/18/14 s1[LENGTH]; HDF and HDF-EOS Workshop X, Landover, MD 48
  • 49. Creating and writing compound dataset /* Create datatype in memory. */ s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t)); H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a), H5T_NATIVE_INT); H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c), H5T_NATIVE_DOUBLE); H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b), H5T_NATIVE_FLOAT); Note: • Use HOFFSET macro instead of calculating offset by hand • Order of H5Tinsert calls is not important if HOFFSET is used 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 49
  • 50. Creating and writing compound dataset /* Create dataset and write data */ dataset = H5Dcreate(file, DATASETNAME, s1_tid, space, H5P_DEFAULT); status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s1); Note: • In this example memory and file datatypes are the same •Type is not packed • Use H5Tpack to save space in the file s2_tid = H5Tpack(s1_tid); status = H5Dcreate(file, DATASETNAME, s2_tid, space, H5P_DEFAULT); 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 50
  • 51. File content with h5dump HDF5 "SDScompound.h5" { GROUP "/" { DATASET "ArrayOfStructures" { DATATYPE { H5T_STD_I32BE "a_name"; H5T_IEEE_F32BE "b_name"; H5T_IEEE_F64BE "c_name"; } DATASPACE { SIMPLE ( 10 ) / ( 10 ) } DATA { { [ 0 ], [ 0 ], [ 1 ] }, { [ 1 ], [ 1 ], [ 0.5 ] }, { [ 2 ], [ 4 ], [ 0.333333 ] }, …. 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 51
  • 52. Reading compound dataset /* Create datatype in memory and read data. */ dataset s2_tid mem_tid status = = = = H5Dopen(file, DATSETNAME); H5Dget_type(dataset); H5Tget_native_type (s2_tid); H5Dread(dataset, mem_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s1); Note: We could construct memory type as we did in writing example For general applications we need discover the type in the file to guess the structure to read to 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 52
  • 53. Reading compound dataset: subsetting by fields typedef struct s2_t { double c; int a; } s2_t; s2_t s2[LENGTH]; … s2_tid = H5Tcreate (H5T_COMPOUND, sizeof(s2_t)); H5Tinsert(s2_tid, "c_name", HOFFSET(s2_t, c), H5T_NATIVE_DOUBLE); H5Tinsert(s2_tid, “a_name", HOFFSET(s2_t, a), H5T_NATIVE_INT); … status = H5Dread(dataset, s2_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s2); 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 53
  • 54. Questions? Comments? ? Thank you! 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 54
  • 55. Acknowledgement This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration. 02/18/14 HDF and HDF-EOS Workshop X, Landover, MD 55

Editor's Notes

  1. H5Sselect_hyperslab: 1: dataspace identifier 2: selection operator to determine how the new selection is to be combined with the already existing selection for the dataspace. CUrrenlty on H5S_SELECT_SET operator is supported, which replaces the existing selection with the parameters from this call. Overlapping blocks are not supported with H5S_SELECT_SET. 3:start - starting coordinates. 0-beginning 4:stride: how many elements to move in each dimension. Stride of zero not supported: NULL==1 5:count: determines how many spaces to select in each dimension 6: block: determines size of the element block: if NULL then defaults to single emenet in each dimension. H5Sselect_elements: 1: dataspace identifier 2: H5S_SELECT_SET: (see above) 3:NUMP: number of elements selected 4:coord: 2-D array of size (dataspace rank0 by num_elements in szie. The order of the element coordinates in the coord array also specifies the order that the array elements are iterated when I/O is performed.
  2. Format and software for scientific data. HDF5 is a different format from earlier versions of HDF, as is the library. Stores images, multidimensional arrays, tables, etc. That is, you can construct all of these different kinds structures and store them in HDF5. You can also mix and match them in HDF5 files according to your needs. Emphasis on storage and I/O efficiency Both the library and the format are designed to address this. Free and commercial software support As far as HDF5 goes, this is just a goal now. There is commercial support for HDF4, but little if any for HDF5 at this time. We are working with vendors to change this. Emphasis on standards You can store data in HDF5 in a variety of ways, so we try to work with users to encourage them to organize HDF5 files in standard ways. Users from many engineering and scientific fields