SlideShare a Scribd company logo
1 of 56
HDF5 Advanced Topics
Object’s Properties
Storage Methods and Filters
Datatypes
HDF and HDF-EOS Workshop VIII
October 26, 2004
1

HDF
Topics
General Introduction to HDF5 properties
HDF5 Dataset properties
I/O and Storage Properties (filters)

HDF5 File properties
I/O and Storage Properties (drivers)

Datatypes
Compound
Variable Length
Reference to object and dataset region

2

HDF
General Introduction to
HDF5 Properties

3

HDF
Properties
Definition
• Mechanism to control different features of the
HDF5 objects
– Implemented via H5P Interface (‘Property lists’)
– HDF5 Library sets objects’ default features
– HDF5 ‘Property lists’ modify default features
• At object creation time (creation properties)
• At object access time (access or transfer properties)

4

HDF
Properties
Definitions
• A property list is a list of name-value pairs
– Values may be of any datatype

• A property list is passed as an optional parameters
to the HDF5 APIs
• Property lists are used/ignored by all the layers of
the library, as needed

5

HDF
Type of Properties
• Predefined and User defined property lists
• Predefined:
–
–
–
–

File creation
File access
Dataset creation
Dataset access

• Will cover each of these

6

HDF
Properties (Example)
HDF5 File
• H5Fcreate(…,creation_prop_id,…)
• Creation properties (how file is created?)
– Library’s defaults
• no user’s block
• predefined sizes of offsets and addresses of the objects in the
file (64-bit for DEC Alpha, 32-bit on Windows)

– User’s settings
• User’s block
• 32-bit sizes on 64-bit platform
• Control over B-trees for chunking storage (split factor)

7

HDF
Properties (Example)
HDF5 File
• H5Fcreate(…,access_prop_id)
• Access properties or drivers (How is file
accessed? What is the physical layout on the
disk?)
– Library defaults
• STDIO Library (UNIX fwrite, fread)

– User’s defined
• MPI I/O for parallel access
• Family of files (100 Gb HDF5 represented by 50 2Gb UNIX
files)
• Size of the chunk cache

8

HDF
Properties (Example)
HDF5 Dataset
• H5Dcreate(…,creation_prop_id)
• Creation properties (how dataset is created)
– Library’s defaults
•
•
•
•

Storage: Contiguous
Compression: None
Space is allocated when data is first written
No fill value is written

– User’s settings
•
•
•
•

9

Storage: Compact, or chunked, or external
Compression
Fill value
Control over space allocation in the file for raw data
– at creation time
– at write time

HDF
Properties (Example)
HDF5 Dataset
• H5Dwrite<read>(…,access_prop_id)
• Access (transfer) properties
– Library defaults
• 1MB conversion buffer
• Error detection on read (if was set during write)
• MPI independent I/O for parallel access

– User defined
• MPI collective I/O for parallel access
• Size of the datatype conversion buffer
• Control over partial I/O to improve performance

10

HDF
Properties
Programming model
• Use predefined property type
–
–
–
–

H5P_FILE_CREATE
H5P_FILE_ACCESS
H5P_DATASET_CREATE
H5P_DATASET_ACCESS

• Create new property instance
– H5Pcreate
– H5Pcopy
– H5*get_access_plist; H5*get_create_plist

• Modify property (see H5P APIs)
• Use property to modify object feature
• Close property when done
– H5Pclose

11

HDF
Properties
Programming model
• General model of usage: get plist, set values, pass
to library
hid_t plist = H5Pcreate(copy)(predefined_plist);
OR
hid_t plist = H5Xget_create(access)_plist(…);
H5Pset_foo( plist, vals);
H5Xdo_something( Xid, …, plist);
H5Pclose(plist);

12

HDF
HDF5 Dataset Creation
Properties and Predefined
Filters

13

HDF
Dataset Creation Properties
• Storage
–
–
–
–

Contiguous (default)
Compact
Chunked
External

• Filters applied to raw data
– Compression
– Checksum

• Fill value
• Space allocation for raw data in the file
14

HDF
Dataset Creation Properties
Storage Layouts
•
•

Storage layout is important for I/O performance
and size of the HDF5 files
Contiguous (default)
•
•

•

Compact
•
•
•
•

15

Used when data will be written/read at once
H5Dcreate(…,H5P_DEFAULT)
Used for small datasets (order of O(bytes)) for better I/O
Raw data is written/read at the time when dataset is open
File is less fragmented
To create a compact dataset follow the ‘Properties
programming model’

HDF
Creating Compact Dataset
•
•
•

Create a dataset creation property list
Set property list to use compact storage layout
Create dataset with the above property list
plist

= H5Pcreate(H5P_DATASET_CREATE);
H5Pset_layout(plist, H5D_COMPACT);
dset_id = H5Dcreate (…, “Compact”,…, plist);
H5Pclose(plist);

16

HDF
Creating chunked Dataset
• Chunked layout is needed for
– Extendible datasets
– Compression and other filters
– To improve partial I/O for big datasets
chunked

Better subsetting
access time;
extendible

Only two chunks will be
written/read

17

HDF
Creating Chunked Dataset
•
•
•

Create a dataset creation property list
Set property list to use chunked storage layout
Create dataset with the above property list
plist

= H5Pcreate(H5P_DATASET_CREATE);
H5Pset_chunk(plist, rank, ch_dims);
dset_id = H5Dcreate (…, “Chunked”,…, plist);
H5Pclose(plist);

18

HDF
Dataset Creation Properties
Compression and other I/O Pipeline Filters
•

HDF5 provides a mechanism (“I/O filters”) to
manipulate data while transferring it between
memory and disk
H5Z and H5P interfaces
HDF5 predefined filters (H5P interface)

•
•
–
–

•

Compression (gzip, szip)
Shuffling and checksum filters

User defined filters (H5Z and H5P interfaces)
–

19

Example: Bzip2 compression
http://hdf.ncsa.uiuc.edu/HDF5/papers/bzip2

HDF
Compression and other I/O Pipeline Filters
(continued)
•
•

Currently used only with chunked datasets
Filters can be combined together
–
–

•

GZIP + shuffle+checksum filters
Checksum filter + user define encryption filter

Filters are called in the order they are defined on
writing and in the reverse order on reading
User is responsible for “filters pipeline sanity”

•
–
–

20

GZIP +SZIP + shuffle doesn’t make sense
Shuffle + SZIP does

HDF
Creating compressed Dataset
• Compression
–
–
–
–

Improves transmission speed
Improves storage efficiency
Requires chunking
May increase CPU time needed for compression
Memory

File
Compressed

21

HDF
Creating compressed datasets
•
•
•
•

Create a dataset creation property list
Set chunking (and specify chunk dimensions)
Set compression method
Create dataset with the above property list
plist = H5Pcreate(H5P_DATASET_CREATE);
H5Pset_chunk (plist, ndims, chkdims);
H5Pset_deflate (plist, level);

/*GZIP */

OR

H5Pset_szip (plist, options-mask, numpixels);/*SZIP*/
dset_id = H5Dcreate (file_id, “comp-data”,
“H5T_NATIVE_FLOAT,space_id, plist);

22

HDF
Creating external Dataset
•
•
•
•

Dataset’s raw data is stored in an external file
Easy to include existing data into HDF5 file
Easy to export raw data if application needs it
Disadvantage: user has to keep track of additional files
to preserve integrity of the HDF5 file
Dataset “A”

HDF5 file

External file
Raw data for “A”

Raw data can be
stored in external file

Metadata for “A”

23

HDF
Creating External Dataset
•
•
•

Create a dataset creation property list
Set property list to use external storage layout
Create dataset with the above property list
plist

= H5Pcreate(H5P_DATASET_CREATE);
H5Pset_external(plist,
“raw_data.ext”, offset, size);
dset_id = H5Dcreate (…, “Chunked”,…, plist);
H5Pclose(plist);

24

HDF
Example of External Files
This example shows how a contiguous, one-dimensional
dataset is partitioned into three parts and each of those
parts is stored in a segment of an external file.

plist = H5Pcreate (H5P_DATASET_CREATE);
HPset_external (plist, “raw.data”, 3000, 1000);
H5Pset_external (plist, “raw.data”, 0, 2500);
H5Pset_external (plist, “raw.data”, 4500, 1500);

25

HDF
Checksum Filter
• HDF5 includes the Fletcher32 checksum algorithm
for error detection.
• It is automatically included in HDF5
• To use this filter you must add it to the filter pipeline
with H5Pset_filter.
Memory

26

Checksum value

HDF
Enabling Checksum Filter
•
•
•
•
•

Create a dataset creation property list
Set chunking (and specify chunk dimensions)
Add the filter to the pipeline
Create your dataset specifying this property list
Close property list
plist = H5Pcreate(H5P_DATASET_CREATE);
H5Pset_chunk (plist, ndims, chkdims);
H5Pset_filter (plist, H5Z_FILTER_FLETCHER32, 0, 0, NULL);
H5Dcreate (…,”Checksum”,…,plist)
H5Pclose(plist);

27

HDF
Shuffling filter
• Predefined HDF5 filter
• Not a compression; change of byte order in a
stream of data
• Example
– 1 23 43

• Hexadecimal form
– 0x01 0x17 0x2B

• Big-endian machine
– 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x17 0x00 0x00
0x00 0x2B

• Shuffling
28

– 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01
0x17 0x2B
HDF
00 00 00 01 00 00 00 17 00 00 00 2B

00 00 00 00 00 00 00 00 00 01 17 2B

29

HDF
Enabling Shuffling Filter
•
•
•
•
•
•

Create a dataset creation property list
Set chunking (and specify chunk dimensions)
Add the filter to the pipeline
Define compression filter
Create your dataset specifying this property list
Close property list
plist = H5Pcreate(H5P_DATASET_CREATE);
H5Pset_chunk (plist, ndims, chkdims);
H5Pset_shuffle(plist);
H5Pset_deflate(plist,level);
H5Dcreate (…,”BetterComp”,…,plist)
H5Pclose(plist);

30

HDF
Effect of data shuffling (H5Pset_shuffle
+ H5Pset_deflate)
• Write 4-byte integer dataset 256x256x1024 (256MB)
• Using chunks of 256x16x1024 (16MB)
• Values: random integers between 0 and 255

File size

Write Time

No Shuffle

102.9MB

671.049

629.45

Shuffle

31

Total time

67.34MB

83.353

78.268

Compression combined with shuffling provides
•Better compression ratio
•Better I/O performance

HDF
HDF5 Dataset Access (Transfer)
Properties

32

HDF
Dataset Access/Transfer Properties
• Improve performance
• H5Pset_buffer
– Sets the size of the datatype conversion buffer during
I/O
– Size should be large enough to hold the slice along the
slowest changing dimension
– Example: Hyperslab 100x200x300, buffer 200x300

• H5Pset_hyper_vector_size
– Sets the number of hyperslab offset and length pairs
– Improves performance for partial I/O

33

HDF
Dataset Access/Transfer Properties
• H5Pset_edc_check
–
–
–
–

For datasets created with error detection filter enabled
Enables error checking during read operation
H5Z_ENABLE_EDC (default)
N5Z_DISABLE_EDC

• H5Pset_dxpl_mpio
– Sets data transfer mode for parallel I/O
– H5FD_MPIO_INDEPENDENT (default)
– H5FD_MPIO_COLLECTIVE

34

HDF
User-defined Filters

35

HDF
Standard Interface for User-defined Filters
• H5Zregister : Register filter so that HDF5
knows about it
• H5Zunregister: Unregister a filter
• H5Pset_filter: Adds a filter to the filter pipeline
• H5Pget_filter: Returns information about a filter
in the pipeline
• H5Zfilter_avail: Check if filter is available

36

HDF
File Creation Properties

37

HDF
File Creation Properties
• H5Pset_userblock
– User block stores user-defined information (e.g ASCII text to
describe a file) at the beginning of the file
– Cat my.txt hdf5.h5 > myhdf5.h5
– Sets the size of the user block
– 512 bytes, 1024 bytes, 2^N

• H5Pset_sizes
– Sets the byte size of the offsets and lengths used to address objects
in the file

• H5Pset_sym_k
– Controls the rank of groups B-trees for groups
– Default is 16

• H5Pset_istore_k
– Controls the rank of groups B-trees for chunked datasets
– Default is 32

38

HDF
File Access Properties

39

HDF
File Access Properties (Performance)
• H5Pset_cache
– Sets metadata cache and raw data chunk parameters
– Improper size will degrade performance

• H5Pset_meta_block_size
– Reduces the number of small objects in the file
– Block of metadata is written in a single I/O operation (default 2K)
– VFL driver has to set H5FD_AGGREGATE_METADATA

• H5Pset_sieve_buffer
– Improves partial I/O
– Need a picture

• VFL layer: file drivers

40

HDF
File Access Properties (Physical storage
and Usage of Low-level I/O Libraries)
• VFL layer: file drivers
• Define physical storage of the HDF5 file
–
–
–
–

Memory driver (HDF5 file in the application’s memory)
Stream driver (HDF5 file written to a socket)
Split(multi) files driver
Family driver

• Define low level I/O library
– MPI I/O driver for parallel access
– STDIO vs. SEC2

41

HDF
Files needn’t be files - Virtual File Layer
VFL: A public API for writing I/O drivers
Hid_t

“File” Handle
VFL: Virtual File I/O Layer
stdio

mpio

split
family

I/O drivers

SRB
memory

network

“Storage”

Files

42

SRB
Memory
Repository

Network

HDF
Split Files
• Allows you to split metadata and data into separate files
• May reside on different file systems for better I/O
• Disadvantage: User has to keep track of the files

HDF5 file
Metadata file

Raw data file

Dataset “A”

Dataset “B”

Data A
Data B

43

HDF
Creating Split Files
•
•
•
•

Create a file access property list
Set up file access property list to use split files
Create the file with this property list
Close the property

plist = H5Pcreate (H5P_FILE_ACCESS);
H5Pset_fapl_family(plist, “.met”, H5P_DEFAULT,”.dat”,
H5P_DEFAULT);
file = H5Fcreate
plist);
H5Pclose(plist);

44

(H5FILE_NAME, H5F_ACC_TRUNC,
H5P_DEFAULT,

HDF
File Families
• Allows you to access files larger than 2GB on
file systems that don't support large files
• Any HDF5 file can be split into a family of files
and vice versa
• A family member size must be a power of two

45

HDF
Creating a File Family
• Create a file access property list
• Set up file access property list to use file
family
• Create the file with this property list
plist = H5Pcreate (H5P_FILE_ACCESS);
H5Pset_fapl_family (plist, family_size, H5P_DEFAULT);
file = H5Fcreate (H5FILE_NAME, H5F_ACC_TRUNC,
H5P_DEFAULT,
plist);
H5Pclose(plist);

46

HDF
HDF5 Datatypes

47

HDF
Datatypes
• A datatype is
– A classification specifying the interpretation of
a data element
– Specifies for a given data element
• the set of possible values it can have
• the operations that can be performed
• how the values of that type are stored

– May be shared between different datasets in
one file
48

HDF
HDF5 datatypes
• Atomic types
–
–
–
–
–
–
49

standard integer & float
user-definable scalars (e.g. 13-bit integer)
bitfields
variable length types (e.g. strings)
pointers - references to objects/dataset regions
enumeration - names mapped to integers
HDF
General Operations on HDF5 Datatypes
• Create
– H5Tcreate creates a datatype of the HT_COMPOUND, H5T_OPAQUE,
and H5T_ENUM classes

• Copy
– H5Tcopy creates another instance of the datatype; can be applied to any
datatypes

• Commit
– H5Tcommit creates an Datatype Object in the HDF5 file; comitted
datatype can be shared between different datatsets

• Open
– H5Topen opens the datatypes stored in the file

• Close
– H5Tclose closes datatype object

50

HDF
Programming model for HDF5 Datatypes
• Use predefined HDF5 types
– No need to close

• OR
– Create
• Create a datatype (by copying existing one or by creating from the one of
H5T_COMPOUND(ENAUM,OPAQUE) classes)
• Create a datatype by queering datatype of a dataset

– Open committed datatype from the file

• (Optional) Discover datatype properties (size, precision,
members, etc.)
• Use datatype to create a dataset/attribute, to write/read
dataset/attribute, to set fill value
• (Optional) Save datatype in the file
• Close
51

HDF
HDF5 Compound Datatypes
• Compound types
–
–
–
–
–
–

Comparable to C structs
Members can be atomic or compound types
Members can be multidimensional
Can be written/read by a field or set of fields
Non all data filters can be applied (shuffling, SZIP)
H5Tcreate(H5T_COMPOUND), H5Tinsert calls to
create a compound datatype
– See H5Tget_member* functions for discovering
properties of the HDF5 compound datatype
52

HDF
HDF5 Fixed and Variable length array
storage
•Data
•Data
Time
•Data
•Data

•Data

Time

•Data
•Data
•Data
•Data

53

HDF
HDF5 Variable Length Datatypes
Programming issues
• Each element is represented by C struct
typedef struct {
size_t length;
void *p;
} hvl_t;

• Base type can be any HDF5 type

54

HDF
HDF5 Variable Length Datatypes
Raw data

Global heap

Dataset with variable length datatype

55

HDF
HDF Information
• HDF Information Center
– http://hdf.ncsa.uiuc.edu/

• HDF Help email address
– hdfhelp@ncsa.uiuc.edu

• HDF users mailing list
– hdfnews@ncsa.uiuc.edu

56

HDF

More Related Content

What's hot

Logarithmic transformations
Logarithmic transformationsLogarithmic transformations
Logarithmic transformationsamylute
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression Dr Athar Khan
 
UNIT-2 Quantitaitive Anlaysis for Mgt Decisions.pptx
UNIT-2 Quantitaitive Anlaysis for Mgt Decisions.pptxUNIT-2 Quantitaitive Anlaysis for Mgt Decisions.pptx
UNIT-2 Quantitaitive Anlaysis for Mgt Decisions.pptxMinilikDerseh1
 
Simple Linear Regression (simplified)
Simple Linear Regression (simplified)Simple Linear Regression (simplified)
Simple Linear Regression (simplified)Haoran Zhang
 
Ch4 Confidence Interval
Ch4 Confidence IntervalCh4 Confidence Interval
Ch4 Confidence IntervalFarhan Alfin
 
The nature of the data
The nature of the dataThe nature of the data
The nature of the dataKen Plummer
 
Trans Fat
Trans FatTrans Fat
Trans Fatlindy23
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesisJags Jagdish
 
NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING karishma gupta
 
Binomial Distribution
Binomial DistributionBinomial Distribution
Binomial Distributionshannonrenee4
 
2 energy metabolism presentation1 final nut &fitness
2 energy metabolism presentation1 final nut &fitness2 energy metabolism presentation1 final nut &fitness
2 energy metabolism presentation1 final nut &fitnessSiham Gritly
 
Basic Descriptive Statistics
Basic Descriptive StatisticsBasic Descriptive Statistics
Basic Descriptive Statisticssikojp
 
Classification of Data
Classification of DataClassification of Data
Classification of DataDr. Raavi Jain
 
Operational research- main techniques PERT and CPM
Operational research- main techniques PERT and CPMOperational research- main techniques PERT and CPM
Operational research- main techniques PERT and CPMvckg1987
 
Generalized additives models (gam)
Generalized additives models (gam)Generalized additives models (gam)
Generalized additives models (gam)AursTCHICHE
 
How to calculate power in statistics
How to calculate power in statisticsHow to calculate power in statistics
How to calculate power in statisticsStat Analytica
 
Friedman test Stat
Friedman test Stat Friedman test Stat
Friedman test Stat Kate Malda
 

What's hot (20)

Logarithmic transformations
Logarithmic transformationsLogarithmic transformations
Logarithmic transformations
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression
 
UNIT-2 Quantitaitive Anlaysis for Mgt Decisions.pptx
UNIT-2 Quantitaitive Anlaysis for Mgt Decisions.pptxUNIT-2 Quantitaitive Anlaysis for Mgt Decisions.pptx
UNIT-2 Quantitaitive Anlaysis for Mgt Decisions.pptx
 
Lecture27 linear programming
Lecture27 linear programmingLecture27 linear programming
Lecture27 linear programming
 
Simple Linear Regression (simplified)
Simple Linear Regression (simplified)Simple Linear Regression (simplified)
Simple Linear Regression (simplified)
 
Ch4 Confidence Interval
Ch4 Confidence IntervalCh4 Confidence Interval
Ch4 Confidence Interval
 
The nature of the data
The nature of the dataThe nature of the data
The nature of the data
 
Trans Fat
Trans FatTrans Fat
Trans Fat
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
 
NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING
 
Binomial Distribution
Binomial DistributionBinomial Distribution
Binomial Distribution
 
2 energy metabolism presentation1 final nut &fitness
2 energy metabolism presentation1 final nut &fitness2 energy metabolism presentation1 final nut &fitness
2 energy metabolism presentation1 final nut &fitness
 
Basic Descriptive Statistics
Basic Descriptive StatisticsBasic Descriptive Statistics
Basic Descriptive Statistics
 
Classification of Data
Classification of DataClassification of Data
Classification of Data
 
Operational research- main techniques PERT and CPM
Operational research- main techniques PERT and CPMOperational research- main techniques PERT and CPM
Operational research- main techniques PERT and CPM
 
Testofhypothesis
TestofhypothesisTestofhypothesis
Testofhypothesis
 
Generalized additives models (gam)
Generalized additives models (gam)Generalized additives models (gam)
Generalized additives models (gam)
 
How to calculate power in statistics
How to calculate power in statisticsHow to calculate power in statistics
How to calculate power in statistics
 
Friedman test Stat
Friedman test Stat Friedman test Stat
Friedman test Stat
 

Similar to HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes

SQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - HadoopSQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - HadoopJan Pieter Posthuma
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopJim Dowling
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptxAakashBerlia1
 
A quick start guide to using HDF5 files in GLOBE Claritas
A quick start guide to using HDF5 files in GLOBE ClaritasA quick start guide to using HDF5 files in GLOBE Claritas
A quick start guide to using HDF5 files in GLOBE ClaritasGuy Maslen
 

Similar to HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes (20)

Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
Parallel HDF5 Introductory Tutorial
Parallel HDF5 Introductory TutorialParallel HDF5 Introductory Tutorial
Parallel HDF5 Introductory Tutorial
 
HDF5 I/O Performance
HDF5 I/O PerformanceHDF5 I/O Performance
HDF5 I/O Performance
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at Scale
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
SQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - HadoopSQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - Hadoop
 
HDF5 Advanced Topics
HDF5 Advanced TopicsHDF5 Advanced Topics
HDF5 Advanced Topics
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
HDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/OHDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/O
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptx
 
Performance Tuning in HDF5
Performance Tuning in HDF5 Performance Tuning in HDF5
Performance Tuning in HDF5
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
A quick start guide to using HDF5 files in GLOBE Claritas
A quick start guide to using HDF5 files in GLOBE ClaritasA quick start guide to using HDF5 files in GLOBE Claritas
A quick start guide to using HDF5 files in GLOBE Claritas
 
Overview of Parallel HDF5
Overview of Parallel HDF5Overview of Parallel HDF5
Overview of Parallel HDF5
 

More from The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (20)

Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes

  • 1. HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004 1 HDF
  • 2. Topics General Introduction to HDF5 properties HDF5 Dataset properties I/O and Storage Properties (filters) HDF5 File properties I/O and Storage Properties (drivers) Datatypes Compound Variable Length Reference to object and dataset region 2 HDF
  • 3. General Introduction to HDF5 Properties 3 HDF
  • 4. Properties Definition • Mechanism to control different features of the HDF5 objects – Implemented via H5P Interface (‘Property lists’) – HDF5 Library sets objects’ default features – HDF5 ‘Property lists’ modify default features • At object creation time (creation properties) • At object access time (access or transfer properties) 4 HDF
  • 5. Properties Definitions • A property list is a list of name-value pairs – Values may be of any datatype • A property list is passed as an optional parameters to the HDF5 APIs • Property lists are used/ignored by all the layers of the library, as needed 5 HDF
  • 6. Type of Properties • Predefined and User defined property lists • Predefined: – – – – File creation File access Dataset creation Dataset access • Will cover each of these 6 HDF
  • 7. Properties (Example) HDF5 File • H5Fcreate(…,creation_prop_id,…) • Creation properties (how file is created?) – Library’s defaults • no user’s block • predefined sizes of offsets and addresses of the objects in the file (64-bit for DEC Alpha, 32-bit on Windows) – User’s settings • User’s block • 32-bit sizes on 64-bit platform • Control over B-trees for chunking storage (split factor) 7 HDF
  • 8. Properties (Example) HDF5 File • H5Fcreate(…,access_prop_id) • Access properties or drivers (How is file accessed? What is the physical layout on the disk?) – Library defaults • STDIO Library (UNIX fwrite, fread) – User’s defined • MPI I/O for parallel access • Family of files (100 Gb HDF5 represented by 50 2Gb UNIX files) • Size of the chunk cache 8 HDF
  • 9. Properties (Example) HDF5 Dataset • H5Dcreate(…,creation_prop_id) • Creation properties (how dataset is created) – Library’s defaults • • • • Storage: Contiguous Compression: None Space is allocated when data is first written No fill value is written – User’s settings • • • • 9 Storage: Compact, or chunked, or external Compression Fill value Control over space allocation in the file for raw data – at creation time – at write time HDF
  • 10. Properties (Example) HDF5 Dataset • H5Dwrite<read>(…,access_prop_id) • Access (transfer) properties – Library defaults • 1MB conversion buffer • Error detection on read (if was set during write) • MPI independent I/O for parallel access – User defined • MPI collective I/O for parallel access • Size of the datatype conversion buffer • Control over partial I/O to improve performance 10 HDF
  • 11. Properties Programming model • Use predefined property type – – – – H5P_FILE_CREATE H5P_FILE_ACCESS H5P_DATASET_CREATE H5P_DATASET_ACCESS • Create new property instance – H5Pcreate – H5Pcopy – H5*get_access_plist; H5*get_create_plist • Modify property (see H5P APIs) • Use property to modify object feature • Close property when done – H5Pclose 11 HDF
  • 12. Properties Programming model • General model of usage: get plist, set values, pass to library hid_t plist = H5Pcreate(copy)(predefined_plist); OR hid_t plist = H5Xget_create(access)_plist(…); H5Pset_foo( plist, vals); H5Xdo_something( Xid, …, plist); H5Pclose(plist); 12 HDF
  • 13. HDF5 Dataset Creation Properties and Predefined Filters 13 HDF
  • 14. Dataset Creation Properties • Storage – – – – Contiguous (default) Compact Chunked External • Filters applied to raw data – Compression – Checksum • Fill value • Space allocation for raw data in the file 14 HDF
  • 15. Dataset Creation Properties Storage Layouts • • Storage layout is important for I/O performance and size of the HDF5 files Contiguous (default) • • • Compact • • • • 15 Used when data will be written/read at once H5Dcreate(…,H5P_DEFAULT) Used for small datasets (order of O(bytes)) for better I/O Raw data is written/read at the time when dataset is open File is less fragmented To create a compact dataset follow the ‘Properties programming model’ HDF
  • 16. Creating Compact Dataset • • • Create a dataset creation property list Set property list to use compact storage layout Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_layout(plist, H5D_COMPACT); dset_id = H5Dcreate (…, “Compact”,…, plist); H5Pclose(plist); 16 HDF
  • 17. Creating chunked Dataset • Chunked layout is needed for – Extendible datasets – Compression and other filters – To improve partial I/O for big datasets chunked Better subsetting access time; extendible Only two chunks will be written/read 17 HDF
  • 18. Creating Chunked Dataset • • • Create a dataset creation property list Set property list to use chunked storage layout Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk(plist, rank, ch_dims); dset_id = H5Dcreate (…, “Chunked”,…, plist); H5Pclose(plist); 18 HDF
  • 19. Dataset Creation Properties Compression and other I/O Pipeline Filters • HDF5 provides a mechanism (“I/O filters”) to manipulate data while transferring it between memory and disk H5Z and H5P interfaces HDF5 predefined filters (H5P interface) • • – – • Compression (gzip, szip) Shuffling and checksum filters User defined filters (H5Z and H5P interfaces) – 19 Example: Bzip2 compression http://hdf.ncsa.uiuc.edu/HDF5/papers/bzip2 HDF
  • 20. Compression and other I/O Pipeline Filters (continued) • • Currently used only with chunked datasets Filters can be combined together – – • GZIP + shuffle+checksum filters Checksum filter + user define encryption filter Filters are called in the order they are defined on writing and in the reverse order on reading User is responsible for “filters pipeline sanity” • – – 20 GZIP +SZIP + shuffle doesn’t make sense Shuffle + SZIP does HDF
  • 21. Creating compressed Dataset • Compression – – – – Improves transmission speed Improves storage efficiency Requires chunking May increase CPU time needed for compression Memory File Compressed 21 HDF
  • 22. Creating compressed datasets • • • • Create a dataset creation property list Set chunking (and specify chunk dimensions) Set compression method Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk (plist, ndims, chkdims); H5Pset_deflate (plist, level); /*GZIP */ OR H5Pset_szip (plist, options-mask, numpixels);/*SZIP*/ dset_id = H5Dcreate (file_id, “comp-data”, “H5T_NATIVE_FLOAT,space_id, plist); 22 HDF
  • 23. Creating external Dataset • • • • Dataset’s raw data is stored in an external file Easy to include existing data into HDF5 file Easy to export raw data if application needs it Disadvantage: user has to keep track of additional files to preserve integrity of the HDF5 file Dataset “A” HDF5 file External file Raw data for “A” Raw data can be stored in external file Metadata for “A” 23 HDF
  • 24. Creating External Dataset • • • Create a dataset creation property list Set property list to use external storage layout Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_external(plist, “raw_data.ext”, offset, size); dset_id = H5Dcreate (…, “Chunked”,…, plist); H5Pclose(plist); 24 HDF
  • 25. Example of External Files This example shows how a contiguous, one-dimensional dataset is partitioned into three parts and each of those parts is stored in a segment of an external file. plist = H5Pcreate (H5P_DATASET_CREATE); HPset_external (plist, “raw.data”, 3000, 1000); H5Pset_external (plist, “raw.data”, 0, 2500); H5Pset_external (plist, “raw.data”, 4500, 1500); 25 HDF
  • 26. Checksum Filter • HDF5 includes the Fletcher32 checksum algorithm for error detection. • It is automatically included in HDF5 • To use this filter you must add it to the filter pipeline with H5Pset_filter. Memory 26 Checksum value HDF
  • 27. Enabling Checksum Filter • • • • • Create a dataset creation property list Set chunking (and specify chunk dimensions) Add the filter to the pipeline Create your dataset specifying this property list Close property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk (plist, ndims, chkdims); H5Pset_filter (plist, H5Z_FILTER_FLETCHER32, 0, 0, NULL); H5Dcreate (…,”Checksum”,…,plist) H5Pclose(plist); 27 HDF
  • 28. Shuffling filter • Predefined HDF5 filter • Not a compression; change of byte order in a stream of data • Example – 1 23 43 • Hexadecimal form – 0x01 0x17 0x2B • Big-endian machine – 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x17 0x00 0x00 0x00 0x2B • Shuffling 28 – 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x17 0x2B HDF
  • 29. 00 00 00 01 00 00 00 17 00 00 00 2B 00 00 00 00 00 00 00 00 00 01 17 2B 29 HDF
  • 30. Enabling Shuffling Filter • • • • • • Create a dataset creation property list Set chunking (and specify chunk dimensions) Add the filter to the pipeline Define compression filter Create your dataset specifying this property list Close property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk (plist, ndims, chkdims); H5Pset_shuffle(plist); H5Pset_deflate(plist,level); H5Dcreate (…,”BetterComp”,…,plist) H5Pclose(plist); 30 HDF
  • 31. Effect of data shuffling (H5Pset_shuffle + H5Pset_deflate) • Write 4-byte integer dataset 256x256x1024 (256MB) • Using chunks of 256x16x1024 (16MB) • Values: random integers between 0 and 255 File size Write Time No Shuffle 102.9MB 671.049 629.45 Shuffle 31 Total time 67.34MB 83.353 78.268 Compression combined with shuffling provides •Better compression ratio •Better I/O performance HDF
  • 32. HDF5 Dataset Access (Transfer) Properties 32 HDF
  • 33. Dataset Access/Transfer Properties • Improve performance • H5Pset_buffer – Sets the size of the datatype conversion buffer during I/O – Size should be large enough to hold the slice along the slowest changing dimension – Example: Hyperslab 100x200x300, buffer 200x300 • H5Pset_hyper_vector_size – Sets the number of hyperslab offset and length pairs – Improves performance for partial I/O 33 HDF
  • 34. Dataset Access/Transfer Properties • H5Pset_edc_check – – – – For datasets created with error detection filter enabled Enables error checking during read operation H5Z_ENABLE_EDC (default) N5Z_DISABLE_EDC • H5Pset_dxpl_mpio – Sets data transfer mode for parallel I/O – H5FD_MPIO_INDEPENDENT (default) – H5FD_MPIO_COLLECTIVE 34 HDF
  • 36. Standard Interface for User-defined Filters • H5Zregister : Register filter so that HDF5 knows about it • H5Zunregister: Unregister a filter • H5Pset_filter: Adds a filter to the filter pipeline • H5Pget_filter: Returns information about a filter in the pipeline • H5Zfilter_avail: Check if filter is available 36 HDF
  • 38. File Creation Properties • H5Pset_userblock – User block stores user-defined information (e.g ASCII text to describe a file) at the beginning of the file – Cat my.txt hdf5.h5 > myhdf5.h5 – Sets the size of the user block – 512 bytes, 1024 bytes, 2^N • H5Pset_sizes – Sets the byte size of the offsets and lengths used to address objects in the file • H5Pset_sym_k – Controls the rank of groups B-trees for groups – Default is 16 • H5Pset_istore_k – Controls the rank of groups B-trees for chunked datasets – Default is 32 38 HDF
  • 40. File Access Properties (Performance) • H5Pset_cache – Sets metadata cache and raw data chunk parameters – Improper size will degrade performance • H5Pset_meta_block_size – Reduces the number of small objects in the file – Block of metadata is written in a single I/O operation (default 2K) – VFL driver has to set H5FD_AGGREGATE_METADATA • H5Pset_sieve_buffer – Improves partial I/O – Need a picture • VFL layer: file drivers 40 HDF
  • 41. File Access Properties (Physical storage and Usage of Low-level I/O Libraries) • VFL layer: file drivers • Define physical storage of the HDF5 file – – – – Memory driver (HDF5 file in the application’s memory) Stream driver (HDF5 file written to a socket) Split(multi) files driver Family driver • Define low level I/O library – MPI I/O driver for parallel access – STDIO vs. SEC2 41 HDF
  • 42. Files needn’t be files - Virtual File Layer VFL: A public API for writing I/O drivers Hid_t “File” Handle VFL: Virtual File I/O Layer stdio mpio split family I/O drivers SRB memory network “Storage” Files 42 SRB Memory Repository Network HDF
  • 43. Split Files • Allows you to split metadata and data into separate files • May reside on different file systems for better I/O • Disadvantage: User has to keep track of the files HDF5 file Metadata file Raw data file Dataset “A” Dataset “B” Data A Data B 43 HDF
  • 44. Creating Split Files • • • • Create a file access property list Set up file access property list to use split files Create the file with this property list Close the property plist = H5Pcreate (H5P_FILE_ACCESS); H5Pset_fapl_family(plist, “.met”, H5P_DEFAULT,”.dat”, H5P_DEFAULT); file = H5Fcreate plist); H5Pclose(plist); 44 (H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, HDF
  • 45. File Families • Allows you to access files larger than 2GB on file systems that don't support large files • Any HDF5 file can be split into a family of files and vice versa • A family member size must be a power of two 45 HDF
  • 46. Creating a File Family • Create a file access property list • Set up file access property list to use file family • Create the file with this property list plist = H5Pcreate (H5P_FILE_ACCESS); H5Pset_fapl_family (plist, family_size, H5P_DEFAULT); file = H5Fcreate (H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist); H5Pclose(plist); 46 HDF
  • 48. Datatypes • A datatype is – A classification specifying the interpretation of a data element – Specifies for a given data element • the set of possible values it can have • the operations that can be performed • how the values of that type are stored – May be shared between different datasets in one file 48 HDF
  • 49. HDF5 datatypes • Atomic types – – – – – – 49 standard integer & float user-definable scalars (e.g. 13-bit integer) bitfields variable length types (e.g. strings) pointers - references to objects/dataset regions enumeration - names mapped to integers HDF
  • 50. General Operations on HDF5 Datatypes • Create – H5Tcreate creates a datatype of the HT_COMPOUND, H5T_OPAQUE, and H5T_ENUM classes • Copy – H5Tcopy creates another instance of the datatype; can be applied to any datatypes • Commit – H5Tcommit creates an Datatype Object in the HDF5 file; comitted datatype can be shared between different datatsets • Open – H5Topen opens the datatypes stored in the file • Close – H5Tclose closes datatype object 50 HDF
  • 51. Programming model for HDF5 Datatypes • Use predefined HDF5 types – No need to close • OR – Create • Create a datatype (by copying existing one or by creating from the one of H5T_COMPOUND(ENAUM,OPAQUE) classes) • Create a datatype by queering datatype of a dataset – Open committed datatype from the file • (Optional) Discover datatype properties (size, precision, members, etc.) • Use datatype to create a dataset/attribute, to write/read dataset/attribute, to set fill value • (Optional) Save datatype in the file • Close 51 HDF
  • 52. HDF5 Compound Datatypes • Compound types – – – – – – Comparable to C structs Members can be atomic or compound types Members can be multidimensional Can be written/read by a field or set of fields Non all data filters can be applied (shuffling, SZIP) H5Tcreate(H5T_COMPOUND), H5Tinsert calls to create a compound datatype – See H5Tget_member* functions for discovering properties of the HDF5 compound datatype 52 HDF
  • 53. HDF5 Fixed and Variable length array storage •Data •Data Time •Data •Data •Data Time •Data •Data •Data •Data 53 HDF
  • 54. HDF5 Variable Length Datatypes Programming issues • Each element is represented by C struct typedef struct { size_t length; void *p; } hvl_t; • Base type can be any HDF5 type 54 HDF
  • 55. HDF5 Variable Length Datatypes Raw data Global heap Dataset with variable length datatype 55 HDF
  • 56. HDF Information • HDF Information Center – http://hdf.ncsa.uiuc.edu/ • HDF Help email address – hdfhelp@ncsa.uiuc.edu • HDF users mailing list – hdfnews@ncsa.uiuc.edu 56 HDF

Editor's Notes

  1. Format and software for scientific data. HDF5 is a different format from earlier versions of HDF, as is the library. Stores images, multidimensional arrays, tables, etc. That is, you can construct all of these different kinds structures and store them in HDF5. You can also mix and match them in HDF5 files according to your needs. Emphasis on storage and I/O efficiency Both the library and the format are designed to address this. Free and commercial software support As far as HDF5 goes, this is just a goal now. There is commercial support for HDF4, but little if any for HDF5 at this time. We are working with vendors to change this. Emphasis on standards You can store data in HDF5 in a variety of ways, so we try to work with users to encourage them to organize HDF5 files in standard ways. Users from many engineering and scientific fields
  2. Format and software for scientific data. HDF5 is a different format from earlier versions of HDF, as is the library. Stores images, multidimensional arrays, tables, etc. That is, you can construct all of these different kinds structures and store them in HDF5. You can also mix and match them in HDF5 files according to your needs. Emphasis on storage and I/O efficiency Both the library and the format are designed to address this. Free and commercial software support As far as HDF5 goes, this is just a goal now. There is commercial support for HDF4, but little if any for HDF5 at this time. We are working with vendors to change this. Emphasis on standards You can store data in HDF5 in a variety of ways, so we try to work with users to encourage them to organize HDF5 files in standard ways. Users from many engineering and scientific fields
  3. Format and software for scientific data. HDF5 is a different format from earlier versions of HDF, as is the library. Stores images, multidimensional arrays, tables, etc. That is, you can construct all of these different kinds structures and store them in HDF5. You can also mix and match them in HDF5 files according to your needs. Emphasis on storage and I/O efficiency Both the library and the format are designed to address this. Free and commercial software support As far as HDF5 goes, this is just a goal now. There is commercial support for HDF4, but little if any for HDF5 at this time. We are working with vendors to change this. Emphasis on standards You can store data in HDF5 in a variety of ways, so we try to work with users to encourage them to organize HDF5 files in standard ways. Users from many engineering and scientific fields
  4. Format and software for scientific data. HDF5 is a different format from earlier versions of HDF, as is the library. Stores images, multidimensional arrays, tables, etc. That is, you can construct all of these different kinds structures and store them in HDF5. You can also mix and match them in HDF5 files according to your needs. Emphasis on storage and I/O efficiency Both the library and the format are designed to address this. Free and commercial software support As far as HDF5 goes, this is just a goal now. There is commercial support for HDF4, but little if any for HDF5 at this time. We are working with vendors to change this. Emphasis on standards You can store data in HDF5 in a variety of ways, so we try to work with users to encourage them to organize HDF5 files in standard ways. Users from many engineering and scientific fields
  5. Format and software for scientific data. HDF5 is a different format from earlier versions of HDF, as is the library. Stores images, multidimensional arrays, tables, etc. That is, you can construct all of these different kinds structures and store them in HDF5. You can also mix and match them in HDF5 files according to your needs. Emphasis on storage and I/O efficiency Both the library and the format are designed to address this. Free and commercial software support As far as HDF5 goes, this is just a goal now. There is commercial support for HDF4, but little if any for HDF5 at this time. We are working with vendors to change this. Emphasis on standards You can store data in HDF5 in a variety of ways, so we try to work with users to encourage them to organize HDF5 files in standard ways. Users from many engineering and scientific fields
  6. Format and software for scientific data. HDF5 is a different format from earlier versions of HDF, as is the library. Stores images, multidimensional arrays, tables, etc. That is, you can construct all of these different kinds structures and store them in HDF5. You can also mix and match them in HDF5 files according to your needs. Emphasis on storage and I/O efficiency Both the library and the format are designed to address this. Free and commercial software support As far as HDF5 goes, this is just a goal now. There is commercial support for HDF4, but little if any for HDF5 at this time. We are working with vendors to change this. Emphasis on standards You can store data in HDF5 in a variety of ways, so we try to work with users to encourage them to organize HDF5 files in standard ways. Users from many engineering and scientific fields