SlideShare a Scribd company logo
1 of 14
Substituting HDF5 tools with Python/H5py scripts
Daniel Kahn
Science Systems and Applications Inc.

HDF HDF-EOS Workshop XIV, 28 Sep. 2010

1 of 14
What are HDF5 tools?
HDF5 tools are command line programs distributed with the HDF5
library. They allow users to manipulate HDF5 files.
h5dump: dump HDF5 data as ASCII text.
h5import: convert non-HDF5 data to HDF5
h5diff: show differences between HDF5 files.
h5copy: Copy objects between HDF5 files.
h5repack: Copy entire file while changing storage
properties of HDF5 objects.
h5edit: (proposed) add attributes to HDF5 objects.

HDF5 tools have a long history as the first (and for a long time only)
way to manipulate HDF5 files conveniently. I.e. without writing a C or
Java program, or without buying expensive commercial software such
as IDL or Matlab.

2 of 14
The tools can be characterized as having three parts:
Text Processing—Evaluate command arguments, process input
text files, match group names.
Tree Walking – Search HDF5 file hierarchy for objects by name.
Object Level Operations – Operate on the objects: copy, diff,
repack, etc.

The tools are simple to use and convenient as they are
distributed with the HDF5 library.
3 of 14
Disadvantage of HDF5 tools:
The command line arguments limit tool capability.
Adding new features with command line syntax which is both
readable and does not break the legacy syntax becomes difficult.

Development time for designing and implementing new features is
long (weeks...months).
Use cases must be evaluated, a solution proposed in an RFC, the
proposal must be implemented, new code is distributed in next
release.

4 of 14
Here's an example from HDF documentation:
h5copy -v -i "test1.h5" -o "test1.out.h5" -s "/array" -d "/array

But suppose we had multiple datasets named arrayNNN
where N is 0–9. We'd like to write something like:
h5copy -v -i "test1.h5" -o "test1.out.h5" -s "/arrayd+{3}”

So that d+{3} would provide a match to all such objects.
Extending the tool syntax to meet this use case, and then
again for the next use case would be a never ending game of
catch up.
A more flexible substitute is desirable...

5 of 14
...Python?

6 of 14
What is Python?
Python is a programming language.
It features dynamic binding of variables, like Perl or shell
scripts, IDL, Matlab, but not C or Fortran.
Unlike Perl, it supports native floating point numbers.
It has scientific array support in the style of IDL or Matlab
(numpy module). Array operations can be programmed using
normal arithmetic operators.
It has access to the HDF5 library (Anderw Collette's
h5py module).
Python is currently the only programming language in wide
spread use to have all these features. They are essential to the
success of the language for easy HDF5 file manipulation.
7 of 14
Real world Experience: Learning Python and h5py is quick.
In the summer of 2010 SSAI hired a summer intern.
Equipped with some Perl programming experience the
intern was able to come up to speed on Python, HDF5,
h5py, and numpy within one to two weeks and, over
the summer, develop a specialized file/dataset merging
tool and a dataset conversion tool.

Python and h5py are the best way to introduce HDF5
because it allows the user to concentrate on the H in
HDF5, rather than the C API syntax.

8 of 14
Python is well suited to HDF5
Python is well suited to HDF5 because the HDF5 array objects
carry the dimensionality, extent, and element data type
information, just as HDF5 datasets do. The object oriented
nature of Python allows these objects to be manipulated at a
high level. C, by contrast, lacks a scientific array object and
the ability to define object methods.

9 of 14
Example: Creating and Writing a Dataset to a New File
Python:

import h5py
import numpy
TestData = numpy.array(range(1,25),dtype='int32').reshape(4,6)
h5py.File("WrittenByH5PY.h5","w")['/TestDataset'] = TestData

Compare to C version:
#include "hdf5.h"
int main() {

hid_t
file_id, dataspace_id, dataset_id; /* identifiers */
herr_t status;
hsize_t dims[2];
const int FirstIndex = 4, SecondIndex = 6;
int
i, j, dset_data[4][6];
for (i = 0; i < 4; i++) /* Initialize the dataset. */
for (j = 0; j < 6; j++)
dset_data[i][j] = i * 6 + j + 1;
dims[0] = FirstIndex;
dims[1] = SecondIndex;
file_id = H5Fcreate("WrittenByC.h5", H5F_ACC_TRUNC, H5P_DEFAULT,H5P_DEFAULT); /* Open an existing file.
*/
dataspace_id = H5Screate_simple(2, dims, NULL);
dataset_id = H5Dcreate(file_id, "/TestDataset", H5T_STD_I32LE, dataspace_id,
H5P_DEFAULT,H5P_DEFAULT,H5P_DEFAULT);
/* Write the dataset. */
status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);
status = H5Dclose(dataset_id); /* Close the dataset. */
status = H5Fclose(file_id); /* Close the file. */

10 of 14

}
And here's the output:
h5dump WrittenByH5PY.h5
HDF5 "WrittenByH5PY.h5" {
GROUP "/" {
DATASET "TestDataset" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
DATA {
(0,0): 1, 2, 3, 4, 5, 6,
(1,0): 7, 8, 9, 10, 11, 12,
(2,0): 13, 14, 15, 16, 17, 18,
(3,0): 19, 20, 21, 22, 23, 24
}
}
}
}

11 of 14
Python and the Three Pillars of HDF5 Tools
Python is well suited to Text Processing
Python has wide range of string manipulation functions, an easyto-use regular expression module, and list and dictionary (hash
table) objects. No segmentation faults!

Python is well suited to Tree Walking. Recursive
functions and loops over lists are easy to write

Object Level Operations...Not so much.
Object Level Operations (e.g. copy, diff) are challenging to write
efficiently and should be provided as part of the API by the HDF
Group, for example h5o_copy. API functions are available to the
Python programmer via h5py.

12 of 14
Why use Python to substitute HDF5 tools?
Python is available now.
Some HDF5 tools are still under development as new use
cases are presented. For example, users have requested a
tool to add attributes to HDF5 files. Such a capability
already exists with h5py:
python -c "import h5py ; fid = h5py.File('FileForAttributeAddition.h5','r+') ;
fid['/TestDataset'].attrs['CmdLine1'] = 'NewValue' ; fid.close()"

It's little ugly, but it is available today.
Python is a full programming language. It can accomplish
tasks which HDF5 tools cannot.
Further Resources:
http://groups.google.com/group/h5py
http://h5py.alfven.org/

13 of 14
Recommendations:
Users should consider Python and H5py to accomplish their HDF5
file manipulation projects.
The HDF Group should concentrate on providing efficient
API functions for object level tasks: object copy, dataset
difference, etc.

The HDF Group should avoid complex enhancements to tools
where Python/h5py could be used instead.
An easily searched contributed application repository on the HDF
Group website with user ratings would be very helpful.

14 of 14

More Related Content

What's hot

Migration Guide from Java 8 to Java 11 #jjug
Migration Guide from Java 8 to Java 11 #jjugMigration Guide from Java 8 to Java 11 #jjug
Migration Guide from Java 8 to Java 11 #jjugYuji Kubota
 
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~JustSystems Corporation
 
初心者向け負荷軽減のはなし
初心者向け負荷軽減のはなし初心者向け負荷軽減のはなし
初心者向け負荷軽減のはなしOonishi Takaaki
 
Airflowで真面目にjob管理
Airflowで真面目にjob管理Airflowで真面目にjob管理
Airflowで真面目にjob管理msssgur
 
今から備えるMySQL最新バージョン5.7
今から備えるMySQL最新バージョン5.7今から備えるMySQL最新バージョン5.7
今から備えるMySQL最新バージョン5.7yoku0825
 
머신러닝과 사이킷런의 이해
머신러닝과 사이킷런의 이해머신러닝과 사이킷런의 이해
머신러닝과 사이킷런의 이해철민 권
 
Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)涛 吴
 
使用 laravel 的前與後
使用 laravel 的前與後使用 laravel 的前與後
使用 laravel 的前與後Shengyou Fan
 
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]Antonios Katsarakis
 
Node-v0.12の新機能について
Node-v0.12の新機能についてNode-v0.12の新機能について
Node-v0.12の新機能についてshigeki_ohtsu
 
Creating a complete disaster recovery strategy
Creating a complete disaster recovery strategyCreating a complete disaster recovery strategy
Creating a complete disaster recovery strategyMariaDB plc
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesMydbops
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
MySQLレプリケーションあれやこれや
MySQLレプリケーションあれやこれやMySQLレプリケーションあれやこれや
MySQLレプリケーションあれやこれやyoku0825
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowLaura Lorenz
 
ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016Derek Downey
 

What's hot (20)

Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1
 
Migration Guide from Java 8 to Java 11 #jjug
Migration Guide from Java 8 to Java 11 #jjugMigration Guide from Java 8 to Java 11 #jjug
Migration Guide from Java 8 to Java 11 #jjug
 
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
 
初心者向け負荷軽減のはなし
初心者向け負荷軽減のはなし初心者向け負荷軽減のはなし
初心者向け負荷軽減のはなし
 
nginx入門
nginx入門nginx入門
nginx入門
 
Airflowで真面目にjob管理
Airflowで真面目にjob管理Airflowで真面目にjob管理
Airflowで真面目にjob管理
 
今から備えるMySQL最新バージョン5.7
今から備えるMySQL最新バージョン5.7今から備えるMySQL最新バージョン5.7
今から備えるMySQL最新バージョン5.7
 
머신러닝과 사이킷런의 이해
머신러닝과 사이킷런의 이해머신러닝과 사이킷런의 이해
머신러닝과 사이킷런의 이해
 
Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)
 
Scale-out ccNUMA - Eurosys'18
Scale-out ccNUMA - Eurosys'18Scale-out ccNUMA - Eurosys'18
Scale-out ccNUMA - Eurosys'18
 
1.11 실행계획 해석 predicate
1.11 실행계획 해석 predicate1.11 실행계획 해석 predicate
1.11 실행계획 해석 predicate
 
使用 laravel 的前與後
使用 laravel 的前與後使用 laravel 的前與後
使用 laravel 的前與後
 
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]
Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]
 
Node-v0.12の新機能について
Node-v0.12の新機能についてNode-v0.12の新機能について
Node-v0.12の新機能について
 
Creating a complete disaster recovery strategy
Creating a complete disaster recovery strategyCreating a complete disaster recovery strategy
Creating a complete disaster recovery strategy
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
MySQLレプリケーションあれやこれや
MySQLレプリケーションあれやこれやMySQLレプリケーションあれやこれや
MySQLレプリケーションあれやこれや
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016
 

Viewers also liked

Python and HDF5: Overview
Python and HDF5: OverviewPython and HDF5: Overview
Python and HDF5: Overviewandrewcollette
 
Introduction To Programming with Python-1
Introduction To Programming with Python-1Introduction To Programming with Python-1
Introduction To Programming with Python-1Syed Farjad Zia Zaidi
 
Introduction To Programming with Python-5
Introduction To Programming with Python-5Introduction To Programming with Python-5
Introduction To Programming with Python-5Syed Farjad Zia Zaidi
 
An Introduction to Interactive Programming in Python 2013
An Introduction to Interactive Programming in Python 2013An Introduction to Interactive Programming in Python 2013
An Introduction to Interactive Programming in Python 2013Syed Farjad Zia Zaidi
 
Introduction To Programming with Python-4
Introduction To Programming with Python-4Introduction To Programming with Python-4
Introduction To Programming with Python-4Syed Farjad Zia Zaidi
 
Introduction to UBI
Introduction to UBIIntroduction to UBI
Introduction to UBIRoy Lee
 
Python 4 Arc
Python 4 ArcPython 4 Arc
Python 4 Arcabsvis
 
Python programming - Everyday(ish) Examples
Python programming - Everyday(ish) ExamplesPython programming - Everyday(ish) Examples
Python programming - Everyday(ish) ExamplesAshish Sharma
 
Introduction To Programming with Python Lecture 2
Introduction To Programming with Python Lecture 2Introduction To Programming with Python Lecture 2
Introduction To Programming with Python Lecture 2Syed Farjad Zia Zaidi
 
Cyberoam Firewall Presentation
Cyberoam Firewall PresentationCyberoam Firewall Presentation
Cyberoam Firewall PresentationManoj Kumar Mishra
 
introduction to python
introduction to pythonintroduction to python
introduction to pythonSardar Alam
 

Viewers also liked (20)

Using HDF5 and Python: The H5py module
Using HDF5 and Python: The H5py moduleUsing HDF5 and Python: The H5py module
Using HDF5 and Python: The H5py module
 
The Python Programming Language and HDF5: H5Py
The Python Programming Language and HDF5: H5PyThe Python Programming Language and HDF5: H5Py
The Python Programming Language and HDF5: H5Py
 
Python and HDF5: Overview
Python and HDF5: OverviewPython and HDF5: Overview
Python and HDF5: Overview
 
HDF5 Tools
HDF5 ToolsHDF5 Tools
HDF5 Tools
 
Introduction To Programming with Python-1
Introduction To Programming with Python-1Introduction To Programming with Python-1
Introduction To Programming with Python-1
 
Logic Over Language
Logic Over LanguageLogic Over Language
Logic Over Language
 
Logic: Language and Information 1
Logic: Language and Information 1Logic: Language and Information 1
Logic: Language and Information 1
 
Introduction To Programming with Python-5
Introduction To Programming with Python-5Introduction To Programming with Python-5
Introduction To Programming with Python-5
 
An Introduction to Interactive Programming in Python 2013
An Introduction to Interactive Programming in Python 2013An Introduction to Interactive Programming in Python 2013
An Introduction to Interactive Programming in Python 2013
 
Introduction to Databases
Introduction to DatabasesIntroduction to Databases
Introduction to Databases
 
Introduction To Programming with Python-4
Introduction To Programming with Python-4Introduction To Programming with Python-4
Introduction To Programming with Python-4
 
Introduction to UBI
Introduction to UBIIntroduction to UBI
Introduction to UBI
 
Python 4 Arc
Python 4 ArcPython 4 Arc
Python 4 Arc
 
Clase 2 estatica
Clase 2 estatica Clase 2 estatica
Clase 2 estatica
 
Using visualization tools to access HDF data via OPeNDAP
Using visualization tools to access HDF data via OPeNDAP Using visualization tools to access HDF data via OPeNDAP
Using visualization tools to access HDF data via OPeNDAP
 
Python programming - Everyday(ish) Examples
Python programming - Everyday(ish) ExamplesPython programming - Everyday(ish) Examples
Python programming - Everyday(ish) Examples
 
Lets learn Python !
Lets learn Python !Lets learn Python !
Lets learn Python !
 
Introduction To Programming with Python Lecture 2
Introduction To Programming with Python Lecture 2Introduction To Programming with Python Lecture 2
Introduction To Programming with Python Lecture 2
 
Cyberoam Firewall Presentation
Cyberoam Firewall PresentationCyberoam Firewall Presentation
Cyberoam Firewall Presentation
 
introduction to python
introduction to pythonintroduction to python
introduction to python
 

Similar to Substituting HDF5 tools with Python/H5py scripts

Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallelmfolk
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014Edwin de Jonge
 

Similar to Substituting HDF5 tools with Python/H5py scripts (20)

Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Parallel HDF5 Introductory Tutorial
Parallel HDF5 Introductory TutorialParallel HDF5 Introductory Tutorial
Parallel HDF5 Introductory Tutorial
 
Introduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming ModelsIntroduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming Models
 
HDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/OHDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/O
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
Overview of Parallel HDF5 and Performance Tuning in HDF5 Library
Overview of Parallel HDF5 and Performance Tuning in HDF5 LibraryOverview of Parallel HDF5 and Performance Tuning in HDF5 Library
Overview of Parallel HDF5 and Performance Tuning in HDF5 Library
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallel
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
Overview of Parallel HDF5
Overview of Parallel HDF5Overview of Parallel HDF5
Overview of Parallel HDF5
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
HDF5 Tools in IDL
HDF5 Tools in IDLHDF5 Tools in IDL
HDF5 Tools in IDL
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014
 
Overview of Parallel HDF5 and Performance Tuning in HDF5 Library
Overview of Parallel HDF5 and Performance Tuning in HDF5 LibraryOverview of Parallel HDF5 and Performance Tuning in HDF5 Library
Overview of Parallel HDF5 and Performance Tuning in HDF5 Library
 
Implementing HDF5 in MATLAB
Implementing HDF5 in MATLABImplementing HDF5 in MATLAB
Implementing HDF5 in MATLAB
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 

More from The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Substituting HDF5 tools with Python/H5py scripts

  • 1. Substituting HDF5 tools with Python/H5py scripts Daniel Kahn Science Systems and Applications Inc. HDF HDF-EOS Workshop XIV, 28 Sep. 2010 1 of 14
  • 2. What are HDF5 tools? HDF5 tools are command line programs distributed with the HDF5 library. They allow users to manipulate HDF5 files. h5dump: dump HDF5 data as ASCII text. h5import: convert non-HDF5 data to HDF5 h5diff: show differences between HDF5 files. h5copy: Copy objects between HDF5 files. h5repack: Copy entire file while changing storage properties of HDF5 objects. h5edit: (proposed) add attributes to HDF5 objects. HDF5 tools have a long history as the first (and for a long time only) way to manipulate HDF5 files conveniently. I.e. without writing a C or Java program, or without buying expensive commercial software such as IDL or Matlab. 2 of 14
  • 3. The tools can be characterized as having three parts: Text Processing—Evaluate command arguments, process input text files, match group names. Tree Walking – Search HDF5 file hierarchy for objects by name. Object Level Operations – Operate on the objects: copy, diff, repack, etc. The tools are simple to use and convenient as they are distributed with the HDF5 library. 3 of 14
  • 4. Disadvantage of HDF5 tools: The command line arguments limit tool capability. Adding new features with command line syntax which is both readable and does not break the legacy syntax becomes difficult. Development time for designing and implementing new features is long (weeks...months). Use cases must be evaluated, a solution proposed in an RFC, the proposal must be implemented, new code is distributed in next release. 4 of 14
  • 5. Here's an example from HDF documentation: h5copy -v -i "test1.h5" -o "test1.out.h5" -s "/array" -d "/array But suppose we had multiple datasets named arrayNNN where N is 0–9. We'd like to write something like: h5copy -v -i "test1.h5" -o "test1.out.h5" -s "/arrayd+{3}” So that d+{3} would provide a match to all such objects. Extending the tool syntax to meet this use case, and then again for the next use case would be a never ending game of catch up. A more flexible substitute is desirable... 5 of 14
  • 7. What is Python? Python is a programming language. It features dynamic binding of variables, like Perl or shell scripts, IDL, Matlab, but not C or Fortran. Unlike Perl, it supports native floating point numbers. It has scientific array support in the style of IDL or Matlab (numpy module). Array operations can be programmed using normal arithmetic operators. It has access to the HDF5 library (Anderw Collette's h5py module). Python is currently the only programming language in wide spread use to have all these features. They are essential to the success of the language for easy HDF5 file manipulation. 7 of 14
  • 8. Real world Experience: Learning Python and h5py is quick. In the summer of 2010 SSAI hired a summer intern. Equipped with some Perl programming experience the intern was able to come up to speed on Python, HDF5, h5py, and numpy within one to two weeks and, over the summer, develop a specialized file/dataset merging tool and a dataset conversion tool. Python and h5py are the best way to introduce HDF5 because it allows the user to concentrate on the H in HDF5, rather than the C API syntax. 8 of 14
  • 9. Python is well suited to HDF5 Python is well suited to HDF5 because the HDF5 array objects carry the dimensionality, extent, and element data type information, just as HDF5 datasets do. The object oriented nature of Python allows these objects to be manipulated at a high level. C, by contrast, lacks a scientific array object and the ability to define object methods. 9 of 14
  • 10. Example: Creating and Writing a Dataset to a New File Python: import h5py import numpy TestData = numpy.array(range(1,25),dtype='int32').reshape(4,6) h5py.File("WrittenByH5PY.h5","w")['/TestDataset'] = TestData Compare to C version: #include "hdf5.h" int main() { hid_t file_id, dataspace_id, dataset_id; /* identifiers */ herr_t status; hsize_t dims[2]; const int FirstIndex = 4, SecondIndex = 6; int i, j, dset_data[4][6]; for (i = 0; i < 4; i++) /* Initialize the dataset. */ for (j = 0; j < 6; j++) dset_data[i][j] = i * 6 + j + 1; dims[0] = FirstIndex; dims[1] = SecondIndex; file_id = H5Fcreate("WrittenByC.h5", H5F_ACC_TRUNC, H5P_DEFAULT,H5P_DEFAULT); /* Open an existing file. */ dataspace_id = H5Screate_simple(2, dims, NULL); dataset_id = H5Dcreate(file_id, "/TestDataset", H5T_STD_I32LE, dataspace_id, H5P_DEFAULT,H5P_DEFAULT,H5P_DEFAULT); /* Write the dataset. */ status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); status = H5Dclose(dataset_id); /* Close the dataset. */ status = H5Fclose(file_id); /* Close the file. */ 10 of 14 }
  • 11. And here's the output: h5dump WrittenByH5PY.h5 HDF5 "WrittenByH5PY.h5" { GROUP "/" { DATASET "TestDataset" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) } DATA { (0,0): 1, 2, 3, 4, 5, 6, (1,0): 7, 8, 9, 10, 11, 12, (2,0): 13, 14, 15, 16, 17, 18, (3,0): 19, 20, 21, 22, 23, 24 } } } } 11 of 14
  • 12. Python and the Three Pillars of HDF5 Tools Python is well suited to Text Processing Python has wide range of string manipulation functions, an easyto-use regular expression module, and list and dictionary (hash table) objects. No segmentation faults! Python is well suited to Tree Walking. Recursive functions and loops over lists are easy to write Object Level Operations...Not so much. Object Level Operations (e.g. copy, diff) are challenging to write efficiently and should be provided as part of the API by the HDF Group, for example h5o_copy. API functions are available to the Python programmer via h5py. 12 of 14
  • 13. Why use Python to substitute HDF5 tools? Python is available now. Some HDF5 tools are still under development as new use cases are presented. For example, users have requested a tool to add attributes to HDF5 files. Such a capability already exists with h5py: python -c "import h5py ; fid = h5py.File('FileForAttributeAddition.h5','r+') ; fid['/TestDataset'].attrs['CmdLine1'] = 'NewValue' ; fid.close()" It's little ugly, but it is available today. Python is a full programming language. It can accomplish tasks which HDF5 tools cannot. Further Resources: http://groups.google.com/group/h5py http://h5py.alfven.org/ 13 of 14
  • 14. Recommendations: Users should consider Python and H5py to accomplish their HDF5 file manipulation projects. The HDF Group should concentrate on providing efficient API functions for object level tasks: object copy, dataset difference, etc. The HDF Group should avoid complex enhancements to tools where Python/h5py could be used instead. An easily searched contributed application repository on the HDF Group website with user ratings would be very helpful. 14 of 14