SlideShare a Scribd company logo
HDF5 ⬌ Zarr
2020 ESIP Summer Meeting
Aleksandar Jelenak
• Zarr ➔ HDF5*
• HDF5 ➔ Zarr
*A portion of this work was supported by the United States Geological Survey (USGS). Any opinions, findings, conclusions, or recommendations
expressed in this material are those of the authors and do not necessarily reflect the views of United States Government.
2Overview
Zarr ➔ HDF5 3
https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314
• Created a new Zarr store: FileChunkStore
• Developed Zarr JSON metadata for (HDF5) chunk file location: .zchunkstore
• Small fixes in the zarr and xarray Python packages to support a separate Zarr store
for file chunks
4Zarr ➔ HDF5: Implementation
"zeta/9.38": {
"offset": 3883848970,
"size": 6907818
},
"zeta/9.39": {
"offset": 3890756788,
"size": 7879493
},
"zeta/9.4": {
"offset": 3648355033,
"size": 6525250
},
"zeta/9.40": {
"offset": 3898636281,
"size": 7132453
}
5Zarr ➔ HDF5: Performance comparison
Zarr reading Zarr Zarr reading HDF5
• Python >= 3.6
• HDF5-1.10.6 library
• pip install git+https://github.com/h5py/h5py.git
• pip install git+https://github.com/ajelenak/xarray.git@zarr-chunkstore
• pip install git+https://github.com/HDFGroup/zarr-python.git@hdf5
• pip install fsspec
• HDF5-to-Zarr translator:
https://gist.github.com/ajelenak/80354a95b449cedea5cca508004f97a9
6Zarr ➔ HDF5: How to try out?
• HDF5 dataset compact layout not supported
• HDF5 dataset data may be written by compressor/filter not supported by Zarr
• Storage system hosting the HDF5 file must allow partial file reading
7Zarr ➔ HDF5: Limitations
• HDF5 API access to Zarr data is provided by the HDF’s Highly Scalabale Data
Service (HSDS)
• Only for Zarr data in AWS S3
• HSDS object store schema is similar to Zarr: a combination of JSON and binary
objects
• HSDS JSON is a superset of HDF5/JSON
• Still work in progress
8HDF5 ➔ Zarr
• Using special HSDS schema chunking layout: H5D_CHUNKED_REF_INDIRECT
• This chunking layout is not supported by the HDF5 library
• Developed to enable HSDS access to chunks in HDF5 files in object storage
• Chunk information for one Zarr array is stored as an anonymous HDF5 compound
dataset
• The compound datatype has 3 fields for: byte offset (always 0), chunk object size,
and chunk object URI
• The HDF5 dataset representing the Zarr array has the
H5D_CHUNKED_REF_INDIRECT layout and its value points to the anonymous HDF5
dataset with chunk location information
9HDF5 ➔ Zarr: Implementation
• Because HSDS does not (yet) support the Blosc compressor, the original Zarr
dataset was copied with the Zlib compressor instead:
10HDF5 ➔ Zarr: Data Wrangling
from sys import stdout
import zarr
import fsspec
from numcodecs import Zlib
src_root = zarr.open(
fsspec.get_mapper('s3://pangeo-data-uswest2/esip/adcirc/adcirc_01d',
anon=False,
requester_pays=True),
mode='r')
dest_root = zarr.open(
fsspec.get_mapper('s3://hdf5-zarr/adcirc_01d.zarr’, anon=False),
mode='w')
zarr.copy_all(src_root, dest_root, shallow=False, without_attrs=False,
log=stdout, if_exists='replace', dry_run=True,
compressor=Zlib(level=6), filters=None)
HDF5 ➔ Zarr: Example translation
Zarr array HDF5 dataset with chunk info
Name : /zeta
Type : zarr.core.Array
Data type : float64
Shape : (720, 9228245)
Chunk shape : (10, 141973)
Compressor : Zlib(level=6)
No. bytes : 53154691200 (49.5G)
Chunks initialized : 4680/4680
Type : h5pyd.Dataset
Data type : compound
Shape : (72, 65)
Value:
[[(0, 1949049, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.0')
(0, 2911533, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.1')
(0, 2506163, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.2') ...
(0, 4344724, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.62')
(0, 5696617, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.63')
(0, 4275725, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.64')]
11
• Zarr array data may be written by compressor/filter not supported by HSDS
• Since both Zarr and HSDS are written in Python, it should be possible to add Zarr’s
numcodecs package to HSDS to resolve this limitation
12HDF5 ➔ Zarr: Limitations
THANK YOU!
ajelenak@hdfgroup.org
13

More Related Content

What's hot

MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
The HDF-EOS Tools and Information Center
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
The HDF-EOS Tools and Information Center
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
The HDF-EOS Tools and Information Center
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
The HDF-EOS Tools and Information Center
 
MODIS Land and HDF-EOS
MODIS Land and HDF-EOSMODIS Land and HDF-EOS
HDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF ConverterHDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF Converter
The HDF-EOS Tools and Information Center
 
Easy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAPEasy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAP
The HDF-EOS Tools and Information Center
 
NetCDF and HDF5
NetCDF and HDF5NetCDF and HDF5
HDF Update 2016
HDF Update 2016HDF Update 2016
Product Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the WebProduct Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the Web
The HDF-EOS Tools and Information Center
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
The HDF-EOS Tools and Information Center
 
HDF Product Designer
HDF Product DesignerHDF Product Designer
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
The HDF-EOS Tools and Information Center
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
The HDF-EOS Tools and Information Center
 
NEON HDF5
NEON HDF5NEON HDF5
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
The HDF-EOS Tools and Information Center
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsThe HDF-EOS Tools and Information Center
 
HDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve InteroperabilityHDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve Interoperability
The HDF-EOS Tools and Information Center
 

What's hot (20)

MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
MODIS Land and HDF-EOS
MODIS Land and HDF-EOSMODIS Land and HDF-EOS
MODIS Land and HDF-EOS
 
Parallel HDF5 Developments
Parallel HDF5 DevelopmentsParallel HDF5 Developments
Parallel HDF5 Developments
 
HDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF ConverterHDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF Converter
 
Easy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAPEasy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAP
 
NetCDF and HDF5
NetCDF and HDF5NetCDF and HDF5
NetCDF and HDF5
 
HDF Update 2016
HDF Update 2016HDF Update 2016
HDF Update 2016
 
Product Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the WebProduct Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the Web
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 
HDF Product Designer
HDF Product DesignerHDF Product Designer
HDF Product Designer
 
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
NEON HDF5
NEON HDF5NEON HDF5
NEON HDF5
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
 
HDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve InteroperabilityHDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve Interoperability
 

Similar to HDF5 <-> Zarr

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
The HDF-EOS Tools and Information Center
 
Dimension Scales in HDF-EOS2 and HDF-EOS5
Dimension Scales in HDF-EOS2 and HDF-EOS5 Dimension Scales in HDF-EOS2 and HDF-EOS5
Dimension Scales in HDF-EOS2 and HDF-EOS5
The HDF-EOS Tools and Information Center
 
HDF5 Tools Updates
HDF5 Tools UpdatesHDF5 Tools Updates
HDF5 Tools Update
HDF5 Tools UpdateHDF5 Tools Update
Hdf5 intro
Hdf5 introHdf5 intro
Hdf5 intro
Smith Kim
 
HDF for the Cloud
HDF for the CloudHDF for the Cloud
HDF5 Advanced Topics
HDF5 Advanced TopicsHDF5 Advanced Topics
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
The HDF-EOS Tools and Information Center
 
S3 VFD
S3 VFDS3 VFD
Performance Tuning in HDF5
Performance Tuning in HDF5 Performance Tuning in HDF5
Performance Tuning in HDF5
The HDF-EOS Tools and Information Center
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Integrating HDF5 with SRB
Integrating HDF5 with SRBIntegrating HDF5 with SRB
Integrating HDF5 with SRB
The HDF-EOS Tools and Information Center
 
Introduction to NetCDF-4
Introduction to NetCDF-4Introduction to NetCDF-4
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
HDF5 Life cycle of data
HDF5 Life cycle of dataHDF5 Life cycle of data

Similar to HDF5 <-> Zarr (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
 
Dimension Scales in HDF-EOS2 and HDF-EOS5
Dimension Scales in HDF-EOS2 and HDF-EOS5 Dimension Scales in HDF-EOS2 and HDF-EOS5
Dimension Scales in HDF-EOS2 and HDF-EOS5
 
HDF5 Tools Updates
HDF5 Tools UpdatesHDF5 Tools Updates
HDF5 Tools Updates
 
HDF5 Tools Update
HDF5 Tools UpdateHDF5 Tools Update
HDF5 Tools Update
 
Hdf5 intro
Hdf5 introHdf5 intro
Hdf5 intro
 
HDF for the Cloud
HDF for the CloudHDF for the Cloud
HDF for the Cloud
 
HDF5 Advanced Topics
HDF5 Advanced TopicsHDF5 Advanced Topics
HDF5 Advanced Topics
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
S3 VFD
S3 VFDS3 VFD
S3 VFD
 
Performance Tuning in HDF5
Performance Tuning in HDF5 Performance Tuning in HDF5
Performance Tuning in HDF5
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
Integrating HDF5 with SRB
Integrating HDF5 with SRBIntegrating HDF5 with SRB
Integrating HDF5 with SRB
 
Introduction to NetCDF-4
Introduction to NetCDF-4Introduction to NetCDF-4
Introduction to NetCDF-4
 
Using HDF5 tools for performance tuning and troubleshooting
Using HDF5 tools for performance tuning and troubleshootingUsing HDF5 tools for performance tuning and troubleshooting
Using HDF5 tools for performance tuning and troubleshooting
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
HDF5 Life cycle of data
HDF5 Life cycle of dataHDF5 Life cycle of data
HDF5 Life cycle of data
 
UML Representation of NPOESS Data Products in HDF5
UML Representation of NPOESS Data Products in HDF5UML Representation of NPOESS Data Products in HDF5
UML Representation of NPOESS Data Products in HDF5
 

More from The HDF-EOS Tools and Information Center

Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
The HDF-EOS Tools and Information Center
 
The State of HDF
The State of HDFThe State of HDF
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
The HDF-EOS Tools and Information Center
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
The HDF-EOS Tools and Information Center
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
The HDF-EOS Tools and Information Center
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
The HDF-EOS Tools and Information Center
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
The HDF-EOS Tools and Information Center
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
The HDF-EOS Tools and Information Center
 
HDF Status Update
HDF Status UpdateHDF Status Update
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF Data in the Cloud
HDF Data in the CloudHDF Data in the Cloud
HDF Kita Lab: JupyterLab + HDF Service
HDF Kita Lab: JupyterLab + HDF ServiceHDF Kita Lab: JupyterLab + HDF Service
HDF Kita Lab: JupyterLab + HDF Service
The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (13)

Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
HDF Status Update
HDF Status UpdateHDF Status Update
HDF Status Update
 
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
NASA Terra Data Fusion
 
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at Scale
 
HDF Data in the Cloud
HDF Data in the CloudHDF Data in the Cloud
HDF Data in the Cloud
 
HDF Kita Lab: JupyterLab + HDF Service
HDF Kita Lab: JupyterLab + HDF ServiceHDF Kita Lab: JupyterLab + HDF Service
HDF Kita Lab: JupyterLab + HDF Service
 

Recently uploaded

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 

Recently uploaded (20)

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 

HDF5 <-> Zarr

  • 1. HDF5 ⬌ Zarr 2020 ESIP Summer Meeting Aleksandar Jelenak
  • 2. • Zarr ➔ HDF5* • HDF5 ➔ Zarr *A portion of this work was supported by the United States Geological Survey (USGS). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of United States Government. 2Overview
  • 3. Zarr ➔ HDF5 3 https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314
  • 4. • Created a new Zarr store: FileChunkStore • Developed Zarr JSON metadata for (HDF5) chunk file location: .zchunkstore • Small fixes in the zarr and xarray Python packages to support a separate Zarr store for file chunks 4Zarr ➔ HDF5: Implementation "zeta/9.38": { "offset": 3883848970, "size": 6907818 }, "zeta/9.39": { "offset": 3890756788, "size": 7879493 }, "zeta/9.4": { "offset": 3648355033, "size": 6525250 }, "zeta/9.40": { "offset": 3898636281, "size": 7132453 }
  • 5. 5Zarr ➔ HDF5: Performance comparison Zarr reading Zarr Zarr reading HDF5
  • 6. • Python >= 3.6 • HDF5-1.10.6 library • pip install git+https://github.com/h5py/h5py.git • pip install git+https://github.com/ajelenak/xarray.git@zarr-chunkstore • pip install git+https://github.com/HDFGroup/zarr-python.git@hdf5 • pip install fsspec • HDF5-to-Zarr translator: https://gist.github.com/ajelenak/80354a95b449cedea5cca508004f97a9 6Zarr ➔ HDF5: How to try out?
  • 7. • HDF5 dataset compact layout not supported • HDF5 dataset data may be written by compressor/filter not supported by Zarr • Storage system hosting the HDF5 file must allow partial file reading 7Zarr ➔ HDF5: Limitations
  • 8. • HDF5 API access to Zarr data is provided by the HDF’s Highly Scalabale Data Service (HSDS) • Only for Zarr data in AWS S3 • HSDS object store schema is similar to Zarr: a combination of JSON and binary objects • HSDS JSON is a superset of HDF5/JSON • Still work in progress 8HDF5 ➔ Zarr
  • 9. • Using special HSDS schema chunking layout: H5D_CHUNKED_REF_INDIRECT • This chunking layout is not supported by the HDF5 library • Developed to enable HSDS access to chunks in HDF5 files in object storage • Chunk information for one Zarr array is stored as an anonymous HDF5 compound dataset • The compound datatype has 3 fields for: byte offset (always 0), chunk object size, and chunk object URI • The HDF5 dataset representing the Zarr array has the H5D_CHUNKED_REF_INDIRECT layout and its value points to the anonymous HDF5 dataset with chunk location information 9HDF5 ➔ Zarr: Implementation
  • 10. • Because HSDS does not (yet) support the Blosc compressor, the original Zarr dataset was copied with the Zlib compressor instead: 10HDF5 ➔ Zarr: Data Wrangling from sys import stdout import zarr import fsspec from numcodecs import Zlib src_root = zarr.open( fsspec.get_mapper('s3://pangeo-data-uswest2/esip/adcirc/adcirc_01d', anon=False, requester_pays=True), mode='r') dest_root = zarr.open( fsspec.get_mapper('s3://hdf5-zarr/adcirc_01d.zarr’, anon=False), mode='w') zarr.copy_all(src_root, dest_root, shallow=False, without_attrs=False, log=stdout, if_exists='replace', dry_run=True, compressor=Zlib(level=6), filters=None)
  • 11. HDF5 ➔ Zarr: Example translation Zarr array HDF5 dataset with chunk info Name : /zeta Type : zarr.core.Array Data type : float64 Shape : (720, 9228245) Chunk shape : (10, 141973) Compressor : Zlib(level=6) No. bytes : 53154691200 (49.5G) Chunks initialized : 4680/4680 Type : h5pyd.Dataset Data type : compound Shape : (72, 65) Value: [[(0, 1949049, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.0') (0, 2911533, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.1') (0, 2506163, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.2') ... (0, 4344724, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.62') (0, 5696617, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.63') (0, 4275725, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.64')] 11
  • 12. • Zarr array data may be written by compressor/filter not supported by HSDS • Since both Zarr and HSDS are written in Python, it should be possible to add Zarr’s numcodecs package to HSDS to resolve this limitation 12HDF5 ➔ Zarr: Limitations