SlideShare a Scribd company logo
1 of 13
Download to read offline
HDF5 ⬌ Zarr
2020 ESIP Summer Meeting
Aleksandar Jelenak
• Zarr ➔ HDF5*
• HDF5 ➔ Zarr
*A portion of this work was supported by the United States Geological Survey (USGS). Any opinions, findings, conclusions, or recommendations
expressed in this material are those of the authors and do not necessarily reflect the views of United States Government.
2Overview
Zarr ➔ HDF5 3
https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314
• Created a new Zarr store: FileChunkStore
• Developed Zarr JSON metadata for (HDF5) chunk file location: .zchunkstore
• Small fixes in the zarr and xarray Python packages to support a separate Zarr store
for file chunks
4Zarr ➔ HDF5: Implementation
"zeta/9.38": {
"offset": 3883848970,
"size": 6907818
},
"zeta/9.39": {
"offset": 3890756788,
"size": 7879493
},
"zeta/9.4": {
"offset": 3648355033,
"size": 6525250
},
"zeta/9.40": {
"offset": 3898636281,
"size": 7132453
}
5Zarr ➔ HDF5: Performance comparison
Zarr reading Zarr Zarr reading HDF5
• Python >= 3.6
• HDF5-1.10.6 library
• pip install git+https://github.com/h5py/h5py.git
• pip install git+https://github.com/ajelenak/xarray.git@zarr-chunkstore
• pip install git+https://github.com/HDFGroup/zarr-python.git@hdf5
• pip install fsspec
• HDF5-to-Zarr translator:
https://gist.github.com/ajelenak/80354a95b449cedea5cca508004f97a9
6Zarr ➔ HDF5: How to try out?
• HDF5 dataset compact layout not supported
• HDF5 dataset data may be written by compressor/filter not supported by Zarr
• Storage system hosting the HDF5 file must allow partial file reading
7Zarr ➔ HDF5: Limitations
• HDF5 API access to Zarr data is provided by the HDF’s Highly Scalabale Data
Service (HSDS)
• Only for Zarr data in AWS S3
• HSDS object store schema is similar to Zarr: a combination of JSON and binary
objects
• HSDS JSON is a superset of HDF5/JSON
• Still work in progress
8HDF5 ➔ Zarr
• Using special HSDS schema chunking layout: H5D_CHUNKED_REF_INDIRECT
• This chunking layout is not supported by the HDF5 library
• Developed to enable HSDS access to chunks in HDF5 files in object storage
• Chunk information for one Zarr array is stored as an anonymous HDF5 compound
dataset
• The compound datatype has 3 fields for: byte offset (always 0), chunk object size,
and chunk object URI
• The HDF5 dataset representing the Zarr array has the
H5D_CHUNKED_REF_INDIRECT layout and its value points to the anonymous HDF5
dataset with chunk location information
9HDF5 ➔ Zarr: Implementation
• Because HSDS does not (yet) support the Blosc compressor, the original Zarr
dataset was copied with the Zlib compressor instead:
10HDF5 ➔ Zarr: Data Wrangling
from sys import stdout
import zarr
import fsspec
from numcodecs import Zlib
src_root = zarr.open(
fsspec.get_mapper('s3://pangeo-data-uswest2/esip/adcirc/adcirc_01d',
anon=False,
requester_pays=True),
mode='r')
dest_root = zarr.open(
fsspec.get_mapper('s3://hdf5-zarr/adcirc_01d.zarr’, anon=False),
mode='w')
zarr.copy_all(src_root, dest_root, shallow=False, without_attrs=False,
log=stdout, if_exists='replace', dry_run=True,
compressor=Zlib(level=6), filters=None)
HDF5 ➔ Zarr: Example translation
Zarr array HDF5 dataset with chunk info
Name : /zeta
Type : zarr.core.Array
Data type : float64
Shape : (720, 9228245)
Chunk shape : (10, 141973)
Compressor : Zlib(level=6)
No. bytes : 53154691200 (49.5G)
Chunks initialized : 4680/4680
Type : h5pyd.Dataset
Data type : compound
Shape : (72, 65)
Value:
[[(0, 1949049, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.0')
(0, 2911533, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.1')
(0, 2506163, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.2') ...
(0, 4344724, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.62')
(0, 5696617, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.63')
(0, 4275725, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.64')]
11
• Zarr array data may be written by compressor/filter not supported by HSDS
• Since both Zarr and HSDS are written in Python, it should be possible to add Zarr’s
numcodecs package to HSDS to resolve this limitation
12HDF5 ➔ Zarr: Limitations
THANK YOU!
ajelenak@hdfgroup.org
13

More Related Content

What's hot

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsThe HDF-EOS Tools and Information Center
 

What's hot (20)

MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
MODIS Land and HDF-EOS
MODIS Land and HDF-EOSMODIS Land and HDF-EOS
MODIS Land and HDF-EOS
 
Parallel HDF5 Developments
Parallel HDF5 DevelopmentsParallel HDF5 Developments
Parallel HDF5 Developments
 
HDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF ConverterHDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF Converter
 
Easy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAPEasy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAP
 
NetCDF and HDF5
NetCDF and HDF5NetCDF and HDF5
NetCDF and HDF5
 
HDF Update 2016
HDF Update 2016HDF Update 2016
HDF Update 2016
 
Product Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the WebProduct Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the Web
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 
HDF Product Designer
HDF Product DesignerHDF Product Designer
HDF Product Designer
 
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
NEON HDF5
NEON HDF5NEON HDF5
NEON HDF5
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
 
HDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve InteroperabilityHDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve Interoperability
 

Similar to HDF5 <-> Zarr

HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesThe HDF-EOS Tools and Information Center
 

Similar to HDF5 <-> Zarr (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
 
Dimension Scales in HDF-EOS2 and HDF-EOS5
Dimension Scales in HDF-EOS2 and HDF-EOS5 Dimension Scales in HDF-EOS2 and HDF-EOS5
Dimension Scales in HDF-EOS2 and HDF-EOS5
 
HDF5 Tools Updates
HDF5 Tools UpdatesHDF5 Tools Updates
HDF5 Tools Updates
 
HDF5 Tools Update
HDF5 Tools UpdateHDF5 Tools Update
HDF5 Tools Update
 
Hdf5 intro
Hdf5 introHdf5 intro
Hdf5 intro
 
HDF for the Cloud
HDF for the CloudHDF for the Cloud
HDF for the Cloud
 
HDF5 Advanced Topics
HDF5 Advanced TopicsHDF5 Advanced Topics
HDF5 Advanced Topics
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
S3 VFD
S3 VFDS3 VFD
S3 VFD
 
Performance Tuning in HDF5
Performance Tuning in HDF5 Performance Tuning in HDF5
Performance Tuning in HDF5
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
Integrating HDF5 with SRB
Integrating HDF5 with SRBIntegrating HDF5 with SRB
Integrating HDF5 with SRB
 
Introduction to NetCDF-4
Introduction to NetCDF-4Introduction to NetCDF-4
Introduction to NetCDF-4
 
Using HDF5 tools for performance tuning and troubleshooting
Using HDF5 tools for performance tuning and troubleshootingUsing HDF5 tools for performance tuning and troubleshooting
Using HDF5 tools for performance tuning and troubleshooting
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
HDF5 Life cycle of data
HDF5 Life cycle of dataHDF5 Life cycle of data
HDF5 Life cycle of data
 
UML Representation of NPOESS Data Products in HDF5
UML Representation of NPOESS Data Products in HDF5UML Representation of NPOESS Data Products in HDF5
UML Representation of NPOESS Data Products in HDF5
 

More from The HDF-EOS Tools and Information Center

More from The HDF-EOS Tools and Information Center (13)

Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
HDF Status Update
HDF Status UpdateHDF Status Update
HDF Status Update
 
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
NASA Terra Data Fusion
 
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at Scale
 
HDF Data in the Cloud
HDF Data in the CloudHDF Data in the Cloud
HDF Data in the Cloud
 
HDF Kita Lab: JupyterLab + HDF Service
HDF Kita Lab: JupyterLab + HDF ServiceHDF Kita Lab: JupyterLab + HDF Service
HDF Kita Lab: JupyterLab + HDF Service
 

Recently uploaded

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 

Recently uploaded (20)

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 

HDF5 <-> Zarr

  • 1. HDF5 ⬌ Zarr 2020 ESIP Summer Meeting Aleksandar Jelenak
  • 2. • Zarr ➔ HDF5* • HDF5 ➔ Zarr *A portion of this work was supported by the United States Geological Survey (USGS). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of United States Government. 2Overview
  • 3. Zarr ➔ HDF5 3 https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314
  • 4. • Created a new Zarr store: FileChunkStore • Developed Zarr JSON metadata for (HDF5) chunk file location: .zchunkstore • Small fixes in the zarr and xarray Python packages to support a separate Zarr store for file chunks 4Zarr ➔ HDF5: Implementation "zeta/9.38": { "offset": 3883848970, "size": 6907818 }, "zeta/9.39": { "offset": 3890756788, "size": 7879493 }, "zeta/9.4": { "offset": 3648355033, "size": 6525250 }, "zeta/9.40": { "offset": 3898636281, "size": 7132453 }
  • 5. 5Zarr ➔ HDF5: Performance comparison Zarr reading Zarr Zarr reading HDF5
  • 6. • Python >= 3.6 • HDF5-1.10.6 library • pip install git+https://github.com/h5py/h5py.git • pip install git+https://github.com/ajelenak/xarray.git@zarr-chunkstore • pip install git+https://github.com/HDFGroup/zarr-python.git@hdf5 • pip install fsspec • HDF5-to-Zarr translator: https://gist.github.com/ajelenak/80354a95b449cedea5cca508004f97a9 6Zarr ➔ HDF5: How to try out?
  • 7. • HDF5 dataset compact layout not supported • HDF5 dataset data may be written by compressor/filter not supported by Zarr • Storage system hosting the HDF5 file must allow partial file reading 7Zarr ➔ HDF5: Limitations
  • 8. • HDF5 API access to Zarr data is provided by the HDF’s Highly Scalabale Data Service (HSDS) • Only for Zarr data in AWS S3 • HSDS object store schema is similar to Zarr: a combination of JSON and binary objects • HSDS JSON is a superset of HDF5/JSON • Still work in progress 8HDF5 ➔ Zarr
  • 9. • Using special HSDS schema chunking layout: H5D_CHUNKED_REF_INDIRECT • This chunking layout is not supported by the HDF5 library • Developed to enable HSDS access to chunks in HDF5 files in object storage • Chunk information for one Zarr array is stored as an anonymous HDF5 compound dataset • The compound datatype has 3 fields for: byte offset (always 0), chunk object size, and chunk object URI • The HDF5 dataset representing the Zarr array has the H5D_CHUNKED_REF_INDIRECT layout and its value points to the anonymous HDF5 dataset with chunk location information 9HDF5 ➔ Zarr: Implementation
  • 10. • Because HSDS does not (yet) support the Blosc compressor, the original Zarr dataset was copied with the Zlib compressor instead: 10HDF5 ➔ Zarr: Data Wrangling from sys import stdout import zarr import fsspec from numcodecs import Zlib src_root = zarr.open( fsspec.get_mapper('s3://pangeo-data-uswest2/esip/adcirc/adcirc_01d', anon=False, requester_pays=True), mode='r') dest_root = zarr.open( fsspec.get_mapper('s3://hdf5-zarr/adcirc_01d.zarr’, anon=False), mode='w') zarr.copy_all(src_root, dest_root, shallow=False, without_attrs=False, log=stdout, if_exists='replace', dry_run=True, compressor=Zlib(level=6), filters=None)
  • 11. HDF5 ➔ Zarr: Example translation Zarr array HDF5 dataset with chunk info Name : /zeta Type : zarr.core.Array Data type : float64 Shape : (720, 9228245) Chunk shape : (10, 141973) Compressor : Zlib(level=6) No. bytes : 53154691200 (49.5G) Chunks initialized : 4680/4680 Type : h5pyd.Dataset Data type : compound Shape : (72, 65) Value: [[(0, 1949049, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.0') (0, 2911533, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.1') (0, 2506163, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.2') ... (0, 4344724, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.62') (0, 5696617, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.63') (0, 4275725, 's3://hdf5-zarr/adcirc_01d.zarr/zeta/0.64')] 11
  • 12. • Zarr array data may be written by compressor/filter not supported by HSDS • Since both Zarr and HSDS are written in Python, it should be possible to add Zarr’s numcodecs package to HSDS to resolve this limitation 12HDF5 ➔ Zarr: Limitations