SlideShare a Scribd company logo
The HDF Group

Parallel HDF5 Developments
Quincey Koziol

The HDF Group
koziol@hdfgroup.org

Copyright © 2010 The HDF Group. All Rights Reserved

1

www.hdfgroup.org
Parallel I/O in HDF5
• Goal is to be invisible: get same performance with HDF5
as with MPI I/O
• Project with LBNL/NERSC to improve HDF5 performance
on parallel applications:
• 6-12x performance improvements on various applications (so
far)

Copyright © 2010 The HDF Group. All Rights Reserved

2

www.hdfgroup.org
Parallel I/O In HDF5
• Up to 12GB/s to shared file (out of 15GB/s) on
NERSC’sfranklinsystem:

Copyright © 2010 The HDF Group. All Rights Reserved

3

www.hdfgroup.org
The HDF Group

Recent Improvements to
Parallel HDF5

Copyright © 2010 The HDF Group. All Rights Reserved

4

www.hdfgroup.org
Recent Parallel I/O Improvements
• Reduce number of file truncation operations
• Distribute metadata I/O over all processes
• Detect same “shape” of selection in more cases,
allowing optimized I/O path to be taken more often
• Many other, smaller, improvements to library algorithms
for faster/better use of MPI

Copyright © 2010 The HDF Group. All Rights Reserved

5

www.hdfgroup.org
Reduced File Truncations
• HDF5 library was very conservative about truncating
file when H5Fflush called.
• However, file truncation very expensive in parallel.
• Library modified to defer truncation until file closed.

Copyright © 2010 The HDF Group. All Rights Reserved

6

www.hdfgroup.org
Distributed Metadata Writes
• HDF5 caches metadata internally, to improve both
read and write performance
• Historically, process 0 writes all dirtied metadata to
HDF5 file, while other processes wait
• Changed to distribute ranges of metadata within the
file across all processes
• Results in ~10x improvement in I/O for Vorpal (see
next slide)

Copyright © 2010 The HDF Group. All Rights Reserved

7

www.hdfgroup.org
Dsitributed Metadata Writes
• I/O Trace Before Changes
• Note long sequence of I/O from process 0

• I/O Trace After Changes
• Note distribution of I/O across all processes, taking much
less time

Copyright © 2010 The HDF Group. All Rights Reserved

8

www.hdfgroup.org
Improved Selection Matching
• When HDF5 performs I/O between regions in memory
and the file, it compares the regions to see if the
application’s buffer can be directly used for I/O
• Historically, this algorithm couldn’t detect that a
region with the same shape, but embedded in arrays
of different dimensionality were the same
• For example, a 10x10 region in a 2-D array should compare
equal to the equivalent 1x10x10 region in a 3-D array

• Changed to detect same shaped region in arbitrary
source and destination buffer array dimensions,
allowing I/O from application’s buffer in more
circumstances.
Copyright © 2010 The HDF Group. All Rights Reserved

9

www.hdfgroup.org
Improved Selection Matching
• Change resulted in ~20x I/O performance
improvement when reading 1-D buffer from 2-D file
dataset
• From ~5-7 seconds (or worse) to ~0.25-0.5 seconds, on a
variety of machine architectures (Linux: amani, hdfdap, jam;
Solaris: linew)

Copyright © 2010 The HDF Group. All Rights Reserved

10

www.hdfgroup.org
The HDF Group

Upcoming Improvements to
Parallel HDF5

Copyright © 2010 The HDF Group. All Rights Reserved

11

www.hdfgroup.org
High-Level “HPC” API for HDF5
• HPC environments typically have unusual, possibly even
unique, computing, network and storage configurations.
• The HDF5 distribution should provide easy to use
interfaces that ease scientists and developers’ use of
these platforms:
• Tune and adapt to the underlying parallel file system.
• New high-­­level API routines that wrap existing
HDF5 functionality in a way that iseasier for HPC application
developers to use and help them move applications from one
HPC environment to another.
• RFC: http://www.hdfgroup.uiuc.edu/RFC/HDF5/HPC-High-LevelAPI/H5HPC_RFC-2010-09-28.pdf
Copyright © 2010 The HDF Group. All Rights Reserved

12

www.hdfgroup.org
High-Level “HPC” API for HDF5 – API Overview
• File System Tuning:
• Automatic file system tuning
• Pass file system tuning info to HDF5 library

• Convenience Routines:
• “Macro” routines
• Encapsulate common parallel I/O operations
• E.g. - create a dataset and write a different hyperslab from each
process, etc.

• “Extended” routines
• Provide special parallel I/O operations not available in main HDF5 API
• Examples:
•
•
•
•

“Group” collective I/O operations
Collective raw data I/O on multiple datasets
Collective multiple object manipulation
Optimized collective object operations

Copyright © 2010 The HDF Group. All Rights Reserved

13

www.hdfgroup.org
The HDF Group

Parallel HDF5 in the Future

Copyright © 2010 The HDF Group. All Rights Reserved

14

www.hdfgroup.org
HPC Funding in 2010 and Beyond
• DOE Exascale FOA w/LBNL &PNNL Proposal Funded
• Exascale-focused enhancements to HDF5

• LLNL Support & Development Contract
• Performance, support and medium-term focused development

• DOE Exascale FOA w/ANL and ORNL Proposal Funded
• Research on alternate file formats for Exascale I/O

• LBNL Development Contract
• Performance and short-term focus

Copyright © 2010 The HDF Group. All Rights Reserved

15

www.hdfgroup.org
Future Parallel I/O Improvements
• Library Enhancements Proposed:
•
•
•
•
•
•
•

Remove collective metadata modification restriction
Append-only mode, targeting restart files
Embarrassingly parallel mode, for decoupled applications
Overlapping compute & I/O, with asynchronous I/O
Auto-tuning to underlying parallel file system
Improve resiliency of changes to HDF5 files
Bring FastBit indexing of HDF5 files into mainstream use for
queries during data analysis and visualization
• Virtual file driver enhancements

• Improved Support:
• Parallel I/O performance tracking, testing and tuning
Copyright © 2010 The HDF Group. All Rights Reserved

16

www.hdfgroup.org
The HDF Group

Performance Hints for Using
Parallel HDF5

Copyright © 2010 The HDF Group. All Rights Reserved

18

www.hdfgroup.org
Hints for Using Parallel HDF5
• Pass along MPI Info hints to file open: H5Pset_fapl_mpio
• Use MPI-POSIX file driver to access file:
H5Pset_fapl_mpiposix
• Align objects in HDF5 file: H5Pset_alignment
• Use collective mode when performing I/O on datasets:
H5Pset_dxpl_mpio before H5Dwrite/H5Dread
• Avoid datatype conversions: make memory and file
datatypes the same
• Advanced: explicitly manage metadata flush operations
with H5Fset_mdc_config

Copyright © 2010 The HDF Group. All Rights Reserved

19

www.hdfgroup.org

More Related Content

What's hot

Introduction to NetCDF-4
Introduction to NetCDF-4Introduction to NetCDF-4
Status of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and ToolsStatus of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and Tools
The HDF-EOS Tools and Information Center
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
The HDF-EOS Tools and Information Center
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
The HDF-EOS Tools and Information Center
 
Caching and Buffering in HDF5
Caching and Buffering in HDF5Caching and Buffering in HDF5
Caching and Buffering in HDF5
The HDF-EOS Tools and Information Center
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
MODIS Land and HDF-EOS
MODIS Land and HDF-EOSMODIS Land and HDF-EOS
HDF & HDF-EOS Data & Support at NSIDC
HDF & HDF-EOS Data & Support at NSIDCHDF & HDF-EOS Data & Support at NSIDC
HDF & HDF-EOS Data & Support at NSIDC
The HDF-EOS Tools and Information Center
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
The HDF-EOS Tools and Information Center
 
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
The HDF-EOS Tools and Information Center
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
The HDF-EOS Tools and Information Center
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
NASA HDF/HDF-EOS Data for Dummies (and Developers)
NASA HDF/HDF-EOS Data for Dummies (and Developers)NASA HDF/HDF-EOS Data for Dummies (and Developers)
NASA HDF/HDF-EOS Data for Dummies (and Developers)
The HDF-EOS Tools and Information Center
 
Using IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS DataUsing IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS Data
The HDF-EOS Tools and Information Center
 
HDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSSHDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSS
The HDF-EOS Tools and Information Center
 
Hierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) UpdateHierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) Update
The HDF-EOS Tools and Information Center
 
HDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGISHDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGIS
The HDF-EOS Tools and Information Center
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
The HDF-EOS Tools and Information Center
 
Bridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data ProductsBridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data Products
The HDF-EOS Tools and Information Center
 
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFViewHDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
The HDF-EOS Tools and Information Center
 

What's hot (20)

Introduction to NetCDF-4
Introduction to NetCDF-4Introduction to NetCDF-4
Introduction to NetCDF-4
 
Status of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and ToolsStatus of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and Tools
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
Caching and Buffering in HDF5
Caching and Buffering in HDF5Caching and Buffering in HDF5
Caching and Buffering in HDF5
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
MODIS Land and HDF-EOS
MODIS Land and HDF-EOSMODIS Land and HDF-EOS
MODIS Land and HDF-EOS
 
HDF & HDF-EOS Data & Support at NSIDC
HDF & HDF-EOS Data & Support at NSIDCHDF & HDF-EOS Data & Support at NSIDC
HDF & HDF-EOS Data & Support at NSIDC
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
NASA HDF/HDF-EOS Data for Dummies (and Developers)
NASA HDF/HDF-EOS Data for Dummies (and Developers)NASA HDF/HDF-EOS Data for Dummies (and Developers)
NASA HDF/HDF-EOS Data for Dummies (and Developers)
 
Using IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS DataUsing IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS Data
 
HDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSSHDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSS
 
Hierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) UpdateHierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) Update
 
HDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGISHDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGIS
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
Bridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data ProductsBridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data Products
 
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFViewHDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
 

Similar to Parallel HDF5 Developments

Transition from HDF4 to HDF5
Transition from HDF4 to HDF5 Transition from HDF4 to HDF5
Transition from HDF4 to HDF5
The HDF-EOS Tools and Information Center
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
The HDF-EOS Tools and Information Center
 
HDF Updae
HDF UpdaeHDF Updae
HDF5 Backward and Forward Compatibility Issues
HDF5 Backward and Forward Compatibility IssuesHDF5 Backward and Forward Compatibility Issues
HDF5 Backward and Forward Compatibility Issues
The HDF-EOS Tools and Information Center
 
HDF Update
HDF UpdateHDF Update
Hdf5 current future
Hdf5 current futureHdf5 current future
Hdf5 current future
mfolk
 
Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallel
mfolk
 
HDF Update
HDF UpdateHDF Update
HDF Update
HDF UpdateHDF Update
HDF Project Status and Plans
HDF Project Status and PlansHDF Project Status and Plans
HDF Project Status and Plans
The HDF-EOS Tools and Information Center
 
HDF Project Update
HDF Project UpdateHDF Project Update
Support for NPP/NPOESS by The HDF Group
Support for NPP/NPOESS by The HDF GroupSupport for NPP/NPOESS by The HDF Group
Support for NPP/NPOESS by The HDF Group
The HDF-EOS Tools and Information Center
 
HDF Update
HDF UpdateHDF Update
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
The HDF-EOS Tools and Information Center
 
HDF OPeNDAP project update and demo
HDF OPeNDAP project update and demoHDF OPeNDAP project update and demo
HDF OPeNDAP project update and demo
The HDF-EOS Tools and Information Center
 
Support for NPP/NPOESS/JPSS by The HDF Group
 Support for NPP/NPOESS/JPSS by The HDF Group Support for NPP/NPOESS/JPSS by The HDF Group
Support for NPP/NPOESS/JPSS by The HDF Group
The HDF-EOS Tools and Information Center
 
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF5 OPeNDAP project update and demo
HDF5 OPeNDAP project update and demoHDF5 OPeNDAP project update and demo
HDF5 OPeNDAP project update and demo
The HDF-EOS Tools and Information Center
 
HDF Update
HDF UpdateHDF Update
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 

Similar to Parallel HDF5 Developments (20)

Transition from HDF4 to HDF5
Transition from HDF4 to HDF5 Transition from HDF4 to HDF5
Transition from HDF4 to HDF5
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 
HDF Updae
HDF UpdaeHDF Updae
HDF Updae
 
HDF5 Backward and Forward Compatibility Issues
HDF5 Backward and Forward Compatibility IssuesHDF5 Backward and Forward Compatibility Issues
HDF5 Backward and Forward Compatibility Issues
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Hdf5 current future
Hdf5 current futureHdf5 current future
Hdf5 current future
 
Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallel
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Project Status and Plans
HDF Project Status and PlansHDF Project Status and Plans
HDF Project Status and Plans
 
HDF Project Update
HDF Project UpdateHDF Project Update
HDF Project Update
 
Support for NPP/NPOESS by The HDF Group
Support for NPP/NPOESS by The HDF GroupSupport for NPP/NPOESS by The HDF Group
Support for NPP/NPOESS by The HDF Group
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF OPeNDAP project update and demo
HDF OPeNDAP project update and demoHDF OPeNDAP project update and demo
HDF OPeNDAP project update and demo
 
Support for NPP/NPOESS/JPSS by The HDF Group
 Support for NPP/NPOESS/JPSS by The HDF Group Support for NPP/NPOESS/JPSS by The HDF Group
Support for NPP/NPOESS/JPSS by The HDF Group
 
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at Scale
 
HDF5 OPeNDAP project update and demo
HDF5 OPeNDAP project update and demoHDF5 OPeNDAP project update and demo
HDF5 OPeNDAP project update and demo
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 

More from The HDF-EOS Tools and Information Center

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
The HDF-EOS Tools and Information Center
 
The State of HDF
The State of HDFThe State of HDF
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
The HDF-EOS Tools and Information Center
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
The HDF-EOS Tools and Information Center
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
The HDF-EOS Tools and Information Center
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
The HDF-EOS Tools and Information Center
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
The HDF-EOS Tools and Information Center
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
The HDF-EOS Tools and Information Center
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
The HDF-EOS Tools and Information Center
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
The HDF-EOS Tools and Information Center
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
The HDF-EOS Tools and Information Center
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
The HDF-EOS Tools and Information Center
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
The HDF-EOS Tools and Information Center
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
The HDF-EOS Tools and Information Center
 
Parallel Computing with HDF Server
Parallel Computing with HDF ServerParallel Computing with HDF Server
Parallel Computing with HDF Server
The HDF-EOS Tools and Information Center
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
The HDF-EOS Tools and Information Center
 
HDF Status Update
HDF Status UpdateHDF Status Update

More from The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 
Parallel Computing with HDF Server
Parallel Computing with HDF ServerParallel Computing with HDF Server
Parallel Computing with HDF Server
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
HDF Status Update
HDF Status UpdateHDF Status Update
HDF Status Update
 

Recently uploaded

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 

Recently uploaded (20)

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 

Parallel HDF5 Developments

  • 1. The HDF Group Parallel HDF5 Developments Quincey Koziol The HDF Group koziol@hdfgroup.org Copyright © 2010 The HDF Group. All Rights Reserved 1 www.hdfgroup.org
  • 2. Parallel I/O in HDF5 • Goal is to be invisible: get same performance with HDF5 as with MPI I/O • Project with LBNL/NERSC to improve HDF5 performance on parallel applications: • 6-12x performance improvements on various applications (so far) Copyright © 2010 The HDF Group. All Rights Reserved 2 www.hdfgroup.org
  • 3. Parallel I/O In HDF5 • Up to 12GB/s to shared file (out of 15GB/s) on NERSC’sfranklinsystem: Copyright © 2010 The HDF Group. All Rights Reserved 3 www.hdfgroup.org
  • 4. The HDF Group Recent Improvements to Parallel HDF5 Copyright © 2010 The HDF Group. All Rights Reserved 4 www.hdfgroup.org
  • 5. Recent Parallel I/O Improvements • Reduce number of file truncation operations • Distribute metadata I/O over all processes • Detect same “shape” of selection in more cases, allowing optimized I/O path to be taken more often • Many other, smaller, improvements to library algorithms for faster/better use of MPI Copyright © 2010 The HDF Group. All Rights Reserved 5 www.hdfgroup.org
  • 6. Reduced File Truncations • HDF5 library was very conservative about truncating file when H5Fflush called. • However, file truncation very expensive in parallel. • Library modified to defer truncation until file closed. Copyright © 2010 The HDF Group. All Rights Reserved 6 www.hdfgroup.org
  • 7. Distributed Metadata Writes • HDF5 caches metadata internally, to improve both read and write performance • Historically, process 0 writes all dirtied metadata to HDF5 file, while other processes wait • Changed to distribute ranges of metadata within the file across all processes • Results in ~10x improvement in I/O for Vorpal (see next slide) Copyright © 2010 The HDF Group. All Rights Reserved 7 www.hdfgroup.org
  • 8. Dsitributed Metadata Writes • I/O Trace Before Changes • Note long sequence of I/O from process 0 • I/O Trace After Changes • Note distribution of I/O across all processes, taking much less time Copyright © 2010 The HDF Group. All Rights Reserved 8 www.hdfgroup.org
  • 9. Improved Selection Matching • When HDF5 performs I/O between regions in memory and the file, it compares the regions to see if the application’s buffer can be directly used for I/O • Historically, this algorithm couldn’t detect that a region with the same shape, but embedded in arrays of different dimensionality were the same • For example, a 10x10 region in a 2-D array should compare equal to the equivalent 1x10x10 region in a 3-D array • Changed to detect same shaped region in arbitrary source and destination buffer array dimensions, allowing I/O from application’s buffer in more circumstances. Copyright © 2010 The HDF Group. All Rights Reserved 9 www.hdfgroup.org
  • 10. Improved Selection Matching • Change resulted in ~20x I/O performance improvement when reading 1-D buffer from 2-D file dataset • From ~5-7 seconds (or worse) to ~0.25-0.5 seconds, on a variety of machine architectures (Linux: amani, hdfdap, jam; Solaris: linew) Copyright © 2010 The HDF Group. All Rights Reserved 10 www.hdfgroup.org
  • 11. The HDF Group Upcoming Improvements to Parallel HDF5 Copyright © 2010 The HDF Group. All Rights Reserved 11 www.hdfgroup.org
  • 12. High-Level “HPC” API for HDF5 • HPC environments typically have unusual, possibly even unique, computing, network and storage configurations. • The HDF5 distribution should provide easy to use interfaces that ease scientists and developers’ use of these platforms: • Tune and adapt to the underlying parallel file system. • New high-­­level API routines that wrap existing HDF5 functionality in a way that iseasier for HPC application developers to use and help them move applications from one HPC environment to another. • RFC: http://www.hdfgroup.uiuc.edu/RFC/HDF5/HPC-High-LevelAPI/H5HPC_RFC-2010-09-28.pdf Copyright © 2010 The HDF Group. All Rights Reserved 12 www.hdfgroup.org
  • 13. High-Level “HPC” API for HDF5 – API Overview • File System Tuning: • Automatic file system tuning • Pass file system tuning info to HDF5 library • Convenience Routines: • “Macro” routines • Encapsulate common parallel I/O operations • E.g. - create a dataset and write a different hyperslab from each process, etc. • “Extended” routines • Provide special parallel I/O operations not available in main HDF5 API • Examples: • • • • “Group” collective I/O operations Collective raw data I/O on multiple datasets Collective multiple object manipulation Optimized collective object operations Copyright © 2010 The HDF Group. All Rights Reserved 13 www.hdfgroup.org
  • 14. The HDF Group Parallel HDF5 in the Future Copyright © 2010 The HDF Group. All Rights Reserved 14 www.hdfgroup.org
  • 15. HPC Funding in 2010 and Beyond • DOE Exascale FOA w/LBNL &PNNL Proposal Funded • Exascale-focused enhancements to HDF5 • LLNL Support & Development Contract • Performance, support and medium-term focused development • DOE Exascale FOA w/ANL and ORNL Proposal Funded • Research on alternate file formats for Exascale I/O • LBNL Development Contract • Performance and short-term focus Copyright © 2010 The HDF Group. All Rights Reserved 15 www.hdfgroup.org
  • 16. Future Parallel I/O Improvements • Library Enhancements Proposed: • • • • • • • Remove collective metadata modification restriction Append-only mode, targeting restart files Embarrassingly parallel mode, for decoupled applications Overlapping compute & I/O, with asynchronous I/O Auto-tuning to underlying parallel file system Improve resiliency of changes to HDF5 files Bring FastBit indexing of HDF5 files into mainstream use for queries during data analysis and visualization • Virtual file driver enhancements • Improved Support: • Parallel I/O performance tracking, testing and tuning Copyright © 2010 The HDF Group. All Rights Reserved 16 www.hdfgroup.org
  • 17. The HDF Group Performance Hints for Using Parallel HDF5 Copyright © 2010 The HDF Group. All Rights Reserved 18 www.hdfgroup.org
  • 18. Hints for Using Parallel HDF5 • Pass along MPI Info hints to file open: H5Pset_fapl_mpio • Use MPI-POSIX file driver to access file: H5Pset_fapl_mpiposix • Align objects in HDF5 file: H5Pset_alignment • Use collective mode when performing I/O on datasets: H5Pset_dxpl_mpio before H5Dwrite/H5Dread • Avoid datatype conversions: make memory and file datatypes the same • Advanced: explicitly manage metadata flush operations with H5Fset_mdc_config Copyright © 2010 The HDF Group. All Rights Reserved 19 www.hdfgroup.org