SlideShare a Scribd company logo
1 of 34
Download to read offline
HDF5 Roadmap and New Features
July 22, 2022
Dana Robinson
The HDF Group
2
What we'll cover…
• HDF5 release schedule
• Virtual Object Layer
• Onion VFD
• VFD SWMR
• "HDF5 2.0"
HDF5 Release Schedule
4
The Current Roadmap
• Published on GitHub in README.md (current as of today)
• Delays 1.13.2 by ~1 month so we can release the subfiling feature earlier
• 1.8 and 1.12 will be retired at the end of the year
5
1.13.x: New Features
HDF5 1.13.2 (August)
HDF5 1.13.3 (October)
• Selection I/O (vector I/O at the VFD level)
• Onion VFD (versioned HDF5 files)
• VFD SWMR (improved SWMR feature)
• Subfiling (I/O concentrators in parallel HDF5)
• Multi-Dataset I/O (I/O that spans multiple datasets)
6
Experimental vs Maintenance Releases
• Even number minor releases are maintenance releases (e.g., 1.12.x)
• Usual HDF5 maintenance branches you know and love
• Binary compatibility
• Stable file format
• Odd number minor releases are experimental releases (e.g., 1.13.x)
• No binary compatibility guarantees
• API calls can change
• File format can change
• Unready features may be dropped
• Used to try out new features as we prepare for the next maintenance release
• See the blog posts on this for more clarity
• https://www.hdfgroup.org/2021/12/hdf5-1-13-0-introducing-experimental-releases
Virtual Object Layer
8
Original HDF5 Architecture (pre-1.12.0)
Application
Public HDF5 API
Non-Storage
API Call Internals
(H5S, H5P, …)
Storage-Oriented
API Call Internals
(H5F, H5D, …)
GPU
sec2
split
Virtual File Layer (VFL)
Storage
core
family
direct
…
9
Current HDF5 Architecture (1.12.0+)
Application
Public HDF5 API
Non-Storage
API Call Internals
(H5S, H5P, …)
Storage-Oriented
API Call Internals
GPU
sec2
split
Virtual File Layer (VFL)
Storage
core
family
direct
…
Virtual Object Layer (VOL)
10
Current HDF5 Architecture (1.12.0+)
Application
Public HDF5 API
Non-Storage
API Call Internals
(H5S, H5P, …)
Storage-Oriented
API Call Internals
GPU
sec2
split
Virtual File Layer (VFL)
Storage
core
family
direct
…
Virtual Object Layer (VOL) New Layer
Allows you to
replace this with
your own code!
11
Two Kinds of VOL Connector
DAOS
VOL
Async
VOL
Native
VOL
Storage
Storage
Terminal
Maps HDF5 objects to
arbitrary storage schemes
Pass-Through
Perform operations (e.g.
caching, logging) before
passing the data on to the
next connector.
12
Current HDF5 Architecture (1.12.0+)
Application
HDF5 API
Non-VOL
API
Calls
H5S
H5P
…
Virtual Object Layer
Native
VOL
GPU
sec2
split
DAOS
VOL
VFL
Async
VOL
Native
VOL
Storage
Storage
Storage
Plugin
Key
Native VOL
HDF5 Library
13
Storage
VOL Connector
Terminal VOL Connectors
H5Fcreate("foo.h5", flags, fcpl , fapl)
Public API
"foo.h5"
flags
fcpl
fapl
14
VOL Toolkit Repository
• Location: https://github.com/HDFGroup/vol-toolkit
• All your VOL construction needs in a single location
• Does not contain original content
• Designed to bring important content from other repositories together with consistent versioning
• Content is mainly included as git submodules, though the docs are currently copied in
• Tags will identify "HDF5 1.13.0", etc. versions of the toolkit
• Includes a VOL construction tutorial
15
HDF5 1.13.x / 1.14.0 VOL Changes
• NOTE: ALL VOL CONNECTOR DEVELOPMENT SHOULD TARGET HDF5 1.13.x
• Do NOT use HDF5 1.12.x
• There are important changes to the VOL interface in HDF5 1.13.x that cannot be moved to
1.12.x without breaking binary compatibility
• As mentioned earlier, HDF5 1.13.x are experimental releases
• It is possible that the VOL interface could be changed in the 1.13.x versions that will be
released as HDF5 1.14.0
• Example: VOL connector feature flags being introduced this summer that describe HDF5 API
capabilities
• Should be fairly stable, and updating is straightforward
Onion VFD
17
Onion VFD
• Enables creating "versioned" HDF5 files
• "version" defined by file open/write/close cycle
• Data and metadata for additional versions stored
in a separate "onion" file
• Only one onion file, regardless of number of
versions
• Can open onion files by version
• New API calls to get the number of versions, etc.
• Command-line tools support
Original
File
Version 4
Version 2
Version 3
Onion File
"foo.h5" "foo.onion"
Linear only! No trees!
18
Onion VFD
• Will be released in HDF5 1.13.2 (August)
• Not coming to 1.12.x or earlier due to
incompatible VFL changes
Original
File
Version 4
Version 2
Version 3
Onion File
"foo.h5" "foo.onion"
Linear only! No trees!
VFD SWMR
20
SWMR
SWMR = Single Writer / Multiple Readers
HDF5
File
Writer
Reader 3
Reader 2
Reader 1
21
caches
The Problem: State
HDF5
File
Writer
State of an HDF5 file =
things written to disk
+
what's in the writer's caches
22
The Problem: State
If this tree node is on disk
And this node has not been
flushed from the cache
Any reader that tries to follow the link to the leaf node will read garbage
23
"Legacy" SWMR (1.10.0+)
• Relies on "flush dependencies" in the metadata cache
• We ensure that children are flushed before parents
• File is always consistent, but not necessarily 100% up to date wrt the writer
• Very limited! – Not implemented for all HDF5 operations
• Dataset appends only
• Can't create new file objects, no variable-length type support
• High maintenance cost as SWMR pervades the metadata cache code
24
VFD SWMR (1.13.2+)
• Uses external, temporary metadata files
• Implemented at the VFL level
• Like legacy SWMR, the file is always consistent, but not necessarily 100% up to date wrt the
writer
• Works with almost all HDF5 operations
• Lower maintenance cost as most SWMR code is isolated to a few internal modules
25
VFD SWMR (1.13.2+)
• Will be released in HDF5 1.13.2 (August)
• Not coming to 1.12.x or earlier due to
incompatible VFL changes
• Legacy SWMR remains in place for now
HDF5 2.0
27
"HDF5 2.0"
HDF5 1.14.0 will release at the end of the year. What then?
I feel it's time for a (medium-size) API reset
• HDF5 is a 25-year-old codebase that has been managed conservatively
• Not all our choices have been good ones
• Let's fix what is broken!
• Currently in the planning stage, so statements here come with forward-looking caveats
My intention is NOT to rework the API in such a way that causes pain for the community. We want
to make any changes in such a way that it's as easy as possible for HDF5 software to adapt.
28
HDF5 2.0 – Possible Changes
Smaller, more manageable things:
• Semantic versioning
• API improvements
• Deprecated functions will be removed
• Signature changes (e.g. all API calls will return herr_t, etc.)
• Renaming (e.g. "sec2" VFD --> "posix" VFD)
• Reworked error mechanism
• Drop the multi VFD (but keep the split VFD)
• Sanitize the metadata I/O layer in the library (source of most of our CVE issues)
If resources are available:
• Improved variable-length storage
29
HDF5 2.0 – Suggestions
We're a community, so let's decide this together!
Join the discussion on our forum: https://forum.hdfgroup.org/
References
31
Links to documentation
While we work on release documentation, the latest RFC documents can be found in the hdf5doc
repository on GitHub:
Selection I/O
https://github.com/HDFGroup/hdf5doc/blob/master/RFCs/HDF5_Library/SelectionIO/selection_io_R
FC_210610.docx
Multi-Dataset I/O
https://github.com/HDFGroup/hdf5doc/blob/master/RFCs/HDF5_Library/MultiDataset/H5HPC_Multi
Dset_RW_IO_RFC_v7_20220523.pdf
Subfiling
https://github.com/HDFGroup/hdf5doc/blob/master/RFCs/HDF5_Library/VFD_Subfiling/RFC_VFD_
subfiling_200424.docx
32
Links to documentation
While we work on release documentation, the latest RFC documents can be found in the hdf5doc
repository on GitHub:
Onion VFD
https://github.com/HDFGroup/hdf5doc/blob/master/RFCs/HDF5_Library/OnionVFD/Onion_VFD_RF
C_v5.docx
VFD SWMR
https://github.com/HDFGroup/hdf5doc/blob/master/RFCs/HDF5_Library/VFD_SWMR/VFD_SWMR
_RFC_220519.pdf
33
Links to documentation
HDF5 Youtube channel
https://www.youtube.com/channel/UCRhtsIZquL3r-zH-R-r9-tQ
Onion VFD demonstration
https://www.youtube.com/watch?v=cfshkgr05mA&ab_channel=hdf5
VOL connector construction tutorial
https://www.youtube.com/watch?v=7XEbm-__QuM&t=2s&ab_channel=hdf5
THANK YOU!
Questions & Comments?
© 2022, The HDF Group.

More Related Content

Similar to HDF - Current status and Future Directions

Sunshine php practical-zf1-zf2-migration
Sunshine php practical-zf1-zf2-migrationSunshine php practical-zf1-zf2-migration
Sunshine php practical-zf1-zf2-migrationClark Everetts
 

Similar to HDF - Current status and Future Directions (20)

Parallel HDF5 Developments
Parallel HDF5 DevelopmentsParallel HDF5 Developments
Parallel HDF5 Developments
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
HDF5 OPeNDAP project update and demo
HDF5 OPeNDAP project update and demoHDF5 OPeNDAP project update and demo
HDF5 OPeNDAP project update and demo
 
Hierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) UpdateHierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) Update
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
What will be new in HDF5?
What will be new in HDF5?What will be new in HDF5?
What will be new in HDF5?
 
Introduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming ModelsIntroduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming Models
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 
Sunshine php practical-zf1-zf2-migration
Sunshine php practical-zf1-zf2-migrationSunshine php practical-zf1-zf2-migration
Sunshine php practical-zf1-zf2-migration
 
HDF OPeNDAP project update and demo
HDF OPeNDAP project update and demoHDF OPeNDAP project update and demo
HDF OPeNDAP project update and demo
 
HDF5 and Storage Resource Broker
HDF5 and Storage Resource BrokerHDF5 and Storage Resource Broker
HDF5 and Storage Resource Broker
 
Moving applications to HDF5 1.8
Moving applications to HDF5 1.8Moving applications to HDF5 1.8
Moving applications to HDF5 1.8
 
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at Scale
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF5 Documentation
HDF5 DocumentationHDF5 Documentation
HDF5 Documentation
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF Status Update
HDF Status UpdateHDF Status Update
HDF Status Update
 
Integrating HDF5 with SRB
Integrating HDF5 with SRBIntegrating HDF5 with SRB
Integrating HDF5 with SRB
 

More from The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 
Parallel Computing with HDF Server
Parallel Computing with HDF ServerParallel Computing with HDF Server
Parallel Computing with HDF Server
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
NASA Terra Data Fusion
 
HDF for the Cloud
HDF for the CloudHDF for the Cloud
HDF for the Cloud
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

HDF - Current status and Future Directions

  • 1. HDF5 Roadmap and New Features July 22, 2022 Dana Robinson The HDF Group
  • 2. 2 What we'll cover… • HDF5 release schedule • Virtual Object Layer • Onion VFD • VFD SWMR • "HDF5 2.0"
  • 4. 4 The Current Roadmap • Published on GitHub in README.md (current as of today) • Delays 1.13.2 by ~1 month so we can release the subfiling feature earlier • 1.8 and 1.12 will be retired at the end of the year
  • 5. 5 1.13.x: New Features HDF5 1.13.2 (August) HDF5 1.13.3 (October) • Selection I/O (vector I/O at the VFD level) • Onion VFD (versioned HDF5 files) • VFD SWMR (improved SWMR feature) • Subfiling (I/O concentrators in parallel HDF5) • Multi-Dataset I/O (I/O that spans multiple datasets)
  • 6. 6 Experimental vs Maintenance Releases • Even number minor releases are maintenance releases (e.g., 1.12.x) • Usual HDF5 maintenance branches you know and love • Binary compatibility • Stable file format • Odd number minor releases are experimental releases (e.g., 1.13.x) • No binary compatibility guarantees • API calls can change • File format can change • Unready features may be dropped • Used to try out new features as we prepare for the next maintenance release • See the blog posts on this for more clarity • https://www.hdfgroup.org/2021/12/hdf5-1-13-0-introducing-experimental-releases
  • 8. 8 Original HDF5 Architecture (pre-1.12.0) Application Public HDF5 API Non-Storage API Call Internals (H5S, H5P, …) Storage-Oriented API Call Internals (H5F, H5D, …) GPU sec2 split Virtual File Layer (VFL) Storage core family direct …
  • 9. 9 Current HDF5 Architecture (1.12.0+) Application Public HDF5 API Non-Storage API Call Internals (H5S, H5P, …) Storage-Oriented API Call Internals GPU sec2 split Virtual File Layer (VFL) Storage core family direct … Virtual Object Layer (VOL)
  • 10. 10 Current HDF5 Architecture (1.12.0+) Application Public HDF5 API Non-Storage API Call Internals (H5S, H5P, …) Storage-Oriented API Call Internals GPU sec2 split Virtual File Layer (VFL) Storage core family direct … Virtual Object Layer (VOL) New Layer Allows you to replace this with your own code!
  • 11. 11 Two Kinds of VOL Connector DAOS VOL Async VOL Native VOL Storage Storage Terminal Maps HDF5 objects to arbitrary storage schemes Pass-Through Perform operations (e.g. caching, logging) before passing the data on to the next connector.
  • 12. 12 Current HDF5 Architecture (1.12.0+) Application HDF5 API Non-VOL API Calls H5S H5P … Virtual Object Layer Native VOL GPU sec2 split DAOS VOL VFL Async VOL Native VOL Storage Storage Storage Plugin Key Native VOL HDF5 Library
  • 13. 13 Storage VOL Connector Terminal VOL Connectors H5Fcreate("foo.h5", flags, fcpl , fapl) Public API "foo.h5" flags fcpl fapl
  • 14. 14 VOL Toolkit Repository • Location: https://github.com/HDFGroup/vol-toolkit • All your VOL construction needs in a single location • Does not contain original content • Designed to bring important content from other repositories together with consistent versioning • Content is mainly included as git submodules, though the docs are currently copied in • Tags will identify "HDF5 1.13.0", etc. versions of the toolkit • Includes a VOL construction tutorial
  • 15. 15 HDF5 1.13.x / 1.14.0 VOL Changes • NOTE: ALL VOL CONNECTOR DEVELOPMENT SHOULD TARGET HDF5 1.13.x • Do NOT use HDF5 1.12.x • There are important changes to the VOL interface in HDF5 1.13.x that cannot be moved to 1.12.x without breaking binary compatibility • As mentioned earlier, HDF5 1.13.x are experimental releases • It is possible that the VOL interface could be changed in the 1.13.x versions that will be released as HDF5 1.14.0 • Example: VOL connector feature flags being introduced this summer that describe HDF5 API capabilities • Should be fairly stable, and updating is straightforward
  • 17. 17 Onion VFD • Enables creating "versioned" HDF5 files • "version" defined by file open/write/close cycle • Data and metadata for additional versions stored in a separate "onion" file • Only one onion file, regardless of number of versions • Can open onion files by version • New API calls to get the number of versions, etc. • Command-line tools support Original File Version 4 Version 2 Version 3 Onion File "foo.h5" "foo.onion" Linear only! No trees!
  • 18. 18 Onion VFD • Will be released in HDF5 1.13.2 (August) • Not coming to 1.12.x or earlier due to incompatible VFL changes Original File Version 4 Version 2 Version 3 Onion File "foo.h5" "foo.onion" Linear only! No trees!
  • 20. 20 SWMR SWMR = Single Writer / Multiple Readers HDF5 File Writer Reader 3 Reader 2 Reader 1
  • 21. 21 caches The Problem: State HDF5 File Writer State of an HDF5 file = things written to disk + what's in the writer's caches
  • 22. 22 The Problem: State If this tree node is on disk And this node has not been flushed from the cache Any reader that tries to follow the link to the leaf node will read garbage
  • 23. 23 "Legacy" SWMR (1.10.0+) • Relies on "flush dependencies" in the metadata cache • We ensure that children are flushed before parents • File is always consistent, but not necessarily 100% up to date wrt the writer • Very limited! – Not implemented for all HDF5 operations • Dataset appends only • Can't create new file objects, no variable-length type support • High maintenance cost as SWMR pervades the metadata cache code
  • 24. 24 VFD SWMR (1.13.2+) • Uses external, temporary metadata files • Implemented at the VFL level • Like legacy SWMR, the file is always consistent, but not necessarily 100% up to date wrt the writer • Works with almost all HDF5 operations • Lower maintenance cost as most SWMR code is isolated to a few internal modules
  • 25. 25 VFD SWMR (1.13.2+) • Will be released in HDF5 1.13.2 (August) • Not coming to 1.12.x or earlier due to incompatible VFL changes • Legacy SWMR remains in place for now
  • 27. 27 "HDF5 2.0" HDF5 1.14.0 will release at the end of the year. What then? I feel it's time for a (medium-size) API reset • HDF5 is a 25-year-old codebase that has been managed conservatively • Not all our choices have been good ones • Let's fix what is broken! • Currently in the planning stage, so statements here come with forward-looking caveats My intention is NOT to rework the API in such a way that causes pain for the community. We want to make any changes in such a way that it's as easy as possible for HDF5 software to adapt.
  • 28. 28 HDF5 2.0 – Possible Changes Smaller, more manageable things: • Semantic versioning • API improvements • Deprecated functions will be removed • Signature changes (e.g. all API calls will return herr_t, etc.) • Renaming (e.g. "sec2" VFD --> "posix" VFD) • Reworked error mechanism • Drop the multi VFD (but keep the split VFD) • Sanitize the metadata I/O layer in the library (source of most of our CVE issues) If resources are available: • Improved variable-length storage
  • 29. 29 HDF5 2.0 – Suggestions We're a community, so let's decide this together! Join the discussion on our forum: https://forum.hdfgroup.org/
  • 31. 31 Links to documentation While we work on release documentation, the latest RFC documents can be found in the hdf5doc repository on GitHub: Selection I/O https://github.com/HDFGroup/hdf5doc/blob/master/RFCs/HDF5_Library/SelectionIO/selection_io_R FC_210610.docx Multi-Dataset I/O https://github.com/HDFGroup/hdf5doc/blob/master/RFCs/HDF5_Library/MultiDataset/H5HPC_Multi Dset_RW_IO_RFC_v7_20220523.pdf Subfiling https://github.com/HDFGroup/hdf5doc/blob/master/RFCs/HDF5_Library/VFD_Subfiling/RFC_VFD_ subfiling_200424.docx
  • 32. 32 Links to documentation While we work on release documentation, the latest RFC documents can be found in the hdf5doc repository on GitHub: Onion VFD https://github.com/HDFGroup/hdf5doc/blob/master/RFCs/HDF5_Library/OnionVFD/Onion_VFD_RF C_v5.docx VFD SWMR https://github.com/HDFGroup/hdf5doc/blob/master/RFCs/HDF5_Library/VFD_SWMR/VFD_SWMR _RFC_220519.pdf
  • 33. 33 Links to documentation HDF5 Youtube channel https://www.youtube.com/channel/UCRhtsIZquL3r-zH-R-r9-tQ Onion VFD demonstration https://www.youtube.com/watch?v=cfshkgr05mA&ab_channel=hdf5 VOL connector construction tutorial https://www.youtube.com/watch?v=7XEbm-__QuM&t=2s&ab_channel=hdf5
  • 34. THANK YOU! Questions & Comments? © 2022, The HDF Group.