It will cover features of the HDF5 library for achieving better I/O performance and efficient storage. The following HDF5 features will be discussed: datatype and partial I/O
This tutorial is for persons who are already familiar with HDF5 and wish to take advantage is some of its advanced features.
The document summarizes several newer antiepileptic drugs including levetiracetam, lacosamide, zonisamide, vigabatrin, rufinamide, and topiramate. It discusses their mechanisms of action, indications, dosages, advantages, side effects, drug interactions, and considerations for use in women. The newer AEDs have fewer drug interactions than older AEDs, unique mechanisms of action, and broader spectrum of activity against different seizure types.
non motor manifestation of parkinson diseaseOsama Ragab
This document discusses non-motor symptoms of Parkinson's disease. It begins by providing background on the original description of Parkinson's disease in 1817 and defines it as primarily affecting motor functions. However, it notes that nearly 90% of Parkinson's patients experience non-motor manifestations as well, including neuropsychiatric symptoms, sleep disorders, autonomic dysfunction, sensory symptoms, and other issues. The document then examines the roles of different neurotransmitter systems including dopamine, serotonin, norepinephrine, glutamate, and GABA in both motor and non-motor features of the disease.
Millie, a 3 month old girl, was brought to the pediatrician with issues feeding, irritability, and muscle spasms. Tests found she had no GALC enzyme activity and an MRI showed demyelination. She was diagnosed with Krabbe disease, a rare inherited disorder caused by mutations in the GALC gene resulting in myelin loss and impaired nervous system development. There is currently no approved treatment, but research is exploring inhibitors of enzymes involved in the disease process.
This document summarizes male reproductive anatomy and function. It describes how the testes produce sperm and sex hormones, regulated by LH and FSH from the pituitary gland. Androgens produced by Leydig cells mediate male sexual development and function. Spermatogenesis requires FSH and androgens. The document also discusses evaluating Leydig and Sertoli cell function, puberty, causes of delayed puberty like constitutional delay or hypogonadism, and conditions like Klinefelter's and Turner's syndromes.
This document discusses calcium and phosphate metabolism. It provides information on:
1. Calcium and phosphate are important minerals that are required for bone formation and strength. Their absorption is regulated by hormones including vitamin D, parathyroid hormone, and calcitonin.
2. Vitamin D helps regulate calcium and phosphate levels by increasing their absorption in the intestine and reabsorption in the kidneys. Parathyroid hormone also increases calcium levels by stimulating vitamin D and calcium reabsorption. Calcitonin decreases calcium levels by acting on bones and kidneys.
3. Other factors like diet, pregnancy, and growth can also influence calcium and phosphate absorption in the body. A balanced ratio of calcium to phosphate
This document summarizes iron metabolism. It discusses:
- The functions of iron as part of hemoglobin, myoglobin, cytochromes and iron-containing enzymes.
- How iron is absorbed in the small intestine and transported to tissues by transferrin. Iron is stored in ferritin and hemosiderin.
- Disorders of iron metabolism include iron deficiency, which can cause anemia, and iron overload disorders like hemochromatosis.
Mucopolysaccharidoses are hereditary lysosomal storage disorders caused by deficiencies in enzymes that break down glycosaminoglycans. This results in glycosaminoglycan accumulation in tissues and organs, leading to symptoms like facial abnormalities, short stature, corneal clouding, developmental delays, bone deformities, organ enlargement, and neurological issues. Diagnosis involves testing urine for glycosaminoglycan levels and measuring enzyme activity levels. Treatment focuses on managing symptoms through supportive care, physical therapy, surgeries, and enzyme replacement therapy or hematopoietic stem cell transplantation to reduce clinical effects.
El documento trata sobre el metabolismo. 1) El metabolismo incluye todos los procesos bioquímicos de transformación de sustancias que constituyen la dinámica de la vida en un organismo. 2) Estos procesos metabólicos implican transformaciones bioquímicas que ocurren en secuencias de reacciones. 3) Un ejemplo clave es la glucólisis y la respiración celular, que transforman la glucosa en energía almacenada en ATP a través de una serie de reacciones.
The document summarizes several newer antiepileptic drugs including levetiracetam, lacosamide, zonisamide, vigabatrin, rufinamide, and topiramate. It discusses their mechanisms of action, indications, dosages, advantages, side effects, drug interactions, and considerations for use in women. The newer AEDs have fewer drug interactions than older AEDs, unique mechanisms of action, and broader spectrum of activity against different seizure types.
non motor manifestation of parkinson diseaseOsama Ragab
This document discusses non-motor symptoms of Parkinson's disease. It begins by providing background on the original description of Parkinson's disease in 1817 and defines it as primarily affecting motor functions. However, it notes that nearly 90% of Parkinson's patients experience non-motor manifestations as well, including neuropsychiatric symptoms, sleep disorders, autonomic dysfunction, sensory symptoms, and other issues. The document then examines the roles of different neurotransmitter systems including dopamine, serotonin, norepinephrine, glutamate, and GABA in both motor and non-motor features of the disease.
Millie, a 3 month old girl, was brought to the pediatrician with issues feeding, irritability, and muscle spasms. Tests found she had no GALC enzyme activity and an MRI showed demyelination. She was diagnosed with Krabbe disease, a rare inherited disorder caused by mutations in the GALC gene resulting in myelin loss and impaired nervous system development. There is currently no approved treatment, but research is exploring inhibitors of enzymes involved in the disease process.
This document summarizes male reproductive anatomy and function. It describes how the testes produce sperm and sex hormones, regulated by LH and FSH from the pituitary gland. Androgens produced by Leydig cells mediate male sexual development and function. Spermatogenesis requires FSH and androgens. The document also discusses evaluating Leydig and Sertoli cell function, puberty, causes of delayed puberty like constitutional delay or hypogonadism, and conditions like Klinefelter's and Turner's syndromes.
This document discusses calcium and phosphate metabolism. It provides information on:
1. Calcium and phosphate are important minerals that are required for bone formation and strength. Their absorption is regulated by hormones including vitamin D, parathyroid hormone, and calcitonin.
2. Vitamin D helps regulate calcium and phosphate levels by increasing their absorption in the intestine and reabsorption in the kidneys. Parathyroid hormone also increases calcium levels by stimulating vitamin D and calcium reabsorption. Calcitonin decreases calcium levels by acting on bones and kidneys.
3. Other factors like diet, pregnancy, and growth can also influence calcium and phosphate absorption in the body. A balanced ratio of calcium to phosphate
This document summarizes iron metabolism. It discusses:
- The functions of iron as part of hemoglobin, myoglobin, cytochromes and iron-containing enzymes.
- How iron is absorbed in the small intestine and transported to tissues by transferrin. Iron is stored in ferritin and hemosiderin.
- Disorders of iron metabolism include iron deficiency, which can cause anemia, and iron overload disorders like hemochromatosis.
Mucopolysaccharidoses are hereditary lysosomal storage disorders caused by deficiencies in enzymes that break down glycosaminoglycans. This results in glycosaminoglycan accumulation in tissues and organs, leading to symptoms like facial abnormalities, short stature, corneal clouding, developmental delays, bone deformities, organ enlargement, and neurological issues. Diagnosis involves testing urine for glycosaminoglycan levels and measuring enzyme activity levels. Treatment focuses on managing symptoms through supportive care, physical therapy, surgeries, and enzyme replacement therapy or hematopoietic stem cell transplantation to reduce clinical effects.
El documento trata sobre el metabolismo. 1) El metabolismo incluye todos los procesos bioquímicos de transformación de sustancias que constituyen la dinámica de la vida en un organismo. 2) Estos procesos metabólicos implican transformaciones bioquímicas que ocurren en secuencias de reacciones. 3) Un ejemplo clave es la glucólisis y la respiración celular, que transforman la glucosa en energía almacenada en ATP a través de una serie de reacciones.
This tutorial is designed for the HDF5 users with some HDF5 experience.
It will cover advanced features of the HDF5 library for achieving better I/O performance and efficient storage. The following HDF5 features will be discussed: partial I/O, chunked storage layout, compression and other filters including new n-bit and scale+offset filters. Significant time will be devoted to the discussion of complex HDF5 datatypes such as strings, variable-length datatypes, array and compound datatypes.
This Tutorial gives a brief introduction to HDF5 for people who have never used it. It covers the HDF5 Data Model including HDF5 objects and their properties. It also briefly describes the HDF5 Programming Model and prepares participants for further self-study of HDF5 and hands-on sessions.
The document summarizes HDF5 advanced topics presented at an HDF5 workshop. It discusses HDF5 groups and links which organize data objects in a file. It also covers HDF5 datasets and datatypes like compound and reference datatypes. HDF5 references allow accessing specific regions of datasets or objects in other files. The document provides examples in Python to demonstrate HDF5 groups, links, datatypes and references.
This 2009 tutorial slide will cover basic HDF5 Data Model objects and their properties. It will include an overview of the HDF5 Libraries and APIs, and describe the HDF5 programming model. Simple programming examples and the HDFView data browser will be used to illustrate HDF5 concepts and start developing your own HDF5 based applications.
This tutorial is for new HDF5 users.
This tutorial is designed for new HDF5 users. We will go over a brief history of HDF and HDF5 software, and will cover basic HDF5 Data Model objects and their properties; we will give an overview of the HDF5 Libraries and APIs, and discuss the HDF5 programming model. Simple C and Fortran examples, and Java tool HDFView will be used to illustrate HDF5 concepts.
This document provides an overview and outline of topics related to advanced features in HDF5, including:
- HDF5 supports various datatypes like atomic, compound, array, and variable-length datatypes. It allows creation of complex user-defined datatypes.
- Partial I/O in HDF5 allows reading and writing subsets of datasets using hyperslab selections, which describe subsets through properties like start point, stride, count, and block size.
- Chunking and compression can be used to improve performance and reduce storage needs when working with subsets of large HDF5 datasets.
This tutorial is designed for new HDF5 users. We will cover basic HDF5 Data Model objects and their properties, give an overview of the HDF5 Libraries and APIs, and discuss the HDF5 programming model. Simple C and Fortran examples will be used to illustrate HDF5 concepts.
The document provides an introduction to the HDF5 format and The HDF Group software. It discusses:
- HDF5 is a data model, library, and file format for managing large amounts of numerical data. It was developed starting in 1996 as an improved successor to the HDF4 format.
- The HDF5 data model defines objects like datasets, groups, and attributes that provide a flexible way to organize and describe stored data. Properties of these objects can be modified to control storage and performance.
- The HDF5 library provides interfaces to work with HDF5 files and objects from C, C++, Fortran, Java and other languages. A set of command line utilities and the HDFView browser can be used to
This document provides an overview of HDF5 (Hierarchical Data Format version 5) and introduces its core concepts. HDF5 is an open source file format and software library designed for storing and managing large amounts of numerical data. It supports a data model with objects such as datasets, groups, attributes, and datatypes. HDF5 files can be accessed through its software library and APIs from languages like C, Fortran, C++, Python and more. The document covers HDF5's data model, file format, programming interfaces, tools and example code.
This Tutorial is designed for the users who have exposure to MPI I/O and basic concepts of HDF5 and would like to learn about Parallel HDF5 Library. The Tutorial will cover Parallel HDF5 design and programming model. Several C and Fortran examples will be used to illustrate the basic ideas of the Parallel HDF5 programming model. Some performance issues including collective chunked I/O will be discussed. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
This Tutorial is designed for the HDF5 users with some HDF5 experience. It will cover properties of the HDF5 objects that affect I/O performance and file sizes. The following HDF5 features will be discussed: partial I/O, chunking and compression, and complex HDF5 datatypes such as strings, variable-length arrays and compound datatypes.
We will also discuss references to objects and datasets regions and how they can be used for indexing. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
This Tutorial is designed for new HDF5 users. We will cover basic HDF5 Data Model objects and their properties; we will give an overview of the HDF5 Libraries and APIs, and discuss the HDF5 programming model. Simple C and Fortran examples will be used to illustrate HDF5 concepts. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
This tutorial is designed for new HDF5 users. We will cover HDF5 abstractions such as datasets, groups, attributes, and datatypes. Simple C examples will cover the programming model and basic features of the API, and will give new users the knowledge they need to navigate through the rich collection of HDF5 interfaces. Participants will be guided through an interactive demonstration of the fundamentals of HDF5.
This tutorial is for new HDF5 users.
In this talk we will discuss what happens to data when it is written from the HDF5 application to an HDF5 file. This knowledge will help developers to write more efficient applications and to avoid performance bottlenecks.
In this talk we will examine how to tune HDF5 performance to improve I/O speed. The talk will focus on chunk and metadata caches, how they affect performance, and which HDF5 APIs that can be used for performance tuning.
Examples of different chunking strategies will be given. We will also discuss how to reduce file overhead by using special properties of the HDF5 groups, datasets and datatypes.
This tutorial is designed for users with some HDF5 experience. It will cover advanced features of the HDF5 library that can be used to achieve better I/O performance and more efficient storage. The following HDF5 features will be discussed: partial I/O; compression and other filters, including new n-bit and scale+offset filters and data storage options. Significant time will be devoted to the discussion of complex HDF5 datatypes such as strings, variable-length datatypes, array datatypes, and compound datatypes.
The document discusses various HDF5 tools that can be used for performance tuning and troubleshooting HDF5 files. It describes how h5stat can provide statistics about file size overhead and object properties to help optimize storage strategies. H5stat reports on file structure metadata like object headers and dataset datatypes. The document also shows how h5diff and h5ls can detect differences between files and locate raw data addresses. An example demonstrates using these tools to debug issues with reading files created on different platforms due to datatype size mismatches.
This document provides an overview and introduction to the HDF5 data model, programming model, and library APIs. The morning session will include a lecture on HDF5 introduction, programming model, and library APIs. The afternoon will have hands-on sessions on introduction to HDF5 files, groups, datasets and attributes as well as advanced topics like hyperslab selections, compound datatypes, and parallel HDF5. The goals are to introduce HDF5, provide knowledge on how data is organized and used by applications, and provide examples of reading and writing HDF5 files. The afternoon aims to help users start working with the HDF5 library on the NCSA machines by running examples and creating their own programs.
In this Tutorial we will discuss different storage methods for the HDF5 files (split files, family of files, multi-files), and datasets (compressed, external, compact), and related filters and properties. This tutorial will introduce advanced features of HDF5, including:
o Property lists
o Compound datatypes
o hyperslab selections
o point selection
o references to objects and regions
o extendable datasets
o mounting files
group iterations
This document discusses how to optimize HDF5 files for efficient access in cloud object stores. Key optimizations include using large dataset chunk sizes of 1-4 MiB, consolidating internal file metadata, and minimizing variable-length datatypes. The document recommends creating files with paged aggregation and storing file content information in the user block to enable fast discovery of file contents when stored in object stores.
This document provides an overview of HSDS (Highly Scalable Data Service), which is a REST-based service that allows accessing HDF5 data stored in the cloud. It discusses how HSDS maps HDF5 objects like datasets and groups to individual cloud storage objects to optimize performance. The document also describes how HSDS was used to improve access performance for NASA ICESat-2 HDF5 data on AWS S3 by hyper-chunking datasets into larger chunks spanning multiple original HDF5 chunks. Benchmark results showed that accessing the data through HSDS provided over 2x faster performance than other methods like ROS3 or S3FS that directly access the cloud storage.
More Related Content
Similar to HDF5 Advanced Topics - Datatypes and Partial I/O
This tutorial is designed for the HDF5 users with some HDF5 experience.
It will cover advanced features of the HDF5 library for achieving better I/O performance and efficient storage. The following HDF5 features will be discussed: partial I/O, chunked storage layout, compression and other filters including new n-bit and scale+offset filters. Significant time will be devoted to the discussion of complex HDF5 datatypes such as strings, variable-length datatypes, array and compound datatypes.
This Tutorial gives a brief introduction to HDF5 for people who have never used it. It covers the HDF5 Data Model including HDF5 objects and their properties. It also briefly describes the HDF5 Programming Model and prepares participants for further self-study of HDF5 and hands-on sessions.
The document summarizes HDF5 advanced topics presented at an HDF5 workshop. It discusses HDF5 groups and links which organize data objects in a file. It also covers HDF5 datasets and datatypes like compound and reference datatypes. HDF5 references allow accessing specific regions of datasets or objects in other files. The document provides examples in Python to demonstrate HDF5 groups, links, datatypes and references.
This 2009 tutorial slide will cover basic HDF5 Data Model objects and their properties. It will include an overview of the HDF5 Libraries and APIs, and describe the HDF5 programming model. Simple programming examples and the HDFView data browser will be used to illustrate HDF5 concepts and start developing your own HDF5 based applications.
This tutorial is for new HDF5 users.
This tutorial is designed for new HDF5 users. We will go over a brief history of HDF and HDF5 software, and will cover basic HDF5 Data Model objects and their properties; we will give an overview of the HDF5 Libraries and APIs, and discuss the HDF5 programming model. Simple C and Fortran examples, and Java tool HDFView will be used to illustrate HDF5 concepts.
This document provides an overview and outline of topics related to advanced features in HDF5, including:
- HDF5 supports various datatypes like atomic, compound, array, and variable-length datatypes. It allows creation of complex user-defined datatypes.
- Partial I/O in HDF5 allows reading and writing subsets of datasets using hyperslab selections, which describe subsets through properties like start point, stride, count, and block size.
- Chunking and compression can be used to improve performance and reduce storage needs when working with subsets of large HDF5 datasets.
This tutorial is designed for new HDF5 users. We will cover basic HDF5 Data Model objects and their properties, give an overview of the HDF5 Libraries and APIs, and discuss the HDF5 programming model. Simple C and Fortran examples will be used to illustrate HDF5 concepts.
The document provides an introduction to the HDF5 format and The HDF Group software. It discusses:
- HDF5 is a data model, library, and file format for managing large amounts of numerical data. It was developed starting in 1996 as an improved successor to the HDF4 format.
- The HDF5 data model defines objects like datasets, groups, and attributes that provide a flexible way to organize and describe stored data. Properties of these objects can be modified to control storage and performance.
- The HDF5 library provides interfaces to work with HDF5 files and objects from C, C++, Fortran, Java and other languages. A set of command line utilities and the HDFView browser can be used to
This document provides an overview of HDF5 (Hierarchical Data Format version 5) and introduces its core concepts. HDF5 is an open source file format and software library designed for storing and managing large amounts of numerical data. It supports a data model with objects such as datasets, groups, attributes, and datatypes. HDF5 files can be accessed through its software library and APIs from languages like C, Fortran, C++, Python and more. The document covers HDF5's data model, file format, programming interfaces, tools and example code.
This Tutorial is designed for the users who have exposure to MPI I/O and basic concepts of HDF5 and would like to learn about Parallel HDF5 Library. The Tutorial will cover Parallel HDF5 design and programming model. Several C and Fortran examples will be used to illustrate the basic ideas of the Parallel HDF5 programming model. Some performance issues including collective chunked I/O will be discussed. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
This Tutorial is designed for the HDF5 users with some HDF5 experience. It will cover properties of the HDF5 objects that affect I/O performance and file sizes. The following HDF5 features will be discussed: partial I/O, chunking and compression, and complex HDF5 datatypes such as strings, variable-length arrays and compound datatypes.
We will also discuss references to objects and datasets regions and how they can be used for indexing. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
This Tutorial is designed for new HDF5 users. We will cover basic HDF5 Data Model objects and their properties; we will give an overview of the HDF5 Libraries and APIs, and discuss the HDF5 programming model. Simple C and Fortran examples will be used to illustrate HDF5 concepts. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
This tutorial is designed for new HDF5 users. We will cover HDF5 abstractions such as datasets, groups, attributes, and datatypes. Simple C examples will cover the programming model and basic features of the API, and will give new users the knowledge they need to navigate through the rich collection of HDF5 interfaces. Participants will be guided through an interactive demonstration of the fundamentals of HDF5.
This tutorial is for new HDF5 users.
In this talk we will discuss what happens to data when it is written from the HDF5 application to an HDF5 file. This knowledge will help developers to write more efficient applications and to avoid performance bottlenecks.
In this talk we will examine how to tune HDF5 performance to improve I/O speed. The talk will focus on chunk and metadata caches, how they affect performance, and which HDF5 APIs that can be used for performance tuning.
Examples of different chunking strategies will be given. We will also discuss how to reduce file overhead by using special properties of the HDF5 groups, datasets and datatypes.
This tutorial is designed for users with some HDF5 experience. It will cover advanced features of the HDF5 library that can be used to achieve better I/O performance and more efficient storage. The following HDF5 features will be discussed: partial I/O; compression and other filters, including new n-bit and scale+offset filters and data storage options. Significant time will be devoted to the discussion of complex HDF5 datatypes such as strings, variable-length datatypes, array datatypes, and compound datatypes.
The document discusses various HDF5 tools that can be used for performance tuning and troubleshooting HDF5 files. It describes how h5stat can provide statistics about file size overhead and object properties to help optimize storage strategies. H5stat reports on file structure metadata like object headers and dataset datatypes. The document also shows how h5diff and h5ls can detect differences between files and locate raw data addresses. An example demonstrates using these tools to debug issues with reading files created on different platforms due to datatype size mismatches.
This document provides an overview and introduction to the HDF5 data model, programming model, and library APIs. The morning session will include a lecture on HDF5 introduction, programming model, and library APIs. The afternoon will have hands-on sessions on introduction to HDF5 files, groups, datasets and attributes as well as advanced topics like hyperslab selections, compound datatypes, and parallel HDF5. The goals are to introduce HDF5, provide knowledge on how data is organized and used by applications, and provide examples of reading and writing HDF5 files. The afternoon aims to help users start working with the HDF5 library on the NCSA machines by running examples and creating their own programs.
In this Tutorial we will discuss different storage methods for the HDF5 files (split files, family of files, multi-files), and datasets (compressed, external, compact), and related filters and properties. This tutorial will introduce advanced features of HDF5, including:
o Property lists
o Compound datatypes
o hyperslab selections
o point selection
o references to objects and regions
o extendable datasets
o mounting files
group iterations
This document discusses how to optimize HDF5 files for efficient access in cloud object stores. Key optimizations include using large dataset chunk sizes of 1-4 MiB, consolidating internal file metadata, and minimizing variable-length datatypes. The document recommends creating files with paged aggregation and storing file content information in the user block to enable fast discovery of file contents when stored in object stores.
This document provides an overview of HSDS (Highly Scalable Data Service), which is a REST-based service that allows accessing HDF5 data stored in the cloud. It discusses how HSDS maps HDF5 objects like datasets and groups to individual cloud storage objects to optimize performance. The document also describes how HSDS was used to improve access performance for NASA ICESat-2 HDF5 data on AWS S3 by hyper-chunking datasets into larger chunks spanning multiple original HDF5 chunks. Benchmark results showed that accessing the data through HSDS provided over 2x faster performance than other methods like ROS3 or S3FS that directly access the cloud storage.
This document summarizes the current status and focus of the HDF Group. It discusses that the HDF Group is located in Champaign, IL and is a non-profit organization focused on developing and maintaining HDF software and data formats. It provides an overview of recent HDF5, HDF4 and HDFView releases and notes areas of focus for software quality improvements, increased transparency, strengthening the community, and modernizing HDF products. It invites support and participation in upcoming user group meetings.
This document provides an overview of HSDS (HDF Server and Data Service), which allows HDF5 files to be stored and accessed from the cloud. Key points include:
- HSDS maps HDF5 objects like datasets and groups to individual cloud storage objects for scalability and parallelism.
- Features include streaming support, fancy indexing for complex queries, and caching for improved performance.
- HSDS can be deployed on Docker, Kubernetes, or AWS Lambda depending on needs.
- Case studies show HSDS is used by organizations like NREL and NSF to make petabytes of scientific data publicly accessible in the cloud.
This document discusses creating cloud-optimized HDF5 files by rearranging internal structures for more efficient data access in cloud object stores. It describes cloud-native and cloud-optimized storage formats, with the latter involving storing the entire HDF5 file as a single object. The benefits of cloud-optimized HDF5 include fast scanning and using the HDF5 library. Key aspects covered include using optimal chunk sizes, compression, and minimizing variable-length datatypes.
This document discusses updates and performance improvements to the HDF5 OPeNDAP data handler. It provides a history of the handler since 2001 and describes recent updates including supporting DAP4, new data types, and NetCDF data models. A performance study showed that passing compressed HDF5 data through the handler without decompressing/recompressing led to speedups of around 17-30x by leveraging HDF5 direct I/O APIs. This allows outputting HDF5 files as NetCDF files much faster through the handler.
This document provides instructions for using the Hyrax software to serve scientific data files stored on Amazon S3 using the OPeNDAP data access protocol. It describes how to generate ancillary metadata files called DMR++ files using the get_dmrpp tool that provide information about the data file structure and locations. The document explains how to run get_dmrpp inside a Docker container to process data files on S3 and generate customized DMR++ files that the Hyrax server can use to serve the files to clients.
This document provides an overview and examples of accessing cloud data and services using the Earthdata Login (EDL), Pydap, and MATLAB. It discusses some common problems users encounter, such as being unable to access HDF5 data on AWS S3 using MATLAB or read data from OPeNDAP servers using Pydap. Solutions presented include using EDL to get temporary AWS tokens for S3 access in MATLAB and providing code examples on the HDFEOS website to help users access S3 data and OPeNDAP services. The document also notes some limitations, such as tokens being valid for only 1 hour, and workarounds like requesting new tokens or using the MATLAB HDF5 API instead of the netCDF API.
The HDF5 Roadmap and New Features document outlines upcoming changes and improvements to the HDF5 library. Key points include:
- HDF5 1.13.x releases will include new features like selection I/O, the Onion VFD for versioned files, improved VFD SWMR for single-writer multiple-reader access, and subfiling for parallel I/O.
- The Virtual Object Layer allows customizing HDF5 object storage and introduces terminal and pass-through connectors.
- The Onion VFD stores versions of HDF5 files in a separate onion file for versioned access.
- VFD SWMR improves on legacy SWMR by implementing single-writer multiple-reader capabilities
This document discusses user analysis of the HDFEOS.org website and plans for future improvements. It finds that the majority of the site's 100 daily users are "quiet", not posting on forums or other interactive elements. The main user types are locators, who search for examples or data; mergers, who combine or mosaic datasets; and converters, who change file formats. The document outlines recent updates focused on these user types, like adding Python examples for subsetting and calculating latitude and longitude. It proposes future work on artificial intelligence/machine learning uses of HDF files and examples for processing HDF data in the cloud.
This document summarizes a presentation about the current status and future directions of the Hierarchical Data Format (HDF) software. It provides updates on recent HDF5 releases, development efforts including new compression methods and ways to access HDF5 data, and outreach resources. It concludes by inviting the audience to share wishes for future HDF development.
The document describes H5Coro, a new C++ library for reading HDF5 files from cloud storage. H5Coro was created to optimize HDF5 reading for cloud environments by minimizing I/O operations through caching and efficient HTTP requests. Performance tests showed H5Coro was 77-132x faster than the previous HDF5 library at reading HDF5 data from Amazon S3 for NASA's SlideRule project. H5Coro supports common HDF5 elements but does not support writing or some complex HDF5 data types and messages to focus on optimized read-only performance for time series data stored sequentially in memory.
This document summarizes MathWorks' work to modernize MATLAB's support for HDF5. Key points include:
1) MATLAB now supports HDF5 1.10.7 features like single-writer/multiple-reader access and virtual datasets through new and updated low-level functions.
2) Performance benchmarks show some improvements but also regressions compared to the previous HDF5 version, and work continues to optimize code and support future versions.
3) There are compatibility considerations for Linux filter plugins, but interim solutions are provided until MathWorks can ship a single HDF5 version.
HSDS provides HDF as a service through a REST API that can scale across nodes. New releases will enable serverless operation using AWS Lambda or direct client access without a server. This allows HDF data to be accessed remotely without managing servers. HSDS stores each HDF object separately, making it compatible with cloud object storage. Performance on AWS Lambda is slower than a dedicated server but has no management overhead. Direct client access has better performance but limits collaboration between clients.
HDF5 and Zarr are data formats that can be used to store and access scientific data. This presentation discusses approaches to translating between the two formats. It describes how HDF5 files were translated to the Zarr format by creating a separate Zarr store to hold HDF5 file chunks, and storing chunk location metadata. It also discusses an implementation that translates Zarr data to the HDF5 format by using a special chunking layout and storing chunk information in an HDF5 compound dataset. Limitations of the translations include lack of support for some HDF5 dataset properties in Zarr, and lack of support for some Zarr compression methods in the HDF5 implementation.
The document discusses HDF for the cloud, including new features of the HDF Server and what's next. Key points:
- HDF Server uses a "sharded schema" that maps HDF5 objects to individual storage objects, allowing parallel access and updates without transferring entire files.
- Implementations include HSDS software that uses the sharded schema with an API and SDKs for different languages like h5pyd for Python.
- New features of HSDS 0.6 include support for POSIX, Azure, AWS Lambda, and role-based access control.
- Future work includes direct access to storage without a server intermediary for some use cases.
This document compares different methods for accessing HDF and netCDF files stored on Amazon S3, including Apache Drill, THREDDS Data Server (TDS), and HDF5 Virtual File Driver (VFD). A benchmark test of accessing a 24GB HDF5/netCDF-4 file on S3 from Amazon EC2 found that TDS performed the best, responding within 2 minutes, while Apache Drill failed after 7 minutes. The document concludes that TDS 5.0 is the clear winner based on performance and support for role-based access control and HDF4 files, but the best solution depends on use case and software.
This document discusses STARE-PODS, a proposal to NASA/ACCESS-19 to develop a scalable data store for earth science data using the SpatioTemporal Adaptive Resolution Encoding (STARE) indexing scheme. STARE allows diverse earth science data to be unified and indexed, enabling the data to be partitioned and stored in a Parallel Optimized Data Store (PODS) for efficient analysis. The HDF Virtual Object Layer and Virtual Data Set technologies can then provide interfaces to access the data in STARE-PODS in a familiar way. The goal is for STARE-PODS to organize diverse data for alignment and parallel/distributed storage and processing to enable integrative analysis at scale.
This document provides an overview and update on HDF5 and its ecosystem. Key points include:
- HDF5 1.12.0 was recently released with new features like the Virtual Object Layer and external references.
- The HDF5 library now supports accessing data in the cloud using connectors like S3 VFD and REST VOL without needing to modify applications.
- Projects like HDFql and H5CPP provide additional interfaces for querying and working with HDF5 files from languages like SQL, C++, and Python.
- The HDF5 community is moving development to GitHub and improving documentation resources on the HDF wiki site.
This document summarizes new features in HDF5 1.12.0, including support for storing references to objects and attributes across files, new storage backends using a virtual object layer (VOL), and virtual file drivers (VFDs) for Amazon S3 and HDFS. It outlines the HDF5 roadmap for 2019-2022, which includes continued support for HDF5 1.8 and 1.10, and new features in future 1.12.x releases like querying, indexing, and provenance tracking.
More from The HDF-EOS Tools and Information Center (20)
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
"NATO Hackathon Winner: AI-Powered Drug Search", Taras KlobaFwdays
This is a session that details how PostgreSQL's features and Azure AI Services can be effectively used to significantly enhance the search functionality in any application.
In this session, we'll share insights on how we used PostgreSQL to facilitate precise searches across multiple fields in our mobile application. The techniques include using LIKE and ILIKE operators and integrating a trigram-based search to handle potential misspellings, thereby increasing the search accuracy.
We'll also discuss how the azure_ai extension on PostgreSQL databases in Azure and Azure AI Services were utilized to create vectors from user input, a feature beneficial when users wish to find specific items based on text prompts. While our application's case study involves a drug search, the techniques and principles shared in this session can be adapted to improve search functionality in a wide range of applications. Join us to learn how PostgreSQL and Azure AI can be harnessed to enhance your application's search capability.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
1. The HDF Group
HDF5 Advanced Topics
Elena Pourmal
The HDF Group
The 13th HDF and HDF-EOS Workshop
November 3-5, 2009
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
1
www.hdfgroup.org
3. The HDF Group
HDF5 Datatypes
Overview
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
3
www.hdfgroup.org
4. HDF5 Datatypes
• An HDF5 datatype
• Required description of a data element
• the set of possible values it can have
• Example: enumeration datatype
• the operations that can be performed
• Example: type conversion cannot be performed on
opaque datatype
• how the values of that type are stored
• Example: values of variable-length type are stored in a
heap in a file
• Stored in the file along with the data it
describes
• Not to be confused with the C, C++, Java
and Fortran types
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
4
www.hdfgroup.org
5. HDF5 Datatypes Examples
• We provide examples how to create, write
and read data of different types
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/api18-c.html
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
5
www.hdfgroup.org
7. HDF5 Datatypes
• When HDF5 Datatypes are used?
• To describe application data in a file
• H5Dcreate, H5Acreate calls
• Example:
• A C application stores integer data as 32-bit littleendian signed two’s complement integer; it uses
H5T_SDT_I32LE with the H5Dcreate call
• A C applications stores double precision data “as is”
in the application memory; it uses
H5T_NATIVE_DOUBLE with the H5Acreate call
• A Fortran application stores array of real numbers as
64-bit big-endian IEEE format; it uses
H5T_IEEE_F64BE with h5dcreate_f call; HDF5
library will perform all necessary conversions
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
7
www.hdfgroup.org
8. HDF5 Datatypes
• When HDF5 Datatypes are used?
• To describe application data in memory
• Data buffer to be written or read into with
H5Dwrite/ H5Dread and H5Awrite/H5Aread
calls
• Example:
• C application reads data from the file and stores it in
an integer buffer; it uses H5T_NATIVE_INT to describe
the buffer.
• A Fortran application reads floating point data from the
file and stores it an integer buffer; it uses
H5T_NATIVE_INTEGER to describe the buffer;
• HDF5 library performs datatype conversion;
overflow/underflow may occur.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
8
www.hdfgroup.org
9. Example
C Array of integers on Linux platform
Native integer (H5T_NATIVE_INT)
is little-endian, 4 bytes
H5T_NATIVE_INT
Fortran Array of integers on AIX platform
Native integer (H5T_NATIVE_INTEGER)
is big-endian, 8 bytes
e
rit ion
D w e rs
H5 n v
co
No
H5
Co Dr e
nv
ers ad
ion
H5T_NATIVE_INTEGER
HDF5 File
H5T_SDT_I32LE
Data is stored as little-endian, converted to big-endian on read
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
9
www.hdfgroup.org
11. Example: Writing/reading an Array to HDF5 file
• Calls you’ve already seen in Intro Tutorial
• H5LTmake_dataset
• H5Dwrite, H5Dread
• APIs to handle specific C data type
• H5LTmake_dataset_<*>
• <*> is one of “char”, “short”, “int”, “long”,
“float”, “double”, “string”
• All data array is written (no sub-setting)
• Data stored in a file as it is in memory
• H5LTread_dataset,
H5LTread_dataset_<*>
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
11
www.hdfgroup.org
12. Example: Read data into array of longs
#include "hdf5.h”
#include "hdf5_hl.h”
int main( void ){
long *data;
…
/* Open file from ex_lite1.c */
file_id = H5Fopen ("ex_lite1.h5", H5F_ACC_RDONLY,
H5P_DEFAULT);
/* Get information about dimensions to allocate memory buffer */
status = H5LTget_dataset_ndims(file_id,"/dset",rank);
status = H5LTget_dataset_info(file_id,"/dset",dims,dt_class,dt_size);
/* Allocate buffer to read data in */
data = (long*)malloc(…
/* Read dataset */
status = H5LTread_dataset_long(file_id,"/dset",data);
/
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
12
www.hdfgroup.org
13. Example: Read data into array of longs
#include hdf5.h
…
long *rdata;
….
/* Open file and dataset. */
file_id = H5Fopen (“ex_lite1.h5”, H5F_ACC_RDONLY, H5P_DEFAULT);
dset_id = H5Dopen (file, “/dset”, H5P_DEFAULT);
/* Get information about dimensions to allocate memory buffer */
space = H5Dget_space (dset);
rank = H5Sget_simple_extent_dims (space, dims, NULL);
status = H5Dread (dset, H5T_NATIVE_LONG, H5S_ALL, H5S_ALL,
H5P_DEFAULT, rdata);
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
13
www.hdfgroup.org
14. Basic Atomic HDF5 Datatypes
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
14
www.hdfgroup.org
15. Basic Atomic Datatypes
• Integers & floats
• Strings (fixed and variable size)
• Pointers - references to objects and dataset
regions
• Bitfield
• Opaque
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
15
www.hdfgroup.org
16. HDF5 Predefined Datatypes
• HDF5 Library provides predefined datatypes
(symbols) for all basic atomic datatypes except
opaque datatype
• H5T_<arch>_<base>
• Examples:
•
•
•
•
•
•
H5T_IEEE_F64LE
H5T_STD_I32BE
H5T_C_S1, H5T_FORTRAN_S1
H5T_STD_B32LE
H5T_STD_REF_OBJ, H5T_STD_REF_DSETREG
H5T_NATIVE_INT
• Predefined datatypes do not have constant
values; initialized when library is initialized
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
16
www.hdfgroup.org
19. HDF5 integer datatype
• HDF5 supports 1,2,4,and 8 byte signed
and unsigned integers in memory and in
the file
• Support differs by language
• C language
• All C integer types including C99 extended
integer types (when available)
• Examples:
• H5T_NATIVE_INT16 for int16_t
• H5T_NATIVE_INT_LEAST64 for int_least64_t
• H5T_NATIVE_UINT_FAST16 for uint_fast16_t
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
19
www.hdfgroup.org
20. HDF5 integer datatype
• Fortran language
• In memory supports only Fortran “integer”
• Examples:
• H5T_NATIVE_INTEGER for integer
• In the file supports all HDF5 integer types
• Example: one-byte integer has to be
represented by integer in memory; can be
stored as one-byte integer by creating an
appropriate dataset (H5T_SDT_I8LE)
• Next major release of HDF5 will support
ANY kinds of Fortran integers
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
20
www.hdfgroup.org
21. HDF5 floating-point datatype
• HDF5 supports 32 and 64-bit floating point
IEEE big-endian, little-endian types in
memory and in the file
• Support differs by language
• C languge
•
•
•
•
H5T_IEEE_F64BE and H5T_IEEE_F32LE
H5T_NATIVE_FLOAT
H5T_NATIVE_DOUBLE
H5T_NATIVE_LDOUBLE
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
21
www.hdfgroup.org
22. HDF5 floating-point datatype
• Fortran language
• In memory supports only Fortran “real” and
“double precision” (obsolete)
• Examples:
• H5T_NATIVE_REAL for real
• H5T_NATIVE_DOUBLE for double precision
• In the file supports all HDF5 floating-point
types
• Next major release of HDF5 will support
ANY kinds of Fortran reals
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
22
www.hdfgroup.org
23. HDF5 string datatype
• HDF5 strings are characterized by
• The way each element of a string type is
stored in a file
• NULL terminated (C type string)
char *mystring[]=“Once upon a time”;
HDF5 stores <Once upon a time/0>
• Space padded (Fortran string)
character(len=16) :: mystring=‘Once upon a time’
HDF5 stores <Once upon a time> and adds spaces if
required
• The sizes of elements in the same dataset
or attribute
• Fixed-length string
• Variable-length string
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
23
www.hdfgroup.org
24. Example: Creating fixed-length string
• C Example: “Once upon a time” has 16characters
string_id = H5Tcopy(H5T_C_S1);
H5Tset_size(string_id, size);
• Size value have to include accommodate
“/0”, i.e., size=17 for “Once upon a time”
string
• Overhead for short strings, e.g., “Once” will
have extra 13 bytes allocated for storage
• Compressed well
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
24
www.hdfgroup.org
25. Example: Creating variable-length string
• C example:
string_id = H5Tcopy(H5T_C_S1);
H5Tset_size(string_id, H5T_VARIABLE);
• Overhead to store and access data
• Cannot be compressed (may be in the future)
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
25
www.hdfgroup.org
26. Reference Datatype
• Reference to an HDF5 object
• Pointer to a group or a dataset in a file
• Predefined datatype H5T_STD_REG_OBJ
describe object references
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
26
www.hdfgroup.org
28. Reference to Object
h5dump –d /”object_reference” ref_obj.h5
DATASET "OBJECT_REFERENCES" {
DATATYPE H5T_REFERENCE
DATASPACE SIMPLE { ( 4 ) / ( 4 ) }
DATA {
(0): GROUP 808 /GROUP1 , GROUP 1848 /GROUP1/GROUP2 ,
(2): DATASET 2808 /INTEGERS , DATATYPE 3352 /MYTYPE
}
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
28
www.hdfgroup.org
29. Reference to Object
• Create a reference to group object
H5Rcreate(&ref[1], fileid, "/GROUP1/GROUP2",
H5R_OBJECT, -1);
• Write references to a dataset
H5Dwrite(dsetr_id, H5T_STD_REF_OBJ, H5S_ALL,
H5S_ALL, H5P_DEFAULT, ref);
• Read reference back with H5Dread and find an object it
points to
type_id = H5Rdereference(dsetr_id, H5R_OBJECT,
&ref[3]);
name_size = H5Rget_name(dsetr_id, H5R_OBJECT,
&ref_out[3], (char*)buf, 10);
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
29
www.hdfgroup.org
30. Saving Selected Region in a File
Need to select and access the same
elements of a dataset
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
30
www.hdfgroup.org
31. Reference Datatype
• Reference to a dataset region (or to
selection)
• Pointer to the dataspace selection
• Predefined datatype
H5T_STD_REF_DSETREG to describe
regions
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
31
www.hdfgroup.org
32. Reference to Dataset Region
REF_REG.h5
Root
Matrix
Region References
1 1 2 3 3 4 5 5 6
1 2 2 3 4 4 5 6 6
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
32
www.hdfgroup.org
33. Reference to Dataset Region
Example
dsetr_id = H5Dcreate(file_id,
“REGION REFERENCES”, H5T_STD_REF_DSETREG,
…);
H5Sselect_hyperslab(space_id,
H5S_SELECT_SET, start, NULL, …);
H5Rcreate(&ref[0], file_id, “MATRIX”,
H5R_DATASET_REGION, space_id);
H5Dwrite(dsetr_id, H5T_STD_REF_DSETREG,
H5S_ALL, H5S_ALL, H5P_DEFAULT,ref);
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
33
www.hdfgroup.org
34. Reference to Dataset Region
HDF5 "REF_REG.h5" {
GROUP "/" {
DATASET "MATRIX" {
……
}
DATASET "REGION_REFERENCES" {
DATATYPE H5T_REFERENCE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): DATASET /MATRIX {(0,3)-(1,5)},
(1): DATASET /MATRIX {(0,0), (1,6), (0,8)}
}
}
}
}
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
34
www.hdfgroup.org
35. The HDF Group
Part II
Working with subsets
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
54
www.hdfgroup.org
36. Collect data one way ….
Array of images (3D)
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
55
www.hdfgroup.org
37. Display data another way …
Stitched image (2D array)
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
56
www.hdfgroup.org
38. Data is too big to read….
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
57
www.hdfgroup.org
39. Refer to a region…
Need to select and access the same
elements of a dataset
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
58
www.hdfgroup.org
40. HDF5 Library Features
• HDF5 Library provides capabilities to
• Describe subsets of data and perform
write/read operations on subsets
• Hyperslab selections and partial I/O
• Store descriptions of the data subsets in a file
• Object references
• Region references
• Use efficient storage mechanism to achieve
good performance while writing/reading
subsets of data
• Chunking, compression
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
59
www.hdfgroup.org
41. The HDF Group
Partial I/O in HDF5
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
60
www.hdfgroup.org
42. How to Describe a Subset in HDF5?
• Before writing and reading a subset of data
one has to describe it to the HDF5 Library.
• HDF5 APIs and documentation refer to a
subset as a “selection” or “hyperslab
selection”.
• If specified, HDF5 Library will perform I/O
on a selection only and not on all elements
of a dataset.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
61
www.hdfgroup.org
43. Types of Selections in HDF5
• Two types of selections
• Hyperslab selection
• Regular hyperslab
• Simple hyperslab
• Result of set operations on hyperslabs
(union, difference, …)
• Point selection
• Hyperslab selection is especially important
for doing parallel I/O in HDF5 (See Parallel
HDF5 Tutorial)
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
62
www.hdfgroup.org
44. Regular Hyperslab
Collection of regularly spaced blocks of equal size
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
63
www.hdfgroup.org
46. Hyperslab Selection
Result of union operation on three simple hyperslabs
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
65
www.hdfgroup.org
47. Hyperslab Description
• Start - starting location of a hyperslab (1,1)
• Stride - number of elements that separate each
block (3,2)
• Count - number of blocks (2,6)
• Block - block size (2,1)
• Everything is “measured” in number of elements
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
66
www.hdfgroup.org
48. Simple Hyperslab Description
• Two ways to describe a simple hyperslab
• As several blocks
• Stride – (1,1)
• Count – (2,6)
• Block – (2,1)
• As one block
• Stride – (1,1)
• Count – (1,1)
• Block – (4,6)
No performance penalty for
one way or another
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
67
www.hdfgroup.org
49. H5Sselect_hyperslab Function
space_id Identifier of dataspace
op
Selection operator
H5S_SELECT_SET or H5S_SELECT_OR
start
Array with starting coordinates of hyperslab
stride
Array specifying which positions along a dimension
to select
count
Array specifying how many blocks to select from the
dataspace, in each dimension
block
Array specifying size of element block
(NULL indicates a block size of a single element in
a dimension)
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
68
www.hdfgroup.org
50. Reading/Writing Selections
Programming model for reading from a dataset in
a file
1. Open a dataset.
2. Get file dataspace handle of the dataset and specify
subset to read from.
a. H5Dget_space returns file dataspace handle
a.
File dataspace describes array stored in a file (number of
dimensions and their sizes).
b. H5Sselect_hyperslab selects elements of the array
that participate in I/O operation.
3. Allocate data buffer of an appropriate shape and size
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
69
www.hdfgroup.org
51. Reading/Writing Selections
Programming model (continued)
4. Create a memory dataspace and specify subset to write
to.
1.
2.
3.
Memory dataspace describes data buffer (its rank and
dimension sizes).
Use H5Screate_simple function to create memory
dataspace.
Use H5Sselect_hyperslab to select elements of the data
buffer that participate in I/O operation.
4. Issue H5Dread or H5Dwrite to move the data between
file and memory buffer.
5. Close file dataspace and memory dataspace when
done.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
70
www.hdfgroup.org
52. Example : Reading Two Rows
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-1
-1
-1
Data in a file
4x6 matrix
Buffer in memory
1-dim array of length 14
-1
-1
November 3-5, 2009
-1
-1
-1
-1
-1
HDF/HDF-EOS Workshop XIII
-1
71
-1
-1
-1
www.hdfgroup.org
56. Things to Remember
• Number of elements selected in a file and in a
memory buffer must be the same
• H5Sget_select_npoints returns number of
selected elements in a hyperslab selection
• HDF5 partial I/O is tuned to move data between
selections that have the same dimensionality;
avoid choosing subsets that have different ranks
(as in example above)
• Allocate a buffer of an appropriate size when
reading data; use H5Tget_native_type and
H5Tget_size to get the correct size of the data
element in memory.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
75
www.hdfgroup.org
57. The HDF Group
Thank You!
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
76
www.hdfgroup.org
58. Acknowledgements
This work was supported by cooperative agreement
number NNX08AO77A from the National
Aeronautics and Space Administration (NASA).
Any opinions, findings, conclusions, or
recommendations expressed in this material are
those of the author[s] and do not necessarily reflect
the views of the National Aeronautics and Space
Administration.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
77
www.hdfgroup.org
Dumping DS with references may be slow since library has to dereference each element; on our to-do list
H5Sselect_hyperslab:
1: dataspace identifier
2: selection operator to determine how the new selection is to be combined with the already existing selection for the dataspace. CUrrenlty on H5S_SELECT_SET operator is
supported, which replaces the existing selection with the parameters from this call. Overlapping blocks are not supported with H5S_SELECT_SET.
3:start - starting coordinates. 0-beginning
4:stride: how many elements to move in each dimension.
Stride of zero not supported: NULL==1
5:count: determines how many spaces to select in each dimension
6: block: determines size of the element block: if NULL then
defaults to single emenet in each dimension.
H5Sselect_elements:
1: dataspace identifier
2: H5S_SELECT_SET: (see above)
3:NUMP: number of elements selected
4:coord: 2-D array of size (dataspace rank0 by
num_elements in szie. The order of the element coordinates in the coord array also specifies the order that the array elements are iterated when I/O is performed.