Dimension Scales for HDF-EOS2 and HDF-EOS5 field dimensions were added to the new release of HDF-EOS. The new APIs will be presented and sample outputs will be shown. Need for development of new APIs for handling Dimension Scales will be discussed.
This document provides examples of adding Climate and Forecast (CF) metadata attributes to HDF5 files using different programming languages and interfaces. It summarizes how to add CF attributes like units, long_name, and coordinates to datasets in HDF5 files using C, Fortran, Python, netCDF4, HDF-EOS5, and HDFView. It also briefly mentions using HDF5 dimension scales to associate coordinate variables with datasets.
The NPOESS program uses Unified Modeling Language (UML) to describe the format of the HDF5 files produced. For each unique type of data product, the HDF5 storage organization and the means to retrieve the data is the same. This provides a consistent data retrieval interface for manual and automated users of the data, without which would require custom development and cumbersome maintenance. The data formats are described using UML to provide a profile of HDF5 files.
This poster will show each unique data type so far produced by NPOESS, and the contents of the files. We will also have overhead snapshots of the data contents.
This document provides an introduction to exploring and visualizing data using the R programming language. It discusses the history and development of R, introduces key R packages like tidyverse and ggplot2 for data analysis and visualization, and provides examples of reading data, examining data structures, and creating basic plots and histograms. It also demonstrates more advanced ggplot2 concepts like faceting, mapping variables to aesthetics, using different geoms, and combining multiple geoms in a single plot.
This Tutorial is designed for the HDF5 users with some HDF5 experience. It will cover advanced features of the HDF5 library for achieving better I/O performance and efficient storage. The following HDF5 features will be discussed: partial I/O, compression and other filters including new n-bit and scale+offset filters, and data storage options. Significant time will be devoted to the discussion of complex HDF5 datatypes such as strings, variable-length, array and compound datatypes. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
Midway in our life's journey, I went astray from the straight imperative road and woke to find myself alone in a dark declarative wood.
My guide out of this dark declarative wood was a familiar friend, SQL, who showed me the way to wrap a context of a window to push through using Window Functions to escape the Inferno.
Next I found myself somewhere in-between running up hill with one foot in front of the other advancing so as the leading foot was always above the ground running with my friend LINQ, I was able to wrap the context of a collection around my data to advance my journey through Purgatorio.
My last guide into the blinding brilliant light of Paradiso was from the Dutch Caribbean, who taught me how to wrap my computations into a context and move my data through leading me into brilliant bliss.
Join me on my divine data comedy.
This document provides an overview of the visualization lifecycle process, including assessing data, parsing, cleaning, and visualizing data. It discusses exploring data, parsing and normalizing data, data cleansing techniques, feature selection, and choosing appropriate visualization tools and libraries. Key steps include parsing raw data into a structured format, filtering and aggregating data, loading data into databases, and iterating on visual transformations to create effective visualizations. A variety of open source tools and JavaScript libraries for data visualization are also presented.
AfterGlow is a script that assists with the visualization of log data. It reads CSV files and converts them into a Graph description. Check out http://afterglow.sf.net for more information also.
This short presentation gives an overview of AfterGlow and outlines the features and capabilities of the tool. It discusses some of the harder to understand features by showing some configuration examples that can be used as a starting point for some more sophisticated setups.
AftterGlow is one the most downloaded security visualization tools with over 17,000 downloads.
This document provides examples of adding Climate and Forecast (CF) metadata attributes to HDF5 files using different programming languages and interfaces. It summarizes how to add CF attributes like units, long_name, and coordinates to datasets in HDF5 files using C, Fortran, Python, netCDF4, HDF-EOS5, and HDFView. It also briefly mentions using HDF5 dimension scales to associate coordinate variables with datasets.
The NPOESS program uses Unified Modeling Language (UML) to describe the format of the HDF5 files produced. For each unique type of data product, the HDF5 storage organization and the means to retrieve the data is the same. This provides a consistent data retrieval interface for manual and automated users of the data, without which would require custom development and cumbersome maintenance. The data formats are described using UML to provide a profile of HDF5 files.
This poster will show each unique data type so far produced by NPOESS, and the contents of the files. We will also have overhead snapshots of the data contents.
This document provides an introduction to exploring and visualizing data using the R programming language. It discusses the history and development of R, introduces key R packages like tidyverse and ggplot2 for data analysis and visualization, and provides examples of reading data, examining data structures, and creating basic plots and histograms. It also demonstrates more advanced ggplot2 concepts like faceting, mapping variables to aesthetics, using different geoms, and combining multiple geoms in a single plot.
This Tutorial is designed for the HDF5 users with some HDF5 experience. It will cover advanced features of the HDF5 library for achieving better I/O performance and efficient storage. The following HDF5 features will be discussed: partial I/O, compression and other filters including new n-bit and scale+offset filters, and data storage options. Significant time will be devoted to the discussion of complex HDF5 datatypes such as strings, variable-length, array and compound datatypes. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
Midway in our life's journey, I went astray from the straight imperative road and woke to find myself alone in a dark declarative wood.
My guide out of this dark declarative wood was a familiar friend, SQL, who showed me the way to wrap a context of a window to push through using Window Functions to escape the Inferno.
Next I found myself somewhere in-between running up hill with one foot in front of the other advancing so as the leading foot was always above the ground running with my friend LINQ, I was able to wrap the context of a collection around my data to advance my journey through Purgatorio.
My last guide into the blinding brilliant light of Paradiso was from the Dutch Caribbean, who taught me how to wrap my computations into a context and move my data through leading me into brilliant bliss.
Join me on my divine data comedy.
This document provides an overview of the visualization lifecycle process, including assessing data, parsing, cleaning, and visualizing data. It discusses exploring data, parsing and normalizing data, data cleansing techniques, feature selection, and choosing appropriate visualization tools and libraries. Key steps include parsing raw data into a structured format, filtering and aggregating data, loading data into databases, and iterating on visual transformations to create effective visualizations. A variety of open source tools and JavaScript libraries for data visualization are also presented.
AfterGlow is a script that assists with the visualization of log data. It reads CSV files and converts them into a Graph description. Check out http://afterglow.sf.net for more information also.
This short presentation gives an overview of AfterGlow and outlines the features and capabilities of the tool. It discusses some of the harder to understand features by showing some configuration examples that can be used as a starting point for some more sophisticated setups.
AftterGlow is one the most downloaded security visualization tools with over 17,000 downloads.
This document discusses the implementation of HDF5 and HDF-EOS5 data formats in the NCAR Command Language (NCL). It describes how NCL handles HDF5 files differently than other file formats by reading data on demand and representing the hierarchical structure of HDF5 groups, datasets, attributes and links. It provides sample NCL scripts to read HDF5 data files, print out file information and variable summaries, and access HDF5 data variables.
This document discusses interoperability between HDF5 files and the netCDF-4 format. It begins with background on netCDF-3, netCDF-4, and the Climate and Forecast (CF) metadata conventions. It then demonstrates different use cases for accessing HDF5 data via netCDF-4, including when the HDF5 file follows the netCDF data model and CF conventions compared to when it does not. The document shares experiences working with HDF-EOS5 and JPSS data products in HDF5 format through this netCDF-4 interface. In particular, it finds that following the netCDF data model and CF conventions improves visualization of HDF5 data in tools like IDV that expect netCDF files.
An HDF-EOS file contains one or more HDF-EOS data objects along with metadata. It provides geolocation information coupled to data, standardized subsetting by geography or time, and ECS metadata for archiving and distribution. The difference between HDF and HDF-EOS is at the object level within a file, not the file type itself. A file can contain both HDF and HDF-EOS objects.
NCAR Command Language (NCL) is an interpreted language designed for sceintific data analysis and visualization with high quality graphics, espeially for atmospherice scince. NCL has been support NetCDF 3/4, GRIB 1/2, HDF-SDS, HDF_EOS, shapefiles, binary, and ASCII files for years. Now HDF-EOS5 is the released version, and HDF5 in beta-test stage.
Now NCL team are developing NCL to write HDF5 files, and to read HDF-EOS5 data with OPeNDAP.
NCL team will share with people their experience to visualize and analyze HDF-EOS5 and HDF5 data.
The document provides an overview of the HDF-EOS5 file format including:
- HDF-EOS5 files contain coremetadata, archivemetadata, and StructMetadata global attributes that provide information on the file structure and contents.
- Files can contain Grid, Swath, Point, Zonal Average, and Profile data structures with no size limits.
- Swath data is organized by time or track with irregular spacing, storing geolocation, time, and data in arrays. Grid data is organized by regular geographic spacing specified by projection parameters with data and geolocation information separated. Point data specifies locations but with no organization of the data.
This document discusses two HDF5-based file formats for storing Earth observation data:
1. The Sorted Pulse Data (SPD) format stores laser scanning data including pulses and point data with attributes. It was created in 2008 and updated to version 4 to improve flexibility.
2. The KEA image file format implements the GDAL raster data model in HDF5, allowing large raster datasets and attribute tables to be stored together with compression. It was created in 2012 to address limitations of other formats.
Both formats take advantage of HDF5 features like compression but also discuss some limitations and lessons learned for effectively designing scientific data formats.
This document provides an introduction to HDF5 for users familiar with HDF4 by outlining some key principles and examples for common HDF4 objects in HDF5. It discusses how SDS, Vgroups, Vdatas (tables), GR images and palettes, attributes, and data types in HDF4 map to HDF5 datasets, groups, compound datatypes, datasets with attributes, and HDF5's more extensive datatype support. Examples are given for each object type to illustrate how to accomplish common HDF4 tasks in HDF5. Resources for additional help with the HDF5 format and its high level APIs are also listed.
This tutorial is designed for the HDF5 users with some HDF5 experience.
It will cover advanced features of the HDF5 library for achieving better I/O performance and efficient storage. The following HDF5 features will be discussed: partial I/O, chunked storage layout, compression and other filters including new n-bit and scale+offset filters. Significant time will be devoted to the discussion of complex HDF5 datatypes such as strings, variable-length datatypes, array and compound datatypes.
This document discusses using Python with the H5py module to interact with HDF5 files. Some key points made include:
- H5py allows HDF5 files to be manipulated as if they were Python dictionaries, with dataset names as keys and arrays as values.
- NumPy provides array manipulation capabilities to work with the dataset values retrieved from HDF5 files.
- Examples demonstrate reading and writing HDF5 datasets, comparing contents of datasets between files, and recursively listing contents of an HDF5 file.
- Using Python with H5py is more concise than other languages like C/Fortran, reducing development time and potential for errors.
HDF5 is a powerful and feature-rich creature, and getting the most out of it requires powerful tools. The MathWorks provides a "low-level" interface to the HDF5 library that closely corresponds to the C API and exposes much of its richness. This short tutorial will present ways to use the low-level MATLAB interface to build those tools and tackle such topics as subsetting, chunking, and compression.
This document provides an overview of parallel HDF5 and performance tuning in the HDF5 library. It discusses the design and implementation of parallel HDF5 (PHDF5), including requirements, layers, programming restrictions, examples of APIs, and support for creating/accessing files and datasets in parallel. It also provides examples of writing and reading datasets in parallel using hyperslabs, including by rows, columns, pattern, and chunks. The goal of PHDF5 is to enable efficient parallel I/O while maintaining compatibility with serial HDF5 files.
This document provides an overview and outline of topics related to advanced features in HDF5, including:
- HDF5 supports various datatypes like atomic, compound, array, and variable-length datatypes. It allows creation of complex user-defined datatypes.
- Partial I/O in HDF5 allows reading and writing subsets of datasets using hyperslab selections, which describe subsets through properties like start point, stride, count, and block size.
- Chunking and compression can be used to improve performance and reduce storage needs when working with subsets of large HDF5 datasets.
HDF5 is designed to work well on high performance parallel systems and clusters. This tutorial will review the high performance features of HDF5, including:
o Design of Parallel HDF5 Library
o Parallel HDF5 Programming Model and Environment
It is desired that participants are familiar with MPI and MPI I/0 and have a basic knowledge of sequential HDF5 Library. The lecture will prepare them for the Parallel I/O hands-on session.
1. The data is loaded from a file into relation 'divs' with specified data types
2. A filter is applied to 'divs' to only keep records where the symbol field matches the regular expression 'CM.*'
3. The filtered relation is stored in 'startswithcm'
The script loads data from a file, applies a regular expression filter to select records where the symbol starts with "CM", and stores the filtered relation. It performs a basic extract, filter, and store workflow in Pig Latin.
NetCDF and HDF5 are data formats and software libraries used for scientific data. NetCDF began in 1989 and allows for array-oriented data with dimensions, variables, and attributes. NetCDF-4 introduced new features while maintaining backward compatibility. It uses HDF5 for data storage and can read HDF4/HDF5 files. NetCDF provides APIs for C, Fortran, Java, and is widely used for earth science and climate data. It supports conventions, parallel I/O, and reading many data formats.
This Tutorial is designed for the HDF5 users with some HDF5 experience. It will cover properties of the HDF5 objects that affect I/O performance and file sizes. The following HDF5 features will be discussed: partial I/O, chunking and compression, and complex HDF5 datatypes such as strings, variable-length arrays and compound datatypes.
We will also discuss references to objects and datasets regions and how they can be used for indexing. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
This Tutorial gives a brief introduction to HDF5 for people who have never used it. It covers the HDF5 Data Model including HDF5 objects and their properties. It also briefly describes the HDF5 Programming Model and prepares participants for further self-study of HDF5 and hands-on sessions.
This tutorial is designed for new HDF5 users. We will go over a brief history of HDF and HDF5 software, and will cover basic HDF5 Data Model objects and their properties; we will give an overview of the HDF5 Libraries and APIs, and discuss the HDF5 programming model. Simple C and Fortran examples, and Java tool HDFView will be used to illustrate HDF5 concepts.
This Tutorial is designed for the users who have exposure to MPI I/O and basic concepts of HDF5 and would like to learn about Parallel HDF5 Library. The Tutorial will cover Parallel HDF5 design and programming model. Several C and Fortran examples will be used to illustrate the basic ideas of the Parallel HDF5 programming model. Some performance issues including collective chunked I/O will be discussed. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
The document discusses pointers and user spaces in RPG IV. It explains that pointers contain memory addresses and allow fields to be based on and dynamically allocated based on the pointer value. Pointers are used with parameter passing, multiple occurrence data structures, C functions, dynamic memory allocation, and user spaces. The document provides examples of using pointers with parameter lists, accessing trigger buffers, and dynamic memory allocation.
This document discusses the implementation of HDF5 and HDF-EOS5 data formats in the NCAR Command Language (NCL). It describes how NCL handles HDF5 files differently than other file formats by reading data on demand and representing the hierarchical structure of HDF5 groups, datasets, attributes and links. It provides sample NCL scripts to read HDF5 data files, print out file information and variable summaries, and access HDF5 data variables.
This document discusses interoperability between HDF5 files and the netCDF-4 format. It begins with background on netCDF-3, netCDF-4, and the Climate and Forecast (CF) metadata conventions. It then demonstrates different use cases for accessing HDF5 data via netCDF-4, including when the HDF5 file follows the netCDF data model and CF conventions compared to when it does not. The document shares experiences working with HDF-EOS5 and JPSS data products in HDF5 format through this netCDF-4 interface. In particular, it finds that following the netCDF data model and CF conventions improves visualization of HDF5 data in tools like IDV that expect netCDF files.
An HDF-EOS file contains one or more HDF-EOS data objects along with metadata. It provides geolocation information coupled to data, standardized subsetting by geography or time, and ECS metadata for archiving and distribution. The difference between HDF and HDF-EOS is at the object level within a file, not the file type itself. A file can contain both HDF and HDF-EOS objects.
NCAR Command Language (NCL) is an interpreted language designed for sceintific data analysis and visualization with high quality graphics, espeially for atmospherice scince. NCL has been support NetCDF 3/4, GRIB 1/2, HDF-SDS, HDF_EOS, shapefiles, binary, and ASCII files for years. Now HDF-EOS5 is the released version, and HDF5 in beta-test stage.
Now NCL team are developing NCL to write HDF5 files, and to read HDF-EOS5 data with OPeNDAP.
NCL team will share with people their experience to visualize and analyze HDF-EOS5 and HDF5 data.
The document provides an overview of the HDF-EOS5 file format including:
- HDF-EOS5 files contain coremetadata, archivemetadata, and StructMetadata global attributes that provide information on the file structure and contents.
- Files can contain Grid, Swath, Point, Zonal Average, and Profile data structures with no size limits.
- Swath data is organized by time or track with irregular spacing, storing geolocation, time, and data in arrays. Grid data is organized by regular geographic spacing specified by projection parameters with data and geolocation information separated. Point data specifies locations but with no organization of the data.
This document discusses two HDF5-based file formats for storing Earth observation data:
1. The Sorted Pulse Data (SPD) format stores laser scanning data including pulses and point data with attributes. It was created in 2008 and updated to version 4 to improve flexibility.
2. The KEA image file format implements the GDAL raster data model in HDF5, allowing large raster datasets and attribute tables to be stored together with compression. It was created in 2012 to address limitations of other formats.
Both formats take advantage of HDF5 features like compression but also discuss some limitations and lessons learned for effectively designing scientific data formats.
This document provides an introduction to HDF5 for users familiar with HDF4 by outlining some key principles and examples for common HDF4 objects in HDF5. It discusses how SDS, Vgroups, Vdatas (tables), GR images and palettes, attributes, and data types in HDF4 map to HDF5 datasets, groups, compound datatypes, datasets with attributes, and HDF5's more extensive datatype support. Examples are given for each object type to illustrate how to accomplish common HDF4 tasks in HDF5. Resources for additional help with the HDF5 format and its high level APIs are also listed.
This tutorial is designed for the HDF5 users with some HDF5 experience.
It will cover advanced features of the HDF5 library for achieving better I/O performance and efficient storage. The following HDF5 features will be discussed: partial I/O, chunked storage layout, compression and other filters including new n-bit and scale+offset filters. Significant time will be devoted to the discussion of complex HDF5 datatypes such as strings, variable-length datatypes, array and compound datatypes.
This document discusses using Python with the H5py module to interact with HDF5 files. Some key points made include:
- H5py allows HDF5 files to be manipulated as if they were Python dictionaries, with dataset names as keys and arrays as values.
- NumPy provides array manipulation capabilities to work with the dataset values retrieved from HDF5 files.
- Examples demonstrate reading and writing HDF5 datasets, comparing contents of datasets between files, and recursively listing contents of an HDF5 file.
- Using Python with H5py is more concise than other languages like C/Fortran, reducing development time and potential for errors.
HDF5 is a powerful and feature-rich creature, and getting the most out of it requires powerful tools. The MathWorks provides a "low-level" interface to the HDF5 library that closely corresponds to the C API and exposes much of its richness. This short tutorial will present ways to use the low-level MATLAB interface to build those tools and tackle such topics as subsetting, chunking, and compression.
This document provides an overview of parallel HDF5 and performance tuning in the HDF5 library. It discusses the design and implementation of parallel HDF5 (PHDF5), including requirements, layers, programming restrictions, examples of APIs, and support for creating/accessing files and datasets in parallel. It also provides examples of writing and reading datasets in parallel using hyperslabs, including by rows, columns, pattern, and chunks. The goal of PHDF5 is to enable efficient parallel I/O while maintaining compatibility with serial HDF5 files.
This document provides an overview and outline of topics related to advanced features in HDF5, including:
- HDF5 supports various datatypes like atomic, compound, array, and variable-length datatypes. It allows creation of complex user-defined datatypes.
- Partial I/O in HDF5 allows reading and writing subsets of datasets using hyperslab selections, which describe subsets through properties like start point, stride, count, and block size.
- Chunking and compression can be used to improve performance and reduce storage needs when working with subsets of large HDF5 datasets.
HDF5 is designed to work well on high performance parallel systems and clusters. This tutorial will review the high performance features of HDF5, including:
o Design of Parallel HDF5 Library
o Parallel HDF5 Programming Model and Environment
It is desired that participants are familiar with MPI and MPI I/0 and have a basic knowledge of sequential HDF5 Library. The lecture will prepare them for the Parallel I/O hands-on session.
1. The data is loaded from a file into relation 'divs' with specified data types
2. A filter is applied to 'divs' to only keep records where the symbol field matches the regular expression 'CM.*'
3. The filtered relation is stored in 'startswithcm'
The script loads data from a file, applies a regular expression filter to select records where the symbol starts with "CM", and stores the filtered relation. It performs a basic extract, filter, and store workflow in Pig Latin.
NetCDF and HDF5 are data formats and software libraries used for scientific data. NetCDF began in 1989 and allows for array-oriented data with dimensions, variables, and attributes. NetCDF-4 introduced new features while maintaining backward compatibility. It uses HDF5 for data storage and can read HDF4/HDF5 files. NetCDF provides APIs for C, Fortran, Java, and is widely used for earth science and climate data. It supports conventions, parallel I/O, and reading many data formats.
This Tutorial is designed for the HDF5 users with some HDF5 experience. It will cover properties of the HDF5 objects that affect I/O performance and file sizes. The following HDF5 features will be discussed: partial I/O, chunking and compression, and complex HDF5 datatypes such as strings, variable-length arrays and compound datatypes.
We will also discuss references to objects and datasets regions and how they can be used for indexing. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
This Tutorial gives a brief introduction to HDF5 for people who have never used it. It covers the HDF5 Data Model including HDF5 objects and their properties. It also briefly describes the HDF5 Programming Model and prepares participants for further self-study of HDF5 and hands-on sessions.
This tutorial is designed for new HDF5 users. We will go over a brief history of HDF and HDF5 software, and will cover basic HDF5 Data Model objects and their properties; we will give an overview of the HDF5 Libraries and APIs, and discuss the HDF5 programming model. Simple C and Fortran examples, and Java tool HDFView will be used to illustrate HDF5 concepts.
This Tutorial is designed for the users who have exposure to MPI I/O and basic concepts of HDF5 and would like to learn about Parallel HDF5 Library. The Tutorial will cover Parallel HDF5 design and programming model. Several C and Fortran examples will be used to illustrate the basic ideas of the Parallel HDF5 programming model. Some performance issues including collective chunked I/O will be discussed. Participants will work with the Tutorial examples and exercises during the hands-on sessions.
The document discusses pointers and user spaces in RPG IV. It explains that pointers contain memory addresses and allow fields to be based on and dynamically allocated based on the pointer value. Pointers are used with parameter passing, multiple occurrence data structures, C functions, dynamic memory allocation, and user spaces. The document provides examples of using pointers with parameter lists, accessing trigger buffers, and dynamic memory allocation.
This document discusses MATLAB support for scientific data formats and analytics workflows. It provides an overview of MATLAB's capabilities for accessing, exploring, and preprocessing large scientific datasets. These include built-in support for HDF5, NetCDF, and other file formats. It also describes datastore objects that allow loading large datasets incrementally for analysis. The document concludes with an example that uses a FileDatastore to access and summarize HDF5 data from NASA ice sheet surveys in a MapReduce workflow.
Data produced by the Ozone PEATE from the Ozone Mapping and Profiler Suite (OMPS) instruments are to be stored in HDF5, not HDF-EOS, but will still need some features similar to those in HDF-EOS. In particular, a mechanism for handling dimension names will be needed. This poster proposes a method to handle dimension names for arrays in HDF5 in a manner commensurate with HDF-EOS5.
Hive is a data warehouse system built on top of Hadoop that allows users to query large datasets using SQL. It is used at Facebook to manage over 15TB of new data added daily across a 300+ node Hadoop cluster. Key features include using SQL for queries, extensibility through custom functions and file formats, and optimizations for performance like predicate pushdown and partition pruning.
HDF5 and Zarr are data formats that can be used to store and access scientific data. This presentation discusses approaches to translating between the two formats. It describes how HDF5 files were translated to the Zarr format by creating a separate Zarr store to hold HDF5 file chunks, and storing chunk location metadata. It also discusses an implementation that translates Zarr data to the HDF5 format by using a special chunking layout and storing chunk information in an HDF5 compound dataset. Limitations of the translations include lack of support for some HDF5 dataset properties in Zarr, and lack of support for some Zarr compression methods in the HDF5 implementation.
Similar to Dimension Scales in HDF-EOS2 and HDF-EOS5 (20)
This document discusses how to optimize HDF5 files for efficient access in cloud object stores. Key optimizations include using large dataset chunk sizes of 1-4 MiB, consolidating internal file metadata, and minimizing variable-length datatypes. The document recommends creating files with paged aggregation and storing file content information in the user block to enable fast discovery of file contents when stored in object stores.
This document provides an overview of HSDS (Highly Scalable Data Service), which is a REST-based service that allows accessing HDF5 data stored in the cloud. It discusses how HSDS maps HDF5 objects like datasets and groups to individual cloud storage objects to optimize performance. The document also describes how HSDS was used to improve access performance for NASA ICESat-2 HDF5 data on AWS S3 by hyper-chunking datasets into larger chunks spanning multiple original HDF5 chunks. Benchmark results showed that accessing the data through HSDS provided over 2x faster performance than other methods like ROS3 or S3FS that directly access the cloud storage.
This document summarizes the current status and focus of the HDF Group. It discusses that the HDF Group is located in Champaign, IL and is a non-profit organization focused on developing and maintaining HDF software and data formats. It provides an overview of recent HDF5, HDF4 and HDFView releases and notes areas of focus for software quality improvements, increased transparency, strengthening the community, and modernizing HDF products. It invites support and participation in upcoming user group meetings.
This document provides an overview of HSDS (HDF Server and Data Service), which allows HDF5 files to be stored and accessed from the cloud. Key points include:
- HSDS maps HDF5 objects like datasets and groups to individual cloud storage objects for scalability and parallelism.
- Features include streaming support, fancy indexing for complex queries, and caching for improved performance.
- HSDS can be deployed on Docker, Kubernetes, or AWS Lambda depending on needs.
- Case studies show HSDS is used by organizations like NREL and NSF to make petabytes of scientific data publicly accessible in the cloud.
This document discusses creating cloud-optimized HDF5 files by rearranging internal structures for more efficient data access in cloud object stores. It describes cloud-native and cloud-optimized storage formats, with the latter involving storing the entire HDF5 file as a single object. The benefits of cloud-optimized HDF5 include fast scanning and using the HDF5 library. Key aspects covered include using optimal chunk sizes, compression, and minimizing variable-length datatypes.
This document discusses updates and performance improvements to the HDF5 OPeNDAP data handler. It provides a history of the handler since 2001 and describes recent updates including supporting DAP4, new data types, and NetCDF data models. A performance study showed that passing compressed HDF5 data through the handler without decompressing/recompressing led to speedups of around 17-30x by leveraging HDF5 direct I/O APIs. This allows outputting HDF5 files as NetCDF files much faster through the handler.
This document provides instructions for using the Hyrax software to serve scientific data files stored on Amazon S3 using the OPeNDAP data access protocol. It describes how to generate ancillary metadata files called DMR++ files using the get_dmrpp tool that provide information about the data file structure and locations. The document explains how to run get_dmrpp inside a Docker container to process data files on S3 and generate customized DMR++ files that the Hyrax server can use to serve the files to clients.
This document provides an overview and examples of accessing cloud data and services using the Earthdata Login (EDL), Pydap, and MATLAB. It discusses some common problems users encounter, such as being unable to access HDF5 data on AWS S3 using MATLAB or read data from OPeNDAP servers using Pydap. Solutions presented include using EDL to get temporary AWS tokens for S3 access in MATLAB and providing code examples on the HDFEOS website to help users access S3 data and OPeNDAP services. The document also notes some limitations, such as tokens being valid for only 1 hour, and workarounds like requesting new tokens or using the MATLAB HDF5 API instead of the netCDF API.
The HDF5 Roadmap and New Features document outlines upcoming changes and improvements to the HDF5 library. Key points include:
- HDF5 1.13.x releases will include new features like selection I/O, the Onion VFD for versioned files, improved VFD SWMR for single-writer multiple-reader access, and subfiling for parallel I/O.
- The Virtual Object Layer allows customizing HDF5 object storage and introduces terminal and pass-through connectors.
- The Onion VFD stores versions of HDF5 files in a separate onion file for versioned access.
- VFD SWMR improves on legacy SWMR by implementing single-writer multiple-reader capabilities
This document discusses user analysis of the HDFEOS.org website and plans for future improvements. It finds that the majority of the site's 100 daily users are "quiet", not posting on forums or other interactive elements. The main user types are locators, who search for examples or data; mergers, who combine or mosaic datasets; and converters, who change file formats. The document outlines recent updates focused on these user types, like adding Python examples for subsetting and calculating latitude and longitude. It proposes future work on artificial intelligence/machine learning uses of HDF files and examples for processing HDF data in the cloud.
This document summarizes a presentation about the current status and future directions of the Hierarchical Data Format (HDF) software. It provides updates on recent HDF5 releases, development efforts including new compression methods and ways to access HDF5 data, and outreach resources. It concludes by inviting the audience to share wishes for future HDF development.
The document describes H5Coro, a new C++ library for reading HDF5 files from cloud storage. H5Coro was created to optimize HDF5 reading for cloud environments by minimizing I/O operations through caching and efficient HTTP requests. Performance tests showed H5Coro was 77-132x faster than the previous HDF5 library at reading HDF5 data from Amazon S3 for NASA's SlideRule project. H5Coro supports common HDF5 elements but does not support writing or some complex HDF5 data types and messages to focus on optimized read-only performance for time series data stored sequentially in memory.
This document summarizes MathWorks' work to modernize MATLAB's support for HDF5. Key points include:
1) MATLAB now supports HDF5 1.10.7 features like single-writer/multiple-reader access and virtual datasets through new and updated low-level functions.
2) Performance benchmarks show some improvements but also regressions compared to the previous HDF5 version, and work continues to optimize code and support future versions.
3) There are compatibility considerations for Linux filter plugins, but interim solutions are provided until MathWorks can ship a single HDF5 version.
HSDS provides HDF as a service through a REST API that can scale across nodes. New releases will enable serverless operation using AWS Lambda or direct client access without a server. This allows HDF data to be accessed remotely without managing servers. HSDS stores each HDF object separately, making it compatible with cloud object storage. Performance on AWS Lambda is slower than a dedicated server but has no management overhead. Direct client access has better performance but limits collaboration between clients.
The document discusses HDF for the cloud, including new features of the HDF Server and what's next. Key points:
- HDF Server uses a "sharded schema" that maps HDF5 objects to individual storage objects, allowing parallel access and updates without transferring entire files.
- Implementations include HSDS software that uses the sharded schema with an API and SDKs for different languages like h5pyd for Python.
- New features of HSDS 0.6 include support for POSIX, Azure, AWS Lambda, and role-based access control.
- Future work includes direct access to storage without a server intermediary for some use cases.
This document compares different methods for accessing HDF and netCDF files stored on Amazon S3, including Apache Drill, THREDDS Data Server (TDS), and HDF5 Virtual File Driver (VFD). A benchmark test of accessing a 24GB HDF5/netCDF-4 file on S3 from Amazon EC2 found that TDS performed the best, responding within 2 minutes, while Apache Drill failed after 7 minutes. The document concludes that TDS 5.0 is the clear winner based on performance and support for role-based access control and HDF4 files, but the best solution depends on use case and software.
This document discusses STARE-PODS, a proposal to NASA/ACCESS-19 to develop a scalable data store for earth science data using the SpatioTemporal Adaptive Resolution Encoding (STARE) indexing scheme. STARE allows diverse earth science data to be unified and indexed, enabling the data to be partitioned and stored in a Parallel Optimized Data Store (PODS) for efficient analysis. The HDF Virtual Object Layer and Virtual Data Set technologies can then provide interfaces to access the data in STARE-PODS in a familiar way. The goal is for STARE-PODS to organize diverse data for alignment and parallel/distributed storage and processing to enable integrative analysis at scale.
This document provides an overview and update on HDF5 and its ecosystem. Key points include:
- HDF5 1.12.0 was recently released with new features like the Virtual Object Layer and external references.
- The HDF5 library now supports accessing data in the cloud using connectors like S3 VFD and REST VOL without needing to modify applications.
- Projects like HDFql and H5CPP provide additional interfaces for querying and working with HDF5 files from languages like SQL, C++, and Python.
- The HDF5 community is moving development to GitHub and improving documentation resources on the HDF wiki site.
This document summarizes new features in HDF5 1.12.0, including support for storing references to objects and attributes across files, new storage backends using a virtual object layer (VOL), and virtual file drivers (VFDs) for Amazon S3 and HDFS. It outlines the HDF5 roadmap for 2019-2022, which includes continued support for HDF5 1.8 and 1.10, and new features in future 1.12.x releases like querying, indexing, and provenance tracking.
The document discusses leveraging cloud resources like Amazon Web Services to improve software testing for the HDF group. Currently HDF software is tested on various in-house systems, but moving more testing to the cloud could provide better coverage of operating systems and distributions at a lower cost. AWS spot instances are being used to run HDF5 build and regression tests across different Linux distributions in around 30 minutes for approximately $0.02 per hour.
More from The HDF-EOS Tools and Information Center (20)
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Programming Foundation Models with DSPy - Meetup Slides
Dimension Scales in HDF-EOS2 and HDF-EOS5
1. Dimension Scales
in
HDF-EOS2 & HDF-EOS5
Abe Taaheri, Raytheon IIS
HDF & HDF-EOS Workshop XIV
Champaign, IL
Sep. 29, 2010
Page 1
2. •
•
•
What is a Dimension Scale ?
Dimension Scales & Metadata
Dimension Scales APIs
–
–
•
Code example
–
–
•
he2
he5
Writing (he2, he5)
Reading (he2, he5)
Sample he2 and he5 files with Dimension Scales
Page 2
3. What is a Dimension Scale?
It is a sequence of numbers
placed along a dimension to
demarcate intervals along it
• HDF4
- It is an array with size and name similar to
its assigned dimension
- Stored using a structure similar to the SDS
array
- One scale is assigned per dimension
Page 3
4. * What is a Dimension Scale?
• HDF5
– A HDF5 dataset.
– With additional metadata that identifies
the dataset as a Dimension Scale
– Typically Dimension Scales are logically
associated with the dimensions of HDF5
Datasets
– The meaning of the association is left to
applications.
* Pedro Vicente talk, HDF/ HDF-EOS Workshop IX
Page 4
5. Example: 3D dataset
3 Dimension Scales Datasets
5
7
Dataset: 3D Array with
5 x 7 x 10
dimensions
10
Page 5
6. More on Dimension Scale
in HDF5
• A dimension scale is not required to be a 1-D
array, or to have a specific datatype
• A dataset dimension can have more than 1
associated dimension scale
• A Dimension Scale can be shared by two or
more Dataset dimensions
Page 6
7. Example: 3D dataset in HDF5
Several Dimension Scales
Datasets
Dataset: 3D Array with
5 x 7 x 10
dimensions
Page 7
8. HDF5 Dimension Scale Metadata
• When the Dimension Scale is associated with
a dimension of a Dataset, the association is
represented by attributes of the two datasets.
• The following dataset attributes are used to
describe dimension scale datasets:
– Attribute named “CLASS” with the value
“DIMENSION_SCALE”
– Optional attribute named “NAME”
– Attribute references to any associated
Dataset
Page 8
9. • HDF4
– Since Version 4.0 ?
– SDsetdimscale()
– SDgetdimscale()
– SDsetdimstrs(): label, unit, format
– SDgetdimstrs()
• HDF5
– Since Version 1.8
– H5DSset_scale()
– H5DSattach_scale(), H5DSdetach_scale()
– H5DSset_label(), H5DSget_label()
– A few more APIs
Page 9
11. •
HDF-EOS
–
Added a few routines to HDF-EOS2 to
create dimension scales like HDF4
Version 2.17
–
Added a few routines to HDF-EOS5 to
create dimension scales like those added
by the HDF Augmentation Tool
Version 1.13
Page 11
19. • Name:
HE5_SWsetdimscale
• Signature:
– herr_t HE5_SWsetdimscale( hid_t swathID,
char *fieldname, char *dimname,
const hsize_t dimsize, hid_t numbertype,
void * data)
• Purpose:
– Sets dimension scale for a field dimension
within the swath
Page 19
20. • Name:
HE5_SWgetdimscale
• Signature:
– long HE5_SWgetdimscale( hid_t swathID,
char *fieldname, char *dimname,
hsize_t *dimsize, hid_t *numbertype,
void * data)
• Purpose:
– Gets dimension scale for a field dimension
within the swath
Page 20
21. • Name:
HE5_SWwritedscaleattr
• Signature:
– herr_t HE5_SWwritedscaleattr( hid_t swathID,
const char *dimname, const char *attrname,
hid_t ntype, hsize_t count[], void *datbuf)
• Purpose:
– Writes/Updates a dimension scale attribute in a
specific swath
Page 21
22. • Name:
HE5_SWreaddscaleattr
• Signature:
– herr_t HE5_SWreaddscaleattr( hid_t swathID,
const char *dimname, const char *attrname,
void *datbuf)
• Purpose:
– Reads a dimension scale attribute from a
specific dimension
Page 22
23. • Name:
HE5_SWinqdscaleattrs
• Signature:
– Long HE5_SWinqdscaleattrs( hid_t swathID,
const char *dimname, char *attrnames,
long *strbufsize)
• Purpose:
– Retrieve information about the attributes
defined
Page 23
24. • Name:
HE5_SWdscaleattrinfo
• Signature:
– herr_t HE5_SWdscaleattrinfo( hid_t swathID,
const char *dimname, const char *attrname,
hid_t *ntype, hsize_t *count)
count : Number of attribute elements
• Purpose:
– Returns information about attribute(s) in a
specific dimension scale
Page 24
25. • Similar APIs for Dimension Scales in
– Grid
– Zonal Average
Page 25