HDF Status and Development

327 views

Published on

An update on HDF, including a status report on The HDF Group, an overview of recent changes to the HDF4 and HDF5 libraries and tools, plans for future releases, HDF Group projects and collaborations, and future plans.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
327
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Why
    Increasing need for support, services, quick response
    Not a good model for a University R&D project
    Who
    11 software engineers and several students: develop, maintain HDF software, work on special projects, manage projects
    3 tech support staff: helpdesk, doc, sysadmin.
    Management team
    President
    Director of Technical Services and Operations
    Director of Software Development
    Director of Business Operations
    Managers responsible for tools, applications
    Other THG staff include seven full-time software engineers who develop and maintain the HDF software, as well as working on special projects, and three technical support staff who provide helpdesk support, documentation, and system administration. The HDF group also generally employs students from the University Computer Science and Engineering departments.
  • The R&D mission
    Maintain and evolve HDF for high end science apps
    Maintain HDF4 and HDF5 and tools at supercomputing centers, TeraGrid
    Support academic science
    Cutting edge data management research
    Adapt to leading edge, experimental architectures
    Integrate with new middleware technologies, parallel file systems
    The “Support and Sustain” mission
    Maintain, evolve for communities, sponsors
    Provide proprietary consulting, tuning, development
    Sustain for long term, maintain data access over time
  • Please mention here that HDF5 maintenance releases are on a half year basis and HDF4 maintenance releases are on yearly basis, i.e., next maintenance release of HDF5 1.6 and 1.8 will be May 2009, and HDF4 in November 2009
  • Options to dump data into ASCII (compatible w. h5import and Excel)
  • - As long as the file system is POSIX compliant
    Other processes can be on other systems (as long as shared file system is POSIX compliant)
  • Store Partial Edge Chunks More Efficiently
    Allow application to control whether partially used chunks at edges of datasets are compressed and/or allocated as full chunks in file.
    Persistent File Free Space tracking
    No more “forgetting where all the free space in the file is” when the file is closed
    Allow a group’s heaps (which store link info) to be compressed
  • HDF Status and Development

    1. 1. The HDF Group HDF Update Mike Folk The HDF Group The 13th HDF and HDF-EOS Workshop November 3-5, 2009 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 1 www.hdfgroup.org
    2. 2. Topics November 3-5, 2009 HDF/HDF-EOS Workshop XIII 2 www.hdfgroup.org
    3. 3. The HDF Group What’s up with The HDF Group? November 3-5, 2009 HDF/HDF-EOS Workshop XIII 3 www.hdfgroup.org
    4. 4. The HDF Group What is The HDF Group And why does it exist? November 3-5, 2009 HDF/HDF-EOS Workshop XIII 4 www.hdfgroup.org
    5. 5. The HDF Group • Established in 1988 • 18 years at University of Illinois National Center for Supercomputing Applications • 4 years an independent non-profit company “The HDF Group” • The HDF Group owns HDF4 and HDF5 • Basic HDF4 and HDF5 formats, libraries and tools are open and free November 3-5, 2009 HDF/HDF-EOS Workshop XIII 5 www.hdfgroup.org
    6. 6. Data challenges addressed by HDF • Our ability to organize complex collections of data • Efficient and scalable data storage and access • A growing need to integrate a wide variety of types of data • Long term preservation of data November 3-5, 2009 HDF/HDF-EOS Workshop XIII 6 www.hdfgroup.org
    7. 7. The HDF Group The HDF Group Mission To ensure long-term accessibility of HDF data through sustainable development and support of HDF technologies. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 7 www.hdfgroup.org
    8. 8. Goals • Maintain and evolve HDF for sponsors and communities that depend on it • Provide support to the HDF communities through consulting, training, tuning, development, research • Sustain The HDF Group for the long term to assure data access over time November 3-5, 2009 HDF/HDF-EOS Workshop XIII 8 www.hdfgroup.org
    9. 9. The HDF Group Services • Helpdesk and Mailing Lists • Available to all users as a first level of support • Standard Support • Rapid issue resolution and advice • Consulting • Needs assessment, troubleshooting, design reviews, etc. • Training • Tutorials and hands-on practical experience • Enterprise Support • Supporting many HDF activities across organizations • Special Projects • Adapting customer applications to HDF • New features and tools • Research and Development November 3-5, 2009 HDF/HDF-EOS Workshop XIII 9 www.hdfgroup.org
    10. 10. Members of the HDF support community • • • • • • • • • NASA – EOS NOAA/NASA/Riverside Tech – NPOESS Army Geospatial Center A leading U.S. aerospace company NIH/Geospiza (bio software company ) University of Illinois/NCSA Sandia National Laboratory (2) Lawrence Berkeley National Lab Projects for petroleum industry, vehicle testing, weapons research, others • “In kind” support November 3-5, 2009 HDF/HDF-EOS Workshop XIII 10 www.hdfgroup.org
    11. 11. Some areas of increased recent interest • Improvements • • • • Concurrent access Parallel I/O performance Real-time write performance High level language support • Life sciences • Sequencing • Biomedical imaging • Database integration • Microsoft products (HPC, .NET, others) November 3-5, 2009 HDF/HDF-EOS Workshop XIII 11 www.hdfgroup.org
    12. 12. Cool recent application Imageworks’ Field3D Spiderman 3 November 3-5, 2009 The Polar Express HDF/HDF-EOS Workshop XIII 12 www.hdfgroup.org
    13. 13. Topics November 3-5, 2009 HDF/HDF-EOS Workshop XIII 13 www.hdfgroup.org
    14. 14. The HDF Group Basic Library Releases HD F5 HDF4 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 14 4 HDF www.hdfgroup.org
    15. 15. Time-line of the HDF libraries releases November 3-5, 2009 HDF/HDF-EOS Workshop XIII 15 www.hdfgroup.org
    16. 16. HDF5 1.8.3 minor release (May 09) • New functions • Improve flexibility when traversing external links • Validate object identifier • Enabled data chunk cache properties to be set per dataset (per file in previous releases) • Forward/backward compatibility issues • Modified library to be able to open files with corrupt root group symbol table messages • Also corrects corruption errors if found. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 16 www.hdfgroup.org
    17. 17. HDF5 1.8.4 minor release (Nov 09) • Modified configure and make process to properly preserve user's CFLAGS and similar environment variables. • Corrected a problem where library would rewrite the superblock in a file opened for R/W access, even when no changes were made to the file. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 17 www.hdfgroup.org
    18. 18. HDF5 1.6 minor releases • 1.6.9 May 09 • Minor bug fixes • Same tools improvements as in 1.8.3 • 1.6.10 Nov 09 • Minor bug fixes • Ability to embed library information in executable binaries • This is a last release of 1.6 series • announced in May 2009 – no response • This is your last chance! November 3-5, 2009 HDF/HDF-EOS Workshop XIII 18 www.hdfgroup.org
    19. 19. HDF 4r2.4 minor release (Feb 09) • • • • • Minor bug fixing, enhancements New routines to get size of compressed data Support for C shared libraries Support for 32-bit version on Mac Intel Updated docs in HTML and PDF November 3-5, 2009 HDF/HDF-EOS Workshop XIII 19 www.hdfgroup.org
    20. 20. HDF 4r2.5 minor release (Feb 10) • Minor bug fixes, enhancements • Support for 64-bit version on Mac Intel • Restructured and cleaned up source code for easier maintenance • Changes in versioning • Improves ability to maintain • Becomes similar HDF5 versioning works • Will use major, minor, release and sub-release suffix in the names of the source tar balls • E.g., hdf-4.2.5, hdf-4.2.5-snap0 • Library string will include suffix • E.g., "HDF Version 4.2 Release 4-snap3, October 18, 2009" November 3-5, 2009 HDF/HDF-EOS Workshop XIII 20 www.hdfgroup.org
    21. 21. H4-H5 Conversion Software 2.1 (Feb 09) • Based on HDF4r2.4 and HDF5-1.8.2 • h4toh5 utility • Recognizes HDF-EOS2 files (--with-hdfeos2 configuration option) • Can generate HDF5 files that can be read by netCDF-4 • h4toh5 library • Bug fixes • Performance improvements • http://hdfgroup.org/h4toh5/ November 3-5, 2009 HDF/HDF-EOS Workshop XIII 21 www.hdfgroup.org
    22. 22. H4-H5 Conversion Software 2.2 (Feb 10) • Based on HDF4r2.5 and HDF5-1.8.4 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 22 www.hdfgroup.org
    23. 23. Topics November 3-5, 2009 HDF/HDF-EOS Workshop XIII 23 www.hdfgroup.org
    24. 24. Major Improvements for Existing Tools • H5dump additions • Ability to show data pointed to by dataset region references. • More options for dumping data into ASCII • Compatible with MS Excel • Compatible with h5import • h5diff • Improvements in accuracy, flexibility, and performance • Some new flags • Report non-comparable objects • Avoid NaN detection • Option to use system epsilon to compare floating-point numbers • Compares for strict equality first to improve performance • Treats two INFINITY values as equal • Fixed segmentation fault problem on variable length strings. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 24 www.hdfgroup.org
    25. 25. Major Improvements for Existing Tools • h5stat • Fixed incorrect statistics on EOS big data files with corrupted headers. • h5repack • Added ability to preserve group creation order • When chunk size not specified, uses heuristics to set chunk size • Fixed problem that 1.8 fails on a file created with 1.6. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 25 www.hdfgroup.org
    26. 26. Tool activities in the works • New tool -- h5tail • Display new records appended to a dataset • Improved code quality and testing • Tools library: general purpose APIs for tools • Tools library currently only for our developers • Want to make it public so that people can use it in their products November 3-5, 2009 HDF/HDF-EOS Workshop XIII 26 www.hdfgroup.org
    27. 27. Conversion Tools Please send us your comments and requests regarding HDF5 conversion tools, such as • • • • HDF4 to HDF5 HDF5 to jpeg HDF5 to XML HDF5 to other formats? November 3-5, 2009 HDF/HDF-EOS Workshop XIII 27 www.hdfgroup.org
    28. 28. Topics November 3-5, 2009 HDF/HDF-EOS Workshop XIII 28 www.hdfgroup.org
    29. 29. HDF-Java 2.6 is on the way • Includes all HDF java products • Java Wrapper API • Java Object API • HDFView • Adds new features, such as better support for dataset region references • Improves performance • Release schedule • Beta 1: end of Nov. 09 • Full release: end of Dec. 09 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 29 www.hdfgroup.org
    30. 30. Full support of HDF5 1.8.x in hdf-java • Full HDF5 1.8 support will be added to the release after version 2.6. • We are looking for input • RFC: http://www.hdfgroup.uiuc.edu/RFC/HDF5/hdf-java/ • Java wrapper will be completed March 2010 • Object API and HDFView update to come later November 3-5, 2009 HDF/HDF-EOS Workshop XIII 30 www.hdfgroup.org
    31. 31. Topics November 3-5, 2009 HDF/HDF-EOS Workshop XIII 31 www.hdfgroup.org
    32. 32. Single-Writer/Multiple-Reader Access • Situation: A long-running process is modifying an HDF5 file and simultaneously other processes want to inspect data in the file. • Solution: Single-Writer/Multiple-Reader (SWMR) File Access. • Allows simultaneous reading of HDF5 file while the file is being modified by another process • No inter-process coordination necessary November 3-5, 2009 HDF/HDF-EOS Workshop XIII 32 www.hdfgroup.org
    33. 33. Improved Multi-Threaded Concurrency • Converting from “big lock” on code (entire library) to locks on internal library data structures • Will improve ability to have multiple threads performing HDF5 operations simultaneously November 3-5, 2009 HDF/HDF-EOS Workshop XIII 33 www.hdfgroup.org
    34. 34. Other Library Features • Saving space • Store Partial Edge Chunks More Efficiently • Persistent File Free Space tracking/recovery • Allow a group’s link info to be compressed • Saving time • Aggregate neighboring metadata for faster metadata cache I/O November 3-5, 2009 HDF/HDF-EOS Workshop XIII 34 www.hdfgroup.org
    35. 35. New chunk indexing methods Dataset type Index type Space improvements Speed improvements no unlimited dimensions, no filters, no missing chunks “implicit” no actual chunk index Same storage space as contiguous dataset storage (no index) Constant time lookups Faster parallel I/O no unlimited dimensions “fixed sized” smaller chunk index Smaller index overhead Constant time lookups 1 unlimited dimension “extensible array” Smaller index overhead Constant time lookups and appends 2+ unlimited dimension Improved B-tree* Smaller index overhead Faster November 3-5, 2009 HDF/HDF-EOS Workshop XIII 35 www.hdfgroup.org
    36. 36. Parallel I/O Improvements • Project with Lawrence Berkeley Nat’l Lab to improve HDF5 performance on parallel applications • Up to 6x performance improvements on certain applications (so far) November 3-5, 2009 HDF/HDF-EOS Workshop XIII 36 www.hdfgroup.org
    37. 37. Topics November 3-5, 2009 HDF/HDF-EOS Workshop XIII 37 www.hdfgroup.org
    38. 38. The HDF Group HDF-EOS library November 3-5, 2009 HDF/HDF-EOS Workshop XIII 38 www.hdfgroup.org
    39. 39. EOS support • HDF-EOS2 and HDF-EOS5 • Automatic configuration with szip enabled/disabled • Now tested daily with HDF4 and HDF5 development code • Updated the HDF-EOS website November 3-5, 2009 HDF/HDF-EOS Workshop XIII 39 www.hdfgroup.org
    40. 40. The HDF Group HDF-EOS5/netCDF-4 Augmentation Tool Accessing HDF-EOS5 files via netCDF-4 API November 3-5, 2009 HDF/HDF-EOS Workshop XIII 40 www.hdfgroup.org
    41. 41. The Main Challenge • Would like netCDF-4 applications to be able to read and understand HDF-EOS 5 files • Problem: NetCDF-4 model follows the HDF5 dimension scale model but HDF-EOS5 does not. HDFEOS GRIDS No HDF5 dimension No CloudFractionAndPressure HDF5 dimension scales are associated scales are associated Data Fields with this variable with this variable CloudFraction CloudPressure November 3-5, 2009 41 HDF/HD F-EOS Worksh op XIII www.hdfgroup.org
    42. 42. Our Solution – Augmentation • Provide dimensions required by netCDF-4 HDFEOS GRIDS CloudFractionAndPressure Data Fields CloudFraction[XDim][YDim] CloudPressure[XDim][YDim] XDim YDim November 3-5, 2009 42 HDF/HD F-EOS Worksh op XIII www.hdfgroup.org
    43. 43. Special values in HDF5 • There are cases where a user may wish to specify more than one “special” value to describe non-standard data. • We provide several examples (C, Fortran, IDL) on how to store special values • http://www.hdfgroup.org/pubs/rfcs/ November 3-5, 2009 HDF/HDF-EOS Workshop XIII 43 www.hdfgroup.org
    44. 44. The HDF Group OPeNDAP November 3-5, 2009 HDF/HDF-EOS Workshop XIII 44 www.hdfgroup.org
    45. 45. OPeNDAP • HDF5-OPeNDAP handler • Served OMI Swath data • HDF4-OPeNDAP handler • Tested with some AIRS data and some MODIS data • More information in the Thursday morning session November 3-5, 2009 HDF/HDF-EOS Workshop XIII 45 www.hdfgroup.org
    46. 46. Swath to Grid conversion Tool • • • • Request from NASA GES DISC Convert Swath to Grid Support both HDF-EOS2 and TRMM data Still in the development MODIS Swath Converted Grid November 3-5, 2009 HDF/HDF-EOS Workshop XIII 46 www.hdfgroup.org
    47. 47. The HDF Group Support for NPP/NPOESS by The HDF Group November 3-5, 2009 HDF/HDF-EOS Workshop XIII 47 www.hdfgroup.org
    48. 48. Priorities for 2008-2009 • Data accessibility and usability • Developed library of high level APIs to support NPP/NPOESS data management • Modified h5dump to display region references • Modified HDFView to view object and region references and quality flags • System maintenance • User support November 3-5, 2009 HDF/HDF-EOS Workshop XIII 48 www.hdfgroup.org
    49. 49. NPOESS Project Information • Project Web site • http://www.hdfgroup.org/projects/npoess/ November 3-5, 2009 HDF/HDF-EOS Workshop XIII 49 www.hdfgroup.org
    50. 50. HDF4 LAYOUT MAPS November 3-5, 2009 HDF/HDF-EOS Workshop XIII 50 www.hdfgroup.org
    51. 51. HDF4 Layout Map Project • Problem • Long-term readability of HDF data depends on long-term availability of software • Proposed solution • Create a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data November 3-5, 2009 HDF/HDF-EOS Workshop XIII 51 www.hdfgroup.org
    52. 52. A Project with the Army Geospatial Center TRANSFORMING THE GEOCOMPUTATIONAL BATTLESPACE FRAMEWORK WITH HDF5 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 52 www.hdfgroup.org
    53. 53. Data Challenges Military Decision Making Wide variety Satellite Buckeye November 3-5, 2009 Large scale Culture High res. Stream HDF/HDF-EOS Workshop XIII High efficiency Accuracy 53 Time www.hdfgroup.org
    54. 54. NIH STTR with Geospiza, Seattle WA BIOHDF :TOWARD SCALABLE BIOINFORMATICS INFRASTRUCTURES November 3-5, 2009 HDF/HDF-EOS Workshop XIII TM 54 www.hdfgroup.org
    55. 55. Next Generation DNA Sequencing NGS is Powerful “Transforms today’s biology” “Democratizing genomics” “Genome center in a mail room” “Changing the landscape” November 3-5, 2009 HDF/HDF-EOS Workshop XIII 55 www.hdfgroup.org
    56. 56. … And Daunting “Prepare for the deluge” “Byte-ing off more than you can chew” November 3-5, 2009 HDF/HDF-EOS Workshop XIII 56 www.hdfgroup.org
    57. 57. BioHDF Project • Goal: Move bioinformatics problems from organizing and structuring data to asking questions and visualizing data • Develop data models and tools to work with NGS data in HDF5 • Create HDF5 domain-specific extensions and library modules to support the unique aspects of NGS data  BioHDF • Integrate BioHDF technologies into Geospiza products • Deliver core BioHDF technologies to the community as open-source software November 3-5, 2009 HDF/HDF-EOS Workshop XIII 57 www.hdfgroup.org
    58. 58. The HDF Group Thank You All and Thank You NASA! November 3-5, 2009 HDF/HDF-EOS Workshop XIII 58 www.hdfgroup.org
    59. 59. Acknowledgements • This report is based on work supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). • Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 59 www.hdfgroup.org
    60. 60. The HDF Group Questions/comments? November 3-5, 2009 HDF/HDF-EOS Workshop XIII 60 www.hdfgroup.org

    ×