HDF Update
Mike Folk
The HDF Group
HDF and HDF-EOS Workshop X
November 29, 2006
HDF
Outline
• Organizational info
• HDF Software Update
• Other Activities of Interest
Organizational info
“The HDF Group” = “THG”

Founded Dec. 2006

Went solo July 15, 2006
Non-profit
THG mission
To support the vast community of HDF
users and to ensure the sustainable
development of HDF technologies and
t...
The HDF Team
Frank Baker
Christian Chilan
Peter Cao
Vailin Choi
Mike Folk
Anne Jennings
Barbara Jones
Quincey Koziol
James...
HDF Software Update
HDF4 update
Platforms to be dropped
• Operating systems
•
•
•
•
•
•
•
•

HPUX 11.00
Crays SV1 and TS IEEE
AIX 5.1 and 5.2
SGI IRIX64-6...
Platforms to be added
• Systems
•
•
•
•
•
•
•

MAC OSX 10.4 (Intel)
Solaris 2.* on Intel
Cray XT3
Windows 64-bit (?)
Linux...
New features
• Configuration
• Switched to use F77_FUNC macro for better
Fortran support (no hard-coded compilers
anymore!...
Bugs fixes
• Tools
• A lot of improvements to the hdp, hrepack,
hdiff and hdfimport utilites based on users’
feedback

• L...
HDF5 update
No new releases!
• Focus on HDF5 release 1.8
• HDF5-1.8.0 Alpha 5 release is available from:
hdf.ncsa.uiuc.edu/HDF5/releas...
Platforms to be dropped
• Operating systems
•
•
•
•
•
•

HPUX 11.00
MAC OS 10.3
AIX 5.1 and 5.2
SGI IRIX64-6.5
Linux 2.4
S...
Platforms to be added
• Systems
•
•
•
•
•

Alpha Open VMS
MAC OSX 10.4 (Intel)
Solaris 2.* on Intel (?)
Cray XT3
Windows 6...
New Features
in HDF5 1.8
HDF5 1.8 new library features
• Datatype and dataspace features
•
•
•
•
•
•
•
•

Serialized dataspaces and datatypes
Abili...
HDF5 1.8 – new library features
• Group revisions
•
•
•
•

Creation order access
Compact groups – small groups take less s...
HDF5 1.8 – new library features
• Link improvements
• External links -- can refer to objects in another file
• User define...
HDF5 1.8 – new library features
• Support for Unicode UTF-8 character set
• Shared header info – duplicate header info
sha...
HDF5 1.8– new APIs
• New extendible error-handling API
• New APIs to copy objects between files fast
• Dimension scale mod...
HDF5 1.8 – backward and
forward compatibility
HDF5 1.8 vs. 1.6.5
• Differences between 1.8 vs. 1.6.5
• Some file format changes
• Several new routines added
• Old APIs ...
Principle of
“Maximum file format compatibility”
Unless instructed otherwise, the HDF5 library will
write objects using th...
Command line tools
New features for old tools
• h5dump
• Dump data in binary format

• h5diff
• Compare dataset regions

• Parallel h5diff (p...
New HDF5 Tools
• h5copy
• Copies an group, dataset or named datatype from
one location to another location
• Copies within...
HDF Java Products
HDFView changes
• Quality improvements for HDF-java package
• Full documentation of hdf-java object package
• Test suite f...
Future work for Java
• Update HDF5 JNI APIs for HDF5 1.8 release
• Release HDFView 2.4 with bug fixes/new
features with HD...
Website Development for
HDF-EOS Tools &
Information Center
Website for HDF-EOS Tools
• THG now manages HDF-EOS web site
•
•
•
•

Registered domain names: hdfeos.net/.org/.com
Re-imp...
Website for HDF-EOS Tools
Other Activities of
Interest
Performance R&D
HDF5 - PnetCDF performance comparison
Flash I/O Benchmark (Checkpoint files)
PnetCDF

HDF5 collective

HDF5 independent

2...
PnetCDF4 - PnetCDF comparison

Bandwidth (MB/S)

PNetCDF collective

NetCDF4 collective

160
140
120
100
80
60
40
20
0
0

...
Collective I/O improvements
• HDF5 supports collective IO for non-regular
selections
• Collective IO for chunked storage i...
DOE Labs
Sandia
National
Laboratory

Lawrence
Livermore
National
Laboratory
DOE ASC* and Others
• Support HDF5 on major systems at Sandia &
Lawrence Livermore National Laboratories
• R&D efforts und...
Flight test
Flight test – collect, then process
Boeing HDF5 for flight test data
• Boeing 787 active archive
• 10 TB per flight-test day

• Must handle raw, real-time dat...
Product data
STE
P
Bioinformatics
caacaagccaaaactcgtacaa
Cgagatatctcttggaaaaact
gctcacaatattgacgtacaag
gttgttcatgaaactttcggta
Acaatcgttgacatt...
C# HDF5 API
for Agilent
Agilent C# project
• Why?
• Heavy use of C# at Agilent
• Compatibility with Matlab
• Other interest in HDF5 at Agilent

• ...
HDF5 Software
Tools & Applications

Fortran C++ Java C#
C API
HDF I/O Library

HDF File
NetCDF 4
NetCDF 4 project
• Enhanced NetCDF-4 Interface to HDF5
• Combine features of netCDF and HDF5
• Take advantage of their sep...
NetCDF-4 Architecture
netCDF-3
netCDF-3
applications
applications

netCDF
netCDF
files
files
netCDF-4
HDF5 files

HDF5
fil...
Archival formats
• Proposal to NOAA Scientific Data
Stewardship program
• Will investigate use of OAIS “Archive
Informatio...
Asymmetries between
collecting and accessing data
• Huge streams of data
collected …

• To be accessed in little
bits…
Challenge – efficient remote access
• How do we efficiently find and access data
from distributed repositories, when the d...
Example – Storage resource broker
• Storage Resource Broker – repository for
heterogeneous data collections
• Simplifies s...
Normal SRB configuration

client
HDF5
HDF5 File
(whole file or a
sequence of
bytes)

SRB Server

MCAT
OPeNDAP-HDF5 project
• OPeNDAP
• Powerful protocol for remote querying and
subsetting of scientific data
• Replaces direct...
OPeNDAP – HDF5 Project
• A NASA ROSES NRA project
• Tasks
•
•
•
•

HDF5-DAP2 server (now a prototype)
HDF5-DAP4 server
DAP...
SQL Server and HDF5
with Microsoft
SQL Server and HDF5
• Microsoft “dream environment for scientists”
• Combine data management, computing
• SQL Server 2005 ...
HDF5 in SQL server
Visualization

Libraries

(MATLAB,…)

Web Services

(XML, REST, RSS)

OLAP and
Data Mining

Reporting

...
Thank you all
and
Thank you NASA!
Acknowledgement
This report is based upon work supported in part by a
Cooperative Agreement with NASA under NASA
NNG05GC60...
Questions/comments?
Information Sources
• HDF website
http://hdfgroup.org/

• HDF5 Information Center
http://hdfgroup.org/HDF5/

• HDF Helpdes...
Upcoming SlideShare
Loading in …5
×

HDF Update

461 views

Published on

Update on HDF, including recent changes to the software, upcoming releases, collaborations, future plans. Will include an overview of the upcoming HDF5 1.8 release, and updates on the netCDF4/HDF5 merge, HDF5 support for indexing, BioHDF, the HDF5-Storage Resource Broker project, the NPOESS BAA, HDF5-OPeNDAP project, HDF-EOS library and website supports and the HDF spin-off THG.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
461
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • Investigate integrated DAP-aware HDF5 library, that could provide seamless access to both local and remote data
  • <number>
  • HDF Update

    1. 1. HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop X November 29, 2006 HDF
    2. 2. Outline • Organizational info • HDF Software Update • Other Activities of Interest
    3. 3. Organizational info
    4. 4. “The HDF Group” = “THG” Founded Dec. 2006 Went solo July 15, 2006 Non-profit
    5. 5. THG mission To support the vast community of HDF users and to ensure the sustainable development of HDF technologies and the ongoing accessibility of HDF-stored data.
    6. 6. The HDF Team Frank Baker Christian Chilan Peter Cao Vailin Choi Mike Folk Anne Jennings Barbara Jones Quincey Koziol James Laird Raymond Lu John Mainzer Matthew Needham Pedro Nunes Tammi O’Neill Elena Pourmal Binh-minh Ribler Randy Ribler Rishi Sinha Kent Yang And all those wonderful folks out there who contribute ideas, requests, bug reports, code, and support.
    7. 7. HDF Software Update
    8. 8. HDF4 update
    9. 9. Platforms to be dropped • Operating systems • • • • • • • • HPUX 11.00 Crays SV1 and TS IEEE AIX 5.1 and 5.2 SGI IRIX64-6.5 Linux 2.4 Solaris 2.7, 2.8, 2.9 Windows 2000 MAC OSX 10.3 • Compilers • GNU C compilers older than 3.4 (Linux) • Intel 8.* • PGI V. 5.*, 6.0
    10. 10. Platforms to be added • Systems • • • • • • • MAC OSX 10.4 (Intel) Solaris 2.* on Intel Cray XT3 Windows 64-bit (?) Linux 2.6 HPUX 11.23 IBM Power 5 • Compilers • g95 • PGI V. 6.1 • Intel 9.*
    11. 11. New features • Configuration • Switched to use F77_FUNC macro for better Fortran support (no hard-coded compilers anymore!) • Support for shared libraries • Library • No hard-coded limit on number of opened files • New APIs to control number of files opened by application • Fortran support for SZIP compression
    12. 12. Bugs fixes • Tools • A lot of improvements to the hdp, hrepack, hdiff and hdfimport utilites based on users’ feedback • Library • Data corruption bug for several opened unlimited dimension SDSs • Better handling of SDSs with duplicated names in SDgetdimscale and more
    13. 13. HDF5 update
    14. 14. No new releases! • Focus on HDF5 release 1.8 • HDF5-1.8.0 Alpha 5 release is available from: hdf.ncsa.uiuc.edu/HDF5/release/alpha/obtain518.html
    15. 15. Platforms to be dropped • Operating systems • • • • • • HPUX 11.00 MAC OS 10.3 AIX 5.1 and 5.2 SGI IRIX64-6.5 Linux 2.4 Solaris 2.8 and 2.9 • Compilers • GNU C compilers older than 3.4 (Linux) • Intel 8.* • PGI V. 5.*, 6.0 • MPICH 1.2.5 http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html
    16. 16. Platforms to be added • Systems • • • • • Alpha Open VMS MAC OSX 10.4 (Intel) Solaris 2.* on Intel (?) Cray XT3 Windows 64-bit (32-bit binaries) • Linux 2.6 • BG/L • Compilers • • • • • g95 PGI V. 6.1 Intel 9.* MPICH 1.2.7 MPICH2
    17. 17. New Features in HDF5 1.8
    18. 18. HDF5 1.8 new library features • Datatype and dataspace features • • • • • • • • Serialized dataspaces and datatypes Ability to create data type from text description Integer to float conversions during I/O Revised exception handling during type conversion Compact storage for N-bit data types Offset+size storage filter, saving space “Null” dataspace – datasets with no elements Data transformation filter
    19. 19. HDF5 1.8 – new library features • Group revisions • • • • Creation order access Compact groups – small groups take less space Large group storage improvements Intermediate group creation
    20. 20. HDF5 1.8 – new library features • Link improvements • External links -- can refer to objects in another file • User defined links – apps create own kinds of links • Attribute improvments • Storage improvements for large numbers of attr • Iterate or look up by creation order
    21. 21. HDF5 1.8 – new library features • Support for Unicode UTF-8 character set • Shared header info – duplicate header info shared, possibly saving space • Metadata cache improvements – faster I/O on files with many objects • Data transformation filter • Stackable Virtual File Drivers • Better UNIX/Linux portability
    22. 22. HDF5 1.8– new APIs • New extendible error-handling API • New APIs to copy objects between files fast • Dimension scale model and API • “HDFpacket” – API to read/write packets efficiently
    23. 23. HDF5 1.8 – backward and forward compatibility
    24. 24. HDF5 1.8 vs. 1.6.5 • Differences between 1.8 vs. 1.6.5 • Some file format changes • Several new routines added • Old APIs deprecated -- removed in later release • Consequences • Application requiring 1.8 format changes will write objects that 1.6.5 library cannot read • To exploit 1.8 changes, apps need to be rewritten
    25. 25. Principle of “Maximum file format compatibility” Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing the information. Assures forward compatibility with the older versions whenever possible – objects in new files can be read with old libraries if those objects are “known” to the old libraries.
    26. 26. Command line tools
    27. 27. New features for old tools • h5dump • Dump data in binary format • h5diff • Compare dataset regions • Parallel h5diff (ph5diff) • Compare two files in MPI parallel environment • h5repack • Efficient data copy using H5Gcopy() • Able to handle big datasets
    28. 28. New HDF5 Tools • h5copy • Copies an group, dataset or named datatype from one location to another location • Copies within a file or across files • h5check • Verifies an HDF5 file against the defined HDF5 File Format Specification • h5stat • Reports statistics about a file and objects in a file
    29. 29. HDF Java Products
    30. 30. HDFView changes • Quality improvements for HDF-java package • Full documentation of hdf-java object package • Test suite for hdf-java object package • Support 64-bit Java on Linux and Solaris • Many new features, including • • • • • Change font size easily Grab and move image Create new table (compound dataset) from template Filter out fill value for image creation -geometry option for very high resolution displays
    31. 31. Future work for Java • Update HDF5 JNI APIs for HDF5 1.8 release • Release HDFView 2.4 with bug fixes/new features with HDF5 1.8 release • New GUI features dealing with table, image and animation • Writing capability for HDF5-SRB model
    32. 32. Website Development for HDF-EOS Tools & Information Center
    33. 33. Website for HDF-EOS Tools • THG now manages HDF-EOS web site • • • • Registered domain names: hdfeos.net/.org/.com Re-implemented major topic areas Re-designed interface Registered google search • Will continue maintenance • Phase two • Host mailing list • Support simple forum features
    34. 34. Website for HDF-EOS Tools
    35. 35. Other Activities of Interest
    36. 36. Performance R&D
    37. 37. HDF5 - PnetCDF performance comparison Flash I/O Benchmark (Checkpoint files) PnetCDF HDF5 collective HDF5 independent 2500 MB/s 2000 1500 1000 uP: Power 5 500 0 10 110 210 310 Number of Processors I/O performance of PnetCDF is comparable with parallel HDF5 when the libraries are used in similar manners.
    38. 38. PnetCDF4 - PnetCDF comparison Bandwidth (MB/S) PNetCDF collective NetCDF4 collective 160 140 120 100 80 60 40 20 0 0 16 32 48 64 80 96 112 128 144 Number of processors I/O performance of parallel NetCDF4 is comparable with PnetCDF with about 15% slowness on average for the output of ROMS history file.
    39. 39. Collective I/O improvements • HDF5 supports collective IO for non-regular selections • Collective IO for chunked storage is not trivial. • Non-regular selection performance optimizations: • Added IO options to achieve good collective IO performance • Added APIs for applications to participate in the optimization process • See the poster
    40. 40. DOE Labs Sandia National Laboratory Lawrence Livermore National Laboratory
    41. 41. DOE ASC* and Others • Support HDF5 on major systems at Sandia & Lawrence Livermore National Laboratories • R&D efforts underway • • • • File recovery after a crash Very fast write speed – goal is 300 MB/sec Read-while-writing capability Java library and HDFView improvements * Advanced Scientific Computing project
    42. 42. Flight test
    43. 43. Flight test – collect, then process
    44. 44. Boeing HDF5 for flight test data • Boeing 787 active archive • 10 TB per flight-test day • Must handle raw, real-time data • High speed ingest, by “packet” • Post-processing, by “time-history” • Boeing High Level API’s • HDFpacket – released with HDF5 1.8 • HDFtime_history – new, open version likely
    45. 45. Product data STE P
    46. 46. Bioinformatics caacaagccaaaactcgtacaa Cgagatatctcttggaaaaact gctcacaatattgacgtacaag gttgttcatgaaactttcggta Acaatcgttgacattgcgacct aatacagcccagcaagcagaat Managing genomic data
    47. 47. C# HDF5 API for Agilent
    48. 48. Agilent C# project • Why? • Heavy use of C# at Agilent • Compatibility with Matlab • Other interest in HDF5 at Agilent • What? • Prototype API in C# for Windows XP • Basic functions to create, open, close, read, write • Limited datatypes, no partial I/O • When? • March 2007
    49. 49. HDF5 Software Tools & Applications Fortran C++ Java C# C API HDF I/O Library HDF File
    50. 50. NetCDF 4
    51. 51. NetCDF 4 project • Enhanced NetCDF-4 Interface to HDF5 • Combine features of netCDF and HDF5 • Take advantage of their separate strengths • Collaboration between NCSA, THG, Unidata • Currently in Alpha Release • Waiting for beta release
    52. 52. NetCDF-4 Architecture netCDF-3 netCDF-3 applications applications netCDF netCDF files files netCDF-4 HDF5 files HDF5 files netCDF-4 netCDF-4 applications applications HDF5 HDF5 applications applications netCDF-3 Interface netCDF-4 Library HDF5 Library • Supports access to netCDF files and HDF5 files created through netCDF-4 interface
    53. 53. Archival formats • Proposal to NOAA Scientific Data Stewardship program • Will investigate use of OAIS “Archive Information Package” standard with HDF5 • PI: Ruth Duerr (NSIDC) and Kent Yang OAIS: Open Archival Information System
    54. 54. Asymmetries between collecting and accessing data
    55. 55. • Huge streams of data collected … • To be accessed in little bits…
    56. 56. Challenge – efficient remote access • How do we efficiently find and access data from distributed repositories, when the data are big and complex? • Storage Resource Broker (SRB) • Efficient access to HDF5 objects in repository • OPeNDAP • Powerful protocol for remote querying and subsetting of scientific data
    57. 57. Example – Storage resource broker • Storage Resource Broker – repository for heterogeneous data collections • Simplifies storage, query and access to massive amounts of scientific data • Has data in HDF5, netCDF, other formats
    58. 58. Normal SRB configuration client HDF5 HDF5 File (whole file or a sequence of bytes) SRB Server MCAT
    59. 59. OPeNDAP-HDF5 project • OPeNDAP • Powerful protocol for remote querying and subsetting of scientific data • Replaces direct file access with remote query and access • Widely used in Earth Sciences
    60. 60. OPeNDAP – HDF5 Project • A NASA ROSES NRA project • Tasks • • • • HDF5-DAP2 server (now a prototype) HDF5-DAP4 server DAP4 to HDF5 conversion utility Investigate integrated DAP-aware HDF5 library
    61. 61. SQL Server and HDF5 with Microsoft
    62. 62. SQL Server and HDF5 • Microsoft “dream environment for scientists” • Combine data management, computing • SQL Server 2005 solution • Combine RDBMS with scientific analysis tools, together in one integrated system. • HDF5 & other formats manage scientific objects
    63. 63. HDF5 in SQL server Visualization Libraries (MATLAB,…) Web Services (XML, REST, RSS) OLAP and Data Mining Reporting .NET Languages with Language Integrated Query Entity Framework (EDM, eSQL, O-R mapping) HDF5 EDM model SQL Server HDF5 HDF5 TVFs Index HDF5 type HDF5 FS blob HDF5 files
    64. 64. Thank you all and Thank you NASA!
    65. 65. Acknowledgement This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration.
    66. 66. Questions/comments?
    67. 67. Information Sources • HDF website http://hdfgroup.org/ • HDF5 Information Center http://hdfgroup.org/HDF5/ • HDF Helpdesk hdfhelp@hdfgroup.org • HDF users mailing list hdfnews@ncsa.uiuc.edu coming soon: news@hdfgroup.org

    ×