HDF Update
Mike Folk
The HDF Group
HDF and HDF-EOS Workshop XI
November 7, 2007

02/18/14

The HDF Group

1
Outline
• What is The HDF Group?
• HDF Software Update
• Other Activities of Interest

02/18/14

The HDF Group

2
What is
The HDF Group
(THG)?

02/18/14

The HDF Group

3
THG, the Company
•
•
•
•

Spun-off from University of Illinois July 2006
Non-profit
20+ scientific, technology, professional staff
Intellectual property:
− THG owns HDF4 and HDF5
− HDF formats and libraries to remain open
− Libraries have BSD-type license

• Continue ties to U of I and NCSA

02/18/14

The HDF Group

4
The mission of The HDF Group
is to ensure long-term
accessibility of HDF data through
sustainable development and
support of HDF technologies.

02/18/14

The HDF Group

5
Goals
• Maintain, evolve HDF for sponsors and
communities that depend on it
• Do consulting, training, tuning, development,
research
• Sustain The HDF Group for long term to assure
data access over time

02/18/14

The HDF Group

6
THG Services
•
•
•
•
•

•

Helpdesk and Mailing Lists
− Available to all users as a first level of support
Standard Support
− Rapid issue resolution support
Consulting
− Needs assessment, troubleshooting, design reviews, etc.
Enterprise Support
− Coordinating HDF activities across divisions
Special Projects
− Adapting customer applications to HDF
− New features and tools, with changes normally incorporated into
open source product
− Research and Development
Training
− Tutorials and hands-on practical experience

02/18/14

The HDF Group

7
HDF Software Update

02/18/14

The HDF Group

8
HDF4 update

02/18/14

The HDF Group

9
HDF 4.2r2
Released in October

02/18/14

The HDF Group

10
New features and changes
• New APIs added to the SD and GR interfaces:
− SDreset_maxopenfiles, SDget_maxopenfiles, Modifies, reports
maximum allowable number of files
− SDget_numopenfiles:Gets number of open files
− SDgetcompinfo, GRgetcompinfo: Gets compression info
− SDgetfilename: Retrieves name of file, given its ID
− SDgetnamelen: Retrieves length of object name, given its ID

• SZIP compression
− Now can be invoked by Fortran API
− Now available for raster images via GR interface

• SDS, Vgroup names no longer limited to 64 characters

02/18/14

The HDF Group

11
New features and changes
• HDF configuration changes
− --enable-netcdf flag introduced
− Autotools versions updated

• Many bug fixes made to hrepack and hdiff
• See RELEASE.txt for a full list of changes

02/18/14

The HDF Group

12
Platforms to drop/add next release
• Drop
− Windows XP with MSVC+
+ 6.0
− Linux 2.4
− IRIX64 6.5
− SunOS 5.8, 5.9

02/18/14

The HDF Group

• Add
− Windows 64-bit (32 and
64-bit binaries)

13
Platforms tested
•

• Compilers

Systems
−
−
−
−
−
−
−
−

AIX 5.3 (32-bit, 64-bit)
Free BSD 6.2 (32-bit, 64-bit)*
HP-UX B.11.23 (32-bit, 64-bit)*
IRIX 64 v6.5 (32-bit, 64-bit)
Linux 2.4, 2.6*
Linux ia64
Linux x86_64
Sun OS 5.8, 5.10* (32-bit, 64bit)
− SunOS 5.10 on Intel
− Windows XP, Vista
− Mac OS X Intel*

−
−
−
−
−
−
−
−
−

IBM C and Fortran compilers
GNU gcc 3.4* and GNU Fortran
HPUX C and Fortran compilers
GNU gcc 3.4 and 4.*
Intel C and Fortran versions 9.1 and
10.00
SUN WorkShop C and Fortran
Visual Studio .NET and 2005 and
Intel Fortran
Visual Studio 2005 (no fortran)
GNU gcc 4.0.1 with gfortran and
g95

* New platforms
For detailed info, see RELEASE.txt

02/18/14

The HDF Group

14
HDF5 Update

02/18/14

The HDF Group

15
HDF5 1.6.6

02/18/14

The HDF Group

16
HDF5 1.6.6 release
• Primarily a bug-fix release
• Some tool changes (see later slide)
• http://hdfgroup.org/HDF5/release/obtain5.html

02/18/14

The HDF Group

17
Platforms dropped
• Operating systems
−
−
−
−

• Compilers

− PGI 6.5-*
AIX 5.3
Solaris 2.8 and 2.9
OSF1
Windows XP with MSVC++ 6.0

http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html

02/18/14

The HDF Group

18
Platforms added
•

Systems
− Alpha Open VMS
− MAC OSX 10.4 (Intel)
− Solaris 2.* on Intel
− Cray XT3
− Windows 64-bit (32 and 64bit)
− BG/L

02/18/14

The HDF Group

• Compilers
−
−
−
−

PGI V. 7.*
Intel 10.*
MPICH 1.2.7
MPICH2

19
HDF5 1.8

02/18/14

The HDF Group

20
HDF5 1.8 new library features
• Datatype and dataspace features
−
−
−
−
−
−

Create datatype from text description
Integer to float conversions during I/O
Compact storage for N-bit datatypes
Offset+size storage filter, saving space
“Null” dataspace – datasets with no elements
Data transformation filter

02/18/14

The HDF Group

21
HDF5 1.8 – new library features
• Group improvements
−
−
−
−

Creation order access
Compact groups – small groups take less space
Large group storage improvements
Intermediate group creation

• Link improvements
− Unicode names allowed
− External links – to objects in another file
− User defined links – create own kinds of links

02/18/14

The HDF Group

22
HDF5 1.8 – new library features
• Attribute improvements
− Improved storage for large number of attributes
− Iterate or look up by creation order
− Unicode names allowed

• Support for Unicode UTF-8 character set
• Shared header information, possibly saving space
• Metadata cache improvements – faster I/O on
files with many objects
• Better UNIX/Linux portability

02/18/14

The HDF Group

23
HDF5 1.8 – new APIs
•
•
•
•

New extendible error-handling API
New APIs to copy objects between files quickly
Dimension scale model and API
“HDFpacket” API, to read/write packets efficiently

02/18/14

The HDF Group

24
HDF5 1.8 – Backward and
Forward Compatibility

02/18/14

The HDF Group

25
HDF5 1.8 and 1.6
• Differences between 1.8 and 1.6.x
− Some file format changes
− Several new routines added
− Old APIs deprecated – may be removed in later
release

• Consequences
− Applications requiring 1.8 format changes will
generate objects that cannot be read by 1.6 library
− To exploit 1.8 changes, applications need to be
rewritten

02/18/14

The HDF Group

26
“The art of progress is to
preserve order amid change, and
to preserve change amid order.”
Alfred North Whitehead

02/18/14

The HDF Group

27
Principle of
Maximum File Format Compatibility
Unless instructed otherwise, the HDF5 library will write objects
using the earliest version of the format possible for describing
the information.
information
Assures older library versions are forward compatible whenever
possible:
− Objects in new files can be read with old versions of the library,
if the objects are “known” to the old libraries.
− New versions of the library can always read objects in files
written with older versions.

02/18/14
02/18/14

The HDF GroupGroup
The HDF

28
28
Command Line Tools

02/18/14
02/18/14

The HDF GroupGroup
The HDF

32
32
New features for existing tools
• -V option for all tools
− Prints HDF5 library version number used by tool

• h5repack: -L option
− Use latest version of file format to create objects

• h5dump: dumps groups/attributes in creation or
name order
− -q Q, --sort_by=Q Sort groups and attributes by index Q
− -z Z, --sort_order=Z Sort groups and attributes by order Z

02/18/14
02/18/14

The HDF GroupGroup
The HDF

33
33
New command line tools
• h5mkgrp
− Creates new groups and group hierarchies in an HDF5 file

• h5stat
− Provides statistics regarding the file, such as number of
objects per group, sizes of datasets, amount of free space in
file

• h5copy
− Copy object within a file or cross files

• h5check
− Verifies an HDF5 file against the defined HDF5 File Format
Specification
− Completed for 1.6.
− In progress for 1.8

02/18/14
02/18/14

The HDF GroupGroup
The HDF

34
34
Tool work in the pipeline
• Export numeric data formatted in several different
ways (such as MS excel, XML, etc)
• Import ASCII data that conforms to certain format
• Use a common text format for h5import and
h5dump
• Support NaN in tools such as h5diff.
Challenges:
− NaN is platform specific
− NaN can have different values for the same
machine
− Checking NaN can be a performance hit
02/18/14
02/18/14

The HDF GroupGroup
The HDF

35
35
HDF Java Products

02/18/14
02/18/14

The HDF GroupGroup
The HDF

36
36
HDF5 Java is Growing UP

02/18/14

The HDF Group

37
HDFView changes
• HDFView 2.4 released
• Many new features, such as
−
−
−
−
−

Support for compound datatypes of 2D+ arrays
Support for "filtering fill value" in Image Viewer
Effective handling of large 3D images
Support large fonts in GUI components
New autogain algorithm for image Brightness/Contrast

• New platforms
− Mac intel
− Linux 64-bit AMD
− Solaris 64-bit

02/18/14
02/18/14

The HDF GroupGroup
The HDF

38
38
Other Java products
• 36 new enhancements and 44 bugs fixed
• Test suite (using junit testing framework)
− Tests all public methods in the object package
− Added “make check” to run the test suite

• Enhanced documentation
− All public methods in the object package are fully
documented

02/18/14
02/18/14

The HDF GroupGroup
The HDF

39
39
Future work for Java
• Update HDF5 JNI APIs for HDF5 1.8 release
• Release HDFView with bug fixes/new features
with HDF5 1.8 release
• Port HDF5-SRB model to HDF5-iRODS model
• Writing capability for HDF5-iRODS model

02/18/14
02/18/14

The HDF GroupGroup
The HDF

40
40
Other Activities of Interest

02/18/14

The HDF Group

41
New THG Website

02/18/14

The HDF Group

42
New THG Website

02/18/14
02/18/14

The HDF GroupGroup
The HDF

43
43
HDF Performance
Framework

02/18/14

The HDF Group

44
Goals
• A framework for performance regression testing
• A tool for
−
−
−
−

Testing on multiple platforms
Testing different versions
Long term regression testing
Assistance in debugging

02/18/14

The HDF Group

45
Solution

HDF5 1.6

HDF5 1.8
cron

A User’s
Benchmark

Database

Performance
Library
www

PHP
Web Server

Graph/Text

02/18/14

The HDF Group

46
Sample Usage
H5Perf_startTimer(&time);
for(i=0;i<1000 ;i++) {
H5Gcreate(fileid,group_name,(size_t)0));
// Add groups
}
H5Perf_endTimer(&time);
H5Perf_addInstance(db_host, date, time);
00 21 * * * /home/local/hyoklee/src/chicago/test-perf-hdfdap-3.sh
|

178820 | 2007-08-17 21:51:14 | 10000 groups

Timestamp

02/18/14

| creating 10000 empty groups

Instance Name

The HDF Group

| 1.8.0

| hdfdap |

Version Platform

47

0.670198 |

Time

4384 |
Improved Crash
Survivability
in the HDF5 Library

02/18/14

The HDF Group

48
Crash Survivability in HDF5
• Problem:
− Data in HDF5 files susceptible to corruption in the
event of an application or system crash.
− Corruption possible if structural metadata is being
written when the crash occurs.

• Initial Objective:
− Guarantee an HDF5 file with consistent metadata
can be reconstructed in the event of a crash.
− No guarantee on state of raw data – contains
whatever made it to disk prior to crash.
02/18/14
02/18/14

The HDF GroupGroup
The HDF

49
49
Crash Survivability in HDF5
• Approach: Metadata Journaling
− When a piece of metadata is modified and in a
consistent state, make a journal note.
− If the application crashes, a recovery program can
replay the journal by applying in order all metadata
writes until the end of the last completed
transaction written to the journal file.

02/18/14
02/18/14

The HDF GroupGroup
The HDF

50
50
Faster HDF5 Data Appends

02/18/14

The HDF Group

51
Fast Data Appends
• Problem: Metadata operations limit the rate at
which HDF5 can append data to datasets.
• Solution: new data structure for indexing chunks:
− Allows constant time extend, shrink and lookup of
chunks in datasets with single unlimited dimension
− # of metadata I/O operations to append to dataset
is independent of # of chunks
− Allows single-writer/multiple-reader access

• Details at:
http://www.hdfgroup.uiuc.edu/RFC/HDF5/SkipList
ChunkIndex/SkipListChunkIndex.html
02/18/14
02/18/14

The HDF GroupGroup
The HDF

52
52
netCDF-4

02/18/14

The HDF Group

53
netCDF-4 Project
• Enhanced NetCDF-4 Interface to HDF5
− Combine features of netCDF and HDF5
− Take advantage of their separate strengths

• Collaboration between NCSA, THG, Unidata
• Currently in beta release
• Will be released after HDF5 1.8

02/18/14

The HDF Group

54
NetCDF-4 Architecture
netCDF-3
netCDF-3
applications
applications

netCDF
netCDF
files
files
netCDF-4
HDF5 files

netCDF-4
netCDF-4
applications
applications

HDF5
HDF5
applications
applications

netCDF-3
Interface

netCDF-4
Library

HDF5
files

HDF5 Library

• Supports access to netCDF files and HDF5
files created through netCDF-4 interface
02/18/14

The HDF Group

55
HDF5 OPeNDAP
Project
02/18/14
02/18/14

The HDF GroupGroup
The HDF

56
56
Project description
• Investigate integrated DAP-aware HDF5 library
that can provide seamless access to both
local and remote data
• A NASA ROSES NRA project
• See Kent Yang’s talk and poster

02/18/14
02/18/14

The HDF GroupGroup
The HDF

57
57
NOAA – Science Data
Stewardship

02/18/14

The HDF Group

58
NOAA – Science Data Stewardship
• Use HDF5 Archival Information Package (AIP) to
archive HDF EOS2 data
• A collaboration between NSIDC and THG
• See Ruth Duerr and Kent Yang’s poster

02/18/14
02/18/14

The HDF GroupGroup
The HDF

59
59
HDF5 and .NET
Framework

02/18/14
02/18/14

The HDF GroupGroup
The HDF

60
60
Why .NET?
• The Microsoft .NET framework is used by most
new applications created for Windows.
− Makes it easier to develop applications
− Reduces application vulnerability to security threats

• Supports development in multiple programming
languages, in particular C#.
• Increased level of interest in .NET from users of
HDF5.

02/18/14
02/18/14

The HDF GroupGroup
The HDF

61
61
HDF and .NET Status
• Received funding to implement prototype .NET
wrapper API for Windows XP
− Based on HDF5 C API
− Focus on C# binding
− Functionality limited to subset of API routines

• If funded, we would like to move beyond the
prototype to
− Create .NET wrappers for all HDF C functions
− Offer full support for .NET wrappers with HDF5 1.8
02/18/14
02/18/14

The HDF GroupGroup
The HDF

62
62
Bioinformatics
caacaagccaaaactcgtacaa
Cgagatatctcttggaaaaact
gctcacaatattgacgtacaag
gttgttcatgaaactttcggta
Acaatcgttgacattgcgacct
aatacagcccagcaagcagaat

Managing genomic data
02/18/14

The HDF Group

63
Electron tomography

25-80Å resolution
4k x 4k x 500 images now
8k x 8k x 1k images soon (256 GB)
02/18/14

The HDF Group

64
Sequencing

•

Next Gen Sequencing platforms produce ~1500 X more data than
CE (Sanger)

•

A single Next Gen instrument can produce 20 times more data a
single run than a day’s operation of a genome center with 100 CE
instruments

02/18/14

The HDF Group

65
An email on Sept 21…

“… A little background, we're doing genetic
association studies, these result in large 2-d matrices
(40K x 1M before applying threshholds). Each of
the cells in this matrix has ~10 numerical
statistics (e.g. some sort of pvalue)… ”
40K x 1M x 10 x 4 = 1,600,000,000,000 (1.6 TB)

02/18/14

The HDF Group

66
Product Data
STE
P

02/18/14

The HDF Group

67
Product data
• HDF5 proposed to ISO as binary representation
for product data representation and exchange
• Would be a binary option to the STEP format
• ISO/NWI-CD 10303-026, STEP Part 26

02/18/14

The HDF Group

68
SQL Server and HDF5

02/18/14

The HDF Group

69
SQL Server and HDF5
• THG discussing possible project with Microsoft
• Microsoft envisions a dream environment for
scientists that would encompass both computing
and data management
• Possible SQL Server solution
− Combine RDBMS and scientific analysis tools in a
single integrated system
− Use HDF5 to manage scientific objects not handled
well by traditional database

02/18/14
02/18/14

The HDF GroupGroup
The HDF

70
70
HDF5 in SQL server
Visualization

Libraries

(MATLAB,…)

Web Services

(XML, REST, RSS)

OLAP and
Data Mining

Reporting

.NET Languages with Language Integrated Query
Entity Framework (EDM, eSQL, O-R mapping)

HDF5 EDM model

SQL Server
HDF5

HDF5
TVFs

Index

HDF5
type

02/18/14

HDF5
files

HDF5 FS
blob

The HDF Group

71
Thank You All
and
Thank You NASA!

02/18/14

The HDF Group

72
Acknowledgement
This report is based upon work supported in part by a
Cooperative Agreement with NASA under NASA
NNG05GC60A. Any opinions, findings, and conclusions
or recommendations expressed in this material are
those of the author(s) and do not necessarily reflect the
views of the National Aeronautics and Space
Administration.

02/18/14

The HDF Group

73
Questions/comments?

02/18/14

The HDF Group

74
Information Sources
• HDF website
http://hdfgroup.org/

• HDF5 Information Center
http://hdfgroup.org/HDF5/

• HDF Helpdesk
hdfhelp@hdfgroup.org

• HDF users mailing list
hdfnews@ncsa.uiuc.edu
coming soon: news@hdfgroup.org

02/18/14

The HDF Group

75

HDF Update

  • 1.
    HDF Update Mike Folk TheHDF Group HDF and HDF-EOS Workshop XI November 7, 2007 02/18/14 The HDF Group 1
  • 2.
    Outline • What isThe HDF Group? • HDF Software Update • Other Activities of Interest 02/18/14 The HDF Group 2
  • 3.
    What is The HDFGroup (THG)? 02/18/14 The HDF Group 3
  • 4.
    THG, the Company • • • • Spun-offfrom University of Illinois July 2006 Non-profit 20+ scientific, technology, professional staff Intellectual property: − THG owns HDF4 and HDF5 − HDF formats and libraries to remain open − Libraries have BSD-type license • Continue ties to U of I and NCSA 02/18/14 The HDF Group 4
  • 5.
    The mission ofThe HDF Group is to ensure long-term accessibility of HDF data through sustainable development and support of HDF technologies. 02/18/14 The HDF Group 5
  • 6.
    Goals • Maintain, evolveHDF for sponsors and communities that depend on it • Do consulting, training, tuning, development, research • Sustain The HDF Group for long term to assure data access over time 02/18/14 The HDF Group 6
  • 7.
    THG Services • • • • • • Helpdesk andMailing Lists − Available to all users as a first level of support Standard Support − Rapid issue resolution support Consulting − Needs assessment, troubleshooting, design reviews, etc. Enterprise Support − Coordinating HDF activities across divisions Special Projects − Adapting customer applications to HDF − New features and tools, with changes normally incorporated into open source product − Research and Development Training − Tutorials and hands-on practical experience 02/18/14 The HDF Group 7
  • 8.
  • 9.
  • 10.
    HDF 4.2r2 Released inOctober 02/18/14 The HDF Group 10
  • 11.
    New features andchanges • New APIs added to the SD and GR interfaces: − SDreset_maxopenfiles, SDget_maxopenfiles, Modifies, reports maximum allowable number of files − SDget_numopenfiles:Gets number of open files − SDgetcompinfo, GRgetcompinfo: Gets compression info − SDgetfilename: Retrieves name of file, given its ID − SDgetnamelen: Retrieves length of object name, given its ID • SZIP compression − Now can be invoked by Fortran API − Now available for raster images via GR interface • SDS, Vgroup names no longer limited to 64 characters 02/18/14 The HDF Group 11
  • 12.
    New features andchanges • HDF configuration changes − --enable-netcdf flag introduced − Autotools versions updated • Many bug fixes made to hrepack and hdiff • See RELEASE.txt for a full list of changes 02/18/14 The HDF Group 12
  • 13.
    Platforms to drop/addnext release • Drop − Windows XP with MSVC+ + 6.0 − Linux 2.4 − IRIX64 6.5 − SunOS 5.8, 5.9 02/18/14 The HDF Group • Add − Windows 64-bit (32 and 64-bit binaries) 13
  • 14.
    Platforms tested • • Compilers Systems − − − − − − − − AIX5.3 (32-bit, 64-bit) Free BSD 6.2 (32-bit, 64-bit)* HP-UX B.11.23 (32-bit, 64-bit)* IRIX 64 v6.5 (32-bit, 64-bit) Linux 2.4, 2.6* Linux ia64 Linux x86_64 Sun OS 5.8, 5.10* (32-bit, 64bit) − SunOS 5.10 on Intel − Windows XP, Vista − Mac OS X Intel* − − − − − − − − − IBM C and Fortran compilers GNU gcc 3.4* and GNU Fortran HPUX C and Fortran compilers GNU gcc 3.4 and 4.* Intel C and Fortran versions 9.1 and 10.00 SUN WorkShop C and Fortran Visual Studio .NET and 2005 and Intel Fortran Visual Studio 2005 (no fortran) GNU gcc 4.0.1 with gfortran and g95 * New platforms For detailed info, see RELEASE.txt 02/18/14 The HDF Group 14
  • 15.
  • 16.
  • 17.
    HDF5 1.6.6 release •Primarily a bug-fix release • Some tool changes (see later slide) • http://hdfgroup.org/HDF5/release/obtain5.html 02/18/14 The HDF Group 17
  • 18.
    Platforms dropped • Operatingsystems − − − − • Compilers − PGI 6.5-* AIX 5.3 Solaris 2.8 and 2.9 OSF1 Windows XP with MSVC++ 6.0 http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html 02/18/14 The HDF Group 18
  • 19.
    Platforms added • Systems − AlphaOpen VMS − MAC OSX 10.4 (Intel) − Solaris 2.* on Intel − Cray XT3 − Windows 64-bit (32 and 64bit) − BG/L 02/18/14 The HDF Group • Compilers − − − − PGI V. 7.* Intel 10.* MPICH 1.2.7 MPICH2 19
  • 20.
  • 21.
    HDF5 1.8 newlibrary features • Datatype and dataspace features − − − − − − Create datatype from text description Integer to float conversions during I/O Compact storage for N-bit datatypes Offset+size storage filter, saving space “Null” dataspace – datasets with no elements Data transformation filter 02/18/14 The HDF Group 21
  • 22.
    HDF5 1.8 –new library features • Group improvements − − − − Creation order access Compact groups – small groups take less space Large group storage improvements Intermediate group creation • Link improvements − Unicode names allowed − External links – to objects in another file − User defined links – create own kinds of links 02/18/14 The HDF Group 22
  • 23.
    HDF5 1.8 –new library features • Attribute improvements − Improved storage for large number of attributes − Iterate or look up by creation order − Unicode names allowed • Support for Unicode UTF-8 character set • Shared header information, possibly saving space • Metadata cache improvements – faster I/O on files with many objects • Better UNIX/Linux portability 02/18/14 The HDF Group 23
  • 24.
    HDF5 1.8 –new APIs • • • • New extendible error-handling API New APIs to copy objects between files quickly Dimension scale model and API “HDFpacket” API, to read/write packets efficiently 02/18/14 The HDF Group 24
  • 25.
    HDF5 1.8 –Backward and Forward Compatibility 02/18/14 The HDF Group 25
  • 26.
    HDF5 1.8 and1.6 • Differences between 1.8 and 1.6.x − Some file format changes − Several new routines added − Old APIs deprecated – may be removed in later release • Consequences − Applications requiring 1.8 format changes will generate objects that cannot be read by 1.6 library − To exploit 1.8 changes, applications need to be rewritten 02/18/14 The HDF Group 26
  • 27.
    “The art ofprogress is to preserve order amid change, and to preserve change amid order.” Alfred North Whitehead 02/18/14 The HDF Group 27
  • 28.
    Principle of Maximum FileFormat Compatibility Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing the information. information Assures older library versions are forward compatible whenever possible: − Objects in new files can be read with old versions of the library, if the objects are “known” to the old libraries. − New versions of the library can always read objects in files written with older versions. 02/18/14 02/18/14 The HDF GroupGroup The HDF 28 28
  • 29.
    Command Line Tools 02/18/14 02/18/14 TheHDF GroupGroup The HDF 32 32
  • 30.
    New features forexisting tools • -V option for all tools − Prints HDF5 library version number used by tool • h5repack: -L option − Use latest version of file format to create objects • h5dump: dumps groups/attributes in creation or name order − -q Q, --sort_by=Q Sort groups and attributes by index Q − -z Z, --sort_order=Z Sort groups and attributes by order Z 02/18/14 02/18/14 The HDF GroupGroup The HDF 33 33
  • 31.
    New command linetools • h5mkgrp − Creates new groups and group hierarchies in an HDF5 file • h5stat − Provides statistics regarding the file, such as number of objects per group, sizes of datasets, amount of free space in file • h5copy − Copy object within a file or cross files • h5check − Verifies an HDF5 file against the defined HDF5 File Format Specification − Completed for 1.6. − In progress for 1.8 02/18/14 02/18/14 The HDF GroupGroup The HDF 34 34
  • 32.
    Tool work inthe pipeline • Export numeric data formatted in several different ways (such as MS excel, XML, etc) • Import ASCII data that conforms to certain format • Use a common text format for h5import and h5dump • Support NaN in tools such as h5diff. Challenges: − NaN is platform specific − NaN can have different values for the same machine − Checking NaN can be a performance hit 02/18/14 02/18/14 The HDF GroupGroup The HDF 35 35
  • 33.
    HDF Java Products 02/18/14 02/18/14 TheHDF GroupGroup The HDF 36 36
  • 34.
    HDF5 Java isGrowing UP 02/18/14 The HDF Group 37
  • 35.
    HDFView changes • HDFView2.4 released • Many new features, such as − − − − − Support for compound datatypes of 2D+ arrays Support for "filtering fill value" in Image Viewer Effective handling of large 3D images Support large fonts in GUI components New autogain algorithm for image Brightness/Contrast • New platforms − Mac intel − Linux 64-bit AMD − Solaris 64-bit 02/18/14 02/18/14 The HDF GroupGroup The HDF 38 38
  • 36.
    Other Java products •36 new enhancements and 44 bugs fixed • Test suite (using junit testing framework) − Tests all public methods in the object package − Added “make check” to run the test suite • Enhanced documentation − All public methods in the object package are fully documented 02/18/14 02/18/14 The HDF GroupGroup The HDF 39 39
  • 37.
    Future work forJava • Update HDF5 JNI APIs for HDF5 1.8 release • Release HDFView with bug fixes/new features with HDF5 1.8 release • Port HDF5-SRB model to HDF5-iRODS model • Writing capability for HDF5-iRODS model 02/18/14 02/18/14 The HDF GroupGroup The HDF 40 40
  • 38.
    Other Activities ofInterest 02/18/14 The HDF Group 41
  • 39.
  • 40.
    New THG Website 02/18/14 02/18/14 TheHDF GroupGroup The HDF 43 43
  • 41.
  • 42.
    Goals • A frameworkfor performance regression testing • A tool for − − − − Testing on multiple platforms Testing different versions Long term regression testing Assistance in debugging 02/18/14 The HDF Group 45
  • 43.
    Solution HDF5 1.6 HDF5 1.8 cron AUser’s Benchmark Database Performance Library www PHP Web Server Graph/Text 02/18/14 The HDF Group 46
  • 44.
    Sample Usage H5Perf_startTimer(&time); for(i=0;i<1000 ;i++){ H5Gcreate(fileid,group_name,(size_t)0)); // Add groups } H5Perf_endTimer(&time); H5Perf_addInstance(db_host, date, time); 00 21 * * * /home/local/hyoklee/src/chicago/test-perf-hdfdap-3.sh | 178820 | 2007-08-17 21:51:14 | 10000 groups Timestamp 02/18/14 | creating 10000 empty groups Instance Name The HDF Group | 1.8.0 | hdfdap | Version Platform 47 0.670198 | Time 4384 |
  • 45.
    Improved Crash Survivability in theHDF5 Library 02/18/14 The HDF Group 48
  • 46.
    Crash Survivability inHDF5 • Problem: − Data in HDF5 files susceptible to corruption in the event of an application or system crash. − Corruption possible if structural metadata is being written when the crash occurs. • Initial Objective: − Guarantee an HDF5 file with consistent metadata can be reconstructed in the event of a crash. − No guarantee on state of raw data – contains whatever made it to disk prior to crash. 02/18/14 02/18/14 The HDF GroupGroup The HDF 49 49
  • 47.
    Crash Survivability inHDF5 • Approach: Metadata Journaling − When a piece of metadata is modified and in a consistent state, make a journal note. − If the application crashes, a recovery program can replay the journal by applying in order all metadata writes until the end of the last completed transaction written to the journal file. 02/18/14 02/18/14 The HDF GroupGroup The HDF 50 50
  • 48.
    Faster HDF5 DataAppends 02/18/14 The HDF Group 51
  • 49.
    Fast Data Appends •Problem: Metadata operations limit the rate at which HDF5 can append data to datasets. • Solution: new data structure for indexing chunks: − Allows constant time extend, shrink and lookup of chunks in datasets with single unlimited dimension − # of metadata I/O operations to append to dataset is independent of # of chunks − Allows single-writer/multiple-reader access • Details at: http://www.hdfgroup.uiuc.edu/RFC/HDF5/SkipList ChunkIndex/SkipListChunkIndex.html 02/18/14 02/18/14 The HDF GroupGroup The HDF 52 52
  • 50.
  • 51.
    netCDF-4 Project • EnhancedNetCDF-4 Interface to HDF5 − Combine features of netCDF and HDF5 − Take advantage of their separate strengths • Collaboration between NCSA, THG, Unidata • Currently in beta release • Will be released after HDF5 1.8 02/18/14 The HDF Group 54
  • 52.
  • 53.
  • 54.
    Project description • Investigateintegrated DAP-aware HDF5 library that can provide seamless access to both local and remote data • A NASA ROSES NRA project • See Kent Yang’s talk and poster 02/18/14 02/18/14 The HDF GroupGroup The HDF 57 57
  • 55.
    NOAA – ScienceData Stewardship 02/18/14 The HDF Group 58
  • 56.
    NOAA – ScienceData Stewardship • Use HDF5 Archival Information Package (AIP) to archive HDF EOS2 data • A collaboration between NSIDC and THG • See Ruth Duerr and Kent Yang’s poster 02/18/14 02/18/14 The HDF GroupGroup The HDF 59 59
  • 57.
  • 58.
    Why .NET? • TheMicrosoft .NET framework is used by most new applications created for Windows. − Makes it easier to develop applications − Reduces application vulnerability to security threats • Supports development in multiple programming languages, in particular C#. • Increased level of interest in .NET from users of HDF5. 02/18/14 02/18/14 The HDF GroupGroup The HDF 61 61
  • 59.
    HDF and .NETStatus • Received funding to implement prototype .NET wrapper API for Windows XP − Based on HDF5 C API − Focus on C# binding − Functionality limited to subset of API routines • If funded, we would like to move beyond the prototype to − Create .NET wrappers for all HDF C functions − Offer full support for .NET wrappers with HDF5 1.8 02/18/14 02/18/14 The HDF GroupGroup The HDF 62 62
  • 60.
  • 61.
    Electron tomography 25-80Å resolution 4kx 4k x 500 images now 8k x 8k x 1k images soon (256 GB) 02/18/14 The HDF Group 64
  • 62.
    Sequencing • Next Gen Sequencingplatforms produce ~1500 X more data than CE (Sanger) • A single Next Gen instrument can produce 20 times more data a single run than a day’s operation of a genome center with 100 CE instruments 02/18/14 The HDF Group 65
  • 63.
    An email onSept 21… “… A little background, we're doing genetic association studies, these result in large 2-d matrices (40K x 1M before applying threshholds). Each of the cells in this matrix has ~10 numerical statistics (e.g. some sort of pvalue)… ” 40K x 1M x 10 x 4 = 1,600,000,000,000 (1.6 TB) 02/18/14 The HDF Group 66
  • 64.
  • 65.
    Product data • HDF5proposed to ISO as binary representation for product data representation and exchange • Would be a binary option to the STEP format • ISO/NWI-CD 10303-026, STEP Part 26 02/18/14 The HDF Group 68
  • 66.
    SQL Server andHDF5 02/18/14 The HDF Group 69
  • 67.
    SQL Server andHDF5 • THG discussing possible project with Microsoft • Microsoft envisions a dream environment for scientists that would encompass both computing and data management • Possible SQL Server solution − Combine RDBMS and scientific analysis tools in a single integrated system − Use HDF5 to manage scientific objects not handled well by traditional database 02/18/14 02/18/14 The HDF GroupGroup The HDF 70 70
  • 68.
    HDF5 in SQLserver Visualization Libraries (MATLAB,…) Web Services (XML, REST, RSS) OLAP and Data Mining Reporting .NET Languages with Language Integrated Query Entity Framework (EDM, eSQL, O-R mapping) HDF5 EDM model SQL Server HDF5 HDF5 TVFs Index HDF5 type 02/18/14 HDF5 files HDF5 FS blob The HDF Group 71
  • 69.
    Thank You All and ThankYou NASA! 02/18/14 The HDF Group 72
  • 70.
    Acknowledgement This report isbased upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration. 02/18/14 The HDF Group 73
  • 71.
  • 72.
    Information Sources • HDFwebsite http://hdfgroup.org/ • HDF5 Information Center http://hdfgroup.org/HDF5/ • HDF Helpdesk hdfhelp@hdfgroup.org • HDF users mailing list hdfnews@ncsa.uiuc.edu coming soon: news@hdfgroup.org 02/18/14 The HDF Group 75

Editor's Notes

  • #4 Why Increasing need for support, services, quick response Not a good model for a University R&amp;D project Who 11 software engineers and several students: develop, maintain HDF software, work on special projects, manage projects 3 tech support staff: helpdesk, doc, sysadmin. Management team President Director of Technical Services and Operations Director of Software Development Director of Business Operations Managers responsible for tools, applications Other THG staff include seven full-time software engineers who develop and maintain the HDF software, as well as working on special projects, and three technical support staff who provide helpdesk support, documentation, and system administration. The HDF group also generally employs students from the University Computer Science and Engineering departments.
  • #6 The R&amp;D mission Maintain and evolve HDF for high end science apps Maintain HDF4 and HDF5 and tools at supercomputing centers, TeraGrid Support academic science Cutting edge data management research Adapt to leading edge, experimental architectures Integrate with new middleware technologies, parallel file systems The “Support and Sustain” mission Maintain, evolve for communities, sponsors Provide proprietary consulting, tuning, development Sustain for long term, maintain data access over time
  • #7 &lt;number&gt;
  • #29 I get all mixed up with the terms backward &amp; forward compatibility. I did a little investigation on the definitions and use in talking with Frank about his compatibility matrix awhile back and still don’t have a good grasp of what is meant… my conclusion was there is no consistent use. It seems most, like MathWorks use “compatibility” without the forward/backward words. I made a change here… is this what you meant in the original?. And, I don’t know if its’ worth saying but – New Versions can always read object in files written with older versions (unless there’s a bug in the writer!) Then we’ll offer the best solution we can.
  • #50 Maybe Objective bullets do belong on later slide… not sure.
  • #53 Is it only limited for unlimited / chunked datasets? Or is it that way for all but we’re just fixing it for limited / unchunked cases? Contrasts with B-tree index: - B-tree has O(log n) extend, shrink and lookup of chunks - B-tree has ~logarithmic # of metadata I/O operations as chunks appended Will be optimizing chunked dataset indexing for datasets with no unlimited dimensions (with array index) and multiple unlimited dimensions (with v2 B-tree) as part of project in the next year also.
  • #56 &lt;number&gt;
  • #63 I’ve changed this considerably. I don’t think its necessary to say who has funded work to date, exactly what that entails, or that the prototype is available. The important message (to me) is we have experience &amp; interest in this area. And, willing to do more if it’s funded. If not, then that’s the end of the story.
  • #71 First bullet – let them know it may or may not happen… not a done deal Not sure I got the “translation” from first version of text to this one right… Dropped “&amp; other formats” (let them give those presentatations)
  • #72 &lt;number&gt;