HDF Update
Mike Folk
The HDF Group
HDF and HDF-EOS Workshop XI
November 7, 2007

02/18/14

The HDF Group

1
Outline
• What is The HDF Group?
• HDF Software Update
• Other Activities of Interest

02/18/14

The HDF Group

2
What is
The HDF Group
(THG)?

02/18/14

The HDF Group

3
THG, the Company
•
•
•
•

Spun-off from University of Illinois July 2006
Non-profit
20+ scientific, technology, profession...
The mission of The HDF Group
is to ensure long-term
accessibility of HDF data through
sustainable development and
support ...
Goals
• Maintain, evolve HDF for sponsors and
communities that depend on it
• Do consulting, training, tuning, development...
THG Services
•
•
•
•
•

•

Helpdesk and Mailing Lists
− Available to all users as a first level of support
Standard Suppor...
HDF Software Update

02/18/14

The HDF Group

8
HDF4 update

02/18/14

The HDF Group

9
HDF 4.2r2
Released in October

02/18/14

The HDF Group

10
New features and changes
• New APIs added to the SD and GR interfaces:
− SDreset_maxopenfiles, SDget_maxopenfiles, Modifie...
New features and changes
• HDF configuration changes
− --enable-netcdf flag introduced
− Autotools versions updated

• Man...
Platforms to drop/add next release
• Drop
− Windows XP with MSVC+
+ 6.0
− Linux 2.4
− IRIX64 6.5
− SunOS 5.8, 5.9

02/18/1...
Platforms tested
•

• Compilers

Systems
−
−
−
−
−
−
−
−

AIX 5.3 (32-bit, 64-bit)
Free BSD 6.2 (32-bit, 64-bit)*
HP-UX B....
HDF5 Update

02/18/14

The HDF Group

15
HDF5 1.6.6

02/18/14

The HDF Group

16
HDF5 1.6.6 release
• Primarily a bug-fix release
• Some tool changes (see later slide)
• http://hdfgroup.org/HDF5/release/...
Platforms dropped
• Operating systems
−
−
−
−

• Compilers

− PGI 6.5-*
AIX 5.3
Solaris 2.8 and 2.9
OSF1
Windows XP with M...
Platforms added
•

Systems
− Alpha Open VMS
− MAC OSX 10.4 (Intel)
− Solaris 2.* on Intel
− Cray XT3
− Windows 64-bit (32 ...
HDF5 1.8

02/18/14

The HDF Group

20
HDF5 1.8 new library features
• Datatype and dataspace features
−
−
−
−
−
−

Create datatype from text description
Integer...
HDF5 1.8 – new library features
• Group improvements
−
−
−
−

Creation order access
Compact groups – small groups take les...
HDF5 1.8 – new library features
• Attribute improvements
− Improved storage for large number of attributes
− Iterate or lo...
HDF5 1.8 – new APIs
•
•
•
•

New extendible error-handling API
New APIs to copy objects between files quickly
Dimension sc...
HDF5 1.8 – Backward and
Forward Compatibility

02/18/14

The HDF Group

25
HDF5 1.8 and 1.6
• Differences between 1.8 and 1.6.x
− Some file format changes
− Several new routines added
− Old APIs de...
“The art of progress is to
preserve order amid change, and
to preserve change amid order.”
Alfred North Whitehead

02/18/1...
Principle of
Maximum File Format Compatibility
Unless instructed otherwise, the HDF5 library will write objects
using the ...
Command Line Tools

02/18/14
02/18/14

The HDF GroupGroup
The HDF

32
32
New features for existing tools
• -V option for all tools
− Prints HDF5 library version number used by tool

• h5repack: -...
New command line tools
• h5mkgrp
− Creates new groups and group hierarchies in an HDF5 file

• h5stat
− Provides statistic...
Tool work in the pipeline
• Export numeric data formatted in several different
ways (such as MS excel, XML, etc)
• Import ...
HDF Java Products

02/18/14
02/18/14

The HDF GroupGroup
The HDF

36
36
HDF5 Java is Growing UP

02/18/14

The HDF Group

37
HDFView changes
• HDFView 2.4 released
• Many new features, such as
−
−
−
−
−

Support for compound datatypes of 2D+ array...
Other Java products
• 36 new enhancements and 44 bugs fixed
• Test suite (using junit testing framework)
− Tests all publi...
Future work for Java
• Update HDF5 JNI APIs for HDF5 1.8 release
• Release HDFView with bug fixes/new features
with HDF5 1...
Other Activities of Interest

02/18/14

The HDF Group

41
New THG Website

02/18/14

The HDF Group

42
New THG Website

02/18/14
02/18/14

The HDF GroupGroup
The HDF

43
43
HDF Performance
Framework

02/18/14

The HDF Group

44
Goals
• A framework for performance regression testing
• A tool for
−
−
−
−

Testing on multiple platforms
Testing differe...
Solution

HDF5 1.6

HDF5 1.8
cron

A User’s
Benchmark

Database

Performance
Library
www

PHP
Web Server

Graph/Text

02/1...
Sample Usage
H5Perf_startTimer(&time);
for(i=0;i<1000 ;i++) {
H5Gcreate(fileid,group_name,(size_t)0));
// Add groups
}
H5P...
Improved Crash
Survivability
in the HDF5 Library

02/18/14

The HDF Group

48
Crash Survivability in HDF5
• Problem:
− Data in HDF5 files susceptible to corruption in the
event of an application or sy...
Crash Survivability in HDF5
• Approach: Metadata Journaling
− When a piece of metadata is modified and in a
consistent sta...
Faster HDF5 Data Appends

02/18/14

The HDF Group

51
Fast Data Appends
• Problem: Metadata operations limit the rate at
which HDF5 can append data to datasets.
• Solution: new...
netCDF-4

02/18/14

The HDF Group

53
netCDF-4 Project
• Enhanced NetCDF-4 Interface to HDF5
− Combine features of netCDF and HDF5
− Take advantage of their sep...
NetCDF-4 Architecture
netCDF-3
netCDF-3
applications
applications

netCDF
netCDF
files
files
netCDF-4
HDF5 files

netCDF-4...
HDF5 OPeNDAP
Project
02/18/14
02/18/14

The HDF GroupGroup
The HDF

56
56
Project description
• Investigate integrated DAP-aware HDF5 library
that can provide seamless access to both
local and rem...
NOAA – Science Data
Stewardship

02/18/14

The HDF Group

58
NOAA – Science Data Stewardship
• Use HDF5 Archival Information Package (AIP) to
archive HDF EOS2 data
• A collaboration b...
HDF5 and .NET
Framework

02/18/14
02/18/14

The HDF GroupGroup
The HDF

60
60
Why .NET?
• The Microsoft .NET framework is used by most
new applications created for Windows.
− Makes it easier to develo...
HDF and .NET Status
• Received funding to implement prototype .NET
wrapper API for Windows XP
− Based on HDF5 C API
− Focu...
Bioinformatics
caacaagccaaaactcgtacaa
Cgagatatctcttggaaaaact
gctcacaatattgacgtacaag
gttgttcatgaaactttcggta
Acaatcgttgacatt...
Electron tomography

25-80Å resolution
4k x 4k x 500 images now
8k x 8k x 1k images soon (256 GB)
02/18/14

The HDF Group
...
Sequencing

•

Next Gen Sequencing platforms produce ~1500 X more data than
CE (Sanger)

•

A single Next Gen instrument c...
An email on Sept 21…

“… A little background, we're doing genetic
association studies, these result in large 2-d matrices
...
Product Data
STE
P

02/18/14

The HDF Group

67
Product data
• HDF5 proposed to ISO as binary representation
for product data representation and exchange
• Would be a bin...
SQL Server and HDF5

02/18/14

The HDF Group

69
SQL Server and HDF5
• THG discussing possible project with Microsoft
• Microsoft envisions a dream environment for
scienti...
HDF5 in SQL server
Visualization

Libraries

(MATLAB,…)

Web Services

(XML, REST, RSS)

OLAP and
Data Mining

Reporting

...
Thank You All
and
Thank You NASA!

02/18/14

The HDF Group

72
Acknowledgement
This report is based upon work supported in part by a
Cooperative Agreement with NASA under NASA
NNG05GC60...
Questions/comments?

02/18/14

The HDF Group

74
Information Sources
• HDF website
http://hdfgroup.org/

• HDF5 Information Center
http://hdfgroup.org/HDF5/

• HDF Helpdes...
Upcoming SlideShare
Loading in …5
×

HDF Update

277 views

Published on

Update on HDF, including recent changes to the software, new releases, THG collaborations, and future plans. Session will include an overview of the HDF4.2r2, HDF5 1.6.6, and 1.8.0 releases, as well as updates on completed and on-going THG projects including crash-proofing HDF5, efficient append to HDF5 datasets, and indexing in HDF5.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
277
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Why
    Increasing need for support, services, quick response
    Not a good model for a University R&amp;D project
    Who
    11 software engineers and several students: develop, maintain HDF software, work on special projects, manage projects
    3 tech support staff: helpdesk, doc, sysadmin.
    Management team
    President
    Director of Technical Services and Operations
    Director of Software Development
    Director of Business Operations
    Managers responsible for tools, applications
    Other THG staff include seven full-time software engineers who develop and maintain the HDF software, as well as working on special projects, and three technical support staff who provide helpdesk support, documentation, and system administration. The HDF group also generally employs students from the University Computer Science and Engineering departments.
  • The R&amp;D mission
    Maintain and evolve HDF for high end science apps
    Maintain HDF4 and HDF5 and tools at supercomputing centers, TeraGrid
    Support academic science
    Cutting edge data management research
    Adapt to leading edge, experimental architectures
    Integrate with new middleware technologies, parallel file systems
    The “Support and Sustain” mission
    Maintain, evolve for communities, sponsors
    Provide proprietary consulting, tuning, development
    Sustain for long term, maintain data access over time
  • &lt;number&gt;
  • I get all mixed up with the terms backward &amp; forward compatibility. I did a little investigation on the definitions and use in talking with Frank about his compatibility matrix awhile back and still don’t have a good grasp of what is meant… my conclusion was there is no consistent use. It seems most, like MathWorks use “compatibility” without the forward/backward words. I made a change here… is this what you meant in the original?.
    And, I don’t know if its’ worth saying but – New Versions can always read object in files written with older versions (unless there’s a bug in the writer!) Then we’ll offer the best solution we can.
  • Maybe Objective bullets do belong on later slide… not sure.
  • Is it only limited for unlimited / chunked datasets? Or is it that way for all but we’re just fixing it for limited / unchunked cases?
    Contrasts with B-tree index:
    - B-tree has O(log n) extend, shrink and lookup of chunks
    - B-tree has ~logarithmic # of metadata I/O operations as chunks appended
    Will be optimizing chunked dataset indexing for datasets with no unlimited dimensions (with array index) and multiple unlimited dimensions (with v2 B-tree) as part of project in the next year also.
  • &lt;number&gt;
  • I’ve changed this considerably. I don’t think its necessary to say who has funded work to date, exactly what that entails, or that the prototype is available. The important message (to me) is we have experience &amp; interest in this area. And, willing to do more if it’s funded. If not, then that’s the end of the story.
  • First bullet – let them know it may or may not happen… not a done deal
    Not sure I got the “translation” from first version of text to this one right…
    Dropped “&amp; other formats” (let them give those presentatations)
  • &lt;number&gt;
  • HDF Update

    1. 1. HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop XI November 7, 2007 02/18/14 The HDF Group 1
    2. 2. Outline • What is The HDF Group? • HDF Software Update • Other Activities of Interest 02/18/14 The HDF Group 2
    3. 3. What is The HDF Group (THG)? 02/18/14 The HDF Group 3
    4. 4. THG, the Company • • • • Spun-off from University of Illinois July 2006 Non-profit 20+ scientific, technology, professional staff Intellectual property: − THG owns HDF4 and HDF5 − HDF formats and libraries to remain open − Libraries have BSD-type license • Continue ties to U of I and NCSA 02/18/14 The HDF Group 4
    5. 5. The mission of The HDF Group is to ensure long-term accessibility of HDF data through sustainable development and support of HDF technologies. 02/18/14 The HDF Group 5
    6. 6. Goals • Maintain, evolve HDF for sponsors and communities that depend on it • Do consulting, training, tuning, development, research • Sustain The HDF Group for long term to assure data access over time 02/18/14 The HDF Group 6
    7. 7. THG Services • • • • • • Helpdesk and Mailing Lists − Available to all users as a first level of support Standard Support − Rapid issue resolution support Consulting − Needs assessment, troubleshooting, design reviews, etc. Enterprise Support − Coordinating HDF activities across divisions Special Projects − Adapting customer applications to HDF − New features and tools, with changes normally incorporated into open source product − Research and Development Training − Tutorials and hands-on practical experience 02/18/14 The HDF Group 7
    8. 8. HDF Software Update 02/18/14 The HDF Group 8
    9. 9. HDF4 update 02/18/14 The HDF Group 9
    10. 10. HDF 4.2r2 Released in October 02/18/14 The HDF Group 10
    11. 11. New features and changes • New APIs added to the SD and GR interfaces: − SDreset_maxopenfiles, SDget_maxopenfiles, Modifies, reports maximum allowable number of files − SDget_numopenfiles:Gets number of open files − SDgetcompinfo, GRgetcompinfo: Gets compression info − SDgetfilename: Retrieves name of file, given its ID − SDgetnamelen: Retrieves length of object name, given its ID • SZIP compression − Now can be invoked by Fortran API − Now available for raster images via GR interface • SDS, Vgroup names no longer limited to 64 characters 02/18/14 The HDF Group 11
    12. 12. New features and changes • HDF configuration changes − --enable-netcdf flag introduced − Autotools versions updated • Many bug fixes made to hrepack and hdiff • See RELEASE.txt for a full list of changes 02/18/14 The HDF Group 12
    13. 13. Platforms to drop/add next release • Drop − Windows XP with MSVC+ + 6.0 − Linux 2.4 − IRIX64 6.5 − SunOS 5.8, 5.9 02/18/14 The HDF Group • Add − Windows 64-bit (32 and 64-bit binaries) 13
    14. 14. Platforms tested • • Compilers Systems − − − − − − − − AIX 5.3 (32-bit, 64-bit) Free BSD 6.2 (32-bit, 64-bit)* HP-UX B.11.23 (32-bit, 64-bit)* IRIX 64 v6.5 (32-bit, 64-bit) Linux 2.4, 2.6* Linux ia64 Linux x86_64 Sun OS 5.8, 5.10* (32-bit, 64bit) − SunOS 5.10 on Intel − Windows XP, Vista − Mac OS X Intel* − − − − − − − − − IBM C and Fortran compilers GNU gcc 3.4* and GNU Fortran HPUX C and Fortran compilers GNU gcc 3.4 and 4.* Intel C and Fortran versions 9.1 and 10.00 SUN WorkShop C and Fortran Visual Studio .NET and 2005 and Intel Fortran Visual Studio 2005 (no fortran) GNU gcc 4.0.1 with gfortran and g95 * New platforms For detailed info, see RELEASE.txt 02/18/14 The HDF Group 14
    15. 15. HDF5 Update 02/18/14 The HDF Group 15
    16. 16. HDF5 1.6.6 02/18/14 The HDF Group 16
    17. 17. HDF5 1.6.6 release • Primarily a bug-fix release • Some tool changes (see later slide) • http://hdfgroup.org/HDF5/release/obtain5.html 02/18/14 The HDF Group 17
    18. 18. Platforms dropped • Operating systems − − − − • Compilers − PGI 6.5-* AIX 5.3 Solaris 2.8 and 2.9 OSF1 Windows XP with MSVC++ 6.0 http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html 02/18/14 The HDF Group 18
    19. 19. Platforms added • Systems − Alpha Open VMS − MAC OSX 10.4 (Intel) − Solaris 2.* on Intel − Cray XT3 − Windows 64-bit (32 and 64bit) − BG/L 02/18/14 The HDF Group • Compilers − − − − PGI V. 7.* Intel 10.* MPICH 1.2.7 MPICH2 19
    20. 20. HDF5 1.8 02/18/14 The HDF Group 20
    21. 21. HDF5 1.8 new library features • Datatype and dataspace features − − − − − − Create datatype from text description Integer to float conversions during I/O Compact storage for N-bit datatypes Offset+size storage filter, saving space “Null” dataspace – datasets with no elements Data transformation filter 02/18/14 The HDF Group 21
    22. 22. HDF5 1.8 – new library features • Group improvements − − − − Creation order access Compact groups – small groups take less space Large group storage improvements Intermediate group creation • Link improvements − Unicode names allowed − External links – to objects in another file − User defined links – create own kinds of links 02/18/14 The HDF Group 22
    23. 23. HDF5 1.8 – new library features • Attribute improvements − Improved storage for large number of attributes − Iterate or look up by creation order − Unicode names allowed • Support for Unicode UTF-8 character set • Shared header information, possibly saving space • Metadata cache improvements – faster I/O on files with many objects • Better UNIX/Linux portability 02/18/14 The HDF Group 23
    24. 24. HDF5 1.8 – new APIs • • • • New extendible error-handling API New APIs to copy objects between files quickly Dimension scale model and API “HDFpacket” API, to read/write packets efficiently 02/18/14 The HDF Group 24
    25. 25. HDF5 1.8 – Backward and Forward Compatibility 02/18/14 The HDF Group 25
    26. 26. HDF5 1.8 and 1.6 • Differences between 1.8 and 1.6.x − Some file format changes − Several new routines added − Old APIs deprecated – may be removed in later release • Consequences − Applications requiring 1.8 format changes will generate objects that cannot be read by 1.6 library − To exploit 1.8 changes, applications need to be rewritten 02/18/14 The HDF Group 26
    27. 27. “The art of progress is to preserve order amid change, and to preserve change amid order.” Alfred North Whitehead 02/18/14 The HDF Group 27
    28. 28. Principle of Maximum File Format Compatibility Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing the information. information Assures older library versions are forward compatible whenever possible: − Objects in new files can be read with old versions of the library, if the objects are “known” to the old libraries. − New versions of the library can always read objects in files written with older versions. 02/18/14 02/18/14 The HDF GroupGroup The HDF 28 28
    29. 29. Command Line Tools 02/18/14 02/18/14 The HDF GroupGroup The HDF 32 32
    30. 30. New features for existing tools • -V option for all tools − Prints HDF5 library version number used by tool • h5repack: -L option − Use latest version of file format to create objects • h5dump: dumps groups/attributes in creation or name order − -q Q, --sort_by=Q Sort groups and attributes by index Q − -z Z, --sort_order=Z Sort groups and attributes by order Z 02/18/14 02/18/14 The HDF GroupGroup The HDF 33 33
    31. 31. New command line tools • h5mkgrp − Creates new groups and group hierarchies in an HDF5 file • h5stat − Provides statistics regarding the file, such as number of objects per group, sizes of datasets, amount of free space in file • h5copy − Copy object within a file or cross files • h5check − Verifies an HDF5 file against the defined HDF5 File Format Specification − Completed for 1.6. − In progress for 1.8 02/18/14 02/18/14 The HDF GroupGroup The HDF 34 34
    32. 32. Tool work in the pipeline • Export numeric data formatted in several different ways (such as MS excel, XML, etc) • Import ASCII data that conforms to certain format • Use a common text format for h5import and h5dump • Support NaN in tools such as h5diff. Challenges: − NaN is platform specific − NaN can have different values for the same machine − Checking NaN can be a performance hit 02/18/14 02/18/14 The HDF GroupGroup The HDF 35 35
    33. 33. HDF Java Products 02/18/14 02/18/14 The HDF GroupGroup The HDF 36 36
    34. 34. HDF5 Java is Growing UP 02/18/14 The HDF Group 37
    35. 35. HDFView changes • HDFView 2.4 released • Many new features, such as − − − − − Support for compound datatypes of 2D+ arrays Support for "filtering fill value" in Image Viewer Effective handling of large 3D images Support large fonts in GUI components New autogain algorithm for image Brightness/Contrast • New platforms − Mac intel − Linux 64-bit AMD − Solaris 64-bit 02/18/14 02/18/14 The HDF GroupGroup The HDF 38 38
    36. 36. Other Java products • 36 new enhancements and 44 bugs fixed • Test suite (using junit testing framework) − Tests all public methods in the object package − Added “make check” to run the test suite • Enhanced documentation − All public methods in the object package are fully documented 02/18/14 02/18/14 The HDF GroupGroup The HDF 39 39
    37. 37. Future work for Java • Update HDF5 JNI APIs for HDF5 1.8 release • Release HDFView with bug fixes/new features with HDF5 1.8 release • Port HDF5-SRB model to HDF5-iRODS model • Writing capability for HDF5-iRODS model 02/18/14 02/18/14 The HDF GroupGroup The HDF 40 40
    38. 38. Other Activities of Interest 02/18/14 The HDF Group 41
    39. 39. New THG Website 02/18/14 The HDF Group 42
    40. 40. New THG Website 02/18/14 02/18/14 The HDF GroupGroup The HDF 43 43
    41. 41. HDF Performance Framework 02/18/14 The HDF Group 44
    42. 42. Goals • A framework for performance regression testing • A tool for − − − − Testing on multiple platforms Testing different versions Long term regression testing Assistance in debugging 02/18/14 The HDF Group 45
    43. 43. Solution HDF5 1.6 HDF5 1.8 cron A User’s Benchmark Database Performance Library www PHP Web Server Graph/Text 02/18/14 The HDF Group 46
    44. 44. Sample Usage H5Perf_startTimer(&time); for(i=0;i<1000 ;i++) { H5Gcreate(fileid,group_name,(size_t)0)); // Add groups } H5Perf_endTimer(&time); H5Perf_addInstance(db_host, date, time); 00 21 * * * /home/local/hyoklee/src/chicago/test-perf-hdfdap-3.sh | 178820 | 2007-08-17 21:51:14 | 10000 groups Timestamp 02/18/14 | creating 10000 empty groups Instance Name The HDF Group | 1.8.0 | hdfdap | Version Platform 47 0.670198 | Time 4384 |
    45. 45. Improved Crash Survivability in the HDF5 Library 02/18/14 The HDF Group 48
    46. 46. Crash Survivability in HDF5 • Problem: − Data in HDF5 files susceptible to corruption in the event of an application or system crash. − Corruption possible if structural metadata is being written when the crash occurs. • Initial Objective: − Guarantee an HDF5 file with consistent metadata can be reconstructed in the event of a crash. − No guarantee on state of raw data – contains whatever made it to disk prior to crash. 02/18/14 02/18/14 The HDF GroupGroup The HDF 49 49
    47. 47. Crash Survivability in HDF5 • Approach: Metadata Journaling − When a piece of metadata is modified and in a consistent state, make a journal note. − If the application crashes, a recovery program can replay the journal by applying in order all metadata writes until the end of the last completed transaction written to the journal file. 02/18/14 02/18/14 The HDF GroupGroup The HDF 50 50
    48. 48. Faster HDF5 Data Appends 02/18/14 The HDF Group 51
    49. 49. Fast Data Appends • Problem: Metadata operations limit the rate at which HDF5 can append data to datasets. • Solution: new data structure for indexing chunks: − Allows constant time extend, shrink and lookup of chunks in datasets with single unlimited dimension − # of metadata I/O operations to append to dataset is independent of # of chunks − Allows single-writer/multiple-reader access • Details at: http://www.hdfgroup.uiuc.edu/RFC/HDF5/SkipList ChunkIndex/SkipListChunkIndex.html 02/18/14 02/18/14 The HDF GroupGroup The HDF 52 52
    50. 50. netCDF-4 02/18/14 The HDF Group 53
    51. 51. netCDF-4 Project • Enhanced NetCDF-4 Interface to HDF5 − Combine features of netCDF and HDF5 − Take advantage of their separate strengths • Collaboration between NCSA, THG, Unidata • Currently in beta release • Will be released after HDF5 1.8 02/18/14 The HDF Group 54
    52. 52. NetCDF-4 Architecture netCDF-3 netCDF-3 applications applications netCDF netCDF files files netCDF-4 HDF5 files netCDF-4 netCDF-4 applications applications HDF5 HDF5 applications applications netCDF-3 Interface netCDF-4 Library HDF5 files HDF5 Library • Supports access to netCDF files and HDF5 files created through netCDF-4 interface 02/18/14 The HDF Group 55
    53. 53. HDF5 OPeNDAP Project 02/18/14 02/18/14 The HDF GroupGroup The HDF 56 56
    54. 54. Project description • Investigate integrated DAP-aware HDF5 library that can provide seamless access to both local and remote data • A NASA ROSES NRA project • See Kent Yang’s talk and poster 02/18/14 02/18/14 The HDF GroupGroup The HDF 57 57
    55. 55. NOAA – Science Data Stewardship 02/18/14 The HDF Group 58
    56. 56. NOAA – Science Data Stewardship • Use HDF5 Archival Information Package (AIP) to archive HDF EOS2 data • A collaboration between NSIDC and THG • See Ruth Duerr and Kent Yang’s poster 02/18/14 02/18/14 The HDF GroupGroup The HDF 59 59
    57. 57. HDF5 and .NET Framework 02/18/14 02/18/14 The HDF GroupGroup The HDF 60 60
    58. 58. Why .NET? • The Microsoft .NET framework is used by most new applications created for Windows. − Makes it easier to develop applications − Reduces application vulnerability to security threats • Supports development in multiple programming languages, in particular C#. • Increased level of interest in .NET from users of HDF5. 02/18/14 02/18/14 The HDF GroupGroup The HDF 61 61
    59. 59. HDF and .NET Status • Received funding to implement prototype .NET wrapper API for Windows XP − Based on HDF5 C API − Focus on C# binding − Functionality limited to subset of API routines • If funded, we would like to move beyond the prototype to − Create .NET wrappers for all HDF C functions − Offer full support for .NET wrappers with HDF5 1.8 02/18/14 02/18/14 The HDF GroupGroup The HDF 62 62
    60. 60. Bioinformatics caacaagccaaaactcgtacaa Cgagatatctcttggaaaaact gctcacaatattgacgtacaag gttgttcatgaaactttcggta Acaatcgttgacattgcgacct aatacagcccagcaagcagaat Managing genomic data 02/18/14 The HDF Group 63
    61. 61. Electron tomography 25-80Å resolution 4k x 4k x 500 images now 8k x 8k x 1k images soon (256 GB) 02/18/14 The HDF Group 64
    62. 62. Sequencing • Next Gen Sequencing platforms produce ~1500 X more data than CE (Sanger) • A single Next Gen instrument can produce 20 times more data a single run than a day’s operation of a genome center with 100 CE instruments 02/18/14 The HDF Group 65
    63. 63. An email on Sept 21… “… A little background, we're doing genetic association studies, these result in large 2-d matrices (40K x 1M before applying threshholds). Each of the cells in this matrix has ~10 numerical statistics (e.g. some sort of pvalue)… ” 40K x 1M x 10 x 4 = 1,600,000,000,000 (1.6 TB) 02/18/14 The HDF Group 66
    64. 64. Product Data STE P 02/18/14 The HDF Group 67
    65. 65. Product data • HDF5 proposed to ISO as binary representation for product data representation and exchange • Would be a binary option to the STEP format • ISO/NWI-CD 10303-026, STEP Part 26 02/18/14 The HDF Group 68
    66. 66. SQL Server and HDF5 02/18/14 The HDF Group 69
    67. 67. SQL Server and HDF5 • THG discussing possible project with Microsoft • Microsoft envisions a dream environment for scientists that would encompass both computing and data management • Possible SQL Server solution − Combine RDBMS and scientific analysis tools in a single integrated system − Use HDF5 to manage scientific objects not handled well by traditional database 02/18/14 02/18/14 The HDF GroupGroup The HDF 70 70
    68. 68. HDF5 in SQL server Visualization Libraries (MATLAB,…) Web Services (XML, REST, RSS) OLAP and Data Mining Reporting .NET Languages with Language Integrated Query Entity Framework (EDM, eSQL, O-R mapping) HDF5 EDM model SQL Server HDF5 HDF5 TVFs Index HDF5 type 02/18/14 HDF5 files HDF5 FS blob The HDF Group 71
    69. 69. Thank You All and Thank You NASA! 02/18/14 The HDF Group 72
    70. 70. Acknowledgement This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration. 02/18/14 The HDF Group 73
    71. 71. Questions/comments? 02/18/14 The HDF Group 74
    72. 72. Information Sources • HDF website http://hdfgroup.org/ • HDF5 Information Center http://hdfgroup.org/HDF5/ • HDF Helpdesk hdfhelp@hdfgroup.org • HDF users mailing list hdfnews@ncsa.uiuc.edu coming soon: news@hdfgroup.org 02/18/14 The HDF Group 75

    ×