Your SlideShare is downloading. ×

HDF Update

68

Published on

Update on HDF, including recent changes to the software, new releases, THG collaborations, and future plans. Session will include an overview of the HDF4.2r2, HDF5 1.6.6, and 1.8.0 releases, as well …

Update on HDF, including recent changes to the software, new releases, THG collaborations, and future plans. Session will include an overview of the HDF4.2r2, HDF5 1.6.6, and 1.8.0 releases, as well as updates on completed and on-going THG projects including crash-proofing HDF5, efficient append to HDF5 datasets, and indexing in HDF5.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
68
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Why
    Increasing need for support, services, quick response
    Not a good model for a University R&D project
    Who
    11 software engineers and several students: develop, maintain HDF software, work on special projects, manage projects
    3 tech support staff: helpdesk, doc, sysadmin.
    Management team
    President
    Director of Technical Services and Operations
    Director of Software Development
    Director of Business Operations
    Managers responsible for tools, applications
    Other THG staff include seven full-time software engineers who develop and maintain the HDF software, as well as working on special projects, and three technical support staff who provide helpdesk support, documentation, and system administration. The HDF group also generally employs students from the University Computer Science and Engineering departments.
  • The R&D mission
    Maintain and evolve HDF for high end science apps
    Maintain HDF4 and HDF5 and tools at supercomputing centers, TeraGrid
    Support academic science
    Cutting edge data management research
    Adapt to leading edge, experimental architectures
    Integrate with new middleware technologies, parallel file systems
    The “Support and Sustain” mission
    Maintain, evolve for communities, sponsors
    Provide proprietary consulting, tuning, development
    Sustain for long term, maintain data access over time
  • <number>
  • I get all mixed up with the terms backward & forward compatibility. I did a little investigation on the definitions and use in talking with Frank about his compatibility matrix awhile back and still don’t have a good grasp of what is meant… my conclusion was there is no consistent use. It seems most, like MathWorks use “compatibility” without the forward/backward words. I made a change here… is this what you meant in the original?.
    And, I don’t know if its’ worth saying but – New Versions can always read object in files written with older versions (unless there’s a bug in the writer!) Then we’ll offer the best solution we can.
  • Maybe Objective bullets do belong on later slide… not sure.
  • Is it only limited for unlimited / chunked datasets? Or is it that way for all but we’re just fixing it for limited / unchunked cases?
    Contrasts with B-tree index:
    - B-tree has O(log n) extend, shrink and lookup of chunks
    - B-tree has ~logarithmic # of metadata I/O operations as chunks appended
    Will be optimizing chunked dataset indexing for datasets with no unlimited dimensions (with array index) and multiple unlimited dimensions (with v2 B-tree) as part of project in the next year also.
  • <number>
  • I’ve changed this considerably. I don’t think its necessary to say who has funded work to date, exactly what that entails, or that the prototype is available. The important message (to me) is we have experience & interest in this area. And, willing to do more if it’s funded. If not, then that’s the end of the story.
  • First bullet – let them know it may or may not happen… not a done deal
    Not sure I got the “translation” from first version of text to this one right…
    Dropped “& other formats” (let them give those presentatations)
  • <number>
  • Transcript

    • 1. HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop XI November 7, 2007 02/18/14 The HDF Group 1
    • 2. Outline • What is The HDF Group? • HDF Software Update • Other Activities of Interest 02/18/14 The HDF Group 2
    • 3. What is The HDF Group (THG)? 02/18/14 The HDF Group 3
    • 4. THG, the Company • • • • Spun-off from University of Illinois July 2006 Non-profit 20+ scientific, technology, professional staff Intellectual property: − THG owns HDF4 and HDF5 − HDF formats and libraries to remain open − Libraries have BSD-type license • Continue ties to U of I and NCSA 02/18/14 The HDF Group 4
    • 5. The mission of The HDF Group is to ensure long-term accessibility of HDF data through sustainable development and support of HDF technologies. 02/18/14 The HDF Group 5
    • 6. Goals • Maintain, evolve HDF for sponsors and communities that depend on it • Do consulting, training, tuning, development, research • Sustain The HDF Group for long term to assure data access over time 02/18/14 The HDF Group 6
    • 7. THG Services • • • • • • Helpdesk and Mailing Lists − Available to all users as a first level of support Standard Support − Rapid issue resolution support Consulting − Needs assessment, troubleshooting, design reviews, etc. Enterprise Support − Coordinating HDF activities across divisions Special Projects − Adapting customer applications to HDF − New features and tools, with changes normally incorporated into open source product − Research and Development Training − Tutorials and hands-on practical experience 02/18/14 The HDF Group 7
    • 8. HDF Software Update 02/18/14 The HDF Group 8
    • 9. HDF4 update 02/18/14 The HDF Group 9
    • 10. HDF 4.2r2 Released in October 02/18/14 The HDF Group 10
    • 11. New features and changes • New APIs added to the SD and GR interfaces: − SDreset_maxopenfiles, SDget_maxopenfiles, Modifies, reports maximum allowable number of files − SDget_numopenfiles:Gets number of open files − SDgetcompinfo, GRgetcompinfo: Gets compression info − SDgetfilename: Retrieves name of file, given its ID − SDgetnamelen: Retrieves length of object name, given its ID • SZIP compression − Now can be invoked by Fortran API − Now available for raster images via GR interface • SDS, Vgroup names no longer limited to 64 characters 02/18/14 The HDF Group 11
    • 12. New features and changes • HDF configuration changes − --enable-netcdf flag introduced − Autotools versions updated • Many bug fixes made to hrepack and hdiff • See RELEASE.txt for a full list of changes 02/18/14 The HDF Group 12
    • 13. Platforms to drop/add next release • Drop − Windows XP with MSVC+ + 6.0 − Linux 2.4 − IRIX64 6.5 − SunOS 5.8, 5.9 02/18/14 The HDF Group • Add − Windows 64-bit (32 and 64-bit binaries) 13
    • 14. Platforms tested • • Compilers Systems − − − − − − − − AIX 5.3 (32-bit, 64-bit) Free BSD 6.2 (32-bit, 64-bit)* HP-UX B.11.23 (32-bit, 64-bit)* IRIX 64 v6.5 (32-bit, 64-bit) Linux 2.4, 2.6* Linux ia64 Linux x86_64 Sun OS 5.8, 5.10* (32-bit, 64bit) − SunOS 5.10 on Intel − Windows XP, Vista − Mac OS X Intel* − − − − − − − − − IBM C and Fortran compilers GNU gcc 3.4* and GNU Fortran HPUX C and Fortran compilers GNU gcc 3.4 and 4.* Intel C and Fortran versions 9.1 and 10.00 SUN WorkShop C and Fortran Visual Studio .NET and 2005 and Intel Fortran Visual Studio 2005 (no fortran) GNU gcc 4.0.1 with gfortran and g95 * New platforms For detailed info, see RELEASE.txt 02/18/14 The HDF Group 14
    • 15. HDF5 Update 02/18/14 The HDF Group 15
    • 16. HDF5 1.6.6 02/18/14 The HDF Group 16
    • 17. HDF5 1.6.6 release • Primarily a bug-fix release • Some tool changes (see later slide) • http://hdfgroup.org/HDF5/release/obtain5.html 02/18/14 The HDF Group 17
    • 18. Platforms dropped • Operating systems − − − − • Compilers − PGI 6.5-* AIX 5.3 Solaris 2.8 and 2.9 OSF1 Windows XP with MSVC++ 6.0 http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html 02/18/14 The HDF Group 18
    • 19. Platforms added • Systems − Alpha Open VMS − MAC OSX 10.4 (Intel) − Solaris 2.* on Intel − Cray XT3 − Windows 64-bit (32 and 64bit) − BG/L 02/18/14 The HDF Group • Compilers − − − − PGI V. 7.* Intel 10.* MPICH 1.2.7 MPICH2 19
    • 20. HDF5 1.8 02/18/14 The HDF Group 20
    • 21. HDF5 1.8 new library features • Datatype and dataspace features − − − − − − Create datatype from text description Integer to float conversions during I/O Compact storage for N-bit datatypes Offset+size storage filter, saving space “Null” dataspace – datasets with no elements Data transformation filter 02/18/14 The HDF Group 21
    • 22. HDF5 1.8 – new library features • Group improvements − − − − Creation order access Compact groups – small groups take less space Large group storage improvements Intermediate group creation • Link improvements − Unicode names allowed − External links – to objects in another file − User defined links – create own kinds of links 02/18/14 The HDF Group 22
    • 23. HDF5 1.8 – new library features • Attribute improvements − Improved storage for large number of attributes − Iterate or look up by creation order − Unicode names allowed • Support for Unicode UTF-8 character set • Shared header information, possibly saving space • Metadata cache improvements – faster I/O on files with many objects • Better UNIX/Linux portability 02/18/14 The HDF Group 23
    • 24. HDF5 1.8 – new APIs • • • • New extendible error-handling API New APIs to copy objects between files quickly Dimension scale model and API “HDFpacket” API, to read/write packets efficiently 02/18/14 The HDF Group 24
    • 25. HDF5 1.8 – Backward and Forward Compatibility 02/18/14 The HDF Group 25
    • 26. HDF5 1.8 and 1.6 • Differences between 1.8 and 1.6.x − Some file format changes − Several new routines added − Old APIs deprecated – may be removed in later release • Consequences − Applications requiring 1.8 format changes will generate objects that cannot be read by 1.6 library − To exploit 1.8 changes, applications need to be rewritten 02/18/14 The HDF Group 26
    • 27. “The art of progress is to preserve order amid change, and to preserve change amid order.” Alfred North Whitehead 02/18/14 The HDF Group 27
    • 28. Principle of Maximum File Format Compatibility Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing the information. information Assures older library versions are forward compatible whenever possible: − Objects in new files can be read with old versions of the library, if the objects are “known” to the old libraries. − New versions of the library can always read objects in files written with older versions. 02/18/14 02/18/14 The HDF GroupGroup The HDF 28 28
    • 29. Command Line Tools 02/18/14 02/18/14 The HDF GroupGroup The HDF 32 32
    • 30. New features for existing tools • -V option for all tools − Prints HDF5 library version number used by tool • h5repack: -L option − Use latest version of file format to create objects • h5dump: dumps groups/attributes in creation or name order − -q Q, --sort_by=Q Sort groups and attributes by index Q − -z Z, --sort_order=Z Sort groups and attributes by order Z 02/18/14 02/18/14 The HDF GroupGroup The HDF 33 33
    • 31. New command line tools • h5mkgrp − Creates new groups and group hierarchies in an HDF5 file • h5stat − Provides statistics regarding the file, such as number of objects per group, sizes of datasets, amount of free space in file • h5copy − Copy object within a file or cross files • h5check − Verifies an HDF5 file against the defined HDF5 File Format Specification − Completed for 1.6. − In progress for 1.8 02/18/14 02/18/14 The HDF GroupGroup The HDF 34 34
    • 32. Tool work in the pipeline • Export numeric data formatted in several different ways (such as MS excel, XML, etc) • Import ASCII data that conforms to certain format • Use a common text format for h5import and h5dump • Support NaN in tools such as h5diff. Challenges: − NaN is platform specific − NaN can have different values for the same machine − Checking NaN can be a performance hit 02/18/14 02/18/14 The HDF GroupGroup The HDF 35 35
    • 33. HDF Java Products 02/18/14 02/18/14 The HDF GroupGroup The HDF 36 36
    • 34. HDF5 Java is Growing UP 02/18/14 The HDF Group 37
    • 35. HDFView changes • HDFView 2.4 released • Many new features, such as − − − − − Support for compound datatypes of 2D+ arrays Support for "filtering fill value" in Image Viewer Effective handling of large 3D images Support large fonts in GUI components New autogain algorithm for image Brightness/Contrast • New platforms − Mac intel − Linux 64-bit AMD − Solaris 64-bit 02/18/14 02/18/14 The HDF GroupGroup The HDF 38 38
    • 36. Other Java products • 36 new enhancements and 44 bugs fixed • Test suite (using junit testing framework) − Tests all public methods in the object package − Added “make check” to run the test suite • Enhanced documentation − All public methods in the object package are fully documented 02/18/14 02/18/14 The HDF GroupGroup The HDF 39 39
    • 37. Future work for Java • Update HDF5 JNI APIs for HDF5 1.8 release • Release HDFView with bug fixes/new features with HDF5 1.8 release • Port HDF5-SRB model to HDF5-iRODS model • Writing capability for HDF5-iRODS model 02/18/14 02/18/14 The HDF GroupGroup The HDF 40 40
    • 38. Other Activities of Interest 02/18/14 The HDF Group 41
    • 39. New THG Website 02/18/14 The HDF Group 42
    • 40. New THG Website 02/18/14 02/18/14 The HDF GroupGroup The HDF 43 43
    • 41. HDF Performance Framework 02/18/14 The HDF Group 44
    • 42. Goals • A framework for performance regression testing • A tool for − − − − Testing on multiple platforms Testing different versions Long term regression testing Assistance in debugging 02/18/14 The HDF Group 45
    • 43. Solution HDF5 1.6 HDF5 1.8 cron A User’s Benchmark Database Performance Library www PHP Web Server Graph/Text 02/18/14 The HDF Group 46
    • 44. Sample Usage H5Perf_startTimer(&time); for(i=0;i<1000 ;i++) { H5Gcreate(fileid,group_name,(size_t)0)); // Add groups } H5Perf_endTimer(&time); H5Perf_addInstance(db_host, date, time); 00 21 * * * /home/local/hyoklee/src/chicago/test-perf-hdfdap-3.sh | 178820 | 2007-08-17 21:51:14 | 10000 groups Timestamp 02/18/14 | creating 10000 empty groups Instance Name The HDF Group | 1.8.0 | hdfdap | Version Platform 47 0.670198 | Time 4384 |
    • 45. Improved Crash Survivability in the HDF5 Library 02/18/14 The HDF Group 48
    • 46. Crash Survivability in HDF5 • Problem: − Data in HDF5 files susceptible to corruption in the event of an application or system crash. − Corruption possible if structural metadata is being written when the crash occurs. • Initial Objective: − Guarantee an HDF5 file with consistent metadata can be reconstructed in the event of a crash. − No guarantee on state of raw data – contains whatever made it to disk prior to crash. 02/18/14 02/18/14 The HDF GroupGroup The HDF 49 49
    • 47. Crash Survivability in HDF5 • Approach: Metadata Journaling − When a piece of metadata is modified and in a consistent state, make a journal note. − If the application crashes, a recovery program can replay the journal by applying in order all metadata writes until the end of the last completed transaction written to the journal file. 02/18/14 02/18/14 The HDF GroupGroup The HDF 50 50
    • 48. Faster HDF5 Data Appends 02/18/14 The HDF Group 51
    • 49. Fast Data Appends • Problem: Metadata operations limit the rate at which HDF5 can append data to datasets. • Solution: new data structure for indexing chunks: − Allows constant time extend, shrink and lookup of chunks in datasets with single unlimited dimension − # of metadata I/O operations to append to dataset is independent of # of chunks − Allows single-writer/multiple-reader access • Details at: http://www.hdfgroup.uiuc.edu/RFC/HDF5/SkipList ChunkIndex/SkipListChunkIndex.html 02/18/14 02/18/14 The HDF GroupGroup The HDF 52 52
    • 50. netCDF-4 02/18/14 The HDF Group 53
    • 51. netCDF-4 Project • Enhanced NetCDF-4 Interface to HDF5 − Combine features of netCDF and HDF5 − Take advantage of their separate strengths • Collaboration between NCSA, THG, Unidata • Currently in beta release • Will be released after HDF5 1.8 02/18/14 The HDF Group 54
    • 52. NetCDF-4 Architecture netCDF-3 netCDF-3 applications applications netCDF netCDF files files netCDF-4 HDF5 files netCDF-4 netCDF-4 applications applications HDF5 HDF5 applications applications netCDF-3 Interface netCDF-4 Library HDF5 files HDF5 Library • Supports access to netCDF files and HDF5 files created through netCDF-4 interface 02/18/14 The HDF Group 55
    • 53. HDF5 OPeNDAP Project 02/18/14 02/18/14 The HDF GroupGroup The HDF 56 56
    • 54. Project description • Investigate integrated DAP-aware HDF5 library that can provide seamless access to both local and remote data • A NASA ROSES NRA project • See Kent Yang’s talk and poster 02/18/14 02/18/14 The HDF GroupGroup The HDF 57 57
    • 55. NOAA – Science Data Stewardship 02/18/14 The HDF Group 58
    • 56. NOAA – Science Data Stewardship • Use HDF5 Archival Information Package (AIP) to archive HDF EOS2 data • A collaboration between NSIDC and THG • See Ruth Duerr and Kent Yang’s poster 02/18/14 02/18/14 The HDF GroupGroup The HDF 59 59
    • 57. HDF5 and .NET Framework 02/18/14 02/18/14 The HDF GroupGroup The HDF 60 60
    • 58. Why .NET? • The Microsoft .NET framework is used by most new applications created for Windows. − Makes it easier to develop applications − Reduces application vulnerability to security threats • Supports development in multiple programming languages, in particular C#. • Increased level of interest in .NET from users of HDF5. 02/18/14 02/18/14 The HDF GroupGroup The HDF 61 61
    • 59. HDF and .NET Status • Received funding to implement prototype .NET wrapper API for Windows XP − Based on HDF5 C API − Focus on C# binding − Functionality limited to subset of API routines • If funded, we would like to move beyond the prototype to − Create .NET wrappers for all HDF C functions − Offer full support for .NET wrappers with HDF5 1.8 02/18/14 02/18/14 The HDF GroupGroup The HDF 62 62
    • 60. Bioinformatics caacaagccaaaactcgtacaa Cgagatatctcttggaaaaact gctcacaatattgacgtacaag gttgttcatgaaactttcggta Acaatcgttgacattgcgacct aatacagcccagcaagcagaat Managing genomic data 02/18/14 The HDF Group 63
    • 61. Electron tomography 25-80Å resolution 4k x 4k x 500 images now 8k x 8k x 1k images soon (256 GB) 02/18/14 The HDF Group 64
    • 62. Sequencing • Next Gen Sequencing platforms produce ~1500 X more data than CE (Sanger) • A single Next Gen instrument can produce 20 times more data a single run than a day’s operation of a genome center with 100 CE instruments 02/18/14 The HDF Group 65
    • 63. An email on Sept 21… “… A little background, we're doing genetic association studies, these result in large 2-d matrices (40K x 1M before applying threshholds). Each of the cells in this matrix has ~10 numerical statistics (e.g. some sort of pvalue)… ” 40K x 1M x 10 x 4 = 1,600,000,000,000 (1.6 TB) 02/18/14 The HDF Group 66
    • 64. Product Data STE P 02/18/14 The HDF Group 67
    • 65. Product data • HDF5 proposed to ISO as binary representation for product data representation and exchange • Would be a binary option to the STEP format • ISO/NWI-CD 10303-026, STEP Part 26 02/18/14 The HDF Group 68
    • 66. SQL Server and HDF5 02/18/14 The HDF Group 69
    • 67. SQL Server and HDF5 • THG discussing possible project with Microsoft • Microsoft envisions a dream environment for scientists that would encompass both computing and data management • Possible SQL Server solution − Combine RDBMS and scientific analysis tools in a single integrated system − Use HDF5 to manage scientific objects not handled well by traditional database 02/18/14 02/18/14 The HDF GroupGroup The HDF 70 70
    • 68. HDF5 in SQL server Visualization Libraries (MATLAB,…) Web Services (XML, REST, RSS) OLAP and Data Mining Reporting .NET Languages with Language Integrated Query Entity Framework (EDM, eSQL, O-R mapping) HDF5 EDM model SQL Server HDF5 HDF5 TVFs Index HDF5 type 02/18/14 HDF5 files HDF5 FS blob The HDF Group 71
    • 69. Thank You All and Thank You NASA! 02/18/14 The HDF Group 72
    • 70. Acknowledgement This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration. 02/18/14 The HDF Group 73
    • 71. Questions/comments? 02/18/14 The HDF Group 74
    • 72. Information Sources • HDF website http://hdfgroup.org/ • HDF5 Information Center http://hdfgroup.org/HDF5/ • HDF Helpdesk hdfhelp@hdfgroup.org • HDF users mailing list hdfnews@ncsa.uiuc.edu coming soon: news@hdfgroup.org 02/18/14 The HDF Group 75

    ×