Your SlideShare is downloading. ×
0
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
HDF
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HDF

178

Published on

Source: http://hdfeos.org/workshops/ws03/presentations/MikeIII.ppt

Source: http://hdfeos.org/workshops/ws03/presentations/MikeIII.ppt

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
178
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • ASCI’s DMF Group is currently supporting HDF work with the idea of possibly adopting HDF as a standard. They want to share data and software among the three labs (Livermore, Sandia, Los Alamos), and would prefer a “non-invented-here,” open standard with publicly available software..
    HDF Requirements. ASCI’s needs overlap with those of EOSDIS, but with some important differences:
    ASCI deals largely with simulations on massively parallel machines, and hence requires very high performance in doing I/O. Only a parallel version of the library will satisfy ASCI’s needs.
    ASCI data deals with meshes, whereas EOS deals largely with remotely sensed data. Many types of meshes can be much more complex than remotely sensed data is, and typically require indexed access.
    Because of these requirement, the current official version of HDF (HDF4) is not adequate for the ASCI project. Fortunately, with support from NASA, we have been developing a completely new version of HDF designed to address these kinds of requirements. This is HDF5. More about HDF5 later.
    A mesh data repository is being developed by the group to standardize the data models and terminology used by the three labs. This will allow them to share resources much better than is currently the case. There is also the hope that the mesh standard adopted by ASCI will be adopted by others, further expanding the leverage of the standard.
  • The HDF group has several Java-based projects. Java’s platform independence supports the need to be able to work with HDF on many platforms. Java’s graphical interface features support the creation of platform-independent HDF browsing and visualization software. And Java’s network awareness facilitate the development of software for remote access to HDF data.
    A Java HDF Interface (JHI). JHI provides an interface to essentially all the functions of the NCSA HDF 4.1r2 library. The JHI is analogous to the FORTRAN interface already provided as part of the HDF library release.
    Basis for tools that access HDF. Any Java application can use the interface classes to read and write HDF. This package ``wraps'' the standard HDF 4.1r2 library, which is called from Java through `native' methods.
    A Java HDF Viewer. This is a tool to provide basic viewing capabilities for HDF.
    HDF browser/visualizer. With this tool you can open an HDF file, look at images, arrays, tables and attributes, and do some simple visualization.
    Template for other Java viz apps. It isn’t meant as an all-encompassing visualization tool for HDF. That is left to others, including commercial vendors. Rather as a template for people to use to build more sophisticated tools.
    Java Scientific Data Server Prototype. We experimenting with remote access to HDF. This project examines different ways Java can be used to provide remote access to HDF
    Lessons learned about scientific data servers. We are learning a great deal about Java’s remote access capabilities: servlets, RMI, etc.
    Template for other Java server apps. Again, we hope this technology will help others who what to do similar things, or to build products out of our prototypes.
  • HDF5 is a new, experimental version of HDF that addresses limitations of the current version (HDF4) and addresses requirements of modern systems and applications. HDF5 is a complete new format and I/O library, not an incrementally new version of HDF4.. An HDF5 prototype was released in Feb, 1998. Although incomplete, this library shows the basic features of HDF5. A full release is scheduled for Summer 1998.
    Why HDF5? HDF5 is motivated by severe limitations in the HDF4 format and library. HDF5 retains most features of HDF4, but addresses these limitations, including:
    Large array and files support. A single HDF4 file cannot store more than 20,000 complex objects, and a cannot be larger than 2 GB. HDF5 will be able to store virtually any number of objects of virtually any size.
    Simple, comprehensive data model. HDF4 has more object types than necessary, and datatypes are too restricted. HDF5 uses a simpler, more comprehensive data model that includes only two basic structures: a multidimensional array of record structures, and a grouping structure. All HDF4 structures can be derived from these.
    New library, with emphasis on parallel I/O. The HDF4 library is old, overly complex, does not support parallel access well, and is not thread tolerant. HDF5 provides a better-engineered library and API, with improved support for parallel I/O, threads, and other requirements imposed by modern systems and applications.
    Collaborations. HDF5 was motivated by the needs of many different users, but two projects in particular are driving HDF5 development:
    ASCI: mesh data standard for ASCI physics. The DMF’s ASCI mesh standard initiative described in an earlier slide is providing most of the support for HDF5.
    Digital Library Initiative (DLI): integrate with commercial object store. A DLI project at the U. of Illinois is using HDF5 for data access in combination with a commercial object store. This project requires very efficient parallel I/O.
  • Here is an example of a basic HDF5 object.
    Notice that each element in the 3D array is a record with four values in it.
  • Like HDF4, HDF5 has a grouping structure.
    The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all.
  • Like HDF4, HDF5 has a grouping structure.
    The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all.
  • Like HDF4, HDF5 has a grouping structure.
    The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all.
  • Transcript

    • 1. HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999 Mike Folk, HDF Group http://hdf.ncsa.uiuc.edu/ National Center for Supercomputing Applications University of Illinois at Urbana-Champaign NCSA/Univ of Illinois at Urbana-Champaign HDF 1
    • 2. Topics I. Overview II. NCSA HDF Activities III. HDF5 IV. HDF4 vs. HDF5 NCSA/Univ of Illinois at Urbana-Champaign HDF 2
    • 3. I. HDF Overview NCSA/Univ of Illinois at Urbana-Champaign HDF 3
    • 4. HDF Mission To develop, promote, deploy, and support open and free technologies that facilitate scientific data storage, exchange, access, analysis and discovery. NCSA/Univ of Illinois at Urbana-Champaign HDF 4
    • 5. What is HDF? • Scientific data file format & supporting software • For images, arrays, tables, other structures • Features – Portability across architectures • I/O library • Files – Efficient I/O – Efficient storage HDF NCSA/Univ of Illinois at Urbana-Champaign 5
    • 6. Why use HDF? • • • • • • Manage data Share data Use software that understands HDF Improve I/O performance Improve storage efficiency Use an open standard NCSA/Univ of Illinois at Urbana-Champaign HDF 6
    • 7. An HDF File: A Collection of Scientific Data Objects HDF file containing four 3-D arrays NCSA/Univ of Illinois at Urbana-Champaign HDF 7
    • 8. Mixing HDF Objects in One File 3-D array group Raster image palette HDF file 3-D array Raster image Lat lon temp ---- ---- ----12 23 3.1 15 24 4.2 17 21 3.6 16 35 5.7 Table NCSA/Univ of Illinois at Urbana-Champaign HDF 8
    • 9. HDF Software Utilities and applications for manipulating, viewing, and analyzing data. General Applications Application Programming Interfaces Low-level Interface HDF file } HDF I/O library – High-level, object-specific APIs. – Low-level API for I/O to files, etc. File or other data source. NCSA/Univ of Illinois at Urbana-Champaign HDF 9
    • 10. HDF Applications Software • Free software – NCSA HDF library and utilities – Other software • Commercial/other software that “understands” – all of HDF (Noesys, IDL, HDF Explorer) – certain HDF objects (MATLAB, WebWinds) – certain HDF applications (SHARP, WIM) • http://hdf.ncsa.uiuc.edu/tools.html NCSA/Univ of Illinois at Urbana-Champaign HDF 10
    • 11. What platforms does HDF run on? • Sun: Solaris • SGI: Indy, Power Challenge, Origin, Cray C90, YMP, T3E • HP9000, HP-Convex Exemplar • IBM: RS6000, SP2 • DEC: Alpha/Digital UNIX, OpenVMS VAX: OpenVMS • Intel: Solarisx86, Linux, FreeBSD, Windows NT/98 • PowerPC: Mac-OS University NCSA/Univ of Illinois at Urbana-Champaign HDF 11
    • 12. A Sampling of HDF Users NCSA-affiliated Science teams Visualization, data exch, fast I/O, ... Mathworks, Fortner Software, Research Systems Inc., etc. Format supported by vendors of vis and data analysis software Boeing Space-time change detection in images Distributed Oceanographic Data System (DODS) Remote access to earth science data Army Research Lab Network distributed global memory Center for Analysis & Prediction of Storms Fast parallel I/O, portability, multi-resolution grids TRAPPIST (Euro consortium) Exchange, analysis & visualization of non-destructive testing data NCSA/Univ of Illinois at Urbana-Champaign HDF 12
    • 13. Major User #1: EOSDIS • ESDIS Project – open standard exchange format and I/O library for EOSDIS – EOS applications • HDF requirements – – – – – Earth science data types (HDF-EOS, etc,) User support for scientists, data producers, etc. Library and file structure improvements HDF tools, utilities, access software Software maintenance and QA NCSA/Univ of Illinois at Urbana-Champaign HDF 13
    • 14. Major User #2: ASCI • ASCI Data Models and Formats (DMF) Group – open standard exchange format and I/O library for ASCI – DOE tri-lab ASCI applications • HDF requirements – – – – large datasets (> a terabyte) ASCI data types, especially meshes good performance in massive parallel environments primarily HDF 5 NCSA/Univ of Illinois at Urbana-Champaign HDF 14
    • 15. II. NCSA HDF Activities NCSA/Univ of Illinois at Urbana-Champaign HDF 15
    • 16. Java applications • HDF APIs – Basis for tools that access HDF • HDF Viewers – HDF browser/visualizer • HDF4 Data Server Prototype – Lessons learned about remote access to NCSA/Univ of Illinois at Urbana-Champaign HDF 16
    • 17. Remote Data Access • The SDB: Web-based Server-side Data Browser • Java for remote access • WP-ESIP: DODS project • Computational Grids (Globus/GASS) NCSA/Univ of Illinois at Urbana-Champaign HDF 17
    • 18. HDF Standardization • To share files, users must organize them similarly. • HDF user groups create standard profiles – Ways to organize data in HDF files. – Metadata – API • Examples: HDF-EOS, ASCI DMF NCSA/Univ of Illinois at Urbana-Champaign HDF 18
    • 19. HDF-EOS software layers HDF-EOS Applications HDF-EOS profiles General Applications HDF-EOS API Application Programming Interfaces Low-level Interface HDF file NCSA/Univ of Illinois at Urbana-Champaign HDF 19
    • 20. “HDF Configuration Record” (HCR) • To simplify the tasks of defining, comparing, and producing HDF-EOS files • Formal (ODL) descriptions of HDF-EOS objects NCSA/Univ of Illinois at Urbana-Champaign HDF 20
    • 21. HCR of Swath /* Project XYZ */ /* First version defined on June 10th, 1998 */ OBJECT = SWATH NAME = SCAN1 OBJECT = Dimension NAME = GeoTrack Size = 1200 END_OBJECT = Dimension OBJECT = Dimension NAME = GeoCrossTrack Size = 205 END_OBJECT = Dimension OBJECT = Dimension NAME = DataX Size = 2410 END_OBJECT = Dimension END_OBJECT = SWATH END NCSA/Univ of Illinois at Urbana-Champaign HDF 21
    • 22. HCR • HCR Utilities: – Converters: HCR ↔ HDF-EOS – Edit HCR and HDF-EOS – Compare HCR with HDF-EOS file • Current projects: – Extend HCR converters to all of HDF4 – Similar work with HDF5 – XML too NCSA/Univ of Illinois at Urbana-Champaign HDF 22
    • 23. III. HDF5 NCSA/Univ of Illinois at Urbana-Champaign HDF 23
    • 24. Why HDF5? • HDF shortcomings exposed by EOSDIS, ASCI and others... – – – – – Limits on object & file size (<2GB) Limited number of of objects (<20K) Rigid data models I/O performance Aging software infrastructure (code entropy) NCSA/Univ of Illinois at Urbana-Champaign HDF 24
    • 25. • …new Demands... – Bigger, faster machines and storage systems • massive parallelism, parallel file systems • teraflop speeds, terabyte storage – Greater complexity • complex data structures • complex subsetting – More emphasis on remote & distributed access NCSA/Univ of Illinois at Urbana-Champaign HDF 25
    • 26. • … and ASCI Requirements – – – – Compatibility with vector bundle model Compatibility with MPI-IO Ability to transform data between memory & storage Parallel file systems: PIOFS, HPSS, etc. NCSA/Univ of Illinois at Urbana-Champaign HDF 26
    • 27. New HDF5 Features • More scalable – Larger arrays and files – More objects • Improved data model – New datatypes – Single comprehensive dataset object • Improved software – More flexible, robust library – More flexible API – More I/O options NCSA/Univ of Illinois at Urbana-Champaign HDF 27
    • 28. HDF5 data model • Two primary objects • Dataset – multidimensional array of elements – rich variety of datatypes • group – directory-like structure – contains datasets, groups, other objects NCSA/Univ of Illinois at Urbana-Champaign HDF 28
    • 29. Dataset components • multidimensional array • header with metadata – – – – datatype dataspace attributes storage properties NCSA/Univ of Illinois at Urbana-Champaign HDF 29
    • 30. Simple datatypes • • • • • • The usual scalars: integer & float user-defined scalars (e.g. 13-bit integers) variable length (e.g. strings) pointers to objects or regions of datasets enumeration opaque NCSA/Univ of Illinois at Urbana-Champaign HDF 30
    • 31. Compound datatypes • • • • User-defined Comparable to C structs Members can be simple or compound types Members can be multidimensional NCSA/Univ of Illinois at Urbana-Champaign HDF 31
    • 32. Data Spaces • How data are organized to form a dataset – rank – dimensions • Subsetting during I/O operations – What subset of data is to be moved – In-memory organization of data – In-file organization of data NCSA/Univ of Illinois at Urbana-Champaign HDF 32
    • 33. HDF5 dataset: array of records int8 int4 int16 Datatype: float32 Dimensionality: 5 x 3 Record 3 5 NCSA/Univ of Illinois at Urbana-Champaign HDF 33
    • 34. Dataspaces Reading Dataset into Memory from File File Memory 2D array of integers 3D array of floats Read NCSA/Univ of Illinois at Urbana-Champaign HDF 34
    • 35. Selection: Examples of mappings between file selections and memory selections. (a) A hyperslab from a 2D array to the corner of a smaller 2D array (c) A sequence of points from a 2D array to a sequence of points in a 3D array. (b) A regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array (d) Union of slabs in file to union of slabs in memory. No. of elements must be equal. NCSA/Univ of Illinois at Urbana-Champaign HDF 35
    • 36. Attributes • Named pieces of data • Stored in a dataset or group header • Operations are scaled-down versions of the dataset operations – Not extendible – No compression – No partial I/O NCSA/Univ of Illinois at Urbana-Champaign HDF 36
    • 37. Property list • Properties of objects or operations • Describe how to create, store, access and transfer data NCSA/Univ of Illinois at Urbana-Champaign HDF 37
    • 38. Some Properties • chunked Better subsetting access time; extendable • compressed Improves storage efficiency, transmission speed • extendable Datasets can be extended in any direction • split file Dataset “Fred” File A HDF Metadata for Fred File B Metadata in one file, raw data in another. Data for Fred NCSA/Univ of Illinois at Urbana-Champaign 38
    • 39. Dataset components Dataset Metadata Data Attributes time = 32.4 pressure = 987 temp = 56 Dataspace Datatype Dim_3=2 Rank=2 Dim_2=4 Dim_1=5 int16 Storage properties Chunked; compressed NCSA/Univ of Illinois at Urbana-Champaign HDF 39
    • 40. Groups • • • • • Structures for organizing the file Like Vgroups in HDF4 Like directories in hierarchical file system Every file starts with a root group Groups have attributes NCSA/Univ of Illinois at Urbana-Champaign HDF 40
    • 41. Groups • A mechanism for collections of related objects • Every file starts with a root group • Can have attributes • Like directories in Unix, but a graph, rather than a tree “root” NCSA/Univ of Illinois at Urbana-Champaign HDF 41
    • 42. Groups Groups and members of groups can be shared root NCSA/Univ of Illinois at Urbana-Champaign HDF 42
    • 43. Mounting File A File B root root mount! NCSA/Univ of Illinois at Urbana-Champaign HDF 43
    • 44. Reading & writing with HDF5 • Set properties • Describe the data – datatypes – rank and dimensions – mapping between file and memory • Read/write NCSA/Univ of Illinois at Urbana-Champaign HDF 44
    • 45. Files needn’t be files - Virtual File Layer VFL: A public API for writing I/O drivers Hid_t “File” Handle VFL: Virtual File I/O Layer stdio mpio memory network I/O drivers “Storage” Files HDF Memory Network NCSA/Univ of Illinois at Urbana-Champaign 45
    • 46. HDF5 tools • Current – hdf5ls - lists contents of HDF5 file – h5dumper - higher level view – hdf5 hdf4 converter • Future – – – – – HDF Convert HDF5 ↔ ascii, binary, GIFF, etc Convert HDF4 HDF5 Java tools - VisAD, etc. File/code generation from DDL description Talking to vendors NCSA/Univ of Illinois at Urbana-Champaign 46
    • 47. Other HDF5 activities • • • • Performance tuning Object model Fortran and C++ API Thread-safe HDF5 NCSA/Univ of Illinois at Urbana-Champaign HDF 47
    • 48. IV. HDF4 vs. HDF5 NCSA/Univ of Illinois at Urbana-Champaign HDF 48
    • 49. HDF4 vs. HDF5 • HDF4 • HDF5 - successor to HDF4 – Original format and library – Compatible with all earlier versions – 6 primary objects • • • • • multidim array of scalars raster image, palette table annotation group – Biggest current user: Earth Observing System Data and Info System (EOSDIS) – New format and library – Not compatible with earlier versions – 2 primary objects • multidim. array of records • group – Biggest current user: Accelerated Strategic Computing Initiative (ASCI) NCSA/Univ of Illinois at Urbana-Champaign HDF 49
    • 50. HDF4 object types can be derived from HDF5 datasets and groups HDF5 group HDF5 dataset HDF4 Vgroup lat 12 15 17 23 25 lon 23 24 21 35 31 temp 3.1 4.2 3.6 7.2 6.3 HDF4 Vdata 1-dim array of records HDF HDF4 SDS n-dim array of scalars 2-dim array of multi-component scalars HDF4 8-bit raster March 15, 1990. Simulation with k=10.0, beta=1.22e3. Calculate the magnitude ... 03 -3 45 45 04 72 77 67 43 44 34 87 43 50 23 00 43 34 57 45 HDF4 NCSA/Univ of Illinois at Urbana-Champaign 24-bit raster 50
    • 51. Status of HDF4 vs. HDF5 • HDF4 is still an EOS standard • HDF5 likely also • HDF4 maintenance – Maintained as long as EOS needs it – Minimal new feature • New applications: use HDF5 if possible! – New features, performance improvements, etc. NCSA/Univ of Illinois at Urbana-Champaign HDF 51
    • 52. HDF Information • HDF Information Center – http://hdf.ncsa.uiuc.edu/ • HDF Help email address – hdfhelp@ncsa.uiuc.edu • HDF users mailing list – hdfnews@ncsa.uiuc.edu NCSA/Univ of Illinois at Urbana-Champaign HDF 52

    ×