SlideShare a Scribd company logo
1 of 39
www.hdfgroup.org
The HDF Group
HDF5 and The HDF Group
May 2014
www.hdfgroup.org
THE HDF GROUP
HDF5 and The HDF Group
www.hdfgroup.org
Mission
To provide high quality software for
managing large complex data,
to provide outstanding services for users
of these technologies,
and to insure effective management of
data throughout the data life cycle.
HDF5 and The HDF Group
www.hdfgroup.org
Goals of The HDF Group
• To create, maintain, and evolve software and
services that enable society to manage large
complex data at every stage of the data life
cycle.
• To establish and maintain a sustainable
organization with a highly-skilled and
committed team devoted to accomplishing the
first goal.
HDF5 and The HDF Group
www.hdfgroup.org
The HDF Group
• 1988-2006: Software group at University of Illinois
National Center for Supercomputing Applications
• 2005-present: Non-profit company in Champaign, IL
• Passionate about managing large, complex,
heterogeneous data throughout its life cycle
• Creators and stewards of HDF4 and HDF5
• Own HDF4 and HDF5
• Formats, libraries, and tools are open and free
• Committed to high quality and reliability
• Currently employ 33 staff
HDF5 and The HDF Group
www.hdfgroup.org
Current project list for the HDF Group
• NASA – Earth Observing System (EOS)
• The basis for global climate research
• HDF is the standard archive and distribution format for EOS
• Hundreds of data products, 8 petabyte archive and growing
• NOAA/NASA – JPSS
• Next generation weather satellite system and EOS
• HDF5 is the primary distribution format (6 TB/day)
• Sandia National Laboratory
• High throughput, multi-stream satellite image management
• Synchrotron community
• Scalable solutions for high throughput data acquisition and
management
• ExaHDF5 (Lawrence Berkeley National Lab)
• High end scientific simulations
• Tuning HDF5 for high performance parallel I/O
• FastForward Computing (DOE)
• Solving I/O challenges for exascale computing
HDF5 and The HDF Group
www.hdfgroup.org
The HDF Group Services
• Helpdesk and Mailing Lists
• Available to all users as a first level of support
• Priority Support
• Rapid issue resolution and advice
• Consulting
• Needs assessment, troubleshooting, design reviews, etc.
• Training
• Tutorials and hands-on practical experience
• Enterprise Support
• Coordinating HDF activities across departments
• Special Projects
• Adapting customer applications to HDF
• New features and tools
• Research and Development
HDF5 and The HDF Group
www.hdfgroup.org
WHO USES HDF5?
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
Who uses HDF5?
• Applications that deal with large or complex
data
• Over 200 different application areas
• >2 million data product users world-wide
• Academia, government agencies, industry
www.hdfgroup.org
Members of the HDF support community
• NASA – Earth Observing System
• NOAA/NASA/Riverside Tech – NPOESS
• A large financial institution
• DOE – projects w/LBNL & PNNL, ANL & ORNL
• Lawrence Livermore National Lab
• Army Geospatial Center
• NIH/Geospiza (bio software company )
• Lawrence Berkeley National Lab
• University of Illinois/NCSA
• Sandia National Lab
• A leading U.S. aerospace company
• Projects for petroleum industry, vehicle testing,
weapons research, others
• “In kind” support
HDF5 and The HDF Group
www.hdfgroup.org
New Areas We’re Exploring
• Fusion research data storage
• Submitted proposal for ITER project’s data
management w/large industrial fusion partner
• Astronomy
• Submitted NSF SI2 grant w/NRAO
• Working toward new standard for radioastronomy
data storage
• Electron Microscopy
• Submitted NSF SI2 grant w/LSU, et al
• Proposing new standard for storing imaging data
• Synthesis of HDF5 and cloud storage w/Microsoft
• Developing “RESTful” API for accessing HDF5
data in Azure cloud
HDF5 and The HDF Group
www.hdfgroup.org
HDF5 SCIENCE
APPLICATIONS
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
NASA EOS Remote Sensed Data
• HDF format is the standard file format for
storing data from NASA's Earth Observing
System (EOS) mission.
• Petabytes of data stored in HDF4 and HDF5
to support the Global Climate Change
Research Program.
HDF5 and The HDF Group
www.hdfgroup.org
What is JPSS?
• JPSS is the next generation of NOAA's polar-orbiting
environmental satellites.
• JPSS observations enable forecasting severe
weather like hurricanes, tornadoes and blizzards, and
assessing environmental hazards such as droughts,
forest fires, poor air quality and harmful coastal
waters.
• JPSS will provide continuity of critical, global Earth
observations— including our atmosphere, oceans and
land through 2025.
• During Hurricane Sandy in October 2012, JPSS data
helped forecasters and scientists accurately predict
Sandy's hurricane track and infamous 'left hook'
landfall into New York and New Jersey–more than
five days in advance.
HDF5 and The HDF Group
www.hdfgroup.org
CFD General Notation System
HDF5 and The HDF Group
www.hdfgroup.org
What is CFD?
Computational fluid dynamics (CFD) is a
branch of fluid mechanics that uses numerical
methods and algorithms to solve and analyze
problems that involve fluid flows.
HDF5 and The HDF Group
www.hdfgroup.org
This CFD computer generated image shows a model of
the space shuttle. CFD has taken the place of wind tunnels
for many evaluations of aircraft and, as computing power
increases and computer models become more
sophisticated, CFD will largely replace wind tunnels.
HDF5 and The HDF Group
www.hdfgroup.org
What is CGNS ?
• Standard Interface Data Structures (SIDS)
– Collection of conventions and definitions that
defines the intellectual content of CFD-related
data.
• SIDS to ADF Mapping
– Advanced Data Format
• SIDS to HDF5 Mapping
– Defines how the SIDS is represented in HDF5
• CGNS Mid-Level Library (MLL)
– Application Programming Interface (API) which
conforms to the SIDS
– Built on top of ADF/HDF5, which do I/O operations
HDF5 and The HDF Group
www.hdfgroup.org
CGNS and HDF5*
• CGNS was originally built using the ADF format.
• However, ADF does not have parallel I/O or data
compression capabilities, and does not have the
support and tools that HDF5 offers.
• HDF5 has rapidly grown to become a world-wide
format standard for scientific data.
• HDF5 has parallel capability as well as a broader
support base than ADF.
• Therefore, CGNS has adopted HDF5 as the
default (official) data storage mechanism.
* Paraphrased from http://cgns.sourceforge.net/hdf5.html.
HDF5 and The HDF Group
www.hdfgroup.org
• An adaptive mesh refinement (AMR), grid-based
hybrid code which is designed to do simulations
of cosmological structure formation.
HDF5 and The HDF Group
HDF5 and The HDF Group
Image credit: Alexei Kritsuk, Paolo Padoan & Mike Norman
www.hdfgroup.org
What is ENZO for?
• At UC San Diego ENZO cosmology is used to
simulate the universe from first principles, starting
near the Big Bang.
• Researchers using ENZO have conducted the
most detailed simulations ever of a region of the
universe more than 1.5 billion light years across.
• “We need to zoom in on these dense regions to
capture the key physical processes -- including
gravitation, flows of normal and ‘dark’ matter, and
shock heating and radiative cooling of the gas,”
said Mike Norman. “This requires ENZO’s
‘adaptive mesh refinement’ capability.”
HDF5 and The HDF Group
www.hdfgroup.org
• “AMR codes begin with a coarse grid spacing, and then
spawn more detailed subgrids as needed to track key
processes in higher density regions.
• “We achieved unprecedented detail by reaching seven levels
of subgrids throughout the survey volume -- something never
done before -- producing more than 400,000 subgrids,” said
SDSC computational scientist Robert Harkness.
• “Norman is one of the largest users of supercomputing time
in the world, with 16 million computing hours at the TACC,
and millions more on TeraGrid systems at SDSC, PSC, and
NCSA.”
• “The HDF Group provided important support for handling the
output, and SDSC’s data storage environment allowed the
researchers to efficiently store and manage the massive
data.”
HDF5 and The HDF Group
NeXus
HDF5 and The HDF Group
www.hdfgroup.org
What is NeXus?
• In recent years, scientists and programmers in
neutron and synchrotron facilities around the world
concluded that a common data format would fulfill a
valuable function in the scattering community.
• As instrumentation becomes more complex and data
visualization more challenging, scientists find it
difficult to keep up with new developments.
• A common data format makes it easier to exchange
experimental results and to exchange ideas about
how to analyze them. It promotes greater cooperation
in software development and stimulates the design of
more sophisticated visualization tools.
• The NeXus data format has been developed in
response to these needs.
HDF5 and The HDF Group
www.hdfgroup.org
HDF5 TECHNOLOGIES
HDF5 and The HDF Group
www.hdfgroup.org
Data challenges addressed by HDF5
• Ability to organize complex collections of data
• Efficient and scalable data storage and access
• A growing need to integrate a wide variety of
types of data
• The evolution of data technologies
• Long term preservation of data
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
HDF is…
• HDF stands for ‘Hierarchical Data Format’
• A file format for storing any kind of data
• Software system to manage data in the format
• Designed for high volume or complex data
• Designed for every size and type of system
• Open format and software library, tools
• There are two HDF’s: HDF4 and HDF5
• Here we focus on HDF5
www.hdfgroup.org
HDF5 Technology Platform
HDF5 data
model
• The “building
blocks” for data
organization
HDF5 software
• Library, language
interfaces, tools
HDF5 file
format
• Byte-level
organization of
data
HDF5 and The HDF Group
www.hdfgroup.org
Professionally managed
• Source under version control, public access
• Automatic daily testing,
• 200+ configurations
• Performance, backward/forward compatibility
• “C, C++, Fortran, Java, Python APIs
• Build supports Autoconfigure and CMake
• Sound development, coding practices
• Maintenance releases every May, November
HDF5 and The HDF Group
www.hdfgroup.org
Professionally supported
• Helpdesk
• FORUM and mailing lists
• Extensive web documentation – User’s Guide,
Ref Manual, examples, tutorials, other docs
• Community friendly
• Integrate contributions from external
developers
• Solicit feedback on new features and pre-
releases
• Collaborate on projects, especially in testing
HDF5 and The HDF Group
www.hdfgroup.org
HDF5 file
lat | lon | temp
----|-----|-----
12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
An HDF5 file is a
container that
holds data
objects.
HDF5 and The HDF Group
www.hdfgroup.org
HDF5 file organization
lat | lon | temp
----|-----|-----
12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
Experiment Notes:
Serial Number: 99378920
Date: 3/13/09
Config: Standard 3
/
SimOutViz
HDF5 groups and links
organize data objects.
Parameters
10;100;1000
Timestep
36,000
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
A single platform with multiple uses
• One general data model
• One general format
• One library
• Adaptable for almost any kind of data
• Works on almost any architecture
• Ability to interact well with other technologies
• Attention to past, present, future compatibility
HDF5 Philosophy
www.hdfgroup.org
HDF5 Software Layers & Storage
HDF5 File
Format File Split
Files
File on
Parallel
Filesystem
Other
h5dump
tool
High Level
APIs
HDFView
tool
Tools
h5repack
tool …
I/O Drivers
Internals
Datatype
Conversion
data
compression
Chunked
Storage
Version
Compatibility
and so on…
Language Interfaces
C, Fortran, C++
HDF5 Data Model
Groups, Datasets, Attributes, …
HDF5Library
Posix
I/O
Split
Files
Parallel
I/O
Custom
HDF5 and The HDF Group
www.hdfgroup.org
HDF ecosystem
Storage
EOS Domain
Data Objects
Applications
EOS
Applications
MATLAB
HDF Library
IDL
HDF-EOS Library
Swath Grid Point
Etc.
HDF tools
HDF5 and The HDF Group
www.hdfgroup.org
Other Software
• The HDF Group
• HDFView – an HDF4 & HDF5 browser
• Command-line utilities
• Regression and performance testing software
• 3rd Party
• NetCDF-4, IDL, MATLAB, Mathematica,
PyTables, Pandas
• Communities
• EOS, ASC, CGNS, Energistics, NeXuS
• Integration with other software
• iRODS, OPeNDAP, MPI
HDF5 and The HDF Group
www.hdfgroup.org
www.hdfgroup.org
HDF5 and The HDF Group

More Related Content

What's hot

Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)The HDF-EOS Tools and Information Center
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research dataARDC
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsThe HDF-EOS Tools and Information Center
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 

What's hot (20)

HDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGISHDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGIS
 
Improved Methods for Accessing Scientific Data for the Masses
Improved Methods for Accessing Scientific Data for the MassesImproved Methods for Accessing Scientific Data for the Masses
Improved Methods for Accessing Scientific Data for the Masses
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
 
HDF
HDFHDF
HDF
 
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
 
HDF Product Designer
HDF Product DesignerHDF Product Designer
HDF Product Designer
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research data
 
Status of HDF-EOS, Related Software and Tools
 Status of HDF-EOS, Related Software and Tools Status of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and Tools
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Transitioning from HDF4 to HDF5
Transitioning from HDF4 to HDF5Transitioning from HDF4 to HDF5
Transitioning from HDF4 to HDF5
 
HDF OPeNDAP Project Update and Demo
HDF OPeNDAP Project Update and DemoHDF OPeNDAP Project Update and Demo
HDF OPeNDAP Project Update and Demo
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
GES DISC Eexperiences with HDF Formats for MEaSUREs Projects
GES DISC Eexperiences with HDF Formats for MEaSUREs ProjectsGES DISC Eexperiences with HDF Formats for MEaSUREs Projects
GES DISC Eexperiences with HDF Formats for MEaSUREs Projects
 
GDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS ProjectGDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS Project
 
Multidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGISMultidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGIS
 
SPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth ObservationSPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth Observation
 
Using IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS DataUsing IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS Data
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 

Similar to HDF5 and The HDF Group

ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarFAIRDOM
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
LCI2009-Tutorial
LCI2009-TutorialLCI2009-Tutorial
LCI2009-Tutorialtutorialsruby
 
LCI2009-Tutorial
LCI2009-TutorialLCI2009-Tutorial
LCI2009-Tutorialtutorialsruby
 

Similar to HDF5 and The HDF Group (20)

RFCs for HDF5 and HDF-EOS5 Status Update
RFCs for HDF5 and HDF-EOS5 Status UpdateRFCs for HDF5 and HDF-EOS5 Status Update
RFCs for HDF5 and HDF-EOS5 Status Update
 
Plans for Enhanced NetCDF-4 Interface to HDF5 Data
Plans for Enhanced NetCDF-4 Interface to HDF5 DataPlans for Enhanced NetCDF-4 Interface to HDF5 Data
Plans for Enhanced NetCDF-4 Interface to HDF5 Data
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF
HDFHDF
HDF
 
HDF Software Process - Lessons Learned & Success Factors
HDF Software Process - Lessons Learned & Success FactorsHDF Software Process - Lessons Learned & Success Factors
HDF Software Process - Lessons Learned & Success Factors
 
Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
HDF Updae
HDF UpdaeHDF Updae
HDF Updae
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 
HDF Town Hall
HDF Town HallHDF Town Hall
HDF Town Hall
 
Hdg geo discussion
Hdg geo discussionHdg geo discussion
Hdg geo discussion
 
LCI2009-Tutorial
LCI2009-TutorialLCI2009-Tutorial
LCI2009-Tutorial
 
LCI2009-Tutorial
LCI2009-TutorialLCI2009-Tutorial
LCI2009-Tutorial
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
SEEDS Standards Process
SEEDS Standards ProcessSEEDS Standards Process
SEEDS Standards Process
 
Parallel HDF5 Developments
Parallel HDF5 DevelopmentsParallel HDF5 Developments
Parallel HDF5 Developments
 

More from The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

HDF5 and The HDF Group

  • 1. www.hdfgroup.org The HDF Group HDF5 and The HDF Group May 2014
  • 3. www.hdfgroup.org Mission To provide high quality software for managing large complex data, to provide outstanding services for users of these technologies, and to insure effective management of data throughout the data life cycle. HDF5 and The HDF Group
  • 4. www.hdfgroup.org Goals of The HDF Group • To create, maintain, and evolve software and services that enable society to manage large complex data at every stage of the data life cycle. • To establish and maintain a sustainable organization with a highly-skilled and committed team devoted to accomplishing the first goal. HDF5 and The HDF Group
  • 5. www.hdfgroup.org The HDF Group • 1988-2006: Software group at University of Illinois National Center for Supercomputing Applications • 2005-present: Non-profit company in Champaign, IL • Passionate about managing large, complex, heterogeneous data throughout its life cycle • Creators and stewards of HDF4 and HDF5 • Own HDF4 and HDF5 • Formats, libraries, and tools are open and free • Committed to high quality and reliability • Currently employ 33 staff HDF5 and The HDF Group
  • 6. www.hdfgroup.org Current project list for the HDF Group • NASA – Earth Observing System (EOS) • The basis for global climate research • HDF is the standard archive and distribution format for EOS • Hundreds of data products, 8 petabyte archive and growing • NOAA/NASA – JPSS • Next generation weather satellite system and EOS • HDF5 is the primary distribution format (6 TB/day) • Sandia National Laboratory • High throughput, multi-stream satellite image management • Synchrotron community • Scalable solutions for high throughput data acquisition and management • ExaHDF5 (Lawrence Berkeley National Lab) • High end scientific simulations • Tuning HDF5 for high performance parallel I/O • FastForward Computing (DOE) • Solving I/O challenges for exascale computing HDF5 and The HDF Group
  • 7. www.hdfgroup.org The HDF Group Services • Helpdesk and Mailing Lists • Available to all users as a first level of support • Priority Support • Rapid issue resolution and advice • Consulting • Needs assessment, troubleshooting, design reviews, etc. • Training • Tutorials and hands-on practical experience • Enterprise Support • Coordinating HDF activities across departments • Special Projects • Adapting customer applications to HDF • New features and tools • Research and Development HDF5 and The HDF Group
  • 9. www.hdfgroup.orgHDF5 and The HDF Group Who uses HDF5? • Applications that deal with large or complex data • Over 200 different application areas • >2 million data product users world-wide • Academia, government agencies, industry
  • 10. www.hdfgroup.org Members of the HDF support community • NASA – Earth Observing System • NOAA/NASA/Riverside Tech – NPOESS • A large financial institution • DOE – projects w/LBNL & PNNL, ANL & ORNL • Lawrence Livermore National Lab • Army Geospatial Center • NIH/Geospiza (bio software company ) • Lawrence Berkeley National Lab • University of Illinois/NCSA • Sandia National Lab • A leading U.S. aerospace company • Projects for petroleum industry, vehicle testing, weapons research, others • “In kind” support HDF5 and The HDF Group
  • 11. www.hdfgroup.org New Areas We’re Exploring • Fusion research data storage • Submitted proposal for ITER project’s data management w/large industrial fusion partner • Astronomy • Submitted NSF SI2 grant w/NRAO • Working toward new standard for radioastronomy data storage • Electron Microscopy • Submitted NSF SI2 grant w/LSU, et al • Proposing new standard for storing imaging data • Synthesis of HDF5 and cloud storage w/Microsoft • Developing “RESTful” API for accessing HDF5 data in Azure cloud HDF5 and The HDF Group
  • 13. www.hdfgroup.orgHDF5 and The HDF Group NASA EOS Remote Sensed Data • HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission. • Petabytes of data stored in HDF4 and HDF5 to support the Global Climate Change Research Program.
  • 14. HDF5 and The HDF Group
  • 15. www.hdfgroup.org What is JPSS? • JPSS is the next generation of NOAA's polar-orbiting environmental satellites. • JPSS observations enable forecasting severe weather like hurricanes, tornadoes and blizzards, and assessing environmental hazards such as droughts, forest fires, poor air quality and harmful coastal waters. • JPSS will provide continuity of critical, global Earth observations— including our atmosphere, oceans and land through 2025. • During Hurricane Sandy in October 2012, JPSS data helped forecasters and scientists accurately predict Sandy's hurricane track and infamous 'left hook' landfall into New York and New Jersey–more than five days in advance. HDF5 and The HDF Group
  • 16. www.hdfgroup.org CFD General Notation System HDF5 and The HDF Group
  • 17. www.hdfgroup.org What is CFD? Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows. HDF5 and The HDF Group
  • 18. www.hdfgroup.org This CFD computer generated image shows a model of the space shuttle. CFD has taken the place of wind tunnels for many evaluations of aircraft and, as computing power increases and computer models become more sophisticated, CFD will largely replace wind tunnels. HDF5 and The HDF Group
  • 19. www.hdfgroup.org What is CGNS ? • Standard Interface Data Structures (SIDS) – Collection of conventions and definitions that defines the intellectual content of CFD-related data. • SIDS to ADF Mapping – Advanced Data Format • SIDS to HDF5 Mapping – Defines how the SIDS is represented in HDF5 • CGNS Mid-Level Library (MLL) – Application Programming Interface (API) which conforms to the SIDS – Built on top of ADF/HDF5, which do I/O operations HDF5 and The HDF Group
  • 20. www.hdfgroup.org CGNS and HDF5* • CGNS was originally built using the ADF format. • However, ADF does not have parallel I/O or data compression capabilities, and does not have the support and tools that HDF5 offers. • HDF5 has rapidly grown to become a world-wide format standard for scientific data. • HDF5 has parallel capability as well as a broader support base than ADF. • Therefore, CGNS has adopted HDF5 as the default (official) data storage mechanism. * Paraphrased from http://cgns.sourceforge.net/hdf5.html. HDF5 and The HDF Group
  • 21. www.hdfgroup.org • An adaptive mesh refinement (AMR), grid-based hybrid code which is designed to do simulations of cosmological structure formation. HDF5 and The HDF Group
  • 22. HDF5 and The HDF Group Image credit: Alexei Kritsuk, Paolo Padoan & Mike Norman
  • 23. www.hdfgroup.org What is ENZO for? • At UC San Diego ENZO cosmology is used to simulate the universe from first principles, starting near the Big Bang. • Researchers using ENZO have conducted the most detailed simulations ever of a region of the universe more than 1.5 billion light years across. • “We need to zoom in on these dense regions to capture the key physical processes -- including gravitation, flows of normal and ‘dark’ matter, and shock heating and radiative cooling of the gas,” said Mike Norman. “This requires ENZO’s ‘adaptive mesh refinement’ capability.” HDF5 and The HDF Group
  • 24. www.hdfgroup.org • “AMR codes begin with a coarse grid spacing, and then spawn more detailed subgrids as needed to track key processes in higher density regions. • “We achieved unprecedented detail by reaching seven levels of subgrids throughout the survey volume -- something never done before -- producing more than 400,000 subgrids,” said SDSC computational scientist Robert Harkness. • “Norman is one of the largest users of supercomputing time in the world, with 16 million computing hours at the TACC, and millions more on TeraGrid systems at SDSC, PSC, and NCSA.” • “The HDF Group provided important support for handling the output, and SDSC’s data storage environment allowed the researchers to efficiently store and manage the massive data.” HDF5 and The HDF Group
  • 25. NeXus HDF5 and The HDF Group
  • 26. www.hdfgroup.org What is NeXus? • In recent years, scientists and programmers in neutron and synchrotron facilities around the world concluded that a common data format would fulfill a valuable function in the scattering community. • As instrumentation becomes more complex and data visualization more challenging, scientists find it difficult to keep up with new developments. • A common data format makes it easier to exchange experimental results and to exchange ideas about how to analyze them. It promotes greater cooperation in software development and stimulates the design of more sophisticated visualization tools. • The NeXus data format has been developed in response to these needs. HDF5 and The HDF Group
  • 28. www.hdfgroup.org Data challenges addressed by HDF5 • Ability to organize complex collections of data • Efficient and scalable data storage and access • A growing need to integrate a wide variety of types of data • The evolution of data technologies • Long term preservation of data HDF5 and The HDF Group
  • 29. www.hdfgroup.orgHDF5 and The HDF Group HDF is… • HDF stands for ‘Hierarchical Data Format’ • A file format for storing any kind of data • Software system to manage data in the format • Designed for high volume or complex data • Designed for every size and type of system • Open format and software library, tools • There are two HDF’s: HDF4 and HDF5 • Here we focus on HDF5
  • 30. www.hdfgroup.org HDF5 Technology Platform HDF5 data model • The “building blocks” for data organization HDF5 software • Library, language interfaces, tools HDF5 file format • Byte-level organization of data HDF5 and The HDF Group
  • 31. www.hdfgroup.org Professionally managed • Source under version control, public access • Automatic daily testing, • 200+ configurations • Performance, backward/forward compatibility • “C, C++, Fortran, Java, Python APIs • Build supports Autoconfigure and CMake • Sound development, coding practices • Maintenance releases every May, November HDF5 and The HDF Group
  • 32. www.hdfgroup.org Professionally supported • Helpdesk • FORUM and mailing lists • Extensive web documentation – User’s Guide, Ref Manual, examples, tutorials, other docs • Community friendly • Integrate contributions from external developers • Solicit feedback on new features and pre- releases • Collaborate on projects, especially in testing HDF5 and The HDF Group
  • 33. www.hdfgroup.org HDF5 file lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 An HDF5 file is a container that holds data objects. HDF5 and The HDF Group
  • 34. www.hdfgroup.org HDF5 file organization lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Config: Standard 3 / SimOutViz HDF5 groups and links organize data objects. Parameters 10;100;1000 Timestep 36,000 HDF5 and The HDF Group
  • 35. www.hdfgroup.orgHDF5 and The HDF Group A single platform with multiple uses • One general data model • One general format • One library • Adaptable for almost any kind of data • Works on almost any architecture • Ability to interact well with other technologies • Attention to past, present, future compatibility HDF5 Philosophy
  • 36. www.hdfgroup.org HDF5 Software Layers & Storage HDF5 File Format File Split Files File on Parallel Filesystem Other h5dump tool High Level APIs HDFView tool Tools h5repack tool … I/O Drivers Internals Datatype Conversion data compression Chunked Storage Version Compatibility and so on… Language Interfaces C, Fortran, C++ HDF5 Data Model Groups, Datasets, Attributes, … HDF5Library Posix I/O Split Files Parallel I/O Custom HDF5 and The HDF Group
  • 37. www.hdfgroup.org HDF ecosystem Storage EOS Domain Data Objects Applications EOS Applications MATLAB HDF Library IDL HDF-EOS Library Swath Grid Point Etc. HDF tools HDF5 and The HDF Group
  • 38. www.hdfgroup.org Other Software • The HDF Group • HDFView – an HDF4 & HDF5 browser • Command-line utilities • Regression and performance testing software • 3rd Party • NetCDF-4, IDL, MATLAB, Mathematica, PyTables, Pandas • Communities • EOS, ASC, CGNS, Energistics, NeXuS • Integration with other software • iRODS, OPeNDAP, MPI HDF5 and The HDF Group