SlideShare a Scribd company logo
HDF Software Process
Lessons Learned & Success Factors
Mike Folk, Elena Pourmal , Bob McGrath
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
NOBUGS 2004
HDF-EOS Workshop VIII
-1-

HDF
Outline
•
•
•
•
•
•
•

What is HDF? and Who is HDF?
HDF “Architecture”
Some statistics
How do we measure success?
How can we achieve success?
Group practices
Summing up – strengths, weaknesses, needs
-2-

HDF
What is HDF?
Who is HDF?

-3-

HDF
HDF in a nutshell – what it is
• File format and I/O Libraries for storing,
managing and archiving large complex
scientific and other data
• Tools and utilities
• Open source, free for any use (U of I license)
• Well maintained and supported

• From HDF group, NCSA Univ of Illinois
• http://hdf.ncsa.uiuc.edu
-4-

HDF
HDF in a nutshell - features
• General

– simple and flexible data model

• Flexible

– store data of diverse origins, sizes, types
– supports complex data structures and types

• Portable

– available for many operating systems and machines

• Scalable

– works in high end computing environments
– accommodates date of any size or multiplicity

• Efficient

– fast access, including parallel i/o
– Stores big data efficiently

-5-

HDF
HDF in a nutshell - users
• Apps in industry, academia, government
– More than 200 distinct applications

• Large user base
– E.g. NASA estimates 1.6 million users

• Underlying format for community standards
– E.g. HDF-EOS, SAF, CGNS, NPOESS, NeXus

-6-

HDF
Example of HDF file: mixing and grouping
objects
Text : This file was create as a part of…
see http://hdf.ncsa.uiuc.edu
foo

a
3-D array

z

1GB

lat | lon | temp
----|-----|----12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6

c

b
palette

x

_foo_y

Table

Raster image
Raster image

-7-

2-D array

HDF
HDF “Architecture”

-8-

HDF
HDF “Architecture”
Tools & Applications
HDF5 Applications
Programming Interface
Low level Interface

Utilities and applications for
managing, manipulating,
viewing, & analyzing data.

HDF I/O library

– High-level, object-specific APIs.
– Low-level API for I/O to files, etc.

File or other data source

File
-9-

HDF
User’s controlled I/O and “storage”
• Data pipeline
–
–
–
–

HDF I/O Library

HDF “File”

Data transformation
Compression
Encryption
Storage layout

• Virtual file options
–
–
–
–
–
–

- 10 -

Stdio (normal file)
Split file
MPI-IO & other parallel
Network
Memory
custom

HDF
Supported languages and compilers
• C
• Wrappers:
– C++
– Fortran90
– Java

• Vendors’ compilers (SUN, IBM, HP, etc.)
• PGI and Absoft (Fortran)
• GNU C (e.g. gcc 3.3.2)
- 11 -

HDF
Supported Machines and OS
•
•
•
•
•
•
•

Solaris 2.7, 2.8 (32/64-bit)
IRIX6.5 IRIX64-6.5
HPUX 11.00
AIX 5.1 (32/64-bit modes)
OSF1
FreeBSD
Linux (SuSe, RH8, RH9)
including 64-bit

- 12 -

•
•
•
•
•
•
•

Altix (SGI Linux)
IA-32 and IA-64
Windows 2000, XP
MAC OS X
Crays (T3E, SV1, T90IEEE)
DOE National Labs machines
Linux Clusters

HDF
Architecture in context

Tools & Applications
C
C++
F90
Java
HDF5 Applications
Programming Interface
Low level Interface
IA32

SGI Wintel Cray
File
Linux RH IRIX32 XP
SV1
Serial
- 13 -

Parallel

HDF
Architecture in context
Tools & Applications
HDF-EOS SAF CGNS
C

C++

F90

Java

HDF5 Applications
Programming Interface
Low level Interface
IA32

SGI Wintel Cray

Linux RH IRIX32 XP
Serial
- 15 -

SV1

Parallel
File

HDF
The testing challenge
Machines × operating systems
× compilers × languages
× serial and parallel
× compression options
× configuration options
× virtual file options
× backward compatibility

= a large number
- 16 -

HDF
“Diversity makes our code better…”
Todd Smith, Geospiza

- 17 -

HDF
Some statistics

- 18 -

HDF
HDF Statistics
• HDF Group
– 15 FTE + 3-5 students
– $2.1million annual budget

• HDF5 source code distribution
– 2073 files
– 917,186 Lines of code

• HDF Project
– HDF5, HDF4, H4toH5, H5Lite, Java
– 3,000,000 lines of code (estimate)
- 19 -

HDF
HDF5 source distribution by categories
(lines of code)

Library
Tests
13%

Tools
Tools
tests
4%
4%

Configure
15%

Docs
33%

Libraries
30%
Examples
1%
- 20 -

HDF
HDF5 staff investment
Comm. with
users 2%

Meetings, etc.
9%

Code dev.
33%

Peer-to-peer
comm. 12%

User's support
14%
Test writing
7%

Docs, design,
consult 14%

- 21 -

Porting/release
testing 9%

HDF
How do we measure success?

- 22 -

HDF
How do we measure success?
•
•
•
•
•
•
•

Mission
Goals and objectives
Strong and continuing relationships with users
High quality software
Strong committed development team
Great working environment
Adequate funding

- 23 -

HDF
Mission, goals and objectives
• Mission
– To develop, promote, deploy, and support open and
free technologies that facilitate scientific data
exchange, access, analysis, archiving and discovery

• Goals (examples)
– Innovate and evolve the technologies in concert with a
changing world of technologies
– Maintain a high level of quality and reliability
– Collaborate and build communities
– Build a team

- 24 -

HDF
Mission, goals and objectives
• Objectives - how we reach the goal
• Example:
– Goal
• Maintain a high level of quality and reliability

– Objectives
• Improve testing
• Implement a program to insure excellent software
engineering practices
• Develop and execute a plan to meet
quality/reliability standards
- 25 -

HDF
Users
•
•
•
•

Number of users
Happy users 
Unhappy users 
Users achieve their goals by using HDF
technologies
• Users coming back with new needs
• Financial support from users
- 26 -

HDF
Software
• Technology that addresses users’ needs and
demands (current and future)
– E.g. big files, parallel access, multiple objects

• Usability
–
–
–
–

Number and types of applications
Appropriate APIs and data models
Available tools
Interoperability with other software
• E.g. IDL, MatLab, Mathematica
- 27 -

HDF
Software
• Stability
– Can data be shared?
– Can software run on needed platforms

• Sustainability
– Can read data written 15 years ago on obsolete platform
– Is software available in 15 years?

• Acceptability
– De facto standard
• Open standard for exchange of remote-sensed data
• Over 3,000,000,000,000,000 bytes stored in HDF and HDF-EOS

- 28 -

HDF
How can we achieve success?

- 29 -

HDF
How can we achieve success?
• Maintain strong, responsible, and continuing
relationships with users
• An approach to needs identification, software
design, and software implementation based on
sound principles of software engineering
• Effective technical processes for developing,
testing, integrating and maintaining software
• Business and social processes based on sound
group management principles

- 30 -

HDF
Stages of software development at
HDF
•
•
•
•
•

Getting started
Creating an implementation approach
Implementation and maintenance
Relations with users and sponsors
Group practices

- 31 -

HDF
Getting started
•
•
•
•

Discover a need
Identify a sponsor
Clarify the need, its role, and its importance
Enter task into the project plan
–
–
–
–

Make initial estimate of time and resources for the task
Give it a priority
Identify task’s lead
Identify a person who will work on the task

- 32 -

HDF
Creating implementation approach
• Write up a needs/approach RFC (Request For
Comment)
– Actively solicit feedback from developers/sponsors
– Revise until satisfied

• Write up a design/approach RFC
– Get feedback from developers/sponsors
– Revise until satisfied

• Revise project plan according to RFC results
• Archive RFC
- 33 -

HDF
Implementation and maintenance
• Identify validation plan (need improvement)
• Implement
– Library or tool
– Tests
– Documentation

• Ask sponsor and friendly users for feedback
• Review results and repeat appropriate steps above as
needed
• Clean up (documentation, Web, etc.) and announce
• Support (debug, fix, add more tests, advertise)

- 34 -

HDF
Relations with users and sponsors
• Who are our sponsors?
– Organizations and communities with
institutional and financial commitment to HDF
• NCSA, NASA, DOE ASCI, Boeing, …

– Agencies supporting R&D
• NCSA, NASA, DOE, NSF, …

– Collaborators who make in-kind contributions
• Cactus, PyTables, NeXUS, CGNS …

– HDF group members
- 35 -

HDF
Relations with users and sponsors
• Each task is associated with a sponsor
• Each task has a priority, which should be
confirmed with sponsor
• Each task falls into one of these categories
– Research
– R&D (research, possibly integrate into product)
– Development
• Technology infusion
• Library or tools enhancement
- 36 -

HDF
Group practices

- 37 -

HDF
Group practices - technical
• Source code management: CVS
• Bug tracking: Bugzilla
– Bugs entered by support staff and developers
– Prioritized by staff
– Easy bugs fixed “on the fly”

- 38 -

HDF
Group practices - technical
• The testing challenge
• Code testing
–
–
–
–
–

Testing before code check-in
Regression testing
Remote testing
Different configurations testing
Backward compatibility testing

- 39 -

HDF
Daily test report
From: HDF group system admin <hdfadmin@ncsa.uiuc.edu>
To: hdf5lib@ncsa.uiuc.edu
Subject: HDF5_Daily_Tests_FAILED!!!
*** HDF5 Tests on 041022 ***
=============================
Watchers List
=============================
HDF5 Daily test features/platforms watchers and procedure
--------------------------------------------------------Procedure:
The watcher will investigate and report
the cause of failure by 11am.
The developer who checked in the error code
may report so by then too.
The watcher or the developer should get the
failure fixed and report it by 3pm.

- 41 -

HDF
Group practices - technical
• Release levels
– Development release
– Official release
– Past releases

- 42 -

HDF
Group practices - technical
•
•
•
•
•
•

Coding standards
Maintaining platform-independence
Maintaining time-independence
Rules for changing APIs
Documentation
Rapid prototyping

- 43 -

HDF
Group practices – business and social
HDF Project
HDF Project

• Staff breakdown
–
–
–
–
–
–
–

User support
Documentation
QA
Software development
Testing
Team leadership
System administration

Support,
Support,
doc, QA,
doc, QA,
maintenance
maintenance

- 44 -

Basic library
Basic library
development
development

Tools and
Tools and
Java
Java

Parallel I/O,
Parallel I/O,
Grid,
Grid,
big machines
big machines

• Team lead for each team
• Most staff in two or more teams
• Staff relationships
– Complement each other
– Overlap each other
– Keep each other honest

HDF
Group practices – business and social
• Accountability of everyone to the whole process
• Help desk
• Approaches to carrying out tasks
– Paying attention to technical proposals
– Weekly HDf5 developer’s meetings
– HDF seminars

• Management and administration
– Performance reviews with emphasis on goals, development
– Critical to success
– That’s another talk

- 45 -

HDF
Summing up
Strengths, weaknesses, needs

- 46 -

HDF
Strengths
• User support
• Staff
– High quality, diverse staff with good morale
– Staff commitment and enthusiasm

• Ability to address all aspects of product development
– Emphasis on quality control
– Fast bug fixing and frequent releases
– Ability to focus on a single product over a long term

• High level of support from sponsors
• Project’s visibility through NCSA, NASA, DOE, users

- 47 -

HDF
Weaknesses
•

Software development team

–
–

Library expertise still concentrated among too few
developers
Team communication is challenging

•

Processes

–
–
–
–
–

Release/maintenance take too much time and
resources
Configuration and porting are a huge time sink
We don’t do enough prototyping
Hard to keep up with new technologies
Parallel I/O hard to support

- 48 -

HDF
More weaknesses & challenges
• Usability
–
–
–
–

Software too hard to use for casual users
Insufficient documentation
Insufficient tools for high level users
Insufficient interoperability with common tools and
formats

• Marketing
– Marketing effort is inadequate
– Need to connect better with users and potential users

• Viable long-term support
- 49 -

HDF
Most immediate needs
•
•
•
•

Configuration and build
Testing and prototyping
Marketing
Reporting
– Performance reports
– General reports to users
– HDF book

• Sustainable business model
- 50 -

HDF
Thank you

- 51 -

HDF

More Related Content

Similar to HDF Software Process - Lessons Learned & Success Factors

Hydra Project Management Survey
Hydra Project Management SurveyHydra Project Management Survey
Hydra Project Management Survey
Mark Notess
 
The State of HDF
The State of HDFThe State of HDF
HDF Update
HDF UpdateHDF Update
HDF
HDFHDF
HDF Update
HDF UpdateHDF Update
Open source caqdas what is in the box and what is missing
Open source caqdas what is in the box and what is missingOpen source caqdas what is in the box and what is missing
Open source caqdas what is in the box and what is missing
Merlien Institute
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
FAIRDOM
 
Technology Planning for River Groups
Technology Planning for River GroupsTechnology Planning for River Groups
Technology Planning for River Groups
Sean Larkin
 
Scaling Application Development & Delivery across the Enterprise
Scaling Application Development & Delivery across the EnterpriseScaling Application Development & Delivery across the Enterprise
Scaling Application Development & Delivery across the Enterprise
CollabNet
 
Get A Head on Your Repository
Get A Head on Your RepositoryGet A Head on Your Repository
Get A Head on Your Repository
eosadler
 
A blueprint for enterprise agility
A blueprint for enterprise agilityA blueprint for enterprise agility
A blueprint for enterprise agility
CollabNet
 
Goethals Harvard Library's Digital Preservation Repository
Goethals Harvard Library's Digital Preservation RepositoryGoethals Harvard Library's Digital Preservation Repository
Goethals Harvard Library's Digital Preservation Repository
National Information Standards Organization (NISO)
 
Criteria and evaluation of research data repository platforms @ the Universit...
Criteria and evaluation of research data repository platforms @ the Universit...Criteria and evaluation of research data repository platforms @ the Universit...
Criteria and evaluation of research data repository platforms @ the Universit...
heila1
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
DataWorks Summit
 
DevTalk: The Road to Continuous Delivery: Driving Lessons
DevTalk: The Road to Continuous Delivery: Driving LessonsDevTalk: The Road to Continuous Delivery: Driving Lessons
DevTalk: The Road to Continuous Delivery: Driving Lessons
Perforce
 
Hdf operations-hortonworks-data-flow-1
Hdf operations-hortonworks-data-flow-1Hdf operations-hortonworks-data-flow-1
Hdf operations-hortonworks-data-flow-1
ssusercda69b
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...
amiraryani
 
Comp8 unit5 lecture_slides
Comp8 unit5 lecture_slidesComp8 unit5 lecture_slides
Comp8 unit5 lecture_slides
CMDLMS
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
Muhammad Rifqi
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 

Similar to HDF Software Process - Lessons Learned & Success Factors (20)

Hydra Project Management Survey
Hydra Project Management SurveyHydra Project Management Survey
Hydra Project Management Survey
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF
HDFHDF
HDF
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Open source caqdas what is in the box and what is missing
Open source caqdas what is in the box and what is missingOpen source caqdas what is in the box and what is missing
Open source caqdas what is in the box and what is missing
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 
Technology Planning for River Groups
Technology Planning for River GroupsTechnology Planning for River Groups
Technology Planning for River Groups
 
Scaling Application Development & Delivery across the Enterprise
Scaling Application Development & Delivery across the EnterpriseScaling Application Development & Delivery across the Enterprise
Scaling Application Development & Delivery across the Enterprise
 
Get A Head on Your Repository
Get A Head on Your RepositoryGet A Head on Your Repository
Get A Head on Your Repository
 
A blueprint for enterprise agility
A blueprint for enterprise agilityA blueprint for enterprise agility
A blueprint for enterprise agility
 
Goethals Harvard Library's Digital Preservation Repository
Goethals Harvard Library's Digital Preservation RepositoryGoethals Harvard Library's Digital Preservation Repository
Goethals Harvard Library's Digital Preservation Repository
 
Criteria and evaluation of research data repository platforms @ the Universit...
Criteria and evaluation of research data repository platforms @ the Universit...Criteria and evaluation of research data repository platforms @ the Universit...
Criteria and evaluation of research data repository platforms @ the Universit...
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
DevTalk: The Road to Continuous Delivery: Driving Lessons
DevTalk: The Road to Continuous Delivery: Driving LessonsDevTalk: The Road to Continuous Delivery: Driving Lessons
DevTalk: The Road to Continuous Delivery: Driving Lessons
 
Hdf operations-hortonworks-data-flow-1
Hdf operations-hortonworks-data-flow-1Hdf operations-hortonworks-data-flow-1
Hdf operations-hortonworks-data-flow-1
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...
 
Comp8 unit5 lecture_slides
Comp8 unit5 lecture_slidesComp8 unit5 lecture_slides
Comp8 unit5 lecture_slides
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 

More from The HDF-EOS Tools and Information Center

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
The HDF-EOS Tools and Information Center
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
The HDF-EOS Tools and Information Center
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
The HDF-EOS Tools and Information Center
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
The HDF-EOS Tools and Information Center
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
The HDF-EOS Tools and Information Center
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
The HDF-EOS Tools and Information Center
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
The HDF-EOS Tools and Information Center
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
The HDF-EOS Tools and Information Center
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
The HDF-EOS Tools and Information Center
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
The HDF-EOS Tools and Information Center
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
The HDF-EOS Tools and Information Center
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
The HDF-EOS Tools and Information Center
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
The HDF-EOS Tools and Information Center
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
The HDF-EOS Tools and Information Center
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
The HDF-EOS Tools and Information Center
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 

Recently uploaded

Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 

Recently uploaded (20)

Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 

HDF Software Process - Lessons Learned & Success Factors

  • 1. HDF Software Process Lessons Learned & Success Factors Mike Folk, Elena Pourmal , Bob McGrath National Center for Supercomputing Applications University of Illinois at Urbana-Champaign NOBUGS 2004 HDF-EOS Workshop VIII -1- HDF
  • 2. Outline • • • • • • • What is HDF? and Who is HDF? HDF “Architecture” Some statistics How do we measure success? How can we achieve success? Group practices Summing up – strengths, weaknesses, needs -2- HDF
  • 3. What is HDF? Who is HDF? -3- HDF
  • 4. HDF in a nutshell – what it is • File format and I/O Libraries for storing, managing and archiving large complex scientific and other data • Tools and utilities • Open source, free for any use (U of I license) • Well maintained and supported • From HDF group, NCSA Univ of Illinois • http://hdf.ncsa.uiuc.edu -4- HDF
  • 5. HDF in a nutshell - features • General – simple and flexible data model • Flexible – store data of diverse origins, sizes, types – supports complex data structures and types • Portable – available for many operating systems and machines • Scalable – works in high end computing environments – accommodates date of any size or multiplicity • Efficient – fast access, including parallel i/o – Stores big data efficiently -5- HDF
  • 6. HDF in a nutshell - users • Apps in industry, academia, government – More than 200 distinct applications • Large user base – E.g. NASA estimates 1.6 million users • Underlying format for community standards – E.g. HDF-EOS, SAF, CGNS, NPOESS, NeXus -6- HDF
  • 7. Example of HDF file: mixing and grouping objects Text : This file was create as a part of… see http://hdf.ncsa.uiuc.edu foo a 3-D array z 1GB lat | lon | temp ----|-----|----12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 c b palette x _foo_y Table Raster image Raster image -7- 2-D array HDF
  • 9. HDF “Architecture” Tools & Applications HDF5 Applications Programming Interface Low level Interface Utilities and applications for managing, manipulating, viewing, & analyzing data. HDF I/O library – High-level, object-specific APIs. – Low-level API for I/O to files, etc. File or other data source File -9- HDF
  • 10. User’s controlled I/O and “storage” • Data pipeline – – – – HDF I/O Library HDF “File” Data transformation Compression Encryption Storage layout • Virtual file options – – – – – – - 10 - Stdio (normal file) Split file MPI-IO & other parallel Network Memory custom HDF
  • 11. Supported languages and compilers • C • Wrappers: – C++ – Fortran90 – Java • Vendors’ compilers (SUN, IBM, HP, etc.) • PGI and Absoft (Fortran) • GNU C (e.g. gcc 3.3.2) - 11 - HDF
  • 12. Supported Machines and OS • • • • • • • Solaris 2.7, 2.8 (32/64-bit) IRIX6.5 IRIX64-6.5 HPUX 11.00 AIX 5.1 (32/64-bit modes) OSF1 FreeBSD Linux (SuSe, RH8, RH9) including 64-bit - 12 - • • • • • • • Altix (SGI Linux) IA-32 and IA-64 Windows 2000, XP MAC OS X Crays (T3E, SV1, T90IEEE) DOE National Labs machines Linux Clusters HDF
  • 13. Architecture in context Tools & Applications C C++ F90 Java HDF5 Applications Programming Interface Low level Interface IA32 SGI Wintel Cray File Linux RH IRIX32 XP SV1 Serial - 13 - Parallel HDF
  • 14. Architecture in context Tools & Applications HDF-EOS SAF CGNS C C++ F90 Java HDF5 Applications Programming Interface Low level Interface IA32 SGI Wintel Cray Linux RH IRIX32 XP Serial - 15 - SV1 Parallel File HDF
  • 15. The testing challenge Machines × operating systems × compilers × languages × serial and parallel × compression options × configuration options × virtual file options × backward compatibility = a large number - 16 - HDF
  • 16. “Diversity makes our code better…” Todd Smith, Geospiza - 17 - HDF
  • 18. HDF Statistics • HDF Group – 15 FTE + 3-5 students – $2.1million annual budget • HDF5 source code distribution – 2073 files – 917,186 Lines of code • HDF Project – HDF5, HDF4, H4toH5, H5Lite, Java – 3,000,000 lines of code (estimate) - 19 - HDF
  • 19. HDF5 source distribution by categories (lines of code) Library Tests 13% Tools Tools tests 4% 4% Configure 15% Docs 33% Libraries 30% Examples 1% - 20 - HDF
  • 20. HDF5 staff investment Comm. with users 2% Meetings, etc. 9% Code dev. 33% Peer-to-peer comm. 12% User's support 14% Test writing 7% Docs, design, consult 14% - 21 - Porting/release testing 9% HDF
  • 21. How do we measure success? - 22 - HDF
  • 22. How do we measure success? • • • • • • • Mission Goals and objectives Strong and continuing relationships with users High quality software Strong committed development team Great working environment Adequate funding - 23 - HDF
  • 23. Mission, goals and objectives • Mission – To develop, promote, deploy, and support open and free technologies that facilitate scientific data exchange, access, analysis, archiving and discovery • Goals (examples) – Innovate and evolve the technologies in concert with a changing world of technologies – Maintain a high level of quality and reliability – Collaborate and build communities – Build a team - 24 - HDF
  • 24. Mission, goals and objectives • Objectives - how we reach the goal • Example: – Goal • Maintain a high level of quality and reliability – Objectives • Improve testing • Implement a program to insure excellent software engineering practices • Develop and execute a plan to meet quality/reliability standards - 25 - HDF
  • 25. Users • • • • Number of users Happy users  Unhappy users  Users achieve their goals by using HDF technologies • Users coming back with new needs • Financial support from users - 26 - HDF
  • 26. Software • Technology that addresses users’ needs and demands (current and future) – E.g. big files, parallel access, multiple objects • Usability – – – – Number and types of applications Appropriate APIs and data models Available tools Interoperability with other software • E.g. IDL, MatLab, Mathematica - 27 - HDF
  • 27. Software • Stability – Can data be shared? – Can software run on needed platforms • Sustainability – Can read data written 15 years ago on obsolete platform – Is software available in 15 years? • Acceptability – De facto standard • Open standard for exchange of remote-sensed data • Over 3,000,000,000,000,000 bytes stored in HDF and HDF-EOS - 28 - HDF
  • 28. How can we achieve success? - 29 - HDF
  • 29. How can we achieve success? • Maintain strong, responsible, and continuing relationships with users • An approach to needs identification, software design, and software implementation based on sound principles of software engineering • Effective technical processes for developing, testing, integrating and maintaining software • Business and social processes based on sound group management principles - 30 - HDF
  • 30. Stages of software development at HDF • • • • • Getting started Creating an implementation approach Implementation and maintenance Relations with users and sponsors Group practices - 31 - HDF
  • 31. Getting started • • • • Discover a need Identify a sponsor Clarify the need, its role, and its importance Enter task into the project plan – – – – Make initial estimate of time and resources for the task Give it a priority Identify task’s lead Identify a person who will work on the task - 32 - HDF
  • 32. Creating implementation approach • Write up a needs/approach RFC (Request For Comment) – Actively solicit feedback from developers/sponsors – Revise until satisfied • Write up a design/approach RFC – Get feedback from developers/sponsors – Revise until satisfied • Revise project plan according to RFC results • Archive RFC - 33 - HDF
  • 33. Implementation and maintenance • Identify validation plan (need improvement) • Implement – Library or tool – Tests – Documentation • Ask sponsor and friendly users for feedback • Review results and repeat appropriate steps above as needed • Clean up (documentation, Web, etc.) and announce • Support (debug, fix, add more tests, advertise) - 34 - HDF
  • 34. Relations with users and sponsors • Who are our sponsors? – Organizations and communities with institutional and financial commitment to HDF • NCSA, NASA, DOE ASCI, Boeing, … – Agencies supporting R&D • NCSA, NASA, DOE, NSF, … – Collaborators who make in-kind contributions • Cactus, PyTables, NeXUS, CGNS … – HDF group members - 35 - HDF
  • 35. Relations with users and sponsors • Each task is associated with a sponsor • Each task has a priority, which should be confirmed with sponsor • Each task falls into one of these categories – Research – R&D (research, possibly integrate into product) – Development • Technology infusion • Library or tools enhancement - 36 - HDF
  • 37. Group practices - technical • Source code management: CVS • Bug tracking: Bugzilla – Bugs entered by support staff and developers – Prioritized by staff – Easy bugs fixed “on the fly” - 38 - HDF
  • 38. Group practices - technical • The testing challenge • Code testing – – – – – Testing before code check-in Regression testing Remote testing Different configurations testing Backward compatibility testing - 39 - HDF
  • 39. Daily test report From: HDF group system admin <hdfadmin@ncsa.uiuc.edu> To: hdf5lib@ncsa.uiuc.edu Subject: HDF5_Daily_Tests_FAILED!!! *** HDF5 Tests on 041022 *** ============================= Watchers List ============================= HDF5 Daily test features/platforms watchers and procedure --------------------------------------------------------Procedure: The watcher will investigate and report the cause of failure by 11am. The developer who checked in the error code may report so by then too. The watcher or the developer should get the failure fixed and report it by 3pm. - 41 - HDF
  • 40. Group practices - technical • Release levels – Development release – Official release – Past releases - 42 - HDF
  • 41. Group practices - technical • • • • • • Coding standards Maintaining platform-independence Maintaining time-independence Rules for changing APIs Documentation Rapid prototyping - 43 - HDF
  • 42. Group practices – business and social HDF Project HDF Project • Staff breakdown – – – – – – – User support Documentation QA Software development Testing Team leadership System administration Support, Support, doc, QA, doc, QA, maintenance maintenance - 44 - Basic library Basic library development development Tools and Tools and Java Java Parallel I/O, Parallel I/O, Grid, Grid, big machines big machines • Team lead for each team • Most staff in two or more teams • Staff relationships – Complement each other – Overlap each other – Keep each other honest HDF
  • 43. Group practices – business and social • Accountability of everyone to the whole process • Help desk • Approaches to carrying out tasks – Paying attention to technical proposals – Weekly HDf5 developer’s meetings – HDF seminars • Management and administration – Performance reviews with emphasis on goals, development – Critical to success – That’s another talk - 45 - HDF
  • 45. Strengths • User support • Staff – High quality, diverse staff with good morale – Staff commitment and enthusiasm • Ability to address all aspects of product development – Emphasis on quality control – Fast bug fixing and frequent releases – Ability to focus on a single product over a long term • High level of support from sponsors • Project’s visibility through NCSA, NASA, DOE, users - 47 - HDF
  • 46. Weaknesses • Software development team – – Library expertise still concentrated among too few developers Team communication is challenging • Processes – – – – – Release/maintenance take too much time and resources Configuration and porting are a huge time sink We don’t do enough prototyping Hard to keep up with new technologies Parallel I/O hard to support - 48 - HDF
  • 47. More weaknesses & challenges • Usability – – – – Software too hard to use for casual users Insufficient documentation Insufficient tools for high level users Insufficient interoperability with common tools and formats • Marketing – Marketing effort is inadequate – Need to connect better with users and potential users • Viable long-term support - 49 - HDF
  • 48. Most immediate needs • • • • Configuration and build Testing and prototyping Marketing Reporting – Performance reports – General reports to users – HDF book • Sustainable business model - 50 - HDF
  • 49. Thank you - 51 - HDF

Editor's Notes

  1. &lt;number&gt;
  2. Format and software for scientific data. HDF5 is a different format from earlier versions of HDF, as is the library. Stores images, multidimensional arrays, tables, etc. That is, you can construct all of these different kinds structures and store them in HDF5. You can also mix and match them in HDF5 files according to your needs. Emphasis on storage and I/O efficiency Both the library and the format are designed to address this. Free and commercial software support As far as HDF5 goes, this is just a goal now. There is commercial support for HDF4, but little if any for HDF5 at this time. We are working with vendors to change this. Emphasis on standards You can store data in HDF5 in a variety of ways, so we try to work with users to encourage them to organize HDF5 files in standard ways. Users from many engineering and scientific fields
  3. Like HDF4, HDF5 has a grouping structure. The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all.
  4. It is useful to think about HDF software in terms of layers. At the bottom layer is the HDF5 file or other data source. Above that are two layers corresponding the the HDF library. First there is a low level interface that concentrates on basic I/O: opening and closing files, reading and writing bytes, seeking, etc. HDF5 provides a public API at this level so that people can write their own drivers for reading and writing to places other than those already provided with the library. Those that are already provided include UNIX stdio, and MPI-IO. Then comes the high-level, object -specific interface. This is the API that most people who develop HDF5 applications use. This is where you create a dataset or group, read and write datasets and subsets, etc. At the top are applications, or perhaps APIs used by applications. Examples of the latter are the HDF-EOS API that supports NASA’s EOSDIS datatypes, and the DSL API that supports the ASCI data models.