HDF Software Process - Lessons Learned & Success Factors

HDF Software Process
Lessons Learned & Success Factors
Mike Folk, Elena Pourmal , Bob McGrath
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
NOBUGS 2004
HDF-EOS Workshop VIII
-1-

HDF

Outline
•
•
•
•
•
•
•

What is HDF? and Who is HDF?
HDF “Architecture”
Some statistics
How do we measure success?
How can we achieve success?
Group practices
Summing up – strengths, weaknesses, needs
-2-

HDF

What is HDF?
Who is HDF?

-3-

HDF

HDF in a nutshell – what it is
• File format and I/O Libraries for storing,
managing and archiving large complex
scientific and other data
• Tools and utilities
• Open source, free for any use (U of I license)
• Well maintained and supported

• From HDF group, NCSA Univ of Illinois
• http://hdf.ncsa.uiuc.edu
-4-

HDF

HDF in a nutshell - features
• General

– simple and flexible data model

• Flexible

– store data of diverse origins, sizes, types
– supports complex data structures and types

• Portable

– available for many operating systems and machines

• Scalable

– works in high end computing environments
– accommodates date of any size or multiplicity

• Efficient

– fast access, including parallel i/o
– Stores big data efficiently

-5-

HDF

HDF in a nutshell - users
• Apps in industry, academia, government
– More than 200 distinct applications

• Large user base
– E.g. NASA estimates 1.6 million users

• Underlying format for community standards
– E.g. HDF-EOS, SAF, CGNS, NPOESS, NeXus

-6-

HDF

Example of HDF file: mixing and grouping
objects
Text : This file was create as a part of…
see http://hdf.ncsa.uiuc.edu
foo

a
3-D array

z

1GB

lat | lon | temp
----|-----|----12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6

c

b
palette

x

_foo_y

Table

Raster image
Raster image

-7-

2-D array

HDF


-8-

HDF

Tools & Applications
HDF5 Applications
Programming Interface
Low level Interface

Utilities and applications for
managing, manipulating,
viewing, & analyzing data.

HDF I/O library

– High-level, object-specific APIs.
– Low-level API for I/O to files, etc.

File or other data source

File
-9-

HDF

User’s controlled I/O and “storage”
• Data pipeline
–
–
–
–

HDF I/O Library

HDF “File”

Data transformation
Compression
Encryption
Storage layout

• Virtual file options
–
–
–
–
–
–

- 10 -

Stdio (normal file)
Split file
MPI-IO & other parallel
Network
Memory
custom

HDF

Supported languages and compilers
• C
• Wrappers:
– C++
– Fortran90
– Java

• Vendors’ compilers (SUN, IBM, HP, etc.)
• PGI and Absoft (Fortran)
• GNU C (e.g. gcc 3.3.2)
- 11 -

HDF

Supported Machines and OS
•
•
•
•
•
•
•

Solaris 2.7, 2.8 (32/64-bit)
IRIX6.5 IRIX64-6.5
HPUX 11.00
AIX 5.1 (32/64-bit modes)
OSF1
FreeBSD
Linux (SuSe, RH8, RH9)
including 64-bit

- 12 -

•
•
•
•
•
•
•

Altix (SGI Linux)
IA-32 and IA-64
Windows 2000, XP
MAC OS X
Crays (T3E, SV1, T90IEEE)
DOE National Labs machines
Linux Clusters

HDF

Architecture in context

C
C++
F90
Java
HDF5 Applications
Low level Interface
IA32

SGI Wintel Cray
File
Linux RH IRIX32 XP
SV1
Serial
- 13 -

Parallel

HDF

Architecture in context
HDF-EOS SAF CGNS
C

C++

F90

Java

HDF5 Applications
Low level Interface
IA32

SGI Wintel Cray

Linux RH IRIX32 XP
Serial
- 15 -

SV1

Parallel
File

HDF

The testing challenge
Machines × operating systems
× compilers × languages
× serial and parallel
× compression options
× configuration options
× virtual file options
× backward compatibility

= a large number
- 16 -

HDF

“Diversity makes our code better…”
Todd Smith, Geospiza

- 17 -

HDF

HDF Statistics
• HDF Group
– 15 FTE + 3-5 students
– $2.1million annual budget

• HDF5 source code distribution
– 2073 files
– 917,186 Lines of code

• HDF Project
– HDF5, HDF4, H4toH5, H5Lite, Java
– 3,000,000 lines of code (estimate)
- 19 -

HDF

HDF5 source distribution by categories
(lines of code)

Library
Tests
13%

Tools
Tools
tests
4%
4%

Configure
15%

Docs
33%

Libraries
30%
Examples
1%
- 20 -

HDF

HDF5 staff investment
Comm. with
users 2%

Meetings, etc.
9%

Code dev.
33%

Peer-to-peer
comm. 12%

User's support
14%
Test writing
7%

Docs, design,
consult 14%

- 21 -

Porting/release
testing 9%

HDF


- 22 -

HDF

•
•
•
•
•
•
•

Mission
Goals and objectives
Strong and continuing relationships with users
High quality software
Strong committed development team
Great working environment
Adequate funding

- 23 -

HDF

Mission, goals and objectives
• Mission
– To develop, promote, deploy, and support open and
free technologies that facilitate scientific data
exchange, access, analysis, archiving and discovery

• Goals (examples)
– Innovate and evolve the technologies in concert with a
changing world of technologies
– Maintain a high level of quality and reliability
– Collaborate and build communities
– Build a team

- 24 -

HDF

Mission, goals and objectives
• Objectives - how we reach the goal
• Example:
– Goal
• Maintain a high level of quality and reliability

– Objectives
• Improve testing
• Implement a program to insure excellent software
engineering practices
• Develop and execute a plan to meet
quality/reliability standards
- 25 -

HDF

Users
•
•
•
•

Number of users
Happy users 
Unhappy users 
Users achieve their goals by using HDF
technologies
• Users coming back with new needs
• Financial support from users
- 26 -

HDF

Software
• Technology that addresses users’ needs and
demands (current and future)
– E.g. big files, parallel access, multiple objects

• Usability
–
–
–
–

Number and types of applications
Appropriate APIs and data models
Available tools
Interoperability with other software
• E.g. IDL, MatLab, Mathematica
- 27 -

HDF

Software
• Stability
– Can data be shared?
– Can software run on needed platforms

• Sustainability
– Can read data written 15 years ago on obsolete platform
– Is software available in 15 years?

• Acceptability
– De facto standard
• Open standard for exchange of remote-sensed data
• Over 3,000,000,000,000,000 bytes stored in HDF and HDF-EOS

- 28 -

HDF


- 29 -

HDF

• Maintain strong, responsible, and continuing
relationships with users
• An approach to needs identification, software
design, and software implementation based on
sound principles of software engineering
• Effective technical processes for developing,
testing, integrating and maintaining software
• Business and social processes based on sound
group management principles

- 30 -

HDF

Stages of software development at
HDF
•
•
•
•
•

Getting started
Creating an implementation approach
Implementation and maintenance
Relations with users and sponsors
Group practices

- 31 -

HDF

Getting started
•
•
•
•

Discover a need
Identify a sponsor
Clarify the need, its role, and its importance
Enter task into the project plan
–
–
–
–

Make initial estimate of time and resources for the task
Give it a priority
Identify task’s lead
Identify a person who will work on the task

- 32 -

HDF

Creating implementation approach
• Write up a needs/approach RFC (Request For
Comment)
– Actively solicit feedback from developers/sponsors
– Revise until satisfied

• Write up a design/approach RFC
– Get feedback from developers/sponsors
– Revise until satisfied

• Revise project plan according to RFC results
• Archive RFC
- 33 -

HDF

Implementation and maintenance
• Identify validation plan (need improvement)
• Implement
– Library or tool
– Tests
– Documentation

• Ask sponsor and friendly users for feedback
• Review results and repeat appropriate steps above as
needed
• Clean up (documentation, Web, etc.) and announce
• Support (debug, fix, add more tests, advertise)

- 34 -

HDF

• Who are our sponsors?
– Organizations and communities with
institutional and financial commitment to HDF
• NCSA, NASA, DOE ASCI, Boeing, …

– Agencies supporting R&D
• NCSA, NASA, DOE, NSF, …

– Collaborators who make in-kind contributions
• Cactus, PyTables, NeXUS, CGNS …

– HDF group members
- 35 -

HDF

• Each task is associated with a sponsor
• Each task has a priority, which should be
confirmed with sponsor
• Each task falls into one of these categories
– Research
– R&D (research, possibly integrate into product)
– Development
• Technology infusion
• Library or tools enhancement
- 36 -

HDF

Group practices - technical
• Source code management: CVS
• Bug tracking: Bugzilla
– Bugs entered by support staff and developers
– Prioritized by staff
– Easy bugs fixed “on the fly”

- 38 -

HDF

• The testing challenge
• Code testing
–
–
–
–
–

Testing before code check-in
Regression testing
Remote testing
Different configurations testing
Backward compatibility testing

- 39 -

HDF

Daily test report
From: HDF group system admin <hdfadmin@ncsa.uiuc.edu>
To: hdf5lib@ncsa.uiuc.edu
Subject: HDF5_Daily_Tests_FAILED!!!
*** HDF5 Tests on 041022 ***
=============================
Watchers List
=============================
HDF5 Daily test features/platforms watchers and procedure
--------------------------------------------------------Procedure:
The watcher will investigate and report
the cause of failure by 11am.
The developer who checked in the error code
may report so by then too.
The watcher or the developer should get the
failure fixed and report it by 3pm.

- 41 -

HDF

• Release levels
– Development release
– Official release
– Past releases

- 42 -

HDF

•
•
•
•
•
•

Coding standards
Maintaining platform-independence
Maintaining time-independence
Rules for changing APIs
Documentation
Rapid prototyping

- 43 -

HDF

Group practices – business and social
HDF Project
HDF Project

• Staff breakdown
–
–
–
–
–
–
–

User support
Documentation
QA
Software development
Testing
Team leadership
System administration

Support,
Support,
doc, QA,
doc, QA,
maintenance
maintenance

- 44 -

Basic library
Basic library
development
development

Tools and
Tools and
Java
Java

Parallel I/O,
Parallel I/O,
Grid,
Grid,
big machines
big machines

• Team lead for each team
• Most staff in two or more teams
• Staff relationships
– Complement each other
– Overlap each other
– Keep each other honest

HDF

Group practices – business and social
• Accountability of everyone to the whole process
• Help desk
• Approaches to carrying out tasks
– Paying attention to technical proposals
– Weekly HDf5 developer’s meetings
– HDF seminars

• Management and administration
– Performance reviews with emphasis on goals, development
– Critical to success
– That’s another talk

- 45 -

HDF

Summing up
Strengths, weaknesses, needs

- 46 -

HDF

Strengths
• User support
• Staff
– High quality, diverse staff with good morale
– Staff commitment and enthusiasm

• Ability to address all aspects of product development
– Emphasis on quality control
– Fast bug fixing and frequent releases
– Ability to focus on a single product over a long term

• High level of support from sponsors
• Project’s visibility through NCSA, NASA, DOE, users

- 47 -

HDF

Weaknesses
•

Software development team

–
–

Library expertise still concentrated among too few
developers
Team communication is challenging

•

Processes

–
–
–
–
–

Release/maintenance take too much time and
resources
Configuration and porting are a huge time sink
We don’t do enough prototyping
Hard to keep up with new technologies
Parallel I/O hard to support

- 48 -

HDF

More weaknesses & challenges
• Usability
–
–
–
–

Software too hard to use for casual users
Insufficient documentation
Insufficient tools for high level users
Insufficient interoperability with common tools and
formats

• Marketing
– Marketing effort is inadequate
– Need to connect better with users and potential users

• Viable long-term support
- 49 -

HDF

Most immediate needs
•
•
•
•

Configuration and build
Testing and prototyping
Marketing
Reporting
– Performance reports
– General reports to users
– HDF book

• Sustainable business model
- 50 -

HDF

HDF Software Process - Lessons Learned & Success Factors

Recommended

Recommended

More Related Content

Similar to HDF Software Process - Lessons Learned & Success Factors

Similar to HDF Software Process - Lessons Learned & Success Factors (20)

More from The HDF-EOS Tools and Information Center

More from The HDF-EOS Tools and Information Center (20)

Recently uploaded

Recently uploaded (20)

HDF Software Process - Lessons Learned & Success Factors

Editor's Notes