Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
FITSIO, HDF4, NetCDF, PDB
and HDF5 Performance
Some Benchmarks Results
Elena Pourmal
Science Data Processing Workshop
Febr...
Benchmark Environment
(software)
• Software
–
–
–
–
–
–

HDF4 r1.5
HDF5 1.4.3
NetCDF 3.5
FITSIO version 2.2
PDB version 8_...
Benchmark Environment
(hardware)
• 2 - 550 Mhz Pentium III Xeon (Linux 2.2.18smp)
– 1G memory

• NCSA O2K (IRIX64-6.5)
– 1...
Benchmarks
• Creating and writing contiguous dataset; sizes
vary from 2MB to 512MB
• Reading contiguous dataset; sizes var...
Some remarks
• “dataset” describes array stored in the FITS, HDF4,
HDF5, PDB, NetCDF and UNIX binary files, i.e.
“dataset”...
Creating and Writing Contiguous Dataset
• In this test we created a file and stored two
dimensional array of short unsigne...
Speed in MB/sec

Creating and Writing Dataset on IRIX
(total time)
100
FITSIO
HDF4
NetCDF
PDB
HDF5
System

80
60
40
20
0
1...
Creating and Writing Dataset on IRIX
(write time only)
Speed in MB/sec

100
FITSIO
HDF4
NetCDF
PDB
HDF5
System

80
60
40
2...
Speed ratios for writing dataset on IRIX
LIB
(Library name)

HDF5/LIB
(total time ratios)

HDF5/LIB
(write time ratios)

S...
Creating and Writing Dataset on LINUX
(total time)
Speed in MB/sec

FITSIO
140
120
100
80
60
40
20
0

HDF4
NetCDF
PDB
HDF5...
Speed in MB/sec

Creating and Writing Dataset on Linux
(write tim e only)
140
120
100
80
60
40
20
0

FITSIO

HDF4

NetCDF
...
Speed ratios for writing dataset on Linux
LIB
(Library name)

HDF5/LIB
(total time ratios)

HDF5/LIB
(write time ratios)

...
Reading Contiguous Dataset
• In this test we created two dimensional array of
short unsigned integers than we read it back...
Speed in MB/sec

Reading Contiguous Dataset on IRIX
(total time)
140
120
100
80
60
40
20
0

FITSIO
HDF4
NetCDF
PDB
HDF5
32...
Speed in MB/sec

Reading Contiguous Dataset on IRIX
(read time only)
140
120
100
80
60
40
20
0

FITSIO
HDF4
NetCDF
PDB
HDF...
Speed ratios for reading dataset on IRIX
LIB
(Library name)

HDF5/LIB
(total time ratios)

HDF5/LIB
(read time ratios)

Sy...
Speed in MB/sec

Reading Contiguous Dataset on Linux
(total time)
180
160
140
120
100
80
60
40
20
0

FIT SIO

HDF4

NetCDF...
Reading Contiguous Dataset on Linux
(read time only)
Speed in MB/sec

200

FITSIO

HDF4

150

NetCDF

100
PDB

50
0
16

32...
Speed ratios for reading dataset on Linux
LIB
(Library name)

HDF5/LIB
(total time ratios)

HDF5/LIB
(read time ratios)

S...
Reading Contiguous Hyperslab of the Dataset
• In this test we created two dimensional array of
short unsigned integers and...
Speed in MB/sec

Reading Contiguous Hyperslab on IRIX
(total time)
100
FITSIO
HDF4
NetCDF
PDB
HDF5

80
60
40
20
0
16

32

...
Soeed in MB/sec

Reading Contiguous Hyperslab on IRIX
(read time only)
100
FITSIO
HDF4
NetCDF
PDB
HDF5

80
60
40
20
0
16

...
Speed ratios for reading contiguous hyperslab
on IRIX
LIB
(Library name)

HDF5/LIB
(total time ratios)

HDF5/LIB
(read tim...
Reading Every Second Element in the
Hyperslab
• In this test we created 256 MB two dimensional
array of short unsigned int...
Speed in MB/sec

Reading Every Second Element of the
Hyperslab on IRIX (total time)
6
5
4
3
2
1
0

FITSIO
HDF4
NetCDF
PDB
...
Speed in MB/sec

Reading every Second Element on IRIX
(total time)
0.0125
0.012
0.0115

HDF4
PDB

0.011
0.0105
0.01
0.25

...
Speed in MB/sec

Reading Every Second Element of the
Hyperslab on IRIX (read time only)
6
5
4
3
2
1
0

FITSIO
NetCDF
HDF5
...
Speed ratios for reading every second element
of contiguous hyperslab on IRIX
LIB
(Library name)

HDF5/LIB
(total time rat...
Creating and Writing Multiple Datasets
• In this test we created up to 1000 1MB two
dimensional datasets of short unsigned...
Creating and Writing Datasets on IRIX

25
FITSIO
HDF4
HDF5
NetCDF

20
15
10
5

Numb er of datasets in the file

10
00

90
...
Reading Dataset from the Files

Time in seconds

3.5
3
2.5

FITSIO
HDF4
HDF5
NetCDF

2
1.5
1
0.5
0
100

200

300

400

500...
Reading a dataset from the file on IRIX

0.04
0.03

NetCDF
HDF5

0.02
0.01

Number of datasets

00
29

00
25

00
21

00
17...
Conclusions
• HDF5 is 1.6 - 8 times faster when performs native
write/read
• HDF5 needs some tuning when datatype
conversi...
Upcoming SlideShare
Loading in …5
×

FITSIO, HDF4, NetCDF, PDB and HDF5 Performance - Some Benchmark Results

564 views

Published on

Source: http://hdfeos.org/workshops/ws05/presentations/Pourmal/05-2c-HDF5_Performance.ppt

Published in: Technology
  • Be the first to comment

  • Be the first to like this

FITSIO, HDF4, NetCDF, PDB and HDF5 Performance - Some Benchmark Results

  1. 1. FITSIO, HDF4, NetCDF, PDB and HDF5 Performance Some Benchmarks Results Elena Pourmal Science Data Processing Workshop February 27, 2002
  2. 2. Benchmark Environment (software) • Software – – – – – – HDF4 r1.5 HDF5 1.4.3 NetCDF 3.5 FITSIO version 2.2 PDB version 8_7_01 ‘System” benchmark uses open, write, read and close UNIX functions. • each measurement was taken 10 times, best times were collected
  3. 3. Benchmark Environment (hardware) • 2 - 550 Mhz Pentium III Xeon (Linux 2.2.18smp) – 1G memory • NCSA O2K (IRIX64-6.5) – 195Mhz MIPS R10000 – 14GB memory – Peak performance 390 MFLOPS
  4. 4. Benchmarks • Creating and writing contiguous dataset; sizes vary from 2MB to 512MB • Reading contiguous dataset; sizes vary from 2MB to 256MB • Reading contiguous hyperslab; sizes vary from 1MB to 64MB • Reading every second element of the hyperslab; sizes of selections vary from 0.25MB to 16MB • Creating and writing up to 1000 1MB datasets; reading back the dataset created last
  5. 5. Some remarks • “dataset” describes array stored in the FITS, HDF4, HDF5, PDB, NetCDF and UNIX binary files, i.e. “dataset” means – “primary array” and “extension” for FITSIO – “variable” for NetCDF – “SDS or scientific data set” for HDF4 – “HDF5 dataset” – “PDB variable” – raw data stored in UNIX binary file • gettimofday function was used to measure time • Speed calculated as data buffer size over time
  6. 6. Creating and Writing Contiguous Dataset • In this test we created a file and stored two dimensional array of short unsigned integers; size of array varied from 2MB and up to 512MB • We measured – Total time to • create a file • create a dataset • write a dataset • close the dataset and the file – Time to write dataset only
  7. 7. Speed in MB/sec Creating and Writing Dataset on IRIX (total time) 100 FITSIO HDF4 NetCDF PDB HDF5 System 80 60 40 20 0 16 32 64 128 Data buffer size in MB 256
  8. 8. Creating and Writing Dataset on IRIX (write time only) Speed in MB/sec 100 FITSIO HDF4 NetCDF PDB HDF5 System 80 60 40 20 0 16 32 64 128 Data buffer size in MB 256
  9. 9. Speed ratios for writing dataset on IRIX LIB (Library name) HDF5/LIB (total time ratios) HDF5/LIB (write time ratios) System 80% - 100 % 89% -100 % FITSIO 1.8 – 2.07 2.0 – 2.5 HDF4 1.5 – 2.5 1.5 – 1.9 NetCDF 1.2 – 1.7 1.5 – 1.7 PDB 1.3 – 1.6 1.4 – 1.9
  10. 10. Creating and Writing Dataset on LINUX (total time) Speed in MB/sec FITSIO 140 120 100 80 60 40 20 0 HDF4 NetCDF PDB HDF5 with datatype conversion HDF5 16 32 64 128 Data buffer size in MB 256 System
  11. 11. Speed in MB/sec Creating and Writing Dataset on Linux (write tim e only) 140 120 100 80 60 40 20 0 FITSIO HDF4 NetCDF PDB HDF5 with datatype conversio HDF5 16 32 64 128 Data buffer size in MB 256 SYSTEM
  12. 12. Speed ratios for writing dataset on Linux LIB (Library name) HDF5/LIB (total time ratios) HDF5/LIB (write time ratios) System 94% - 100 % 98% -100 % FITSIO 3.5 – 3.6 3.7 – 3.8 HDF4 4.7 4.8 – 5.1 NetCDF 2.0 – 2.2 2.1 – 2.2 PDB 1.6 1.7
  13. 13. Reading Contiguous Dataset • In this test we created two dimensional array of short unsigned integers than we read it back; size of array varied from 2MB and up to 512MB • We measured – Total time to • open a file • open a dataset • read a dataset • close the dataset and the file – Time to read dataset only
  14. 14. Speed in MB/sec Reading Contiguous Dataset on IRIX (total time) 140 120 100 80 60 40 20 0 FITSIO HDF4 NetCDF PDB HDF5 32 64 128 256 Data buffer size in MB 512 System
  15. 15. Speed in MB/sec Reading Contiguous Dataset on IRIX (read time only) 140 120 100 80 60 40 20 0 FITSIO HDF4 NetCDF PDB HDF5 System 32 64 128 256 Data buffer size in MB 512
  16. 16. Speed ratios for reading dataset on IRIX LIB (Library name) HDF5/LIB (total time ratios) HDF5/LIB (read time ratios) System 74% - 100 % 91% -100 % FITSIO 1.9 – 2.7 1.8 – 2.6 HDF4 1.8 –2.5 1.8 –2.5 NetCDF 1.5 – 2.2 1.7 –2.7 PDB 1.9 – 3.1 1.9 – 3.7
  17. 17. Speed in MB/sec Reading Contiguous Dataset on Linux (total time) 180 160 140 120 100 80 60 40 20 0 FIT SIO HDF4 NetCDF PDB HDF5 with datatype conversion HDF5 16 32 64 128 Data buffer size in MB 256 Sys tem
  18. 18. Reading Contiguous Dataset on Linux (read time only) Speed in MB/sec 200 FITSIO HDF4 150 NetCDF 100 PDB 50 0 16 32 64 128 Data buffer in MB 256 HDF5 with datatype conversion HDF5 System
  19. 19. Speed ratios for reading dataset on Linux LIB (Library name) HDF5/LIB (total time ratios) HDF5/LIB (read time ratios) System 97% - 100 % 100 % FITSIO 4.0 – 4.5 4.0 – 4.5 HDF4 5.2 – 5.5 5.5 –5.7 NetCDF 1.8 – 1.9 1.9 PDB 2.6 – 2.8 2.7
  20. 20. Reading Contiguous Hyperslab of the Dataset • In this test we created two dimensional array of short unsigned integers and than read contiguous hyperslab of the dataset; size of the dataset was up 256 MB and size of the hyperslab varied from 1MB up to 64 MB • We measured – Total time to open a file, dataset, select and read hyperslab, close the dataset and the file – Time to read hyperslab only
  21. 21. Speed in MB/sec Reading Contiguous Hyperslab on IRIX (total time) 100 FITSIO HDF4 NetCDF PDB HDF5 80 60 40 20 0 16 32 64 Hyperslab size in MB 128
  22. 22. Soeed in MB/sec Reading Contiguous Hyperslab on IRIX (read time only) 100 FITSIO HDF4 NetCDF PDB HDF5 80 60 40 20 0 16 32 64 Hyperslab size in MB 128
  23. 23. Speed ratios for reading contiguous hyperslab on IRIX LIB (Library name) HDF5/LIB (total time ratios) HDF5/LIB (read time ratios) FITSIO 1.7 – 2.1 1.8 HDF4 6.4 – 7.4 6.4 – 8. NetCDF 0.8 0.8 – 0.9 PDB 7. - 8. 7.6 – 9.5
  24. 24. Reading Every Second Element in the Hyperslab • In this test we created 256 MB two dimensional array of short unsigned integers; then we read read back every second element of the selected hyperslab • We measured – Total time to open a file and dataset, select and read every second element of the hyperslab, close the file and dataset – Time to read selection only
  25. 25. Speed in MB/sec Reading Every Second Element of the Hyperslab on IRIX (total time) 6 5 4 3 2 1 0 FITSIO HDF4 NetCDF PDB HDF5 0.25 0.5 1 2 4 Data buffer size in MB 8 16
  26. 26. Speed in MB/sec Reading every Second Element on IRIX (total time) 0.0125 0.012 0.0115 HDF4 PDB 0.011 0.0105 0.01 0.25 0.5 Data buffer size in MB
  27. 27. Speed in MB/sec Reading Every Second Element of the Hyperslab on IRIX (read time only) 6 5 4 3 2 1 0 FITSIO NetCDF HDF5 1 2 4 8 Data buffer size in MB 16
  28. 28. Speed ratios for reading every second element of contiguous hyperslab on IRIX LIB (Library name) HDF5/LIB (total time ratios) HDF5/LIB (read time ratios) FITSIO 0.3 0.3 HDF4 130 -136 130 - 136 NetCDF 3.4 3.4 PDB 147 147
  29. 29. Creating and Writing Multiple Datasets • In this test we created up to 1000 1MB two dimensional datasets of short unsigned integers; then we read the last created dataset • We measured – Time to • create a file • create and write N datasets • close all datasets and the file – Time to open the file, read N-th dataset and close the file
  30. 30. Creating and Writing Datasets on IRIX 25 FITSIO HDF4 HDF5 NetCDF 20 15 10 5 Numb er of datasets in the file 10 00 90 0 80 0 70 0 60 0 50 0 40 0 30 0 20 0 0 10 0 Time in seconds 30
  31. 31. Reading Dataset from the Files Time in seconds 3.5 3 2.5 FITSIO HDF4 HDF5 NetCDF 2 1.5 1 0.5 0 100 200 300 400 500 600 700 Nu mb er of datasets in the file 800 900
  32. 32. Reading a dataset from the file on IRIX 0.04 0.03 NetCDF HDF5 0.02 0.01 Number of datasets 00 29 00 25 00 21 00 17 00 13 0 90 0 50 0 0 10 Time in second 0.05
  33. 33. Conclusions • HDF5 is 1.6 - 8 times faster when performs native write/read • HDF5 needs some tuning when datatype conversion is used • When contiguous subsetting is used, HDF5 performs several times better than FITSIO, HDF4 and PDB and achieves about 80% of NetCDF performance • HDF5 is an order of magnitude faster in accessing datasets within the file with many objects

×