The merits of HDF5 and HDF-EOS are considered for storing a variety of types of geospatial data. This work involves mapping many different geospatial data formats to HDF5 and/or HDF-EOS, converting sample files, developing visualization tools for examining these files, and performance analyses. Supported by the National Archives and Records Administration and the Illinois State Geological Survey, this use of HDF5 and HDF-EOS is seen to have potential value for the long-term preservation of geospatial data, as well as providing efficient storage and access in active geospatial data repositories.
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Using HDF5 to Store Geospatial Vector Data
1. Using HDF5 for Geospatial Vector Data
Question: How suitable is a general purpose
format like HDF5 for storing and accessing
geospatial feature data?
2. Using HDF5 for Geospatial Vector Data
Feature (vector) data example:
*ESRI: Environmental Systems Research Institute, Inc
3. Using HDF5 for Geospatial Vector Data
Test case: ESRI* Shapefiles
•Store geometry and attribute information for spatial features as
shapes with vector coordinates.
•Support point, line, and area features.
•Widely used file format for geospatial feature data.
HDF5 example (1 file)
Shapefile format (3 files)
.shp
.shx
.dbx
*ESRI: Environmental Systems Research Institute, Inc
4. Using HDF5 for Geospatial Vector Data
Shapefiles tested
A
Shapefile
size
(M bytes)
(.shp +
.shx)
0.001
B
Shapefile
Total #
shapes
Total #
vertices
Max.
# vert for
a shape
1
66
66
0.01
44
191
12
C
0.2
219
9,397
1,632
D
3.0
2,253
179,106
38,725
E
12.3
11,576
721,123
500
F
18.8
8,877 1,140,460
500
*ESRI: Environmental Systems Research Institute, Inc
5. Using HDF5 for Geospatial Vector Data
y x
metadata
2
y x y x
metadata
x y x y x
metadata
3
y x
metadata
4
y
• Ragged array – 1-D array of
variable-length data types
x y x
metadata
5
…
• 2-D array – one shape per
row, multiple arrays when shape
sizes vary.
…
0
2
3
6
7
Distribution showing
# vertices/shape:
shapefile - F
(vertices sorted by ascending order)
1000
number of vertices
(maximum - 500)
• Index – array of offsets to data
values in single linear array.
Similar to Shapefiles.
x y x y x y x y x y x y x y x y x y
metadata
1
metadata
2
metadata
3
metadata
4
metadata
5
100
10
1
shape # (8877 shapes)
6. Results: Comparing Shapefile and HDF5
File size
•Overhead for variable-length
structures (ragged array) is high.
•HDF5 linear array with index is
comparable to shapefile.
•Compression
•HDF5 linear array with index
saves up to 40% vs. Shapefile.
•HDF5 2-D arrays comparable to
Shapefile when compression
used. Without compression,
HDF5 files much larger.
Access time
•Variable length and compound
types significantly slows
access in HDF5.
•Can be improved considerably
by turning off internal free lists.
•When compound and variablelength types not used, HDF5
access time is comparable to
Shapefile access.