SlideShare a Scribd company logo
1 of 6
Using HDF5 for Geospatial Vector Data

Question: How suitable is a general purpose
format like HDF5 for storing and accessing
geospatial feature data?
Using HDF5 for Geospatial Vector Data
Feature (vector) data example:

*ESRI: Environmental Systems Research Institute, Inc
Using HDF5 for Geospatial Vector Data
Test case: ESRI* Shapefiles
•Store geometry and attribute information for spatial features as
shapes with vector coordinates.
•Support point, line, and area features.
•Widely used file format for geospatial feature data.
HDF5 example (1 file)
Shapefile format (3 files)

.shp
.shx

.dbx
*ESRI: Environmental Systems Research Institute, Inc
Using HDF5 for Geospatial Vector Data
Shapefiles tested

A

Shapefile
size
(M bytes)
(.shp +
.shx)
0.001

B

Shapefile

Total #
shapes

Total #
vertices

Max.
# vert for
a shape

1

66

66

0.01

44

191

12

C

0.2

219

9,397

1,632

D

3.0

2,253

179,106

38,725

E

12.3

11,576

721,123

500

F

18.8

8,877 1,140,460

500

*ESRI: Environmental Systems Research Institute, Inc
Using HDF5 for Geospatial Vector Data
y x
metadata
2
y x y x
metadata
x y x y x
metadata
3
y x
metadata
4
y

• Ragged array – 1-D array of
variable-length data types

x y x
metadata
5

…

• 2-D array – one shape per
row, multiple arrays when shape
sizes vary.

…

0
2
3
6
7

Distribution showing
# vertices/shape:
shapefile - F
(vertices sorted by ascending order)
1000
number of vertices
(maximum - 500)

• Index – array of offsets to data
values in single linear array.
Similar to Shapefiles.

x y x y x y x y x y x y x y x y x y
metadata
1
metadata
2
metadata
3
metadata
4
metadata
5

100

10

1
shape # (8877 shapes)
Results: Comparing Shapefile and HDF5
File size
•Overhead for variable-length
structures (ragged array) is high.
•HDF5 linear array with index is
comparable to shapefile.
•Compression
•HDF5 linear array with index
saves up to 40% vs. Shapefile.
•HDF5 2-D arrays comparable to
Shapefile when compression
used. Without compression,
HDF5 files much larger.

Access time
•Variable length and compound
types significantly slows
access in HDF5.
•Can be improved considerably
by turning off internal free lists.
•When compound and variablelength types not used, HDF5
access time is comparable to
Shapefile access.

More Related Content

More from The HDF-EOS Tools and Information Center

More from The HDF-EOS Tools and Information Center (20)

Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 
Parallel Computing with HDF Server
Parallel Computing with HDF ServerParallel Computing with HDF Server
Parallel Computing with HDF Server
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
HDF Status Update
HDF Status UpdateHDF Status Update
HDF Status Update
 
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
NASA Terra Data Fusion
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Using HDF5 to Store Geospatial Vector Data

  • 1. Using HDF5 for Geospatial Vector Data Question: How suitable is a general purpose format like HDF5 for storing and accessing geospatial feature data?
  • 2. Using HDF5 for Geospatial Vector Data Feature (vector) data example: *ESRI: Environmental Systems Research Institute, Inc
  • 3. Using HDF5 for Geospatial Vector Data Test case: ESRI* Shapefiles •Store geometry and attribute information for spatial features as shapes with vector coordinates. •Support point, line, and area features. •Widely used file format for geospatial feature data. HDF5 example (1 file) Shapefile format (3 files) .shp .shx .dbx *ESRI: Environmental Systems Research Institute, Inc
  • 4. Using HDF5 for Geospatial Vector Data Shapefiles tested A Shapefile size (M bytes) (.shp + .shx) 0.001 B Shapefile Total # shapes Total # vertices Max. # vert for a shape 1 66 66 0.01 44 191 12 C 0.2 219 9,397 1,632 D 3.0 2,253 179,106 38,725 E 12.3 11,576 721,123 500 F 18.8 8,877 1,140,460 500 *ESRI: Environmental Systems Research Institute, Inc
  • 5. Using HDF5 for Geospatial Vector Data y x metadata 2 y x y x metadata x y x y x metadata 3 y x metadata 4 y • Ragged array – 1-D array of variable-length data types x y x metadata 5 … • 2-D array – one shape per row, multiple arrays when shape sizes vary. … 0 2 3 6 7 Distribution showing # vertices/shape: shapefile - F (vertices sorted by ascending order) 1000 number of vertices (maximum - 500) • Index – array of offsets to data values in single linear array. Similar to Shapefiles. x y x y x y x y x y x y x y x y x y metadata 1 metadata 2 metadata 3 metadata 4 metadata 5 100 10 1 shape # (8877 shapes)
  • 6. Results: Comparing Shapefile and HDF5 File size •Overhead for variable-length structures (ragged array) is high. •HDF5 linear array with index is comparable to shapefile. •Compression •HDF5 linear array with index saves up to 40% vs. Shapefile. •HDF5 2-D arrays comparable to Shapefile when compression used. Without compression, HDF5 files much larger. Access time •Variable length and compound types significantly slows access in HDF5. •Can be improved considerably by turning off internal free lists. •When compound and variablelength types not used, HDF5 access time is comparable to Shapefile access.