SlideShare a Scribd company logo
SESIP-0720-JL
Using Apache Drill and Unidata
TDS* for NASA HDF-EOS on S3
ESIP 2020 Summer / HDF-EOS Workshop XXIII
This work was supported by NASA/GSFC under Raytheon Technologies contract number NNG15HZ39C.
This document does not contain technology or Technical Data controlled under either the U.S. International Traffic
in Arms Regulations or the U.S. Export Administration Regulations.
H. Joe Lee
EED-2 / The HDF Group / Software Engineer
hyoklee@hdfgroup.org
*THREDDS Data Server
SESIP-0720-JL
2
• HDF4
– HDF-EOS2
• HDF5
– HDF-EOS5
– netCDF-4
Hierarchical Data Format-Earth Observing
System
SESIP-0720-JL
3
HDF-EOS on S3
•HDF4?
• No elegant solution other than GDAL*
• Not so elegant: h4mapwriter / s3fs
•HDF5?
• Many OK solutions exist
• HDF5 VFD**/ HSDS*** / GDAL / Hyrax
DMR****++ / etc.
• But “Just OK is not OK.”
*Geospatial Data Abstraction Library
** Virtual File Driver
***Highly Scalable Data Service
****Dataset Metadata Response
SESIP-0720-JL
4
Apache Drill
• Supports Variety of storage - Amazon S3,
Azure Blob Storage, Google Cloud
Storage, Swift, NAS and local files.
• Data agility - query the raw data in-situ.
• Table - in-memory shredded columnar
representation for complex data
• BI Tools and REST API
SESIP-0720-JL
5
Apache Drill 1.18 (beta)
• Collection of HDF5 files on S3
• ANSI SQL
• Geoprocessing?
SESIP-0720-JL
6
THREDDS Data Server 5.0
(beta)
It supports S3!
• both HDF4 and HDF5
• NcML?
• Catalog for collection of files?
SESIP-0720-JL
7
netCDF-Java
• This is core library.
• THREDDS / Panoply / IDV shares this.
• toolsUI is a generic GUI tool based on
netCDF-Java.
• Like GDAL, if netCDF-Java works with
S3, the rest are trivial.
SESIP-0720-JL
8
toolsUI - HDF4 on S3
SESIP-0720-JL
9
Benchmark: TerraFusion on S3
• Test file size: 24G
• Format: HDF5/netCDF-4 CF
• One orbit data from 5 sensors on Terra
• S3 access from EC2 (m4.xlarge)
SESIP-0720-JL
10
Apache Drill fails after 7 minute.
read on
s3a://basicterrafusion/TERRA_BF_L1B_O535
57_20100112014327_F000_V001.h5:
com.amazonaws.AbortedException:
org.apache.drill.common.exceptions.UserE
xception$Builder.build(UserException.jav
a:657)
org.apache.drill.exec.store.hdf5.HDF5Bat
chReader.convertInputStreamToFile(HDF5Ba
tchReader.java:356)
SESIP-0720-JL
11
TDS responds within 2 minutes.
Float32
/MOPITT/granule_20100112/Geolocation/Latitude[ntr
ack_1 = 46][nstare = 29][npixels = 4];
Float32
/MOPITT/granule_20100112/Geolocation/Longitude[nt
rack_1 = 36][nstare = 29][npixels = 4];
Float64
/MOPITT/granule_20100112/Geolocation/Time[ntrack_
1 = 436];
} s3-
test/TERRA_BF_L1B_O53557_20100112014327_F000_V001
.h5;
real 1m47.065s
SESIP-0720-JL
12
h5ls responds in 2.5 minutes.
• HDF5 Virtual File Driver (VFD)
• --enable-ros3-vfd configuration option
It takes 2X longer (5 minutes) outside AWS.
SESIP-0720-JL
13
Role-based Access Control
(RBAC)
Drill THREDDS H5 VFD
Always Yes No
• RBAC eliminates access key and token.
• Access with s3://bucket/key.h5 (no https://)
• S3 buckets and objects can be private.
SESIP-0720-JL
14
THREDDS 5.0 is a Clear Winner
Based on our Benchmark Results.
• Performance is good.
• It supports HDF4.
• RBAC is supported.
• Existing netcdf-Java / OPeNDAP based
software works seamlessly.
SESIP-0720-JL
15
However, Use Case Still Matters
• SQL user? Try Drill after sanitization.
• Good for Collection of HDF5 files with 2D Grid.
• Use AWS Lambda (w/ CUMULUS) for sanitization.
• Java user? Try netCDF-Java.
• Python user? Try GDAL vsis3/ driver for HDF5 and viscurl/
for HDF4.
• OPeNDAP user? Try THREDDS 5.0 beta.
• HDF5 C/Fortran user? Try HDF5 VFD.
There are many (read-only) solutions for HDF-EOS on S3:
SESIP-0720-JL
16
This work was supported by NASA/GSFC under
Raytheon Technologies contract number
NNG15HZ39C.
in partnership with

More Related Content

What's hot

Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
The HDF-EOS Tools and Information Center
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
The HDF-EOS Tools and Information Center
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
The HDF-EOS Tools and Information Center
 
Caching and Buffering in HDF5
Caching and Buffering in HDF5Caching and Buffering in HDF5
Caching and Buffering in HDF5
The HDF-EOS Tools and Information Center
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
The HDF-EOS Tools and Information Center
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
The HDF-EOS Tools and Information Center
 
Easy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAPEasy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAP
The HDF-EOS Tools and Information Center
 
HDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF ConverterHDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF Converter
The HDF-EOS Tools and Information Center
 
HDF Update 2016
HDF Update 2016HDF Update 2016
NetCDF and HDF5
NetCDF and HDF5NetCDF and HDF5
HDF Product Designer
HDF Product DesignerHDF Product Designer
HDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve InteroperabilityHDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve Interoperability
The HDF-EOS Tools and Information Center
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
The HDF-EOS Tools and Information Center
 
Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4
The HDF-EOS Tools and Information Center
 

What's hot (20)

Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
Caching and Buffering in HDF5
Caching and Buffering in HDF5Caching and Buffering in HDF5
Caching and Buffering in HDF5
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
Parallel HDF5 Developments
Parallel HDF5 DevelopmentsParallel HDF5 Developments
Parallel HDF5 Developments
 
Easy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAPEasy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAP
 
HDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF ConverterHDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF Converter
 
HDF Update 2016
HDF Update 2016HDF Update 2016
HDF Update 2016
 
NetCDF and HDF5
NetCDF and HDF5NetCDF and HDF5
NetCDF and HDF5
 
HDF Product Designer
HDF Product DesignerHDF Product Designer
HDF Product Designer
 
Status of HDF-EOS, Related Software and Tools
 Status of HDF-EOS, Related Software and Tools Status of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and Tools
 
HDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve InteroperabilityHDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve Interoperability
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
 
HDF & HDF-EOS Data & Support at NSIDC
HDF & HDF-EOS Data & Support at NSIDCHDF & HDF-EOS Data & Support at NSIDC
HDF & HDF-EOS Data & Support at NSIDC
 
HDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSSHDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSS
 
Easy Remote Access Via OPeNDAP
Easy Remote Access Via OPeNDAPEasy Remote Access Via OPeNDAP
Easy Remote Access Via OPeNDAP
 
Using IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS DataUsing IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS Data
 
Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4
 

Similar to Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3

Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
The HDF-EOS Tools and Information Center
 
HDF5 Tools Updates
HDF5 Tools UpdatesHDF5 Tools Updates
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
The HDF-EOS Tools and Information Center
 
HDF5 iRODS
HDF5 iRODSHDF5 iRODS
HDF5 Advanced Topics
HDF5 Advanced TopicsHDF5 Advanced Topics
HDF5 Tools Update
HDF5 Tools UpdateHDF5 Tools Update
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
DataWorks Summit
 
HDF OPeNDAP project update and demo
HDF OPeNDAP project update and demoHDF OPeNDAP project update and demo
HDF OPeNDAP project update and demo
The HDF-EOS Tools and Information Center
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
HDF4 and HDF5 Performance Preliminary Results
HDF4 and HDF5 Performance Preliminary ResultsHDF4 and HDF5 Performance Preliminary Results
HDF4 and HDF5 Performance Preliminary Results
The HDF-EOS Tools and Information Center
 
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
The HDF-EOS Tools and Information Center
 
ESDIS Status (2002)
ESDIS Status (2002)ESDIS Status (2002)
Status of HDF-EOS, Related Software, and Tools
Status of HDF-EOS, Related Software, and ToolsStatus of HDF-EOS, Related Software, and Tools
Status of HDF-EOS, Related Software, and Tools
The HDF-EOS Tools and Information Center
 
HDF Status Update
HDF Status UpdateHDF Status Update
HDF Tools Tutorial
HDF Tools TutorialHDF Tools Tutorial
HDFView and HDF Java Products
HDFView and HDF Java ProductsHDFView and HDF Java Products
HDFView and HDF Java Products
The HDF-EOS Tools and Information Center
 
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 dataUsage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
The HDF-EOS Tools and Information Center
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
The HDF-EOS Tools and Information Center
 

Similar to Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3 (20)

Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF5 Tools Updates
HDF5 Tools UpdatesHDF5 Tools Updates
HDF5 Tools Updates
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
HDF5 iRODS
HDF5 iRODSHDF5 iRODS
HDF5 iRODS
 
HDF5 Advanced Topics
HDF5 Advanced TopicsHDF5 Advanced Topics
HDF5 Advanced Topics
 
Tools to improve the usability of NASA HDF Data
Tools to improve the usability of NASA HDF DataTools to improve the usability of NASA HDF Data
Tools to improve the usability of NASA HDF Data
 
HDF5 Tools Update
HDF5 Tools UpdateHDF5 Tools Update
HDF5 Tools Update
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
HDF OPeNDAP project update and demo
HDF OPeNDAP project update and demoHDF OPeNDAP project update and demo
HDF OPeNDAP project update and demo
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF4 and HDF5 Performance Preliminary Results
HDF4 and HDF5 Performance Preliminary ResultsHDF4 and HDF5 Performance Preliminary Results
HDF4 and HDF5 Performance Preliminary Results
 
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
 
ESDIS Status (2002)
ESDIS Status (2002)ESDIS Status (2002)
ESDIS Status (2002)
 
Bridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data ProductsBridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data Products
 
Status of HDF-EOS, Related Software, and Tools
Status of HDF-EOS, Related Software, and ToolsStatus of HDF-EOS, Related Software, and Tools
Status of HDF-EOS, Related Software, and Tools
 
HDF Status Update
HDF Status UpdateHDF Status Update
HDF Status Update
 
HDF Tools Tutorial
HDF Tools TutorialHDF Tools Tutorial
HDF Tools Tutorial
 
HDFView and HDF Java Products
HDFView and HDF Java ProductsHDFView and HDF Java Products
HDFView and HDF Java Products
 
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 dataUsage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 

More from The HDF-EOS Tools and Information Center

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
The HDF-EOS Tools and Information Center
 
The State of HDF
The State of HDFThe State of HDF
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
The HDF-EOS Tools and Information Center
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
The HDF-EOS Tools and Information Center
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
The HDF-EOS Tools and Information Center
 
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF for the Cloud
HDF for the CloudHDF for the Cloud
S3 VFD
S3 VFDS3 VFD
HDF Data in the Cloud
HDF Data in the CloudHDF Data in the Cloud
HDF Kita Lab: JupyterLab + HDF Service
HDF Kita Lab: JupyterLab + HDF ServiceHDF Kita Lab: JupyterLab + HDF Service
HDF Kita Lab: JupyterLab + HDF Service
The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (12)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
NASA Terra Data Fusion
 
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at Scale
 
HDF for the Cloud
HDF for the CloudHDF for the Cloud
HDF for the Cloud
 
S3 VFD
S3 VFDS3 VFD
S3 VFD
 
HDF Data in the Cloud
HDF Data in the CloudHDF Data in the Cloud
HDF Data in the Cloud
 
HDF Kita Lab: JupyterLab + HDF Service
HDF Kita Lab: JupyterLab + HDF ServiceHDF Kita Lab: JupyterLab + HDF Service
HDF Kita Lab: JupyterLab + HDF Service
 

Recently uploaded

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 

Recently uploaded (20)

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 

Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3

  • 1. SESIP-0720-JL Using Apache Drill and Unidata TDS* for NASA HDF-EOS on S3 ESIP 2020 Summer / HDF-EOS Workshop XXIII This work was supported by NASA/GSFC under Raytheon Technologies contract number NNG15HZ39C. This document does not contain technology or Technical Data controlled under either the U.S. International Traffic in Arms Regulations or the U.S. Export Administration Regulations. H. Joe Lee EED-2 / The HDF Group / Software Engineer hyoklee@hdfgroup.org *THREDDS Data Server
  • 2. SESIP-0720-JL 2 • HDF4 – HDF-EOS2 • HDF5 – HDF-EOS5 – netCDF-4 Hierarchical Data Format-Earth Observing System
  • 3. SESIP-0720-JL 3 HDF-EOS on S3 •HDF4? • No elegant solution other than GDAL* • Not so elegant: h4mapwriter / s3fs •HDF5? • Many OK solutions exist • HDF5 VFD**/ HSDS*** / GDAL / Hyrax DMR****++ / etc. • But “Just OK is not OK.” *Geospatial Data Abstraction Library ** Virtual File Driver ***Highly Scalable Data Service ****Dataset Metadata Response
  • 4. SESIP-0720-JL 4 Apache Drill • Supports Variety of storage - Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. • Data agility - query the raw data in-situ. • Table - in-memory shredded columnar representation for complex data • BI Tools and REST API
  • 5. SESIP-0720-JL 5 Apache Drill 1.18 (beta) • Collection of HDF5 files on S3 • ANSI SQL • Geoprocessing?
  • 6. SESIP-0720-JL 6 THREDDS Data Server 5.0 (beta) It supports S3! • both HDF4 and HDF5 • NcML? • Catalog for collection of files?
  • 7. SESIP-0720-JL 7 netCDF-Java • This is core library. • THREDDS / Panoply / IDV shares this. • toolsUI is a generic GUI tool based on netCDF-Java. • Like GDAL, if netCDF-Java works with S3, the rest are trivial.
  • 9. SESIP-0720-JL 9 Benchmark: TerraFusion on S3 • Test file size: 24G • Format: HDF5/netCDF-4 CF • One orbit data from 5 sensors on Terra • S3 access from EC2 (m4.xlarge)
  • 10. SESIP-0720-JL 10 Apache Drill fails after 7 minute. read on s3a://basicterrafusion/TERRA_BF_L1B_O535 57_20100112014327_F000_V001.h5: com.amazonaws.AbortedException: org.apache.drill.common.exceptions.UserE xception$Builder.build(UserException.jav a:657) org.apache.drill.exec.store.hdf5.HDF5Bat chReader.convertInputStreamToFile(HDF5Ba tchReader.java:356)
  • 11. SESIP-0720-JL 11 TDS responds within 2 minutes. Float32 /MOPITT/granule_20100112/Geolocation/Latitude[ntr ack_1 = 46][nstare = 29][npixels = 4]; Float32 /MOPITT/granule_20100112/Geolocation/Longitude[nt rack_1 = 36][nstare = 29][npixels = 4]; Float64 /MOPITT/granule_20100112/Geolocation/Time[ntrack_ 1 = 436]; } s3- test/TERRA_BF_L1B_O53557_20100112014327_F000_V001 .h5; real 1m47.065s
  • 12. SESIP-0720-JL 12 h5ls responds in 2.5 minutes. • HDF5 Virtual File Driver (VFD) • --enable-ros3-vfd configuration option It takes 2X longer (5 minutes) outside AWS.
  • 13. SESIP-0720-JL 13 Role-based Access Control (RBAC) Drill THREDDS H5 VFD Always Yes No • RBAC eliminates access key and token. • Access with s3://bucket/key.h5 (no https://) • S3 buckets and objects can be private.
  • 14. SESIP-0720-JL 14 THREDDS 5.0 is a Clear Winner Based on our Benchmark Results. • Performance is good. • It supports HDF4. • RBAC is supported. • Existing netcdf-Java / OPeNDAP based software works seamlessly.
  • 15. SESIP-0720-JL 15 However, Use Case Still Matters • SQL user? Try Drill after sanitization. • Good for Collection of HDF5 files with 2D Grid. • Use AWS Lambda (w/ CUMULUS) for sanitization. • Java user? Try netCDF-Java. • Python user? Try GDAL vsis3/ driver for HDF5 and viscurl/ for HDF4. • OPeNDAP user? Try THREDDS 5.0 beta. • HDF5 C/Fortran user? Try HDF5 VFD. There are many (read-only) solutions for HDF-EOS on S3:
  • 16. SESIP-0720-JL 16 This work was supported by NASA/GSFC under Raytheon Technologies contract number NNG15HZ39C. in partnership with