SlideShare a Scribd company logo
© Hatfield Consultants. All Rights Reserved.
STAC, ZARR, COG, K8S and Data
Cubes: The brave new world of satellite
EO analytics in the cloud
Jason Suwala
Nov 2019
Version #
© Hatfield Consultants. All Rights Reserved. 1
Who Am I?
 UVic Engineering Grad
 Partner at Hatfield Consultants
 Director of Environmental Information
Systems
 Lots of different hats
 Digital development
 Knowledge management
 Environmental data management
 System of Systems
 CGDI
 European and Canadian Space Agency Projects
 “Bringing Science and People Together”
© Hatfield Consultants. All Rights Reserved. 2
Why are we here?
Nov 5, 2019: “Canada must become a leader in
using space data to improve our society”
– CSA President Sylvain Laporte
© Hatfield Consultants. All Rights Reserved. 3
Why are we here?
© Hatfield Consultants. All Rights Reserved. 4
NASA EOSDIS Data Growth
ESA EO Data Archive
Petabytes
0
10
20
30
40
50
60
70
80
90
100
110
2000 2003 2005 2007 2009 2011 2013 2015 2016 2018 2020 2022 2024 2026
Sentinel missions operated by ESA
Earth Explorer missions
Heritage missions
Third Party & Contributing Missions
Ref: European Space Agency, 2018
ESA’s Data Growth
© Hatfield Consultants. All Rights Reserved. 6
Timeseries Analysis over Large
Areas?
Total number of archived Landsat images acquired for Canada,
by year and sensor. (Wulder 2018)
© Hatfield Consultants. All Rights Reserved. 7
Digital Ecosystem to Monitor the Planet
› “Digital Twins”
› Towards real-time
acquisition and analysis
© Hatfield Consultants. All Rights Reserved. 8
Traditional Approaches Obsolete
› The traditional download approach is obsolete
© Hatfield Consultants. All Rights Reserved. 9
Innovation Solutions Canada
Working with the Public Health Agency of Canada to address this problem
© Hatfield Consultants. All Rights Reserved. 10
www.GEOAnalytics.ca
“Advancing Canadian Satellite Earth Observation Analytics”
© Hatfield Consultants. All Rights Reserved. 11
Brief Primer on Cloud Native
Geospatial
© Hatfield Consultants. All Rights Reserved. 12
Cloud Native Geospatial
› Simply moving a server to be hosted in
the cloud is not “cloud native”
› Cloud native:
› Horizontally scalable on commodity
hardware
› Always available
› Always current
› Virtualized resource sharing
+ Geospatial:
› Optimized file formats (COG/ZARR)
› Web-crawlable (STAC)
© Hatfield Consultants. All Rights Reserved. 13
Data goes together with compute
› Bring your algorithm to the data, not
the other way around
› Always co-locate your compute with
the data
› Above all else, minimize data
downloading
› Infrastructure options: HPC or Cloud
© Hatfield Consultants. All Rights Reserved. 14
File Formats
› “how you store your data can have an enormous effect
on performance.”
› Dr. Philip Austin, UBC
December Mosaic of the Bahamas, Image ©2017 Planet Labs,
Inc.
© Hatfield Consultants. All Rights Reserved. 15
Raster File Formats: COG
› COG = “Cloud Optimized GeoTiff”
› https://www.cogeo.org/
› “A Cloud Optimized GeoTIFF (COG) is a
regular GeoTIFF file, aimed at being
hosted on a HTTP file server, with an
internal organization that enables more
efficient workflows on the cloud. It does
this by leveraging the ability of clients
issuing ​HTTP GET range requests to ask
for just the parts of a file they need
instead of downloading the whole file.
› COG-aware software can stream just the
portion of data that it needs
› Supported by GDAL, RasterIO + Others
© Hatfield Consultants. All Rights Reserved. 16
COG versus GeoTiff
› Vincent Sarago
© Hatfield Consultants. All Rights Reserved. 17
COG versus GeoTiff
› storage size: 1.5 Gb vs 69 Mb
© Hatfield Consultants. All Rights Reserved. 18
COG versus JPEG2000
JPEG2000 COG
Size 25TB 50TB
Storage $575/month $1150/month
Data access $440 $20
Processing Time $76.81 $25.60
Cost $1091.81 $1195.60
› If you just care about storage cost JPEG2000 is your best option,
but if someone will have to pay to access/process the data, COG is
a better option
© Hatfield Consultants. All Rights Reserved. 19
Raster File Formats: NetCDF + HDF
Problems
› The most common multidimensional data format is NetCDF and
HDF
› Supercomputer simulations (like a large climate model) produce a
few petabytes of HDF files.
› Planned NASA satellite missions will produce hundreds of
petabytes a year of HDF files.
› the layout of HDF files makes them difficult to query efficiently on
cloud storage systems
› “slowdown is significant because the HDF library makes many small
4kB reads in order to gather the metadata necessary to pull out a chunk
of data. Each of those tiny reads made sense when the data was local,
but now that we’re sending out a web request each time. This means
that users can sit for minutes just to open a file.”
© Hatfield Consultants. All Rights Reserved. 20
NetCDF+HDF: store byte layout map?
› NASA proposes to use OPeNDAP Server to proxy NetCDF + HDF
files stored on S3
› The OPeNDAP server stores a map (“Byte layout map” in
illustration) of how the S3 bucket is organized, so it knows which
bytes to retrieve from the file stored in the S3 bucket based on what
the client’s application is requesting.
© Hatfield Consultants. All Rights Reserved. 21
Replace NetCDF with ZARR
› On tests run by CNES, Zarr is more than ten times faster for
reading data than NetCDF (link)
› makes large datasets easily accessible to distributed computing
› In Zarr datasets, the arrays are divided into chunks and
compressed.
› These individual chunks can be stored as files on a filesystem or as
objects in a cloud storage bucket.
› The metadata are stored in lightweight .json files.
› Zarr works well on both local filesystems and cloud-based object
stores.
› Existing NetCDF and HDF datasets can easily be converted to zarr
via xarray’s zarr functions.
› 12 June 2019: Zarr support is coming to the standard netCDF
library. (link)
© Hatfield Consultants. All Rights Reserved. 22
Operating Systems: Linux wins
› Auro: Windows costs ~ 2.75x more/hour than Linux
› GCE: Windows costs ~2x more/hour than Linux
› 2017: All Top500 ranked supercomputers run Linux
© Hatfield Consultants. All Rights Reserved. 23
Data Storage: Object Storage
› S3 = “Simple Storage Service”
› Not just on Amazon: Implemented by OpenStack Swift, MinIO,
Azure, Google Cloud, etc.
› “provides object storage through a web service interface”
› Organized using Buckets and keys
› Geographically replicated for redundancy
› Supported by GDAL, RasterIO, GeoServer
› On Linux S3 can be mounted as a user-mode file system (S3FS)
› Windows file-system access possible through rclone mount
› Auro: CAD$0.05/GB/month. AWS: USD$0.025/GB/month
(CAD$0.033)
© Hatfield Consultants. All Rights Reserved. 24
Data Storage: Object Storage
› GDAL support through network based virtual file systems
› /vsicurl/ (http/https/ftp files: random access)
› /vsicurl_streaming/ (http/https/ftp files: streaming)
› /vsis3/ (AWS S3 files: random reading)
› /vsis3_streaming/ (AWS S3 files: streaming)
› /vsigs/ (Google Cloud Storage files: random reading)
› /vsigs_streaming/ (Google Cloud Storage files: streaming)
› /vsiaz/ (Microsoft Azure Blob files: random reading)
› /vsiaz_streaming/ (Microsoft Azure Blob files: streaming)
› /vsioss/ (Alibaba Cloud OSS files: random reading)
› /vsioss_streaming/ (Alibaba Cloud OSS files: streaming)
› /vsiswift/ (OpenStack Swift Object Storage: random reading)
› /vsiswift_streaming/ (OpenStack Swift Object Storage: streaming)
› Steam drivers allow on-the-fly sequential reading without prior download of
the entire file
© Hatfield Consultants. All Rights Reserved. 25
MetaData + Searching
› OGC Existing Standards: CSW and OpenSearch
› Considerable work to implement and consume
› XML based, not JSON
› Not easily crawled by search engines
› Not RESTful
› Hard to consume
› Ideal for geospatial experts, but no one else
Source: Michael Smith’s/Harris Geospatial Dec 2018 presentation to the OGC - link
© Hatfield Consultants. All Rights Reserved. 26
MetaData + Searching: STAC
© Hatfield Consultants. All Rights Reserved. 27
MetaData + Searching: STAC
› STAC aims to define a simple universal API for geospatial data
discovery
› The core of STAC is very general and simple
› STAC appeals to non-geospatial specialists
› All metadata specific to a modality or domain is defined as an
extension. Current STAC extensions:
› Datacube
› EO
› Point cloud
› SAR
› DOI
› Working to align STAC with OGC’s “Web Feature Services version
3” (WFS v3) specification
› NASA is indexing all of its AWS data using STAC
© Hatfield Consultants. All Rights Reserved. 28
Kubernetes
› Kubernetes (K8s) is an open-source system for automating
deployment, scaling, and management of containerized
applications.
› Execution is done in parallel, on many worker nodes
› Can horizontally scale dynamically to use new compute nodes
based on metrics (such as CPU usage, HTTP requests, etc.)
© Hatfield Consultants. All Rights Reserved. 29
Kubernetes uses Docker Containers
© Hatfield Consultants. All Rights Reserved. 30
Kubernetes is a cluster manager
© Hatfield Consultants. All Rights Reserved. 31
Data Cubes
› A data cube is an “n-dimensional array”
› Latitude
› Longitude
› Time
› Data variables
› Requires Analysis Ready Data (ARD)
› Each pixel is stored as calibrated and corrected measurement
› Allows time-series analysis
© Hatfield Consultants. All Rights Reserved. 32
Data Cubes
› Non-trivial to create
and work-with
› Example
implementations:
› Xarray
› Open Data Cube
› Xcube
› Rasdaman
› Apache Spark +
GeoTrellis
© Hatfield Consultants. All Rights Reserved. 33
Conclusion
© Hatfield Consultants. All Rights Reserved. 34
Why are we here?
Nov 5, 2019: “Canada must become a leader in
using space data to improve our society”
– CSA President Sylvain Laporte
© Hatfield Consultants. All Rights Reserved. 35
Conclusion
› Bring your algorithm to the data, not the other way
around
› Let’s embrace change, together
› Ensure we don’t forget marginalized and data-poor
communities
› Canada was a leader in GIS, now we are a follower of
our peers: Europe, Australia and US
› Let’s talk about opportunities to work together to move
Canadian EO analytic capabilities forward in this new
era.
© Hatfield Consultants. All Rights Reserved. 36
www.GEOAnalytics.ca
© Hatfield Consultants. All Rights Reserved. 37
Thank You!
jsuwala@hatfieldgroup.com

More Related Content

What's hot

KW0847846
KW0847846KW0847846
Hacking google cloud run
Hacking google cloud runHacking google cloud run
Hacking google cloud run
Aviv Laufer
 
Large Hadron Collider(LHC) PPT
Large Hadron Collider(LHC) PPTLarge Hadron Collider(LHC) PPT
Large Hadron Collider(LHC) PPT
Heman Chopra
 
HAARP/Chemtrails WMD: Exposing a Spiritual, Mass Mind-control, and Planetary ...
HAARP/Chemtrails WMD: Exposing a Spiritual, Mass Mind-control, and Planetary ...HAARP/Chemtrails WMD: Exposing a Spiritual, Mass Mind-control, and Planetary ...
HAARP/Chemtrails WMD: Exposing a Spiritual, Mass Mind-control, and Planetary ...
exouniversity
 
[cb22] Tales of 5G hacking by Karsten Nohl
[cb22] Tales of 5G hacking by Karsten Nohl[cb22] Tales of 5G hacking by Karsten Nohl
[cb22] Tales of 5G hacking by Karsten Nohl
CODE BLUE
 
Henkilötietojen ja yksityisyyden suojaaminen
Henkilötietojen ja yksityisyyden suojaaminenHenkilötietojen ja yksityisyyden suojaaminen
Henkilötietojen ja yksityisyyden suojaaminen
Harto Pönkä
 
Mikä se some oikein on?
Mikä se some oikein on?Mikä se some oikein on?
Mikä se some oikein on?
Harto Pönkä
 
The big bang theory
The big bang theoryThe big bang theory
The big bang theory
Esther020
 
Tietosuoja ja sosiaalinen media
Tietosuoja ja sosiaalinen mediaTietosuoja ja sosiaalinen media
Tietosuoja ja sosiaalinen media
Harto Pönkä
 
Life On Mars
Life On MarsLife On Mars
Life On MarsFIS
 
Tietosuoja sosiaalisessa mediassa ja somen turvallinen käyttö
Tietosuoja sosiaalisessa mediassa ja somen turvallinen käyttöTietosuoja sosiaalisessa mediassa ja somen turvallinen käyttö
Tietosuoja sosiaalisessa mediassa ja somen turvallinen käyttö
Harto Pönkä
 
12-cloud-security.ppt
12-cloud-security.ppt12-cloud-security.ppt
12-cloud-security.ppt
chelsi33
 
Tekijänoikeudet ja kuvausluvat varhaiskasvatuksessa ja opetuksessa
Tekijänoikeudet ja kuvausluvat varhaiskasvatuksessa ja opetuksessaTekijänoikeudet ja kuvausluvat varhaiskasvatuksessa ja opetuksessa
Tekijänoikeudet ja kuvausluvat varhaiskasvatuksessa ja opetuksessa
Harto Pönkä
 
Teknologian ja somen trendit 2021
Teknologian ja somen trendit 2021Teknologian ja somen trendit 2021
Teknologian ja somen trendit 2021
Harto Pönkä
 
Android Forensics: Exploring Android Internals and Android Apps
Android Forensics: Exploring Android Internals and Android AppsAndroid Forensics: Exploring Android Internals and Android Apps
Android Forensics: Exploring Android Internals and Android AppsMoe Tanabian
 
Varhaiskasvatuksen julkaisuluvat ja tietosuoja somessa
Varhaiskasvatuksen julkaisuluvat ja tietosuoja somessaVarhaiskasvatuksen julkaisuluvat ja tietosuoja somessa
Varhaiskasvatuksen julkaisuluvat ja tietosuoja somessa
Harto Pönkä
 
The higgs boson
The higgs bosonThe higgs boson
The higgs boson
zaurezahmad
 
Black Hat: XML Out-Of-Band Data Retrieval
Black Hat: XML Out-Of-Band Data RetrievalBlack Hat: XML Out-Of-Band Data Retrieval
Black Hat: XML Out-Of-Band Data Retrieval
qqlan
 
Dark Matter & Dark Energy.pptx
Dark Matter & Dark Energy.pptxDark Matter & Dark Energy.pptx
Dark Matter & Dark Energy.pptx
TanushTM1
 

What's hot (20)

KW0847846
KW0847846KW0847846
KW0847846
 
Hacking google cloud run
Hacking google cloud runHacking google cloud run
Hacking google cloud run
 
Large Hadron Collider(LHC) PPT
Large Hadron Collider(LHC) PPTLarge Hadron Collider(LHC) PPT
Large Hadron Collider(LHC) PPT
 
HAARP/Chemtrails WMD: Exposing a Spiritual, Mass Mind-control, and Planetary ...
HAARP/Chemtrails WMD: Exposing a Spiritual, Mass Mind-control, and Planetary ...HAARP/Chemtrails WMD: Exposing a Spiritual, Mass Mind-control, and Planetary ...
HAARP/Chemtrails WMD: Exposing a Spiritual, Mass Mind-control, and Planetary ...
 
[cb22] Tales of 5G hacking by Karsten Nohl
[cb22] Tales of 5G hacking by Karsten Nohl[cb22] Tales of 5G hacking by Karsten Nohl
[cb22] Tales of 5G hacking by Karsten Nohl
 
Henkilötietojen ja yksityisyyden suojaaminen
Henkilötietojen ja yksityisyyden suojaaminenHenkilötietojen ja yksityisyyden suojaaminen
Henkilötietojen ja yksityisyyden suojaaminen
 
Mikä se some oikein on?
Mikä se some oikein on?Mikä se some oikein on?
Mikä se some oikein on?
 
The big bang theory
The big bang theoryThe big bang theory
The big bang theory
 
Tietosuoja ja sosiaalinen media
Tietosuoja ja sosiaalinen mediaTietosuoja ja sosiaalinen media
Tietosuoja ja sosiaalinen media
 
Life On Mars
Life On MarsLife On Mars
Life On Mars
 
Tietosuoja sosiaalisessa mediassa ja somen turvallinen käyttö
Tietosuoja sosiaalisessa mediassa ja somen turvallinen käyttöTietosuoja sosiaalisessa mediassa ja somen turvallinen käyttö
Tietosuoja sosiaalisessa mediassa ja somen turvallinen käyttö
 
12-cloud-security.ppt
12-cloud-security.ppt12-cloud-security.ppt
12-cloud-security.ppt
 
Tekijänoikeudet ja kuvausluvat varhaiskasvatuksessa ja opetuksessa
Tekijänoikeudet ja kuvausluvat varhaiskasvatuksessa ja opetuksessaTekijänoikeudet ja kuvausluvat varhaiskasvatuksessa ja opetuksessa
Tekijänoikeudet ja kuvausluvat varhaiskasvatuksessa ja opetuksessa
 
Teknologian ja somen trendit 2021
Teknologian ja somen trendit 2021Teknologian ja somen trendit 2021
Teknologian ja somen trendit 2021
 
Android Forensics: Exploring Android Internals and Android Apps
Android Forensics: Exploring Android Internals and Android AppsAndroid Forensics: Exploring Android Internals and Android Apps
Android Forensics: Exploring Android Internals and Android Apps
 
Varhaiskasvatuksen julkaisuluvat ja tietosuoja somessa
Varhaiskasvatuksen julkaisuluvat ja tietosuoja somessaVarhaiskasvatuksen julkaisuluvat ja tietosuoja somessa
Varhaiskasvatuksen julkaisuluvat ja tietosuoja somessa
 
The higgs boson
The higgs bosonThe higgs boson
The higgs boson
 
Black Hat: XML Out-Of-Band Data Retrieval
Black Hat: XML Out-Of-Band Data RetrievalBlack Hat: XML Out-Of-Band Data Retrieval
Black Hat: XML Out-Of-Band Data Retrieval
 
Dark Matter & Dark Energy.pptx
Dark Matter & Dark Energy.pptxDark Matter & Dark Energy.pptx
Dark Matter & Dark Energy.pptx
 
Sacagawea
SacagaweaSacagawea
Sacagawea
 

Similar to STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO analytics in the cloud - GeoAlberta 2019

Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
Rob Emanuele
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
Sam Ng
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
Amazon Web Services
 
Unlocking Open Data in the Cloud
Unlocking Open Data in the CloudUnlocking Open Data in the Cloud
Unlocking Open Data in the Cloud
Amazon Web Services
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Safe Software
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @Scale
Dr Hajji Hicham
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
Amazon Web Services
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloud
lohitvijayarenu
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
Omnia Safaan
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
Ashish Mrig
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
Larry Smarr
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
Igor Sfiligoi
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
WekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound AgainWekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Again
inside-BigData.com
 
Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014
aceas13tern
 

Similar to STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO analytics in the cloud - GeoAlberta 2019 (20)

Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
 
Unlocking Open Data in the Cloud
Unlocking Open Data in the CloudUnlocking Open Data in the Cloud
Unlocking Open Data in the Cloud
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @Scale
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloud
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
WekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound AgainWekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Again
 
Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 

STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO analytics in the cloud - GeoAlberta 2019

  • 1. © Hatfield Consultants. All Rights Reserved. STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO analytics in the cloud Jason Suwala Nov 2019 Version #
  • 2. © Hatfield Consultants. All Rights Reserved. 1 Who Am I?  UVic Engineering Grad  Partner at Hatfield Consultants  Director of Environmental Information Systems  Lots of different hats  Digital development  Knowledge management  Environmental data management  System of Systems  CGDI  European and Canadian Space Agency Projects  “Bringing Science and People Together”
  • 3. © Hatfield Consultants. All Rights Reserved. 2 Why are we here? Nov 5, 2019: “Canada must become a leader in using space data to improve our society” – CSA President Sylvain Laporte
  • 4. © Hatfield Consultants. All Rights Reserved. 3 Why are we here?
  • 5. © Hatfield Consultants. All Rights Reserved. 4 NASA EOSDIS Data Growth
  • 6. ESA EO Data Archive Petabytes 0 10 20 30 40 50 60 70 80 90 100 110 2000 2003 2005 2007 2009 2011 2013 2015 2016 2018 2020 2022 2024 2026 Sentinel missions operated by ESA Earth Explorer missions Heritage missions Third Party & Contributing Missions Ref: European Space Agency, 2018 ESA’s Data Growth
  • 7. © Hatfield Consultants. All Rights Reserved. 6 Timeseries Analysis over Large Areas? Total number of archived Landsat images acquired for Canada, by year and sensor. (Wulder 2018)
  • 8. © Hatfield Consultants. All Rights Reserved. 7 Digital Ecosystem to Monitor the Planet › “Digital Twins” › Towards real-time acquisition and analysis
  • 9. © Hatfield Consultants. All Rights Reserved. 8 Traditional Approaches Obsolete › The traditional download approach is obsolete
  • 10. © Hatfield Consultants. All Rights Reserved. 9 Innovation Solutions Canada Working with the Public Health Agency of Canada to address this problem
  • 11. © Hatfield Consultants. All Rights Reserved. 10 www.GEOAnalytics.ca “Advancing Canadian Satellite Earth Observation Analytics”
  • 12. © Hatfield Consultants. All Rights Reserved. 11 Brief Primer on Cloud Native Geospatial
  • 13. © Hatfield Consultants. All Rights Reserved. 12 Cloud Native Geospatial › Simply moving a server to be hosted in the cloud is not “cloud native” › Cloud native: › Horizontally scalable on commodity hardware › Always available › Always current › Virtualized resource sharing + Geospatial: › Optimized file formats (COG/ZARR) › Web-crawlable (STAC)
  • 14. © Hatfield Consultants. All Rights Reserved. 13 Data goes together with compute › Bring your algorithm to the data, not the other way around › Always co-locate your compute with the data › Above all else, minimize data downloading › Infrastructure options: HPC or Cloud
  • 15. © Hatfield Consultants. All Rights Reserved. 14 File Formats › “how you store your data can have an enormous effect on performance.” › Dr. Philip Austin, UBC December Mosaic of the Bahamas, Image ©2017 Planet Labs, Inc.
  • 16. © Hatfield Consultants. All Rights Reserved. 15 Raster File Formats: COG › COG = “Cloud Optimized GeoTiff” › https://www.cogeo.org/ › “A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing ​HTTP GET range requests to ask for just the parts of a file they need instead of downloading the whole file. › COG-aware software can stream just the portion of data that it needs › Supported by GDAL, RasterIO + Others
  • 17. © Hatfield Consultants. All Rights Reserved. 16 COG versus GeoTiff › Vincent Sarago
  • 18. © Hatfield Consultants. All Rights Reserved. 17 COG versus GeoTiff › storage size: 1.5 Gb vs 69 Mb
  • 19. © Hatfield Consultants. All Rights Reserved. 18 COG versus JPEG2000 JPEG2000 COG Size 25TB 50TB Storage $575/month $1150/month Data access $440 $20 Processing Time $76.81 $25.60 Cost $1091.81 $1195.60 › If you just care about storage cost JPEG2000 is your best option, but if someone will have to pay to access/process the data, COG is a better option
  • 20. © Hatfield Consultants. All Rights Reserved. 19 Raster File Formats: NetCDF + HDF Problems › The most common multidimensional data format is NetCDF and HDF › Supercomputer simulations (like a large climate model) produce a few petabytes of HDF files. › Planned NASA satellite missions will produce hundreds of petabytes a year of HDF files. › the layout of HDF files makes them difficult to query efficiently on cloud storage systems › “slowdown is significant because the HDF library makes many small 4kB reads in order to gather the metadata necessary to pull out a chunk of data. Each of those tiny reads made sense when the data was local, but now that we’re sending out a web request each time. This means that users can sit for minutes just to open a file.”
  • 21. © Hatfield Consultants. All Rights Reserved. 20 NetCDF+HDF: store byte layout map? › NASA proposes to use OPeNDAP Server to proxy NetCDF + HDF files stored on S3 › The OPeNDAP server stores a map (“Byte layout map” in illustration) of how the S3 bucket is organized, so it knows which bytes to retrieve from the file stored in the S3 bucket based on what the client’s application is requesting.
  • 22. © Hatfield Consultants. All Rights Reserved. 21 Replace NetCDF with ZARR › On tests run by CNES, Zarr is more than ten times faster for reading data than NetCDF (link) › makes large datasets easily accessible to distributed computing › In Zarr datasets, the arrays are divided into chunks and compressed. › These individual chunks can be stored as files on a filesystem or as objects in a cloud storage bucket. › The metadata are stored in lightweight .json files. › Zarr works well on both local filesystems and cloud-based object stores. › Existing NetCDF and HDF datasets can easily be converted to zarr via xarray’s zarr functions. › 12 June 2019: Zarr support is coming to the standard netCDF library. (link)
  • 23. © Hatfield Consultants. All Rights Reserved. 22 Operating Systems: Linux wins › Auro: Windows costs ~ 2.75x more/hour than Linux › GCE: Windows costs ~2x more/hour than Linux › 2017: All Top500 ranked supercomputers run Linux
  • 24. © Hatfield Consultants. All Rights Reserved. 23 Data Storage: Object Storage › S3 = “Simple Storage Service” › Not just on Amazon: Implemented by OpenStack Swift, MinIO, Azure, Google Cloud, etc. › “provides object storage through a web service interface” › Organized using Buckets and keys › Geographically replicated for redundancy › Supported by GDAL, RasterIO, GeoServer › On Linux S3 can be mounted as a user-mode file system (S3FS) › Windows file-system access possible through rclone mount › Auro: CAD$0.05/GB/month. AWS: USD$0.025/GB/month (CAD$0.033)
  • 25. © Hatfield Consultants. All Rights Reserved. 24 Data Storage: Object Storage › GDAL support through network based virtual file systems › /vsicurl/ (http/https/ftp files: random access) › /vsicurl_streaming/ (http/https/ftp files: streaming) › /vsis3/ (AWS S3 files: random reading) › /vsis3_streaming/ (AWS S3 files: streaming) › /vsigs/ (Google Cloud Storage files: random reading) › /vsigs_streaming/ (Google Cloud Storage files: streaming) › /vsiaz/ (Microsoft Azure Blob files: random reading) › /vsiaz_streaming/ (Microsoft Azure Blob files: streaming) › /vsioss/ (Alibaba Cloud OSS files: random reading) › /vsioss_streaming/ (Alibaba Cloud OSS files: streaming) › /vsiswift/ (OpenStack Swift Object Storage: random reading) › /vsiswift_streaming/ (OpenStack Swift Object Storage: streaming) › Steam drivers allow on-the-fly sequential reading without prior download of the entire file
  • 26. © Hatfield Consultants. All Rights Reserved. 25 MetaData + Searching › OGC Existing Standards: CSW and OpenSearch › Considerable work to implement and consume › XML based, not JSON › Not easily crawled by search engines › Not RESTful › Hard to consume › Ideal for geospatial experts, but no one else Source: Michael Smith’s/Harris Geospatial Dec 2018 presentation to the OGC - link
  • 27. © Hatfield Consultants. All Rights Reserved. 26 MetaData + Searching: STAC
  • 28. © Hatfield Consultants. All Rights Reserved. 27 MetaData + Searching: STAC › STAC aims to define a simple universal API for geospatial data discovery › The core of STAC is very general and simple › STAC appeals to non-geospatial specialists › All metadata specific to a modality or domain is defined as an extension. Current STAC extensions: › Datacube › EO › Point cloud › SAR › DOI › Working to align STAC with OGC’s “Web Feature Services version 3” (WFS v3) specification › NASA is indexing all of its AWS data using STAC
  • 29. © Hatfield Consultants. All Rights Reserved. 28 Kubernetes › Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. › Execution is done in parallel, on many worker nodes › Can horizontally scale dynamically to use new compute nodes based on metrics (such as CPU usage, HTTP requests, etc.)
  • 30. © Hatfield Consultants. All Rights Reserved. 29 Kubernetes uses Docker Containers
  • 31. © Hatfield Consultants. All Rights Reserved. 30 Kubernetes is a cluster manager
  • 32. © Hatfield Consultants. All Rights Reserved. 31 Data Cubes › A data cube is an “n-dimensional array” › Latitude › Longitude › Time › Data variables › Requires Analysis Ready Data (ARD) › Each pixel is stored as calibrated and corrected measurement › Allows time-series analysis
  • 33. © Hatfield Consultants. All Rights Reserved. 32 Data Cubes › Non-trivial to create and work-with › Example implementations: › Xarray › Open Data Cube › Xcube › Rasdaman › Apache Spark + GeoTrellis
  • 34. © Hatfield Consultants. All Rights Reserved. 33 Conclusion
  • 35. © Hatfield Consultants. All Rights Reserved. 34 Why are we here? Nov 5, 2019: “Canada must become a leader in using space data to improve our society” – CSA President Sylvain Laporte
  • 36. © Hatfield Consultants. All Rights Reserved. 35 Conclusion › Bring your algorithm to the data, not the other way around › Let’s embrace change, together › Ensure we don’t forget marginalized and data-poor communities › Canada was a leader in GIS, now we are a follower of our peers: Europe, Australia and US › Let’s talk about opportunities to work together to move Canadian EO analytic capabilities forward in this new era.
  • 37. © Hatfield Consultants. All Rights Reserved. 36 www.GEOAnalytics.ca
  • 38. © Hatfield Consultants. All Rights Reserved. 37 Thank You! jsuwala@hatfieldgroup.com

Editor's Notes

  1. https://earthdata.nasa.gov/cmr-and-esdc-in-cloud https://earthdata.nasa.gov/eosdis/cloud-evolution
  2. Landsat-8:  22,500 Landsat-8 OLI images per year, or more than 60 per day over Canada With > 430 Landsat-7 per-day,  1200 Landsat 8/7 images over Canada/day https://medium.com/@mikewulder/landsat-data-record-for-canada-an-update-38b176f49a4f
  3. https://medium.com/pangeo/step-by-step-guide-to-building-a-big-data-portal-e262af1c2977 https://medium.com/planet-stories/cng-part-5-cloud-native-geospatial-architecture-defined-193d5ffdd681
  4. https://medium.com/pangeo/step-by-step-guide-to-building-a-big-data-portal-e262af1c2977
  5. Quote: https://clouds.eos.ubc.ca/~phil/courses/parallel_python/02_xarray_zarr.html#Some-challenges-with-netcdf Image: https://medium.com/planet-stories/cng-part-7-a-vision-for-the-cloud-native-geospatial-ecosystem-7a55ae782690
  6. https://medium.com/planet-stories/cloud-native-geospatial-part-2-the-cloud-optimized-geotiff-6b3f15c696ed https://www.eclipse.org/community/eclipse_newsletter/2018/december/geotrellis.php
  7. https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f
  8. https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f
  9. https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f
  10. http://matthewrocklin.com/blog/work/2018/02/06/hdf-in-the-cloud NASA statement: https://earthdata.nasa.gov/cmr-and-esdc-in-cloud
  11. https://earthdata.nasa.gov/eosdis-data-in-the-cloud-user-requirements
  12. https://pangeo.io/data.html
  13. https://medium.com/descarteslabs-team/thunder-from-the-cloud-40-000-cores-running-in-concert-on-aws-bf1610679978
  14. Rclone mount: https://rclone.org/commands/rclone_mount/
  15. https://gdal.org/user/virtual_file_systems.html#network-based-file-systems
  16. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view
  17. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view
  18. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view
  19. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view
  20. https://towardsdatascience.com/why-you-should-care-about-docker-9622725a5cb8
  21. https://towardsdatascience.com/machine-learning-with-big-data-86bcb39f2f0b
  22. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view
  23. https://drive.google.com/file/d/1Xf6Ix6pnMVpUAFh-bVw0EmkHls_WPUS9/view