The OpenTopography facility was funded by the National Science Foundation (NSF) in 2009 to provide efficient online access to Earth science-oriented high-resolution lidar topography data, online processing tools, and derivative products.
Unleash Your Potential - Namagunga Girls Coding Club
OpenTopography - Scalable Services for Geosciences Data
1. OpenTopography - Scalable Services for Geosciences Data
www.opentopography.org
Canopy Height (ft)
@opentopography
info@opentopography.org
DOI / OGC
CSW
DATA USAGE ANALYTICS
HPC & CLOUD INTEGRATION
CYBERINFRASTRUCTURE
Spatiotemporal variations in data access illustrate that certain regions of a dataset can be "cold", while others are "hot". OT collects analytics which include user data selections through time.
We have developed tools that allow us to mine and visualize this information, and are exploring how to utilize these analytics to develop storage optimizations based on data value and cost.
For the hottest data, fast (I/O) and scaleable access are required. In these cases, data stored on SSD and accessible through HPC systems such as Gordon are desirable. For "cooler" data
which sees more infrequent access, cheaper (and slower) storage systems such as the cloud can be used to lower data facility operating costs. A tiered storage system offers the potential
to dynamically manage data storage and associated system performance based on real analytical information about usage.
In the case of topographic data, events such as earthquakes, floods, landslides, and other geophysical events are likely to cause an increase in demand for data that intersect the spatial
extent of the event. External feeds (e.g., USGS NEIC) could be monitored to proactively move data into high performance storage in anticipation of increased demand.
Activity based
data ranking and
tiered cloud &
HPC integrated
storage
1. On-demand job execution on Gordon (XSEDE HPC Resource)
OT received a Microsoft Azure for Research Award (allocated $40k in Azure Resources) to
explore integration of cloud resources into our existing infrastructure.
A prototype OT image on Azure VM depot allows us (or others) to quickly deploy the OT
software stack on an appropriately sized resource.
Data can be pulled from OT’s storage on the SDSC Cloud for processing in Azure.
USE CASE: TauDEM hydrologic analysis of DEMs
TauDEM is an open source hydrologic analysis toolkit developed by David Tarboton
(USU).
As part of OT’s CyberGIS collaboration, we implemented TauDEM (MPI) on Gordon. We
dynamically scale the number of cores allocated to the job, as a function of the size of
the input DEM.
2. Integration of cloud based on-demand geospatial processing services
OT has a dedicated Gordon I/O Node XSEDE allocation with 48 GB Memory/4.8TB
Flash memory + 16 Compute nodes (256 cores) with 64GB memory + QDR InfiniBand
Interconnect.
Performance tests using a DEM generation use case showed 20x job speed-ups
when four concurrent jobs are executed on Gordon vs OT's standard compute cluster.
Test case: 208 million LIDAR returns gridded to 20cm grid.
http://www.engineering.usu.edu/dtarb/
The OpenTopography cyberinfrastructure employs a multi-tier service-oriented architecture (SOA) that is highly scalable, permitting upgrades to the infrastructure tier and
corresponding algorithms without the need to update the APIs and clients. The SOA has enabled the integration of compute intensive algorithms, like the TauDEM hydrology
suite running on the Gordon XSEDE resource, as a service made available to the OpenTopography user community. The pluggable services architecture allows researchers
to integrate their algorithms into the OpenTopography processing workflow. OpenTopography also interoperates with other CI systems like the NSF-funded CyberGIS
viewshed analysis application, NASA SSARA, etc.
OpenTopography implements a catalog services for the web (CSW),
using the ISO 19115 metadata standard that can be federated with
other environments, e.g., NSF Earthcube, Thomson Reuters Web of
Science, etc. All datasets served via OpenTopography are assigned a
DOI that not only provides a persistent identifier for the dataset.
Cover image of Science featured a 0.25 m digital elevation model
(DEM) and hillshade of offset channels along the San Andreas
Fault in the Carrizo Plain produced by OpenTopography.
The OpenTopography facility was funded by the National Science Foundation (NSF) in 2009 to provide efficient online access to Earth science-oriented high-resolution lidar topography data, online processing tools, and derivative products.
Currently, OpenTopography serves 183 high resolution LIDAR (Light Detection and Ranging) point cloud datasets with over 820 billion returns covering approximately 179,153 sq. km. of important geologic
features such as the San Andreas Fault, Yellowstone, Tetons, Yosemite National Parks, etc., to a growing user community. Information collected from over 42,250 custom point cloud jobs that have processed
upwards of 1.4 trillion LIDAR returns, and over 19,800 custom raster data jobs, is being analyzed to prioritize future development based on usage insights as well as identifying novel approaches to managing
the exponential growth in data.
Collaboration Opportunities
Analysis of user behavior and data usage for optimizing
data location in deep storage/memory hierarchies
Pluggable services framework - Tracking software
provenance / framework security
New data types - Full waveform
LIDAR, Hyperspectral Imagery data
New processing algorithms - change detection, difference analysis
and time series analysis. Algorithm optimizations/parallelization
| | |