Sky Arrays - ArrayDB in action for Sky View Factor Computation

ArrayDB in action for Sky View
Factor computation
Andrea Pagani – KNMI DataLab
Luca Trani – KNMI R&D Seismology and Acoustics
Array Databases for Research Communities Workshop –
EUDAT Conference
Porto 22nd January 2018
SkyArrays

Agenda
Use case introduction
Why arrayDB
Old vs. New
Results analysis
Comparison
Lesson learned
Suggestions
Open issues/questions
Problems encountered
Conclusions

Use case description
Computation of the sky view factor at high resolution (1m grid) for the
entire Netherlands
Parentheses:
“The sky view factor (SVF) denotes the ratio between radiation received by a planar surface and that from
the entire hemispheric radiating environment and is calculated as the fraction of sky visible from the ground
up. SVF is a dimensionless value that ranges from 0 to 1. A SVF of 1 means that the sky is completely
visible, for example, in a flat terrain. When a location has buildings and trees, it will cause the SVF to
decrease proportionally.”

Sky view factor usage
4
Heath stress
Road temperature
Fog formation

Why ArrayDB
Initial solution:
● 1.5TB point cloud data of height of objects for the Netherlands in raw LAZ format
● 40000+ LAZ files (i.e., tiles) with geo metadata in the filename
● Computation to be performed in R (nice library for the purpose: horizon)
Issues:
● Gridding very memory intensive
● Keep tracking of geo locations for tiles by filename
● Logic to merge/subset tiles done ad hoc
● High memory requirement to process multiple tiles
Computation:
● Test on high parallel machine (24CPUs/128GB ram)
● Distributed on Amazon AWS EC2 (80CPUs)

Computation, the old way
6
LAZ files
master
slaves
Slave tasks:
• LAS load
• Rasterization
• Tile neighbors merge
• Call SVF computation
• Write result grid
grid files
Master tasks:
• Divide work
• Coordinate slaves

New workflow
Consult with an expert on populating the data cube
Pre-processing phase: off-line rasterization of LAZ files to geoTiff
Initial ingestion phase
Computation nodes querying the system
Web coverage service to retrieve the data
Query with subsetting for a 2km x 2km region
Compute Sky View Factor in R

New setup
Resources provider by Dutch high performance
computing provider SurfSara via EUDAT project grant
1 machine with Rasdaman (4 cores, 16GB) arrayDB
installed
19 machines (2 cores each) for computation
1 machine (2 cores) acting as super peer for
computation and cluster coordination
New dataset: height of objects in geotiff raster
available via a government institution with resolution
at 0.5m

Computation, the new way
9
master
slaves
Slave tasks:
• Call WCS service
• Call SVF computation
• Write result grid
geoTiff files
Master tasks:
• Divide work
• Coordinate slaves
Rasdaman
server
Rasdaman tasks:
• Expose web service
• Interact with underlying DB
• Reply to queries
Net mounted
storage
Future plan:
write back SVF
in the arrayDB

Results analysis
Computation ongoing 7129
of the 20615
Up to 410 minutes to
compute one 2km x 2km
tile at 0.5m resolution

Comparison
The new solution require an initial investment of understanding and installing the arrayDB technology
But it makes the interaction more standardized, easier, and less error prone
Cost benefit analysis: depends on use, need to share, and target user group
Old New
File based interaction Web-based query
Assemble and subsetting by ad hoc logic Initial ingestion and query based subsetting
Understanding georeferencing in files (names) Ingestion recipe
Raster from raw data on the fly Pre-processed raster
Data access via distributed file system Data transferred via http response to query
Custom coding in R Installation of arrayDB

Lessons learned
+
 ArrayDB helps make your life easier
 Standardized access to data
 Less error prone for subsetting/assembling
 Flexible access to relevant data partition
-
 Input data have to be perfect, not always in real life,
unfortunately
 Still careful in the DB query when many processes ask
(distributed installation might solve this)
Condition: interaction with engineer is essential to share knowledge for a working solution

Suggestions
- Data preparation tools (e.g., to handle non perfect datasets)
- Documentation and support for users/engineers
- Collaborative tools to facilitate user/arrayDB engineer interaction
- Promote standard interfaces/APIs for e.g. avoiding technology lock-
in, fostering decoupling and software portability

Open issues/questions
● Reliability and scalability of the community version
● RRasdaman installation is cumbersome (from the typical R user
perspective)
● Query multi-layer
● Support for point data/gridding
● Query logic for heterogeneous data

● Configuration RASDAMAN
rasmgr.conf change ip → localhost
Users configuration: petascope, rasadmin, rasuser
● Wcst_import tool
○ Default mosaic_map recipe problem importing files with different tile dimensions
and slightly imperfect overlapping
○ Alternative custom recipe → complex
● java.lang.RuntimeException: Deadline Exceeded
Catched an exception:
org.odmg.TransactionNotInProgressException: Could not execute OQL-Query:
no open transaction

2018-01-05 14:23:12 ERROR::packer-ubuntu-16-1263--error writing
file /home/ubuntu/temp/58900_61100--564150_566350-temp.tiff error
Cannot create a RasterLayer object from this file. (file does not exist)
coverage:
http://10.100.253.10:8080/rasdaman/ows?service=WCS&version=2.0.
1&request=GetCoverage&coverageId=HeightCoverage&subset=X(5890
0,61100)&subset=Y(564150,566350)&format=image/tiff
The raster is somehow corrupted after the request

2018-01-15 22:30:13 ERROR::packer-ubuntu-16-1194--response status
server 500 coverage
http://10.100.253.10:8080/rasdaman/ows?service=WCS&version=2.0.1&req
uest=GetCoverage&coverageId=HeightCoverage&subset=X(34900,37100)&s
ubset=Y(392150,394350)&format=image/tiff not available
2018-01-15 22:30:13 ERROR::packer-ubuntu-16-1207--response status
server 500 coverage
http://10.100.253.10:8080/rasdaman/ows?service=WCS&version=2.0.1&req
uest=GetCoverage&coverageId=HeightCoverage&subset=X(46900,49100)&s
ubset=Y(526150,528350)&format=image/tiff not available
Server unavailable, coverage done again later is OK
Workaround: repeat request after a timeout set in R code
17

Conclusions
Overall a positive experience
Other users can benefit from the ingested dataset (and the ingested
results, future work)
DataCube (platform standard-based) rather than ArrayDB (technology)
A hosted data cube as-a-service and ready for use would be of great
help for scientific communities (with many datasets pre-ingested)
Relief the burden of installations, setup, configuration etc
Shared added value services to save investments and effort across
communities
Accounting model to be discussed

Sky Arrays - ArrayDB in action for Sky View Factor Computation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sky Arrays - ArrayDB in action for Sky View Factor Computation

Similar to Sky Arrays - ArrayDB in action for Sky View Factor Computation (20)

More from EUDAT

More from EUDAT (20)

Recently uploaded

Recently uploaded (20)

Sky Arrays - ArrayDB in action for Sky View Factor Computation

Editor's Notes