SC5 Hangout2 pilot 1 description

To demonstrate what can be achieved
through the BDE platform in:
Managing large volumes of climate /
weather numerical data
Ingestion / exporting of data
Analytics potential
Data lineage
BASIC AIM

Downscaling
 Downscaling of climatic and / or meteorological
data:
o Essential first step for any further analysis,
assessment or processing in climate and related
domains

BDE SC5 Pilot I - Architecture
Cassandra
Metadata &
data lineage
Cassandra
Metadata &
data lineage
Hive/Hadoop
Raw data &
analytics
Hive/Hadoop
Raw data &
analytics
WRF Model
Institutional
resource
connectors
WRF Model
Institutional
resource
connectors
NetCDF
Interfaces
and
visualisation
NetCDF
Interfaces
and
visualisation
SC5
Pilot
SC5
Pilot

Current status
 Operations
o Data ingestion (NetCDF files)
 Both manually, for bootstrapping, as well as after downscaling
o Data export (NetCDF files)
 Selection of variables / time slices
o Start and monitor WRF-based downscaling on institutional
resources
 If requested results already exist, they are retrieved
 If not, WRF is started
o Maintain data lineage records on BDE platform
 Monitoring and further analysis
 Subset of W3C PROV, http://www.w3.org/TR/prov-overview

Current status
o Support basic analytics on BDE
 Hive queries
o Console-based UI
 Python/Jupyter interface for demonstration

Sample analytics
 Climate-change indices / analytics (indicative)
o Number of summer days, frost days
o Tropical nights
o Monthly minimum value of daily maximum temperature
o Precipitation-based statistics
o Etc.
 Analytics for other applications
o Comfort indices (temperature – humidity)
o Risk for forest fires (wind speed – temperature – humidity)
o Atmospheric pollution (wind speed – vertical gradient of
temperature – heat fluxes )
o Etc.

Further pilot development
 Investigation regarding transparent climate
NetCDF transformation tailored to the WRF
model, using the BDE integrator (esp. Spark)
 Testing and further development regarding
data lineage and downscaling
parameterisation and execution

Expected added value
 Scalability and ease in managing large data
sets
 Efficient use of institutional resources in
performing downscaling computations
o Avoiding calculating products when not needed
 Data lineage
o either for existing data in the database, or for data
that are not present anymore
o reproducibility

Hands-on
 The jupyter notebook is accessible at:
o https://143.233.226.108
 (please bypass the warnings)

SC5 Hangout2 pilot 1 description

More Related Content

What's hot

Viewers also liked

Similar to SC5 Hangout2 pilot 1 description

More from BigData_Europe

Recently uploaded

SC5 Hangout2 pilot 1 description