SC5 1st
Pilot Hangout
To demonstrate what can be achieved
through the BDE platform in:
Managing large volumes of climate /
weather numerical data
Ingestion / exporting of data
Analytics potential
Data lineage
BASIC AIM
Downscaling
 Downscaling of climatic and / or meteorological
data:
o Essential first step for any further analysis,
assessment or processing in climate and related
domains
BDE SC5 Pilot I - Architecture
Cassandra
Metadata &
data lineage
Cassandra
Metadata &
data lineage
Hive/Hadoop
Raw data &
analytics
Hive/Hadoop
Raw data &
analytics
WRF Model
Institutional
resource
connectors
WRF Model
Institutional
resource
connectors
NetCDF
Interfaces
and
visualisation
NetCDF
Interfaces
and
visualisation
SC5
Pilot
SC5
Pilot
Current status
 Operations
o Data ingestion (NetCDF files)
 Both manually, for bootstrapping, as well as after downscaling
o Data export (NetCDF files)
 Selection of variables / time slices
o Start and monitor WRF-based downscaling on institutional
resources
 If requested results already exist, they are retrieved
 If not, WRF is started
o Maintain data lineage records on BDE platform
 Monitoring and further analysis
 Subset of W3C PROV, http://www.w3.org/TR/prov-overview
Current status
o Support basic analytics on BDE
 Hive queries
o Console-based UI
 Python/Jupyter interface for demonstration
Sample analytics
 Climate-change indices / analytics (indicative)
o Number of summer days, frost days
o Tropical nights
o Monthly minimum value of daily maximum temperature
o Precipitation-based statistics
o Etc.
 Analytics for other applications
o Comfort indices (temperature – humidity)
o Risk for forest fires (wind speed – temperature – humidity)
o Atmospheric pollution (wind speed – vertical gradient of
temperature – heat fluxes )
o Etc.
Further pilot development
 Investigation regarding transparent climate
NetCDF transformation tailored to the WRF
model, using the BDE integrator (esp. Spark)
 Testing and further development regarding
data lineage and downscaling
parameterisation and execution
Expected added value
 Scalability and ease in managing large data
sets
 Efficient use of institutional resources in
performing downscaling computations
o Avoiding calculating products when not needed
 Data lineage
o either for existing data in the database, or for data
that are not present anymore
o reproducibility
Hands-on
 The jupyter notebook is accessible at:
o https://143.233.226.108
 (please bypass the warnings)

SC5 Hangout2 pilot 1 description

  • 1.
  • 2.
    To demonstrate whatcan be achieved through the BDE platform in: Managing large volumes of climate / weather numerical data Ingestion / exporting of data Analytics potential Data lineage BASIC AIM
  • 3.
    Downscaling  Downscaling ofclimatic and / or meteorological data: o Essential first step for any further analysis, assessment or processing in climate and related domains
  • 4.
    BDE SC5 PilotI - Architecture Cassandra Metadata & data lineage Cassandra Metadata & data lineage Hive/Hadoop Raw data & analytics Hive/Hadoop Raw data & analytics WRF Model Institutional resource connectors WRF Model Institutional resource connectors NetCDF Interfaces and visualisation NetCDF Interfaces and visualisation SC5 Pilot SC5 Pilot
  • 5.
    Current status  Operations oData ingestion (NetCDF files)  Both manually, for bootstrapping, as well as after downscaling o Data export (NetCDF files)  Selection of variables / time slices o Start and monitor WRF-based downscaling on institutional resources  If requested results already exist, they are retrieved  If not, WRF is started o Maintain data lineage records on BDE platform  Monitoring and further analysis  Subset of W3C PROV, http://www.w3.org/TR/prov-overview
  • 6.
    Current status o Supportbasic analytics on BDE  Hive queries o Console-based UI  Python/Jupyter interface for demonstration
  • 7.
    Sample analytics  Climate-changeindices / analytics (indicative) o Number of summer days, frost days o Tropical nights o Monthly minimum value of daily maximum temperature o Precipitation-based statistics o Etc.  Analytics for other applications o Comfort indices (temperature – humidity) o Risk for forest fires (wind speed – temperature – humidity) o Atmospheric pollution (wind speed – vertical gradient of temperature – heat fluxes ) o Etc.
  • 8.
    Further pilot development Investigation regarding transparent climate NetCDF transformation tailored to the WRF model, using the BDE integrator (esp. Spark)  Testing and further development regarding data lineage and downscaling parameterisation and execution
  • 9.
    Expected added value Scalability and ease in managing large data sets  Efficient use of institutional resources in performing downscaling computations o Avoiding calculating products when not needed  Data lineage o either for existing data in the database, or for data that are not present anymore o reproducibility
  • 10.
    Hands-on  The jupyternotebook is accessible at: o https://143.233.226.108  (please bypass the warnings)