THE RATIONALE AND
METHODOLOGY OF THE
BDE 2ND SC5 PILOT
NCSR “Demokritos”20-Dec.-16
Framework
¥ Computational modelling of atmospheric dispersion
of hazardous pollutants
¥ How can BigDataEurope Integrator tools contribute
to performing more efficiently computational tasks
related to atmospheric dispersion of hazardous
pollutants?
11-oct.-16www.big-data-europe.eu
Purposes and means
¥ Air pollution abatement / early warning / countermeasures
o Anthropogenic emissions: routine, accidental (nuclear, chemical),
malevolent (terrorist) – unannounced releases
o Natural emissions (e.g., volcanic eruptions)
¥ Measurements (from earth or space)
¥ Mathematical modelling
¥ Combination of the above → “forward” or “inverse” modelling
through “data assimilation”
11-oct.-16www.big-data-europe.eu
Input data for dispersion modelling
¥ Meteorology
¥ “Source term”: knowledge of the emitted pollutant(s)
source(s): Location, quantity and conditions of release,
timing
¥ Terrain characteristics, geometry of buildings etc.
¥ Depending on available input and measurement data:
“forward” or “inverse” modelling
11-oct.-16www.big-data-europe.eu
Cases of “inverse” computations
¥ The pollutant emission sources are NOT known:
location and / or quantity of emitted substances
o Technological accidents (e.g., chemical, nuclear), natural
disasters (e.g., volcanos): known location, unknown
emission
o Un-announced technological accidents (e.g. Chernobyl),
malevolent intentional releases (terrorism), nuclear tests
¥ Inverse “source-term” estimation techniques
11-oct.-16www.big-data-europe.eu
Inverse source-term estimation
¥ Available information:
o Measurements indicating the presence of air pollutant
o Meteorological data for now and recent past
¥ Mathematical techniques blending the above with
results of dispersion models to infer position and
strength of emitting source
o Special attention: multiple solutions
11-oct.-16www.big-data-europe.eu
Introducing the 2nd BDE SC5 Pilot
¥ The previously mentioned mathematical techniques require
large computing times
¥ Purpose: fast estimation of source location in emergencies
¥ Proposed solution: pre-calculate a large number of scenarios,
store them, and at the time of an emergency select the “most
appropriate”
¥ BDE will provide the tools to perform this functionality
efficiently
11-oct.-16www.big-data-europe.eu
Structure of the 2nd BDE SC5 Pilot
¥ Geographic area: Europe
¥ Cases of interest: accidents at Nuclear Power Plants
¥ Weather calculations:
o Re-analysis data for 20 years
o Clustering → “typical” weather circulation patterns
o Downscaling through WRF for the “typical” weather
circulation patterns
11-oct.-16www.big-data-europe.eu
Structure of the 2nd BDE SC5 Pilot
¥ Dispersion calculations:
o Calculation of dispersion patterns from NPPs for the
above downscaled typical weather circulation patterns
o Dispersion results: gridded and (optionally) at
monitoring stations
11-oct.-16www.big-data-europe.eu
Structure of the 2nd BDE SC5 Pilot
¥ In the event of radiation signals at some stations:
o Matching of current and recent weather to closest
typical circulation pattern
o From the stored dispersion results pertaining to the
matched weather circulation patterns select the one that
closest matches the monitoring data
o The matched dispersion pattern will reveal the most
probable emission source
11-oct.-16www.big-data-europe.eu
So far …
¥ Preliminary clustering studies on limited amount of
re-analysis data (while waiting for full download)
o On the basis of different variables on different
pressure levels
¥ Dispersion calculations for a selected NPP for the
revealed weather classes
11-oct.-16www.big-data-europe.eu
So far …
¥ Selected a random date, taken as “true” accident day
¥ Matching of the “true” day’s weather data with the closest
weather class from the clustering procedure
¥ Dispersion calculations with the weather data of the “true” day
¥ Comparison of dispersion results based on “true” and matched
weather data
11-oct.-16www.big-data-europe.eu
Workflow
www.big-data-europe.eu
ECMWF
Weather
reanalysis data
(20+years)
WRF
Pre-processed
weather data
Clustering
Predominant
weather patterns
DIPCOT
Dispersions for
weather
patterns, for a
number of fixed
nuclear sites
Detector
Detection of
dangerous
release
Weather
service
Recent
weather (e.g.
3 days)
Batch processing
Interactive workflow Comparison
Candidate
release origins
Data
¥ ECMWF Reanalysis data
¥ NCAR-UCAR Archive
o Better compatibility with WPS/WRF
¥ 20-30 years
o Approx. 6 TB in total
¥ Grib2 format – again for better compatibility with WRF
o NetCDF via WPS
¥ Many variables at multiple geopotential heights
www.big-data-europe.eu
Architectural Overview
www.big-data-europe.eu
Possible additions as BDE pilot components:
(1) POSTGIS
(2) DIPCOT
Clustering
¥ Traditional methods
o Agglomerative hierarchical
o K-means
¥ Soon to implement
o NN-based feature extraction (e.g. autoencoders,
convolution nets)
o (Possibly) followed by k-means
www.big-data-europe.eu
Evaluation
¥ Incremental
o Clustering outcome
o Closeness of constituent weather within clusters / distance between
clusters
o Dispersion characteristics
o Different cluster descriptors for
v Creating cluster-based dispersions
v Matching “real data” to clusters
¥ Complete
o Compare cluster-based dispersion against
o “Real data” dispersion
v For a number of hypothetical scenarios
www.big-data-europe.eu
Preliminary results
¥ Clustering over 2-year period (1986, 1987)
o K=6 clusters
¥ Multiple geopotentials
¥ Other variables – notably wind speed – at
different heights
¥ “Visual comparison” against “real data” dispersions
¥ Incrementally combining more vars
www.big-data-europe.eu
Cluster quality / GHT 500hPa
www.big-data-europe.eu
• 1986, 1987
• Resolution=
• Items (6-hr snapshots) =
• K-means, for K-6
• Geopotential height=500hPa
• Dispersions well differentiated for a
specific hypothetical origin
• Real data:
Different Clustering Algorithms
www.big-data-europe.eu
Immediate Future Work
¥ Feature extraction
o Taking into account multiple variables
o At more heights
¥ Automatic evaluation
o For a number of pre-selected scenarios
¥ Dockerisation and inclusion into the BDE architecture
www.big-data-europe.eu
11-oct.-16www.big-data-europe.eu
Thank you for your attention!

The Rationale and Methodology of the 2nd SC5 Pilot

  • 1.
    THE RATIONALE AND METHODOLOGYOF THE BDE 2ND SC5 PILOT NCSR “Demokritos”20-Dec.-16
  • 2.
    Framework ¥ Computational modellingof atmospheric dispersion of hazardous pollutants ¥ How can BigDataEurope Integrator tools contribute to performing more efficiently computational tasks related to atmospheric dispersion of hazardous pollutants? 11-oct.-16www.big-data-europe.eu
  • 3.
    Purposes and means ¥Air pollution abatement / early warning / countermeasures o Anthropogenic emissions: routine, accidental (nuclear, chemical), malevolent (terrorist) – unannounced releases o Natural emissions (e.g., volcanic eruptions) ¥ Measurements (from earth or space) ¥ Mathematical modelling ¥ Combination of the above → “forward” or “inverse” modelling through “data assimilation” 11-oct.-16www.big-data-europe.eu
  • 4.
    Input data fordispersion modelling ¥ Meteorology ¥ “Source term”: knowledge of the emitted pollutant(s) source(s): Location, quantity and conditions of release, timing ¥ Terrain characteristics, geometry of buildings etc. ¥ Depending on available input and measurement data: “forward” or “inverse” modelling 11-oct.-16www.big-data-europe.eu
  • 5.
    Cases of “inverse”computations ¥ The pollutant emission sources are NOT known: location and / or quantity of emitted substances o Technological accidents (e.g., chemical, nuclear), natural disasters (e.g., volcanos): known location, unknown emission o Un-announced technological accidents (e.g. Chernobyl), malevolent intentional releases (terrorism), nuclear tests ¥ Inverse “source-term” estimation techniques 11-oct.-16www.big-data-europe.eu
  • 6.
    Inverse source-term estimation ¥Available information: o Measurements indicating the presence of air pollutant o Meteorological data for now and recent past ¥ Mathematical techniques blending the above with results of dispersion models to infer position and strength of emitting source o Special attention: multiple solutions 11-oct.-16www.big-data-europe.eu
  • 7.
    Introducing the 2ndBDE SC5 Pilot ¥ The previously mentioned mathematical techniques require large computing times ¥ Purpose: fast estimation of source location in emergencies ¥ Proposed solution: pre-calculate a large number of scenarios, store them, and at the time of an emergency select the “most appropriate” ¥ BDE will provide the tools to perform this functionality efficiently 11-oct.-16www.big-data-europe.eu
  • 8.
    Structure of the2nd BDE SC5 Pilot ¥ Geographic area: Europe ¥ Cases of interest: accidents at Nuclear Power Plants ¥ Weather calculations: o Re-analysis data for 20 years o Clustering → “typical” weather circulation patterns o Downscaling through WRF for the “typical” weather circulation patterns 11-oct.-16www.big-data-europe.eu
  • 9.
    Structure of the2nd BDE SC5 Pilot ¥ Dispersion calculations: o Calculation of dispersion patterns from NPPs for the above downscaled typical weather circulation patterns o Dispersion results: gridded and (optionally) at monitoring stations 11-oct.-16www.big-data-europe.eu
  • 10.
    Structure of the2nd BDE SC5 Pilot ¥ In the event of radiation signals at some stations: o Matching of current and recent weather to closest typical circulation pattern o From the stored dispersion results pertaining to the matched weather circulation patterns select the one that closest matches the monitoring data o The matched dispersion pattern will reveal the most probable emission source 11-oct.-16www.big-data-europe.eu
  • 11.
    So far … ¥Preliminary clustering studies on limited amount of re-analysis data (while waiting for full download) o On the basis of different variables on different pressure levels ¥ Dispersion calculations for a selected NPP for the revealed weather classes 11-oct.-16www.big-data-europe.eu
  • 12.
    So far … ¥Selected a random date, taken as “true” accident day ¥ Matching of the “true” day’s weather data with the closest weather class from the clustering procedure ¥ Dispersion calculations with the weather data of the “true” day ¥ Comparison of dispersion results based on “true” and matched weather data 11-oct.-16www.big-data-europe.eu
  • 13.
    Workflow www.big-data-europe.eu ECMWF Weather reanalysis data (20+years) WRF Pre-processed weather data Clustering Predominant weatherpatterns DIPCOT Dispersions for weather patterns, for a number of fixed nuclear sites Detector Detection of dangerous release Weather service Recent weather (e.g. 3 days) Batch processing Interactive workflow Comparison Candidate release origins
  • 14.
    Data ¥ ECMWF Reanalysisdata ¥ NCAR-UCAR Archive o Better compatibility with WPS/WRF ¥ 20-30 years o Approx. 6 TB in total ¥ Grib2 format – again for better compatibility with WRF o NetCDF via WPS ¥ Many variables at multiple geopotential heights www.big-data-europe.eu
  • 15.
    Architectural Overview www.big-data-europe.eu Possible additionsas BDE pilot components: (1) POSTGIS (2) DIPCOT
  • 16.
    Clustering ¥ Traditional methods oAgglomerative hierarchical o K-means ¥ Soon to implement o NN-based feature extraction (e.g. autoencoders, convolution nets) o (Possibly) followed by k-means www.big-data-europe.eu
  • 17.
    Evaluation ¥ Incremental o Clusteringoutcome o Closeness of constituent weather within clusters / distance between clusters o Dispersion characteristics o Different cluster descriptors for v Creating cluster-based dispersions v Matching “real data” to clusters ¥ Complete o Compare cluster-based dispersion against o “Real data” dispersion v For a number of hypothetical scenarios www.big-data-europe.eu
  • 18.
    Preliminary results ¥ Clusteringover 2-year period (1986, 1987) o K=6 clusters ¥ Multiple geopotentials ¥ Other variables – notably wind speed – at different heights ¥ “Visual comparison” against “real data” dispersions ¥ Incrementally combining more vars www.big-data-europe.eu
  • 19.
    Cluster quality /GHT 500hPa www.big-data-europe.eu • 1986, 1987 • Resolution= • Items (6-hr snapshots) = • K-means, for K-6 • Geopotential height=500hPa • Dispersions well differentiated for a specific hypothetical origin • Real data:
  • 20.
  • 21.
    Immediate Future Work ¥Feature extraction o Taking into account multiple variables o At more heights ¥ Automatic evaluation o For a number of pre-selected scenarios ¥ Dockerisation and inclusion into the BDE architecture www.big-data-europe.eu
  • 22.