Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Ana M. Martinez, Vestas Wind Systems A/S
On-Prem Solution for the
Selection of Wind Energy
Models
#UnifiedDataAnalytics #S...
3
Is this a good
piece of art?
pixabay.com
Classification: Public
4
Is this a good
piece of art?
pixabay.com
Classification: Public
5
Is this a good site?
Classification: Public
6
Is this a good site?
Classification: Public
SiteHunt®
• Enables early identification of potential wind farms
7
SiteHunt® FirstView
3km resolution
Classification: Publ...
SiteHunt®
• Enables early identification of potential wind farms
8
SiteHunt® FirstView
3km resolution
SiteHunt® CloseUp
1k...
SiteHunt®
• Enables early identification of potential wind farms
9
SiteHunt® FirstView
3km resolution
SiteHunt® CloseUp
1k...
Wind resources
enrichment
Existing modelling options:
– Physical modelling leads to
time-consuming simulations.
– Sub-opti...
Motivation
• DL technology has been recently proved
successful on similar tasks with data that has
hierarchical structure....
HPC is our
primary tool
• 650 compute nodes (Lenovo).
• ~ 16000 CPU cores.
• Total memory > 100 TB.
• >5 PB HDD storage (E...
Data is our spine
• Vestas Climate Library (peta-byte scale).
– Hourly wind resource data from 2000-01-01 to
present in 3k...
US
average
wind
speed at
80m
14Classification: Public 14
US: Avg.
80m wind,
terrain below
1500m, wspd
> 3m/s
15Classification: Public 15
US: Exclude
National
Parks,
protected
areas, national
forests and
federal land
16Classification: Public 16
US: Remove
urban areas
and airports
17Classification: Public 17
High-voltage
grid
proximity
(up to 30 km
from the
grid)
18Classification: Public 18
Siting
• Improve siting by not relying on point estimates from
meteorological masts.
• Wind resources in higher resolution...
Technical Solution
20
Data
Preparation
Data
Extraction
Data
Preparation
Model
selection
Model
Training &
Evaluation
Model
...
Wind resource
downscale
PoC Example
21Classification: Public 21
Data Extraction & Preparation
22
Wind data
(HR/LR)
orc format
~1.5PB
Elevation
Data (VHR)
hgt format
~400GB
Roughness
(HR)...
VCL 3km point – global
coverage (19 years).
3km
23
Data Extraction & Preparation
23Classification: Public
VCL 3km point – global
coverage (19 years).
VCL 1km point – Saudi Arabia
coverage (1 year).
1km
3km
24
Data Extraction & P...
VCL 3km point – global
coverage (19 years).
VCL 1km point – Saudi Arabia
coverage (1 year).
Terrain data - SRTM (very high...
26
Each red point generates 1 row
per timesptamp on the dataset
VCL 3km point – global
coverage (19 years).
VCL 1km point ...
Data Extraction & Preparation
27
INPUT TARGET
swdown u_HR
xhour/yhour v_HR
temperature
u_LR
v_LR
heights_HR
roughness_HR
I...
Feed Forward neural network
28
Input parameters
Output parameters
Hidden
layers
first_neuron (width)
Shape(brick)
Classifi...
Hyperparameter selection
29
48 combinations
#neurons #hidden
layers
#epochs dropout
12 1 200 0.2
56 1 200 0.2
128 1 200 0....
Hyperparameter search
Existing tools not directly applicable:
– Talos.
– KubeFlow.
– MLflow.
– Elephas.
30Classification: ...
Model Selection
31
Job Scheduler
qsub array
TensorBoard*
Configuration
+
train/val data
C
onfiguration
+
train/val data Ou...
Model Training & Evaluation
32
Evaluation measures
MAE, RMSE, BIAS, STDE
Riemann sum between the CDF*
differences (CDF dif...
Do we
downscale?
PoC Example
33Classification: Public 33
34Classification: Public
35Classification: Public
36Classification: Public
37Classification: Public
38Classification: Public
39Classification: Public
40Classification: Public
41Classification: Public
Quantitative results
Method RMSE Bias Pearson’s R CDF diff.
closest 3km point 0.0305 -0.0063 0.9827 0.0077
Linear regressi...
43Classification: Public
44Classification: Public
45Classification: Public
46Classification: Public
47Classification: Public
48Classification: Public
49Classification: Public
Quantitative results
Method RMSE Bias Pearson’s R CDF diff.
closest 3km point 0.0752 -0.021 0.9223 0.0218
Linear regressio...
Do we
downscale?
PoC Example
51Classification: Public 51
Ongoing work
• Use of convolutional + recurrent neural
networks.
• Test different evaluation scenarios.
• Test higher reso...
Potential
• ML importance across the whole value chain.
– Power forecasting.
– Long-term correction of wind measurements.
...
Vestas Team
54
Ana M. Martinez Hjalte Vinther Kiefer
Hans Harhoff Andersen Tiago Miguel da Costa Luna
Classification: Publ...
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT
Upcoming SlideShare
Loading in …5
×

of

On-Prem Solution for the Selection of Wind Energy Models Slide 1 On-Prem Solution for the Selection of Wind Energy Models Slide 2 On-Prem Solution for the Selection of Wind Energy Models Slide 3 On-Prem Solution for the Selection of Wind Energy Models Slide 4 On-Prem Solution for the Selection of Wind Energy Models Slide 5 On-Prem Solution for the Selection of Wind Energy Models Slide 6 On-Prem Solution for the Selection of Wind Energy Models Slide 7 On-Prem Solution for the Selection of Wind Energy Models Slide 8 On-Prem Solution for the Selection of Wind Energy Models Slide 9 On-Prem Solution for the Selection of Wind Energy Models Slide 10 On-Prem Solution for the Selection of Wind Energy Models Slide 11 On-Prem Solution for the Selection of Wind Energy Models Slide 12 On-Prem Solution for the Selection of Wind Energy Models Slide 13 On-Prem Solution for the Selection of Wind Energy Models Slide 14 On-Prem Solution for the Selection of Wind Energy Models Slide 15 On-Prem Solution for the Selection of Wind Energy Models Slide 16 On-Prem Solution for the Selection of Wind Energy Models Slide 17 On-Prem Solution for the Selection of Wind Energy Models Slide 18 On-Prem Solution for the Selection of Wind Energy Models Slide 19 On-Prem Solution for the Selection of Wind Energy Models Slide 20 On-Prem Solution for the Selection of Wind Energy Models Slide 21 On-Prem Solution for the Selection of Wind Energy Models Slide 22 On-Prem Solution for the Selection of Wind Energy Models Slide 23 On-Prem Solution for the Selection of Wind Energy Models Slide 24 On-Prem Solution for the Selection of Wind Energy Models Slide 25 On-Prem Solution for the Selection of Wind Energy Models Slide 26 On-Prem Solution for the Selection of Wind Energy Models Slide 27 On-Prem Solution for the Selection of Wind Energy Models Slide 28 On-Prem Solution for the Selection of Wind Energy Models Slide 29 On-Prem Solution for the Selection of Wind Energy Models Slide 30 On-Prem Solution for the Selection of Wind Energy Models Slide 31 On-Prem Solution for the Selection of Wind Energy Models Slide 32 On-Prem Solution for the Selection of Wind Energy Models Slide 33 On-Prem Solution for the Selection of Wind Energy Models Slide 34 On-Prem Solution for the Selection of Wind Energy Models Slide 35 On-Prem Solution for the Selection of Wind Energy Models Slide 36 On-Prem Solution for the Selection of Wind Energy Models Slide 37 On-Prem Solution for the Selection of Wind Energy Models Slide 38 On-Prem Solution for the Selection of Wind Energy Models Slide 39 On-Prem Solution for the Selection of Wind Energy Models Slide 40 On-Prem Solution for the Selection of Wind Energy Models Slide 41 On-Prem Solution for the Selection of Wind Energy Models Slide 42 On-Prem Solution for the Selection of Wind Energy Models Slide 43 On-Prem Solution for the Selection of Wind Energy Models Slide 44 On-Prem Solution for the Selection of Wind Energy Models Slide 45 On-Prem Solution for the Selection of Wind Energy Models Slide 46 On-Prem Solution for the Selection of Wind Energy Models Slide 47 On-Prem Solution for the Selection of Wind Energy Models Slide 48 On-Prem Solution for the Selection of Wind Energy Models Slide 49 On-Prem Solution for the Selection of Wind Energy Models Slide 50 On-Prem Solution for the Selection of Wind Energy Models Slide 51 On-Prem Solution for the Selection of Wind Energy Models Slide 52 On-Prem Solution for the Selection of Wind Energy Models Slide 53 On-Prem Solution for the Selection of Wind Energy Models Slide 54 On-Prem Solution for the Selection of Wind Energy Models Slide 55
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

On-Prem Solution for the Selection of Wind Energy Models

Download to read offline

The renewable energy industry has only recently started to rely on data-driven models on applications that have traditionally required complex physical solutions. In this talk, we would like to show how we leverage Spark, Keras and (in our case, on-prem) high performance computing (HPC) infrastructure to potentially tackle common and interesting problems in the wind-related industry (saving hours of CPU-consuming simulations).

We use:

Apache Spark and Hive for data preparation and a combination of different data sources (some of them in the range of the petabytes scale).
Keras for model training/generation.
HPC for coordination and node-wide training of hyperparameters.

  • Be the first to like this

On-Prem Solution for the Selection of Wind Energy Models

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Ana M. Martinez, Vestas Wind Systems A/S On-Prem Solution for the Selection of Wind Energy Models #UnifiedDataAnalytics #SparkAISummit
  3. 3. 3 Is this a good piece of art? pixabay.com Classification: Public
  4. 4. 4 Is this a good piece of art? pixabay.com Classification: Public
  5. 5. 5 Is this a good site? Classification: Public
  6. 6. 6 Is this a good site? Classification: Public
  7. 7. SiteHunt® • Enables early identification of potential wind farms 7 SiteHunt® FirstView 3km resolution Classification: Public
  8. 8. SiteHunt® • Enables early identification of potential wind farms 8 SiteHunt® FirstView 3km resolution SiteHunt® CloseUp 1km – 300m resolution Classification: Public
  9. 9. SiteHunt® • Enables early identification of potential wind farms 9 SiteHunt® FirstView 3km resolution SiteHunt® CloseUp 1km – 300m resolution SiteHunt® DeepDive 10 – 25m resolution Classification: Public
  10. 10. Wind resources enrichment Existing modelling options: – Physical modelling leads to time-consuming simulations. – Sub-optimal geostatistical approaches. 10Classification: Public
  11. 11. Motivation • DL technology has been recently proved successful on similar tasks with data that has hierarchical structure. • What tools and data do we have at Vestas for this task? • What is missing? 11Classification: Public
  12. 12. HPC is our primary tool • 650 compute nodes (Lenovo). • ~ 16000 CPU cores. • Total memory > 100 TB. • >5 PB HDD storage (EMC Isilon). • 56Gb/s IB. • ~500 TFLOPS. • 20 GPU nodes. • Sun Grid Engine scheduler. 12 Classification: Public
  13. 13. Data is our spine • Vestas Climate Library (peta-byte scale). – Hourly wind resource data from 2000-01-01 to present in 3km horizontal resolution. – More than 50 parameters. – From ground level to beyond 500m. – ORC database, started in 2012. • Elevation database. • Roughness database. 13Classification: Public
  14. 14. US average wind speed at 80m 14Classification: Public 14
  15. 15. US: Avg. 80m wind, terrain below 1500m, wspd > 3m/s 15Classification: Public 15
  16. 16. US: Exclude National Parks, protected areas, national forests and federal land 16Classification: Public 16
  17. 17. US: Remove urban areas and airports 17Classification: Public 17
  18. 18. High-voltage grid proximity (up to 30 km from the grid) 18Classification: Public 18
  19. 19. Siting • Improve siting by not relying on point estimates from meteorological masts. • Wind resources in higher resolution. 19Classification: Public 19
  20. 20. Technical Solution 20 Data Preparation Data Extraction Data Preparation Model selection Model Training & Evaluation Model deployment Hyperparameter search Classification: Public
  21. 21. Wind resource downscale PoC Example 21Classification: Public 21
  22. 22. Data Extraction & Preparation 22 Wind data (HR/LR) orc format ~1.5PB Elevation Data (VHR) hgt format ~400GB Roughness (HR) GeoTIFF format Apache Spark* (pyspark) Apache Spark* Derived features vector field Curl, divergence, laplacian * All product names, logos and brands are property of their respective owners. All company, product and service names used in this document are for identification purposes only. Use of these names, logos and brands does not imply endorsement. Classification: Public Apache Hive* python
  23. 23. VCL 3km point – global coverage (19 years). 3km 23 Data Extraction & Preparation 23Classification: Public
  24. 24. VCL 3km point – global coverage (19 years). VCL 1km point – Saudi Arabia coverage (1 year). 1km 3km 24 Data Extraction & Preparation 24Classification: Public
  25. 25. VCL 3km point – global coverage (19 years). VCL 1km point – Saudi Arabia coverage (1 year). Terrain data - SRTM (very high resolution, up to 30m). 25 Data Extraction & Preparation 25Classification: Public
  26. 26. 26 Each red point generates 1 row per timesptamp on the dataset VCL 3km point – global coverage (19 years). VCL 1km point – Saudi Arabia coverage (1 year). Terrain data - SRTM (very high resolution, up to 30m). Data Extraction & Preparation 26Classification: Public
  27. 27. Data Extraction & Preparation 27 INPUT TARGET swdown u_HR xhour/yhour v_HR temperature u_LR v_LR heights_HR roughness_HR INPUT TARGET heights u_HR u_LR v_HR v_LR DNN BASELINE u v wind speed q Classification: Public
  28. 28. Feed Forward neural network 28 Input parameters Output parameters Hidden layers first_neuron (width) Shape(brick) Classification: Public
  29. 29. Hyperparameter selection 29 48 combinations #neurons #hidden layers #epochs dropout 12 1 200 0.2 56 1 200 0.2 128 1 200 0.2 256 1 200 0.2 12 5 200 0.2 56 5 200 0.2 … … 256 10 400 0.5 Classification: Public
  30. 30. Hyperparameter search Existing tools not directly applicable: – Talos. – KubeFlow. – MLflow. – Elephas. 30Classification: Public
  31. 31. Model Selection 31 Job Scheduler qsub array TensorBoard* Configuration + train/val data C onfiguration + train/val data Output keras_model.h5 Tensorboard logs params.json * All product names, logos and brands are property of their respective owners. All company, product and service names used in this document are for identification purposes only. Use of these names, logos and brands does not imply endorsement. docker containers docker containers docker containers docker containers docker containers docker containers talos* * Classification: Public
  32. 32. Model Training & Evaluation 32 Evaluation measures MAE, RMSE, BIAS, STDE Riemann sum between the CDF* differences (CDF diff.) Pearson correlation coefficient (Pearson’s r) TestValidationTrain Train baseline & wining DNN model Learn hyperpara meters Learn candidate models Time- consecutive data kept to evaluate * Cumulative distribution function Classification: Public
  33. 33. Do we downscale? PoC Example 33Classification: Public 33
  34. 34. 34Classification: Public
  35. 35. 35Classification: Public
  36. 36. 36Classification: Public
  37. 37. 37Classification: Public
  38. 38. 38Classification: Public
  39. 39. 39Classification: Public
  40. 40. 40Classification: Public
  41. 41. 41Classification: Public
  42. 42. Quantitative results Method RMSE Bias Pearson’s R CDF diff. closest 3km point 0.0305 -0.0063 0.9827 0.0077 Linear regression 0.0315 0.0080 0.9836 0.0114 DNN 0.0294 0.0022 0.9853 0.0093 42Classification: Public 42
  43. 43. 43Classification: Public
  44. 44. 44Classification: Public
  45. 45. 45Classification: Public
  46. 46. 46Classification: Public
  47. 47. 47Classification: Public
  48. 48. 48Classification: Public
  49. 49. 49Classification: Public
  50. 50. Quantitative results Method RMSE Bias Pearson’s R CDF diff. closest 3km point 0.0752 -0.021 0.9223 0.0218 Linear regression 0.0663 0.0009 0.9236 0.0178 DNN 0.0538 0.0021 0.9459 0.0075 50Classification: Public 50
  51. 51. Do we downscale? PoC Example 51Classification: Public 51
  52. 52. Ongoing work • Use of convolutional + recurrent neural networks. • Test different evaluation scenarios. • Test higher resolution terrain information. • Connect and automate the end-to-end cycle. 52Classification: Public
  53. 53. Potential • ML importance across the whole value chain. – Power forecasting. – Long-term correction of wind measurements. – Wind resources enrichment. – Troubleshooting turbine errors. – Condition monitoring. – Wind farm control. – Wind Turbine Surface Damage Detection. – … 53Classification: Public
  54. 54. Vestas Team 54 Ana M. Martinez Hjalte Vinther Kiefer Hans Harhoff Andersen Tiago Miguel da Costa Luna Classification: Public
  55. 55. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

The renewable energy industry has only recently started to rely on data-driven models on applications that have traditionally required complex physical solutions. In this talk, we would like to show how we leverage Spark, Keras and (in our case, on-prem) high performance computing (HPC) infrastructure to potentially tackle common and interesting problems in the wind-related industry (saving hours of CPU-consuming simulations). We use: Apache Spark and Hive for data preparation and a combination of different data sources (some of them in the range of the petabytes scale). Keras for model training/generation. HPC for coordination and node-wide training of hyperparameters.

Views

Total views

450

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

8

Shares

0

Comments

0

Likes

0

×