SlideShare a Scribd company logo
1 of 47
Download to read offline
Operational ocean wave
ensemble forecasts: state-of-the-
art validation and high resolution
forecasts
Final Year Project report towards the achievement of a Graduate Engineering
Diploma in Hydrography at ENSTA Bretagne
Loïc Stouff
Tutor at ACTIMAR: M. Cyril FRELIN
Tutor at ENSTA Bretagne: Mme. Amandine NICOLE
2012 – 2013
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
2
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
3
ABSTRACT ......................................................................................................................................................................4
1 INTRODUCTION...................................................................................................................................................5
1-1 OUTLINE OF THE REPORT ..................................................................................................................................5
1-2 INTRODUCTION TO ENSEMBLE FORECAST .........................................................................................................6
1-3 MAIN OCEAN WAVE ENSEMBLE PREDICTION CENTERS ......................................................................................7
NCEP........................................................................................................................................................................8
FNMOC....................................................................................................................................................................8
ECMWF....................................................................................................................................................................9
Norwegian Meteorological Institute ........................................................................................................................9
China National Meteorological Centre ...................................................................................................................9
1-4 POSSIBLE IMPROVEMENTS OF EXISTING MODELS..............................................................................................9
2 METHODOLOGY................................................................................................................................................ 11
2-1 BUOYS LOCATION AND DATA PROCESSING....................................................................................................... 11
2-2 DEFINITION OF WW3 STUDY AREAS ...............................................................................................................14
2-3 OVERVIEW OF MATHEMATICAL AND VISUALIZATION TOOLS............................................................................15
3 VALIDATION OF STATE-OF-THE-ART WAVE ENSEMBLE FORECAST................................................18
3-1 IN THE NORTH SEA .........................................................................................................................................18
3-2 IN TAIWAN......................................................................................................................................................25
3-3 WIND AND WAVE VARIABILITY........................................................................................................................29
4 IMPROVEMENT OF WAVE ENSEMBLE FORECASTS...............................................................................33
4-1 IMPROVEMENT OF STATE-OF-THE-ART FORECASTS .........................................................................................33
Linear Shift.............................................................................................................................................................33
Empirical Orthogonal functions............................................................................................................................35
4-2 HIGH RESOLUTION WAVE ENSEMBLE FORECAST..............................................................................................36
High resolution wave ensemble forecast validation..............................................................................................36
Clustering ...............................................................................................................................................................38
5 CONCLUSIONS ...................................................................................................................................................44
BIBLIOGRAPHY...........................................................................................................................................................45
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
4
Abstract
Ocean Wave Ensemble forecasts – predictions based on several runs of the same model with
different initial and boundary conditions– progressively replace deterministic forecasts in
operational contexts. The objectives of this study are, after reviewing various tools and theoretical
matters, to quantify the performances of state-of-the-art wave ensemble forecasts and analyze the
possibility of running higher resolution ensemble forecasts at low computational cost. It appears
that performances of ensemble forecasts are, in most cases, better than higher resolution
deterministic forecasts but seems still insufficient for particularly sensitive applications such as high
risks offshore operations.
Several statistical methods such as clustering and ensemble generation by Empirical Orthogonal
Functions – EOFs – were investigated and revealed to be efficient for reducing computational cost;
thus allowing to run higher resolution ensemble forecasts.
Performances of very high resolution forecasts were then studied but several limits related with
wind fields and model parameters caused these predictions to be slightly disappointing. Despite a
higher variability which represents a strong improvement, the predictions sometimes differ too
much from observations. Further studies have then to be conducted on this matter especially to
analyze the influence of improvements on bathymetry and wind forcing fields.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
5
1 Introduction
1-1Outline of the report
This study focuses on the possibility of running very high resolution wave ensemble
forecasts at the lowest computational costs. As operational ensemble forecasts are shown growing
interest in both public and private sectors, the question of its application as a decision support tool
in high risks maritime operations is raised.
This master thesis is part of a research and development (R & D) project founded by the CITEPH
program (Consultation for Technological Innovation in Exploration and Production of
Hydrocarbons) which expects practical answers and results.
In addition to technical tasks including data-processing and visualization, an interpretation
of model outputs was necessary to evaluate performances of forecasts coupled with extensive
research on ways to reduce effectively computational costs.
This study intends to answer following questions:
- Are actual performances of state-of-the-art wave ensemble forecasts reliable enough to
be used in decision support tools?
- Do higher resolution forecasts increase these performances?
- Can computational time be reduced without deteriorating performances?
Five sections make up this report:
- Section 1: Overview of ensemble forecasts. Useful scientific notions and reminders are
briefly defined followed by a succinct presentation of main wave ensemble forecast
centers.
- Section 2: Methodology. This section provides information about data sources, data
processing methods used as well as about statistical and numerical tools.
- Section 3: Validation of state-of-the-art ensemble forecast. The performances of NCEP
wave ensemble forecast are validated against buoy data.
- Section 4: Improvements of Wave Ensemble Forecasts. Findings of the project in
regards to ensemble forecasts are developed in this section and discussed in the light of
previously mentioned questions.
- Section 5: Conclusion and outlook of the project. Outcomes of the project are given and
further research directions are highlighted.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
6
1-2Introduction to ensemble forecast
Ensemble predictions originally come from meteorology and aim at taking into account the
effect of uncertainty in the initial conditions of a model. Indeed, in meteorological studies, initial
conditions represent our knowledge of the atmosphere’s state; due to the scarcity of observations
and to the presence of inherent errors in measurements, this knowledge is imperfect.
Figure 1-1: Principle of ensemble and deterministic forecasts
Due to the non-linearity of flood mechanics equations [5], medium to long range results can
vary drastically if errors are present in the initial conditions. It is nowadays commonly agreed that
errors in observations are unavoidable, and therefore that these errors in initial conditions and
forcing have to be taken into account. Considering this fact, the principle of a single deterministic
solution of model’s governing equations can be questioned. The generation of an ensemble of
solutions derivate from various initial conditions reflecting the observation’s uncertainty provides
then more information on long-term behavior of predictions. In addition to improving reliability in
predictions, ensemble prediction systems (EPS) estimate probabilities associated with different
possible states.
Unlike atmospheric models, ocean wave models are not very sensitive to initial conditions’ errors
after the first 24 hours. However, perturbations in wind forcing fields represent the main source of
errors in wave models, giving the opportunity to produce ensemble forecasts based on these wind
perturbations. The methods used to produce these perturbed wind forcing fields have been relatively
well documented in the literature [12] and are beyond the scope of this section; they will therefore
not be treated here.
All forecasts mentioned in this paper are based on third generation wave models. These
models are governed by the action balance equation which describes the evolution of the wave
energy spectrum F forced by specific source terms. The resolution of the equation requires the
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
7
knowledge of the spectrum at a given time and surface winds for all time integration intervals. The
equation’s formulation is given below.
P (F) =
DF
Dt
= Sin + Snl + Sds
, where
D
Dt
represents the Lagrangian derivative which can be written:
D
Dt
=
∂
∂t
+ cg · ∇ cg
• The right-hand side of the action balance equation represents source terms: Sin being the
wind-related input, Sds describing dissipation term and Snl standing for nonlinear wave-
wave interactions terms [1].
• Further details about the action balance equation and source terms are available in Komen et
al. (1994) [9].
The main quantity used in this paper is the significant wave height Hs defined as below
(Ochi, 1998 [10]) which is consistent with the average of the third of the highest waves (H1/3)
derived from measurements.
Hs = 4 √E
, where E, the wave energy, is given by:
∫∫
∞
=
2π
0 0
θ).df.dθ.F(f,E
For further information on third generation wave models, please refer to following papers –amongst
others ([5], [6] and [7])
1-3Main ocean wave ensemble prediction centers
This section presents the main centers providing ocean wave ensemble forecasts, which are
still marginal in comparison to atmospheric ensemble forecasts. The ensemble prediction systems
presented in this section are based on third generation wave models, all ensemble members use the
undisturbed analysis as initial conditions and the members’ entire spread results from wind forcing
perturbations. Indeed, spectral initial conditions’ influence has been showed to be negligible in third
generation models.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
8
NCEP
The US National Center for Environmental Prediction (NCEP) has developed in 2004 and
implemented operationally in 2006 a wave ensemble forecast system called Global Ensemble Ocean
Wave Forecast System (GEOWaFS).
The version of GEOWaFS currently used is an updated version running with the NOAA (National
Oceanic and Atmospheric Administration) multi-grid WAVEWATCH III - replacing prior
GEOWaFS with NOAA WAVEWATCH III (Cao et al., 2009 [3]).
The model ranges from 78°S to 78°N with a 1° x 1° spatial resolution.
Current version of GEOWaFS consists of 20+1 members generated by separate runs of NWW3
based on perturbed wind fields obtained from NOAA/NCEP Global Ensemble Forecast System
(GEFS) bias-corrected 10m winds updated every 3 hours. Perturbations of the wind fields were
generated using the breeding of growing mode method as described in (Toth and Kalnay, 1993 [12],
1997 [13]).
The initial wave field comes from deterministic NWW3 forecast. Operational GEOWaFS is run 4
times a day (at 00, 06, 12, 18 UTC).
Several studies (Chen, 2006 [6] / Cao et al., 2009 [3]) demonstrated that GEOWaFS produces more
realistic and reliable predictions than current operational global deterministic wave forecast NWW3
system.
FNMOC
The US Navy Fleet Numerical Meteorology and Oceanography Center (FNMOC) also
provides global ensemble ocean wave prediction. It consists in 20 members of 10days forecast run
twice daily (at 00 and 12 UTC) with a 1° x 1° resolution. Ensemble members are generated from
Navy Operational Global Atmospheric Prediction System (NOGAPS EFS) wind fields.
Both NCEP and FNMOC ensemble forecasts are sometimes combined to form an ensemble of 40+1
independent ensemble members.
Table below sums up characteristics of these two forecasts.
NCEP wave ensemble system FNMOC wave ensemble system
Number of members 20 20
Wind forcing Bias-corrected GEFS winds NOGAPS Ensemble Forecast
System
Grid Global spherical Global spherical
Spatial resolution °°
×11 °°
×11
Geographical extension 78°S to 78°N 78°S to 78°N
Cycle per days (Z) 4 runs a day 00, 06, 12, 18
UTC
Twice daily at 00 and 12 UTC
Forecast 10 days 10 days
Table 1-1: Summary of main characteristics of FNMOC and GEOWaFS forecasts
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
9
ECMWF
The European Center for Medium-range Weather Forecasts (ECMWF) provides global wave
ensemble forecasts with a spatial resolution of 0.5° x 0.5° with shallow water physics and 15days
forecast range. Like for previously mentioned forecasts, the initial wave field comes from
unperturbed deterministic prediction and the spread of all 50 members only depends of
perturbations in wind fields. ECMWF EFS is run twice daily (at 00 and 12 UTC).
Norwegian Meteorological Institute
The Norwegian Meteorological Institute (met.no) runs daily a regional operational ensemble
prediction system for ocean waves (WAMEPS). The model covers Northern Europe, Scandinavian
Peninsula, the Nordic Seas (including North Sea and Barents Sea) with a 0.1° resolution and is
forced by the atmospheric limited area ensemble prediction system (LAMEPS).
More information is provided in (Carrasco et al., 2008 [4])
China National Meteorological Centre
China National Meteorological Centre also runs a global ensemble wave forecast at 1° x 1°
spatial resolution with WW3 model. 14+1 wave fields are calculated from perturbed wind fields of
the atmospheric operational forecast of China National Meteorological Centre. The model is run
twice daily (at 00 and 12 UTC) with a 10days forecast range.
For the purpose of this study, NCEP and FNMOC forecasts were used as reference state-of-
the-art numerical ocean wave ensemble predictions.
Daily deterministic forecast run operationally on a global scale at ACTIMAR was also gathered as
global gridded significant wave height data on the relevant period at a 0.5° x 0.5° spatial resolution
in order to allow direct performance comparisons between probabilistic and deterministic forecasts.
1-4Possible improvements of existing models.
The issue underlying this project is to know the reliability and the quality of existing
operational forecasts for being used as a decision support tools for high risks maritime operations.
The following question is then to know what are the opportunities to improve those forecasts or to
produce higher resolution ones. The two possible ways are first to identify an eventual common
pattern in all predictions, a recurrent tendency characteristic of a forecast, allowing to increase
rapidly and effectively its performance by a simple correction. Otherwise if no such pattern is to be
found, the only solution lies in running higher resolution models both spatially and temporally
speaking.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
10
The computing power still being a fundamental issue when considering operational
forecasts, especially for ensemble forecasts where several dozens of wave fields may have to be
generated, it is essential to find solutions minimizing the computational cost of forecasts.
Approaches which were retained include the generation of wind fields by empirical orthogonal
functions from one single member, or the classification of members to reproduce a similar range
with a reduced number of wave fields.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
11
2 Methodology
2-1 Buoys location and data processing
For the purpose of the comparison of predicted wave fields with observations, buoys
measurements where gathered from US National Data Buoy Centre (NDBC) and Taiwan Central
Weather Bureau (CWB). Significant wave heights were recorded hourly at 8 different locations on
which the study will focus. I selected four buoys in each relevant area that is to say four in the
North Sea and four near Taiwan. The selection was based on several criteria including the
geographical location of the buoy, the ocean depth, the distance to the shore and the quality of data.
Despite this preliminary selection, several buoys show gaps in their time series, due to corrupted or
unavailable data at particular dates. Even if they appear from times to times on graphs, periods
covering those gaps where not taken into account when computing performance indicators of
forecasts and therefore do not induce any bias in the interpretation of results.
In the North Sea, buoys B62164, B62145, B62127 and B63113 were selected. They are all
located within a 50-100km range from the shore in relatively deep water areas.
Near Taiwan, buoys 46699A, 46778A, 46757B and C6V27 were considered. Except from buoy
C6V27 which is located 250km from the shore with a 3000meters ocean depth, all others stand in
near shore areas with ocean depth lower than 30meters. Both shallow and deep ocean behaviors of
the forecast could then be studied.
Despite the unknown uncertainty of measurements at these buoys, they were taken as
reference values, as uncertainty of predicted values is assumed to be way higher.
Following Table 2-1 and Figure 2-1 give the locations of these buoys.
Buoy Longitude Latitude
B62164 0.5°E 57°N
B62145 2.8°E 53.1°N
B63113 1.7°E 61°N
B62127 0.7°E 54°N
C6V27 118.8°E 21°N
46699A 121.6°E 24°N
46778A 120°E 23.1°N
46757B 120.8°E 24.8°N
Table 2-1: Geographical coordinates of selected buoy moorings
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
12
Figure 2-1: Buoy moorings locations in the North Sea and near Taiwan
An important detail has to be mentioned concerning locations of buoys. As the spatial
resolution of ACTIMAR’s deterministic forecast did not always permit to directly extract
predictions at buoys’ precise locations, it was necessary to find a solution to overcome this issue: I
linearly interpolated gridded Hs every time it was possible in order to have the most accurate
prediction possible. However, at locations where the buoy was too close to the shoreline to allow
interpolation, closest value available was chosen. Thus, error on Hs prediction is inevitable due to
the influence of bathymetry. Nevertheless, as the ensemble forecasts’ resolution is twice lower than
deterministic one, inter-comparisons of performance should not be strongly impacted.
Another issue to be raised is the time steps’ variation between observations and both
ensemble and deterministic forecasts. Whereas observations are sampled hourly, ensemble forecasts
are sampled 6-hourly. Therefore, temporal interpolation had to be performed in order to make those
time-series easily comparable. Two scenarios were possible and I had to choose between
interpolating on the smaller time step -1 hour- or on the largest one -6 hours. A brief comparison of
root mean square errors (RMSE –see Section 2-3) of the ensemble mean computed from datasets
obtained from both methods showed that the differences between those RMSE values are low –
typically less than 10mm except from buoys B63113 and B62127 (Table 2-2). It seems then
reasonable to consider that both interpolation methods are equivalent for the purpose of this study.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
13
Buoy RMSE of ensemble mean (Raw
Prediction/Interpolated Obs)
RMSE of ensemble mean
(Interpolated Prediction/Raw Obs)
46778A 0.317 0.310
C6V27 0.509 0.520
46699A 0.761 0.758
46757B 0.460 0.460
B62164 0.499 0.509
B62127 0.328 0.278
B62145 0.276 0.281
B63113 0.537 0.510
Table 2-2: RMSE of ensemble mean for both temporal interpolation methods considered
In order to make sure that both observations and predictions will show phenomenon of the
same frequency range, observations were interpolated on the 6 hours period of predictions. Indeed,
the other scenario would have let observations show variations of significant height that predictions
would not have been able to reproduce. However, improving the temporal resolution of ensemble
forecasts appear also as a possible way to improve performance of forecasts, allowing to reproduce
events in a larger frequency range.
Before any analysis of data could be undertaken, I had to extract all observations and
prediction time series from their various files and turn them, after compilation, into an easily
readable format – here .mat files - using UNIX and matlab scripts. The typical processing chain for
one data file is illustrated below:
Figure 2-2: Sketch of data processing chain
Unarchiving
Extraction of data at
relevant locations
Conversion netCDF4
to netCDF3
Concatenation with
previous dates
Data Storage in .mat
format
ACTIMAR’s
archives only
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
14
Time series of significant height and wind speed were stored from the 20th
of October 2012
to the 20th
of April 2013 for GEOWaFS and ACTIMAR forecasts. Significant height time series of
FNMOC forecast were stored from the 1st
of January 2013 to the 20th
of April 2013. Finally, time
series of significant height recorded by buoys were also stored at each location from the 20th
of
October 2012 to the 20th
of April 2013.
2-2 Definition of WW3 study areas
The project focuses on 4 different areas which had to be defined spatially before starting
simulations in order to be able to establish the forcing at borders or prepare the bathymetry.
For the purpose of this master thesis, I focused first on Indonesian area and on the North Sea in
order to benefit from a sufficient number of buoy observations making the validation of state-of-
the-art forecasts easier.
The area in the North Sea extends from 5°W to 10°E in longitude and from 50°N to 65°N in
latitude, while the Indonesian one is much larger and extends from 90°E to 140°E in longitude and
from 5°S to 30°N in latitude. The Indonesian area is characterized by the presence of thousands of
islands with a drastic influence on sea states. The spatial resolution of the grid used in these areas is
the one of the NCEP/FNMOC ensembles that is to say 1° x 1°.
The second part of the project on high resolution forecasts will focus on the Tierra del Fuego area in
Argentina which extends from 70°W to 62°W in longitude and from 58°S to 48°S in latitude. On
this area, the grid spatial resolution is 0.1° x 0.1° and is forced by a 1° x 1° global run. Drastic
climatic conditions can be observed with recurrent storms and strong weather.
Concerning the input parameters of the model, such as the parameterization of bottom friction, surf
breaking, dissipation, non linear interactions or the choice of advection schemes, the usual set of
parameters used at ACTIMAR for operational forecasts was used. The study of the influence of
these parameters was indeed beyond the scope of this thesis.
This represents, however, a possible solution to improve results especially when dealing with small
areas at very high resolution where the influence of these parameters can be higher than usual.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
15
2-3 Overview of mathematical and visualization tools
Many possibilities exist to compare probabilistic forecasts (Bidlot and Al., 2002 [2])
focusing on different parameters and characteristics of the forecast. In addition to usual direct
comparison of Hs and RMS errors, scatter plots of predictions relatively to observations were used,
as well as Pearson product moment correlation coefficients and persistence analysis.
All these performance indicators were selected to estimate the quality of existing forecasts for the
purpose of offshore operation planning and were computed using Matlab scripts written during my
master thesis.
Visual comparisons of significant heights predicted by the forecasts were realized using a
box-and-whiskers representation showing the median of the ensemble – the horizontal line, the 25th
and 75th
percentiles – vertical box, and the minimum and maximum values – vertical lines called
whiskers.
The dispersion of Hs for the members of the ensemble at a given time is then clearly represented.
Figure 2-3: Box-and-whiskers representation of the ensemble members of the forecasts
Root mean square errors were computed for the members of the forecasts at each given time
in order to estimate the variations in the prediction of events within the ensemble using following
formulation – where y represents the observation and ŷ stands for predicted values at given time t, n
being the size of the ensemble.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
16
RMS error was also computed on the entire time-series of characteristic quantities of the
ensemble - including median, maximum and minimum values, thus giving an indication of the
overall deviation of predictions from observations.
Another relevant performance indicator is the Pearson product-moment correlation
coefficient r which gives a good estimate of the linear dependence between predictions and
observations. Its formulation in the case of a sample of paired variables X and Y is given below – X
and Y being here observation and prediction data, X and Y being their respective mean with n the
size of the sample.
Thanks to this coefficient, a “best member” of the ensemble forecast was defined using a
posteriori comparisons with observations. The member of the forecast having the highest
correlation coefficients being considered as the best member of the ensemble.
In addition to those performance indicators, scatter plots were drawn. Values taken by the
observation dataset are set as horizontal axis and predictions from the model as vertical axis. A
linear regression was performed each time in order to estimate the deviation from observations.
This regression consists in an iterative reweighed least square algorithm giving less weight to
outliers implemented in Matlab toolboxes.
Q-Q plots were also used; these diagrams consist in plotting the quantiles of two variables
against each other. Probability distributions can then be compared easily. The same algorithm than
for scatter plots was used to produce regression lines of datasets in Q-Q plots.
In addition to those diagrams, a common analysis performed on climatic related variables is
the persistence analysis, also known as detection of climatic windows. It consists in studying the
ability of the forecast to predict successfully events during which climatic conditions – here
significant wave height – stay below a given threshold for a given period of time. This analysis,
which is essential for offshore operations, was realized thanks to a simple algorithm I implemented.
It consists in browsing the time-series of Hs and saving dates consistent with a climatic-window
pattern, that is to say a beginning date at which Hs falls below threshold as well as at its directly
following date, while at preceding date Hs is above threshold. Ending dates are the last of two
successive dates at which Hs is still below threshold while at following one, significant height falls
above.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
17
Figure 2-4: Sketch of the pattern used for persistence analysis
All climatic windows of any duration are recorded thanks to this pattern, a simple subtraction of
serial date numbers of both corresponding ending and beginning dates permits then to separate
windows of various durations and to count them.
Borderline cases where no ending dates could be found before the end of the time-series were
solved by imposing the end of dataset as an arbitrary ending date. The minimum size of detected
window is twice the temporal resolution of data. (12h window for the 6h outputs)
Along with this detection, another performance indicator of persistence analysis was computed, the
equivalent percentage uptime. It represents, in percentages of the total number of hours of each
month, the amount of time during which the environmental parameter (here significant height) stays
below a given threshold.
In order to perform this computation, each time step at which Hs is below the threshold is
associated with a “Flag 1” while time steps which do not satisfy the condition are characterized by a
“Flag 0”. The percentage of “Flag 1” gives the equivalent percentage uptime as defined above.
Date0 Date1 DateX DateY…
Hs > T Hs < T Hs < T Hs > T…
T : Threshold
Beginning date Ending date
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
18
3 Validation of state-of-the-art wave ensemble forecast
The purpose of this section is to develop comparisons of state-of-the-art NCEP and FNMOC
ensemble forecasts with buoy data and against higher resolution deterministic forecast. Both areas
of interest – that is to say North Sea and Taiwan regions - will be treated separately using various
performance indicators.
For a matter of redundancy and length, only parts of relevant figures will be shown in this paper.
3-1 In the North Sea
The area is characterized by high waves events with Hs reaching values higher than 7meters.
Those events represent a particularly sensitive matter for this study and will therefore be given top
priority during the project.
Figure 3-1 illustrates the maximum, minimum and median predicted values of ensemble forecast on
a period covering several of these events along with observation and ACTIMAR deterministic
forecast’s predictions at buoy B62164.
Figure 3-1: Observations and deterministic forecast along with maximum, minimum and median values
predicted by NCPE ensemble forecast from the 10th
of January 2013 to the 1st
of April 2013 at buoy B62164
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
19
High waves events with Hs exceeding 6 meters are detected on 18/01/2013, 05/02/2013, 14/02/2013
and 19/02/2113 along with lower Hs peaks. Except from the first one, at all these events, both
deterministic and ensemble predictions highly underestimated the significant height by 1.5 to
3meters. This tendency is confirmed at smaller Hs peaks at which predictions often reveal to be
under observations. Furthermore, deterministic predicted values appear to be 0.5 to 1meter lower
than the ones predicted by the ensemble forecast – lower than the minimum value given of the
ensemble.
Another remarkable fact is the low spread of the ensemble on most parts of the visualized period;
indeed, the variations between minimum and maximum values of the ensemble never overcome
0.30 meters and they regularly appear to be superimposed – variations lower than 10cm within the
ensemble.
During lower Hs events, predictions are generally closer to observations and show approximately
the same order of magnitude for differences between minimum and maximum values of the forecast
with significant variations from date to date. For instance, these differences amount to more than
0.5m on 15/01/2013 but lead to almost superimposed curves on 17/01/2013.
Figure 3-2 presents a box-and-whiskers representation of the ensemble predictions focusing on a
smaller time period still at buoy B62164 – refer to Section 2-3 for further information about this
representation.
Figure 3-2: Observations and deterministic forecast along with box-and-whiskers representation of NCEP
ensemble forecast from the 1st
of March 2012 to the 22nd
of March 2012 at buoy B62164.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
20
On this period, two relatively high wave events were recorded on 08/03/2013 and on 19/03/2013. In
both cases predictions were below observations for both deterministic and ensemble forecasts. It is
interesting to notice that predictions seem to be closer to observations during decreasing Hs phase
than when Hs increases with also smaller variability. Except from two dates, boxes and whiskers are
short, values in the ensemble showing very little spread. At dates where high Hs values were
recorded, this indicates a lack of variability as they represent periods where the sea state is the most
unpredictable, thus variability should be at its highest.
Like previously observed, the deterministic forecast predicted significant heights even lower than
for the ensemble forecast, typically 0.5meters lower.
Similar tendency of underestimating significant height during high wave events is observed
at buoy B62145. The highest Hs values recorded by the buoy amount to approximately 5meters
while predictions often stay 1meter below that limit, many lower peaks were however well
predicted.
It also appears that variability within the ensemble of the forecast is higher, as minimum and
maximum predicted values are generally further from one another as shown on Figure 3-3.
Figure 3-3: Observations and deterministic forecast along with box-and-whiskers representation of NCEP
ensemble forecast from 22nd
of November 2012 to the 2nd
of December 2012 at buoy B62145.
Both boxes and whiskers components shown are longer, especially during higher wave
events. On 25/11/2012 and 28/11/2012 higher ensemble variability is noticed which is consistent
with the Hs peaks of 5.2 and 3meters recorded by the buoy.
Other buoys show intermediate behaviors in terms of variability. The underestimating tendency is
nevertheless recurrent at all locations. The use of scatter plots (Figure 3-4) permits to quantify this
tendency by plotting predicted Hs values against observations. The regression line computed on the
dataset gives an indication of the deviation from measures.
The first characteristic of scatter plots to be mentioned is the position of regression line which falls
below the identity line at high Hs values. Hence, observations are statistically higher than
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
21
predictions during high waves events. Furthermore, despite the fact that best member and the
median value are similar, it appears that the median value of the forecast is closer to observations
than the other indicators as its correlation coefficient is higher. Statistically, its regression line is
also closer to the identity line.
The maximum value sometimes reveals to give good results for 0-24h predictions, but always falls
far from observations at higher lead times.
For this reason, the median value of the ensemble will be considered as the most effective way to
characterize the forecast for the purpose of this study. The table in Appendix B gathers all RMSE
and correlation coefficients values at all buoys and lead times and illustrates well the effectiveness
of the median value compared to others indicators.
At all buoys, the regression lines of median value’s dataset are slightly below the identity line with
correlation coefficients still higher than for other indicators.
The underestimating tendency is confirmed even if its impact does not seem to be as strong on
scatter plots as it seems on time series.
Figure 3-4: Scatter plots of best member, median value and maximum value of ensemble forecast (0-24h) against
observations at buoy B62145.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
22
Using the median value as characteristic quantity of the ensemble performances, comparisons with
deterministic predictions were performed. Scatter plots of both median and deterministic predictions
were drawn on Figures 3-5 & 3-6.
The median value is systematically better at high Hs at all lead times. Even if deterministic
forecasts seem to show lower spread than median value, high Hs values are systematically
underestimated and median values appear to be more appropriate for medium to long range
predictions.
Figure 3-5: Scatter plots of median value of ensemble and deterministic forecast (0-24h & 48-72h) against
observations at buoy B62127 and B62164.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
23
Results of the persistence analysis performed at two buoys in the North Sea are plotted
below on Figure 3-6. The number of detected windows represents the total number of 12h, 24h,
36h, 48h and 72h windows.
The probability function of the number of windows was computed using a kernel smoothing density
estimator implemented in Matlab toolboxes from values given by all members of the forecast.
Results vary from buoy to buoy both in terms of number of windows and of quality of the
prediction relatively to observations, however ensemble predictions are in good agreement with
observations except from buoy B62127 at which predicted total number was lower of
approximately 100 windows than observed value.
Deterministic forecast’s result in terms of number of windows is also given along with observation
and ensemble values. At all buoys in the North Sea, deterministic forecast shows a number of
windows always way higher than the observations and are further from them than ensemble
forecasts.
Tables with results for all buoys are given in Appendix B. Equivalent percentage uptime – as
defined in Section 2 - for given threshold and window duration are given for observations,
deterministic forecast as well as minimum, median and maximum value of ensemble forecast.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
24
Figure 3-6: Total number of detected windows of 12h, 24h, 36h, 48h and 72h below the 1.5m threshold at buoys
B62145 and B62127.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
25
3-2 In Taiwan
In the area near Taiwan, lower significant heights are observed than in the North Sea although Hs
peaks happen more frequently.
Figure 3-7 & 3-8 illustrates the maximum, minimum and median predicted values of NCEP
ensemble forecast on a period covering several of these events along with observation and
ACTIMAR deterministic forecast’s predictions at buoy C6V27 and 46699A.
Figure 3-7: Observations and deterministic forecast along with maximum, minimum and median values
predicted by NCPE ensemble forecast from the 20th
of October 2012 to the 29th
of November 2012 at buoy C6V27
Figure 3-8: Observations and deterministic forecast along with maximum, minimum and median values
predicted by NCPE ensemble forecast from the 30th
of November 2012 to 9th
of January 2013 at buoy 46699A
According to these figures, the behavior of predictions varies a lot from one buoy to another. A
tendency of overestimating significant wave height may be noticed but is not clear on all time series
and will therefore have to be confirmed. Moreover, unlike in the North Sea, this tendency does not
reveal to be limited to high wave events but covers the whole dataset and thus, may originate from a
systematic bias either in observations or in predictions.
Concerning the variability of the area in terms of Hs, peaks of higher waves can be observed every
3-5 days from 1meter above the average to more than 3meters. The average significant wave height
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
26
is relatively high – approximately 2meters – and Hs seem to hardly ever fall below a 1m threshold.
Figure 3-9 shows the observations, deterministic prediction and the ensemble forecast as a box-and-
whiskers representation at buoy B46757B.
Figure 3-9: Observations and deterministic forecast along with box-and-whiskers representation of NCEP
ensemble forecast from 22nd
of November 2012 to the 2nd
of December 2012 at buoy 46757B.
Just as in the North Sea, variability within the ensemble is low and only increase during short
periods. Unlike previously, Hs peaks seem to suffer from time shifts from time to time in addition to
the errors in the predicted height.
Variability seems also to be higher during increasing Hs events which is consistent with the
tendency observed in the North Sea.
Predictions alternatively fall above and below observations and no overestimating global tendency
can be confirmed. Deterministic forecast stays very close to ensemble predictions except at dates
when Hs peaks are observed, the deterministic prediction is systematically higher.
No evidence can be given to determine which forecast is closer to observations on time series.
The median value of the ensemble was taken as a characteristic quantity of the forecast as it gives
statistically better results than others.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
27
Figure 3-10: Scatter plots of median value of ensemble and deterministic forecast (0-24h & 48-72h) against
observations at buoy 46778A and C6V27.
Ensemble forecast in Taiwan seems to give better result at both high and low Hs with higher
correlation coefficient than deterministic forecast according to scatter plots on Figure 3-10.
The tendency of predictions is to overestimate Hs values lower than a threshold while values higher
are underestimated. The threshold value varies with buoys and lead times from 0.5m to more than
3meters.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
28
Figure 3-11a: Total number of detected windows of 12h, 24h, 36h, 48h and 72h below the 1.5m threshold at buoy
C6V27.
The persistence analysis performed on buoys in Taiwan follows the same methods as the one used
for the North Sea.
The total number of windows varies with the position of the buoy as they are not exposed to the
same sea states and ocean depths.
At all locations, results are slightly further to observations than they were in the North Sea but still
in relatively good agreement. Deterministic forecast regularly gives results similar to ensemble
predictions in Taiwan in terms of persistence analysis.
Tables with results for all buoys are given in Appendix A. Equivalent percentage uptime – as
defined in Section 2 - for given threshold and window duration are given for observations,
deterministic forecast as well as minimum, median and maximum value of ensemble forecast.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
29
Figure 3-11b: Total number of detected windows of 12h, 24h, 36h, 48h and 72h below the 1.5m threshold at buoy
46757B.
3-3 Wind and wave variability
In both areas, ensemble predictions are characterized by a weak variability during high Hs
events which does not prevent them to predict well these events compared to deterministic
prediction. Persistence analysis demonstrates that the variability of the areas is well predicted.
In Taiwan, high Hs are often better predicted than lower ones while the opposite applies in the
North Sea. Considering all indicators, ensemble predictions are promising in operational context as
they give regularly better result than higher resolution deterministic forecast. The lack of ensemble
variability and the underestimating tendency of high significant heights represent however a major
obstacle to be overcome.
Considering both areas, the origin of the low ensemble spread was studied. The main factor which
is likely to underlie this trend is the existence of a similar tendency in the wind fields’ ensemble.
Time series of wave and wind ensembles were plotted simultaneously in a box-and-whiskers
representation to qualitatively estimate a possible link.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
30
Figure 3-12: Wind magnitude, ensemble median along with observations and deterministic forecast at buoy
C6V27 from 12/12/2012 to 24/12/2012.
The lack of variability within the ensembles can be noticed on both wave and wind fields during
high waves events. Even if a correlation seems to exist, it must be moderate as an increase in the
spread of wind ensemble does not systematically lead to an equivalent increase in wave ensemble
such as on the 16th
of December 2012 at buoy C6V27 or on the 23rd
of October 2012 (Figure 3-12
& 3-13).
Except from occasional events which regularly occur in the time series, wind and wave ensemble
spreads seem to be relatively well correlated in the North Sea as well as in Taiwan.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
31
Figure 3-13: Wind magnitude, ensemble median along with observations and deterministic forecast at buoy
B62127 from 20/10/2012 to 30/10/2012.
To quantify the correlation between both ensembles, scatter plots and QQ-plots of the wind and
wave normalized amplitudes were drawn as illustrated on Figure 3-14.
Normalization of amplitudes used the standard score formulation as given below:
σ
XX
Z
−
=
, where Z is the normalized value, X represents raw value of the population, X is the mean of the
population and σ its standard deviation.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
32
Figure 3-14: Scatter plot and QQ-plot of normalized wave amplitude on normalized wind amplitude at buoy
B62145.
These scatter plots and quantiles-quantiles diagrams were similar at all buoys despite small
variations in the regression coefficients computed.
In all cases the regression line of the scatter falls below the identity line, hence indicating that the
spread of wave ensemble is lower than wind ensemble’s one with slope varying from 0.3 to 0.6.
The distributions of amplitudes computed from both ensembles are however well correlated as
values in the QQ-plot gather around the identity line except from highest values.
Variability of wind ensembles may then represent a major factor underlying the weak variability
observed in significant wave height ensembles; however it does not fully explain this low spread as
the correlation exists but is moderate.
Ensemble forecasts are promising in operational context but their resolution does not seem to be
sufficient in order to predict at best high wave events or structures with small temporal and spatial
extent. Wind forcing resolution appears also to be critical when considering wave forecasts.
It also appeared that median value of the forecast is efficient for characterizing its performances for
medium to long ranges predictions (48h, 72h and 96h).
Ensemble forecast performances have then revealed to be insufficient for extremely precise
operational forecast as they underestimate high wave events and show less variability within the
ensemble than expected which may lead to a loss a one of the greater strength of ensemble forecast.
Thus, ways to improve these forecasts without increasing computational cost have to be found
either by acting directly on these forecasts, either by running higher resolution ones.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
33
4 Improvement of Wave Ensemble Forecasts
4-1 Improvement of State-of-the-art forecasts
Ensemble forecasting basically requires exceptional computational cost as it can involves up
to several dozens of simulations. Indeed, each member of a wave ensemble forecast originates from
an independent perturbed wind forcing field and an initial sea state. These wind fields often comes
from the outputs of an atmospheric ensemble forecast run in parallel in the same forecast center.
Hence, each operational ensemble forecast normally needs as many simulations as twice the number
of members to be effective. However, some solutions exist to reduce this number of simulations;
first by improving directly existing forecasts or by generating wind forcing fields faster.
Linear Shift
As predictions at all locations considered tend to underestimate significant height during high wave
events, a common pattern was searched in the distribution of all Hs time series in order to find a
simple linear transformation to apply to all datasets.
QQ-plots of predicted Hs values on observations were drawn at all buoys for 0-24h, 24-48h, 48-72h
and 72-96h lead times and linear regression was performed on datasets.
Figure 4-1: QQ-plot of wave ensemble quantiles on observation data quantiles for 0-24h, 24-48h, 48-72h, 72-96h
predictions at buoy 46778A.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
34
Obviously, the shape of the quantile-quantile diagram at buoy 46778A suggests that the predicted
distribution is non-linearly correlated with observations’ distribution, and characterized by a
positive skew according to the observed concavity.
Linear regression was nevertheless performed on the dataset in order to make it fit the distribution
of observations at best with the simplest transformation.
Intercepts of the regression lines at buoy 46778A vary from -0.0051 to 0.11 and slopes from 0.72 to
0.94. Thus, no simple transformation can be applied to predictions at all lead times.
The same exercise was done at other buoys including B62164. The opposite concavity is observed
from buoy B62164 to 46778A, the correlation between predicted and observed distributions is
however higher as the point cloud is gathered closer to the identity line.
Slopes of regression lines vary from 0.89 to 0.94 and intercepts from 0.051 to 0.23. All values for
regression lines of all lead times and buoys are given in Table 4-1.
Figure 4-2: QQ-plot of wave ensemble quantiles on observation data quantiles for 0-24, 24-48h, 48-72h, 72-98h
predictions at buoy B62164.
As coefficients needed to perform a transformation shifting all datasets closer to observations vary a
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
35
lot with location and lead time of the forecast, it seems impossible to easily improve performances
of state-of-the-art forecasts with a simple linear transformation.
1 day lead time 2 days lead time 3 days lead time 4 days lead time
B62145 Y=0.057 + X*0.940 Y=0.003 + X*1.000 Y=-0.007 + X*1.000 Y=-0.021 + X*1.100
B62164 Y=0.080 + X*0.890 Y=0.051 + X*0.920 Y=0.058 + X*0.940 Y=0.230 + X*0.900
B62127 Y=0.062 + X*1.000 Y=0.025 + X*1.100 Y=0.015 + X*1.100 Y=-0.067 + X*1.200
B63113 Y=0.240 + X*1.000 Y=0.260 + X*1.000 Y=0.058 + X*1.100 Y=-0.096 + X*1.200
46778A Y=-0.005 + X*0.940 Y=0.056 + X*0.810 Y=0.081 + X*0.760 Y=0.110 + X*0.720
46699A Y=0.140 + X*1.400 Y=0.240 + X*1.200 Y=0.280 + X*1.200 Y=0.280 +X*1.200
46757B Y=-0.110 + X*1.000 Y=-0.054 + X*0.980 Y=-0.053 + X*0.940 Y=-0.048 + X*0.960
C6V27 Y=0.290 + X*0.950 Y=0.290 + X*0.940 Y=0.260 + X*0.950 Y=0.270 + X*0.970
Table 4-1: Slopes and intercepts of regression lines from QQ-plots for J0 to J3 at all buoys
This method may be applied for local conditions and in the case where a lot of time is available to
perform climatologic analysis, this is however not generally the case in operational context.
Therefore, the solution cannot be retained.
Empirical Orthogonal functions
Another alternative solution to create wind ensemble requiring way less computational time and
using Empirical Orthogonal Functions (EOF) was also considered. Unfortunately, the time lacked
during my master thesis to investigate fully this method and to assess its performances. Only
theoretical matters were studied.
EOFs represent orthogonal basis functions of a signal or a dataset accounting each for as much
variance as possible. They are typically obtained by computing eigenvectors of the covariance
matrix of the dataset.
Thanks to this method, perturbed wind fields are generated from one single unperturbed wind field.
The spatial structure of perturbations is given by EOFs, thus maintaining most spatial properties of
the undisturbed field throughout the process. The method is often known as geographically
weighted principal component analysis (PCA) in geophysics.
The principle of the method developed in this study is to run atmospheric model in the chosen area
on a long period –typically one or several years – once and for all. This initial run would give the
overall structure of errors in the area, taking into account every situation encountered within the
period considered.
An ensemble representation of the error covariance is thus provided and analyzed in order to extract
eigenvectors and eigenvalues of error covariance matrix. Each eigenvector represents a direction of
the spatial structure of errors and the higher the associated eigenvalue is, the more variance the
direction will account for. By simply introducing random errors following the directions given by
the EOFs in a single wind field, it is then possible to obtain up to several dozens of wind ensemble
members without any additional computational cost.
This method combined with the clustering algorithm may permit to create even more accurate
reduced forecasts. Indeed, EOF based forecasts can easily reach more than 20members, merging
more members together in the clustering phase and therefore making the choice of clusters more
efficient.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
36
4-2 High resolution wave ensemble forecast
In the previous sections, the performances of existing forecasts along with statistical
methods to reduce the computational cost when dealing with ensemble forecasts were investigated.
Still, the question of the performances of high resolution operational ensemble forecasts remains
unanswered. This very question was studied with predictions realized in the Tierra del Fuego area as
defined in Section 2-2. The main question underlying this section is to study how forecasts are
improved when wind forcing fields and bathymetry are refined.
High resolution wave ensemble forecast validation
Validation procedures were based on satellite wave observations on the area measured by
Jason 2 during March and April 2013.
So far, results of the WW3 runs performed at a high resolution are not satisfactory as can be
observed on Figure 4-3 and 4-4 which shows the main characteristics of the comparison of both
GEOWaFS and High Resolution forecasts respectively with the observations from a swath
performed around the 10th of March 2013.
Figure 4-3: Satellite measurement and GEOWaFS forecast around the 10th
of March 2013 in Tierra del Fuego.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
37
Figure 4-4: Satellite measurement and ACTIMAR’s forecast around the 10th
of March 2013 in Tierra del Fuego.
From these Figures, two details can be mentioned. On the swath considered, the ensemble
predicted by GEOWaFS seems to be in better agreement with JASON2’s satellite observations as it
shows lower mean RMSE and higher mean correlation coefficient with measurements of
respectively 0.20 and 0.65 than ACTIMAR’s high resolution forecast which reaches a value of 0.39
for mean RMSE and 0.62 for the mean correlation coefficient. Therefore, so far, the high resolution
predictions are not accurate enough to overcome state-of-the-art forecasts. The causes of these
disappointing results may be various and will be developed further in this thesis.
The second point to be mentioned is the higher variability observed in the ensemble. As was
demonstrated previously, a sufficient spread is needed to ensure good quality of results and this
condition is satisfied. In regards to this matter, the high resolution forecast fulfils our expectations.
The process used to run these forecasts have nevertheless to be revised.
The limits of the high resolution forecasts are various. The choice of parameters of the model may
be partly responsible. Indeed, the parameters used in the WaveWatch III model are the same as the
one used for the global deterministic wave forecast at ACTIMAR, more accurate results may be
obtained by tuning these parameters to fit at best the Tierra del Fuego area on the nested high
resolution grid. A complete study of the influence of these parameters should then be performed to
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
38
find the best combination possible.
The input wind fields may be another limiting factor. With a lower resolution than the one of the
grid, many phenomenons might be badly predicted. The validity of the wind ensemble has also to
be checked against in-situ observations and against NOAA’s wind fields. A bias in the wind fields
could easily explain the poor quality of results.
The lack of time did not permit to fully investigate all aspects of high resolution forecasts,
directions for further research will include, amongst others, studying the influence of the
improvement of the wind forcing and the bathymetry respectively.
Clustering
In the context of high resolution ensemble forecasts, computational times – already high in
all ensemble forecasts - begin to be critical and must be reduced somehow to assure an operational
viability.
As mentioned previously in this paper, ensemble members are chosen in order to reflect the
uncertainty in the observations. Considering how important the variability of wind forcing is in
regards to this matter, it is common sense to try to maintain the widest range of wind magnitude
values in the different input wind fields.
Therefore, a way to reduce the number of wind fields without deteriorating wind variability was
investigated. Involving classification methods, it consists in merging members with similar
variability together.
K means clustering method was retained amongst many other possibilities as it was easy to adapt to
a context of ensemble forecast and is known for converging quickly with the right heuristic
algorithms despite its computational complexity. The principle of this method is very simple which
makes it easy to study and use.
The Matlab algorithm used for the purpose of this study consists in two iterative steps:
• An assignment step: which consists in assigning each observation to the cluster
whose mean – called centroïd - is the closest. The squared Euclidean distance was
used to compute distances in the dataset.
• An update step: during which the centroïds of clusters are computed again by
including in the averaging phase the newly added points.
The algorithm stops and is said to have converged when the assignment does not change.
In other words, the algorithm tends to minimize the sum of within clusters distances from points to
the centroïds.
The clustering is performed by heuristic algorithms which do not find always the best solution but
an approximate one. A local minimum of the sum of within clusters distances can be found instead
of the absolute one, the convergence towards one or another solution being mostly related to the
initial partition of the dataset. As described in the literature [Gong and Richman 1995], k means
algorithms are indeed very sensitive to initial partition: the first guess of clusters partition of the
dataset.
An easy way to fix this problem, made possible by the quick convergence of the algorithm, is to
run it several times with different initial partitions and find the lowest local minimum. 2000 runs of
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
39
the kmeans algorithm were then performed on the dataset in order to approximate at best the
minimum. The best solution amongst all these local minimums is then considered as the best
partition of the data.
In the context of operational ensemble forecast, the clustering should be performed each day of
simulation and take into account predictions at all lead times in order to adapt at best to spatial and
temporal variability. In regards to this need, the dataset on which the clustering is to be performed
consists in a 2D-table where each row represents the wind magnitudes for one component measured
for one member of wind ensemble. Each column stands for one grid point at a given date. Figure 4-
5 illustrates this layout.
Figure 4-5: Layout of input dataset for clustering. X and Y stand for the number of grid points in longitude and
latitude respectively, U(i , j) represents predicted wind magnitude in zonal direction at grid point (x=i , y=j).
(J0,T0) to (J4,T0) are the lead times of the prediction with (J0,T0) = 00h00, (J0,T1) = 06h00, …, (J1,T0) = 24h00.
For the purpose of this study, 24 points in latitude and 34 points in longitude were taken for the
wind fields with a spatial resolution of 1°x1° and a temporal resolution of 6 hours. Predictions were
used at lead times up to 4days: 0-96h. 20 different wind fields were available from WRF
simulations.
Another detail to be mentioned concerning the k means clustering method is that the number of
clusters to be computed is an input parameter. Thus, preliminary runs were to be made in order to
estimate the optimal number of clusters balancing the future computational cost and the accuracy of
the solution.
From the initial ensemble of 20 wind fields, it rapidly appeared that creating less than 7 clusters led
to a significant lack of wind variability. With more than 11 clusters, each additional member did add
only little information to the overall ensemble as the distribution of amplitude values was already
close to the initial members’ one.
. . .
. . .
1st member
2nd
member
.
.
.
X * Y grid
points at J0, T0
X * Y grid
points at J0, T1
X * Y grid
points at J0, T2
X * Y grid
points at J0, T3
U(i , j) U(i , j+n) U(i+n , j+n)U(i+n , j)
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
40
Figure 4-6: Probability Density Function of the amplitude of 7 and 10 clusters ensembles and correlation
coefficient with initial ensemble of 20 members.
It appeared that the within ensemble wind amplitudes obtained from the ensemble of 7 clusters are
significantly lower than both ensemble of 10 clusters and 20 members as illustrated by the
distributions on Figure 4-6, The 10-clusters ensemble being slightly closer to the initial ensemble
as shown by the higher correlation coefficient. Considering only wind distributions, the gain in
performance using the 10-clusters ensemble seems to overcome the additional computational cost.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
41
The same analysis was however performed after computation of the respective wave predictions.
Significant wave height predictions of the model using from 7 to 11 clusters were thus compared in
order to estimate the gain in performances, keeping computational cost in mind.
Figure 4-5 presents a box-and-whiskers representation of waves ensemble composed of 20
members, 7 clusters and 10 clusters respectively on a J0 to J3 prediction.
Unlike what could be inferred from wind distributions, wave forecasts are very close to one another
when computed from 7 or 10 clusters. Moreover, they are also close to the initial 20-members
forecast except from a few dates at the end of the forecast around the 5th
of July 2013.
Unfortunately, as this analysis took place relatively late during my master thesis, it only considers a
very short time period. Still, it gives a good preview of the expected performances of such a method
which are very satisfactory.
Further in this study, 7-clusters ensembles were considered as they need less computational cost and
seem to give good results.
Figure 4-7: Box-and-whiskers representation of 7-clusters, 10-clusters and 20-members ensemble forecasts from
the 1st
of July 2013 to the 6th
of July 2013.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
42
As it was mentioned previously in this study, the median value has proven to be the most effective
characteristic figure of the forecasts. Therefore, it seems mandatory to study the impact of
clustering on the median of ensembles. On Figure 4-7 results for a 7-clusters ensemble are
presented. The median values of the initial forecast from the 1st
of July 2013 to the 6th
of July 2013
are plotted along with the median of a 7-clusters ensemble and a weighted 7-clusters ensemble. This
latter ensemble consists in replicating values within the 7-clusters ensemble as many times as the
number of initial members which were merged in each cluster. In other words, if the 1st
cluster
gathers 4 different members, its value will be repeated 4 times in the weighted ensemble. Thus, we
obtain a 20-members ensemble filled with cluster values, taking into account the probabilities for
each cluster value to happen.
It is clear on Figure 4-8 that the median values computed from the initial ensemble or a cluster
ensemble are much the same, the clustering seems then to maintain well the variability of the
ensemble throughout the process. The median value can then be used as well to characterize a
forecast composed of a cluster ensemble.
Figure 4-8: Median values of 20-members, 7-cluster and weighted 7-clusters ensembles from 2013/07/01 to
2013/07/06.
Once the number of clusters was set, the question of the proper use of the output information given
by the clustering had to be raised. Centroïds positions, as well as ensemble members constituting
each cluster, represent the main outputs of the method, along with centroïd-point distances values
which were used for additional check of results.
I investigated two methods:
All constituting members of the clusters were at first replaced by their respective centroïds.
Thus, each member had an influence on the final “cluster ensemble” weighted by its distance from
the centroïd.
The main drawbacks of such a method are that the final “cluster ensemble” appeared to be slightly
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
43
smoothed by the within cluster averaging, therefore the range of amplitudes – maximum of the
ensemble minus minimum of the ensemble - was reduced. Moreover, this method meant performing
several data conversion from Matlab format to NetCDF then to a specific format read by the wave
model (.wnd).
The second method consists in replacing all constituting members of a cluster by the closest
member from the centroïd, thus avoiding any averaging between members or any data conversion.
Several limits can be pointed out in regards to this clustering technique. The computational power
and memory required to run the algorithm on huge datasets do not permit to take into account a long
time period for the computation of clusters and it must be run separately for each date, therefore the
selected members may change from one run to the following one.
In order to avoid artefacts in the time series, a gap was intentionally left at the beginning of each run
and a simple interpolation was performed to connect both sides of the gap. The loss of information
is then minimized with low computational cost.
Another limit which still remains is the dependence of the algorithm to the initial partition of the
dataset. Despite the high number of different runs made in order to reduce its influence, this
dependence leads to clusters slightly different from one to another on the same dataset.
However, as these variations mostly take place between members predicting relatively close Hs
values, the overall performance of the forecast is not really impacted. Members which are the
furthest from the others are systematically selected as they reproduce limit cases.
A quality check is also systematically performed after the computation of clusters to make sure the
final ensemble is not too far from the initial one.
The probabilities inherent to ensemble forecast on the most probable sea state to happen are not left
aside as the number of members for which each cluster accounts is stored systematically.
Thus, the most probable sea state can then easily be determined by the probability density function
of the ensemble values.
Despite the short time period covered by this analysis, clustering methods show impressive
effectiveness to reduce computational cost while maintaining at best the quality of the forecast.
Studies on longer periods will have to be performed in order to assess the performances more
precisely and check their consistency in time.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
44
5 Conclusions
State-of-the-art wave ensemble forecasts revealed to be more efficient in prediction of
significant height than higher resolution deterministic forecast as they take into account the
uncertainty in the initial conditions. In both the North Sea and near Taiwan at shallow and high
depths, better results are noticed especially at longer lead times. It appeared also that the median
value of forecasts characterize well their performances. However, a lack of variability sometimes
appears within the ensemble mainly related to a lack of variability within the wind forcing
ensemble. Thus, they may not be sufficient in regard to very sensitive marine operations.
Several statistical methods were investigated to produce ensemble forecasts at lower
computational cost. Clustering methods were studied in particular as they proved to be an effective
way to gather correlated ensemble members, thus permitting to reduce the size of the ensemble –
with the smallest loss of information. Preliminary results with the simplest K-mean clustering
method are very encouraging. The generation of wind ensemble via Empirical Orthogonal
Functions (EOFs) was also studied in a very theoretical way and may represent a direction of
further research.
High resolution ensemble forecasts represent a possible improvement of existing forecasts.
Indeed, the high resolution forecast run in Tierra del Fuego area shows a higher variability than
lower ensemble forecasts – which represents their main limit. However, a non negligible bias
sometimes appear in these forecasts, probably related with poorly tuned model parameters and wind
forcing fields not accurate enough to reproduce well all phenomenon. An in-depth study of the
influence of these parameters, along with the influence of improvement on wind forcing fields and
bathymetry, should be conducted in the future to improve the quality of high resolution forecasts.
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
45
Bibliography
[1]: Les vagues: un compartiment important du système terre
Ardhuin, F, 2012 (Course)
[2]: Intercomparison of the performance of operational ocean wave forecasting systems with
buoy data
Bidlot, J.R., D.J. Holmes, P.A. Wittmann, R.L. Lalbeharry, and H.S. Chen / Weather
Forecasting 2002, 17, 287-310.
[3]: Performance of the ocean wave ensemble forecast system at NCEP
Cao, D., H.L. Tolman, H.S. Chen, A. Chawla and V.M. Gerald / MMAB contribution
No.279, 2009 (available at http://polar.ncep.noaa.gov/mmab/papers/tn279/mmab279.pdf)
[4]: A limited area wave ensemble prediction system for the Nordic seas and the North Sea.
Carrasco, A. and O. Saetra / Report No.22/2008, Meteorology and oceanography, ISSN:
1503-8017, Dec.2008
[5]: Wave modeling – The state of the art
Cavaleri, L. et al / Progress in Oceanography 75 (2007) 603-674
[6]: Ensemble Prediction of Ocean Waves at NCEP
Chen, H.S / Proceedings of the 28th
Ocean Engineering Conference in Taiwan, NSYSU,
2006
[7]: On ensemble prediction of ocean waves
Farina, L. / Tellus - Series A: Dynamic Meteorology and Oceanography (2002), Vol. 54,
Issue: 2, Pages: 148-158.
[8]: On the Application of Cluster Analysis to Growing Season Precipitation Data in North
America East of the Rockies.
Gong, Xiaofeng, Michael B. Richman / J. Climate, 1995, 8, 897–931.
[9]: Dynamics and Modeling of Ocean Waves
Komen, G.J., L. Cavaleri, M. Donelan, K. Hasselmann, S. Hasselmann and P.A.E.M, Jansen
/ Cambridge University Press 1994, 532pp.
[10]: Ocean Waves: The Stochastic Approach
Ochi, M.K. / Cambridge University Press 1998, 319pp.
[11]: Forecasting wave height probabilities with numerical weather prediction model
Roulston, M.S, J. Ellepola, J. von Hardenberg, L.A. Smith / Ocean Engineering 32 (2005)
1841-1863
[12]: Ensemble forecasting at NMC: The generation of perturbations
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
46
Toth, Z. and E. Kalnay / Bulletin of the American Meteorological Society Vol. 74, No. 12,
Dec. 1993
[13]: Ensemble Forecasting at NMC and the Breeding Method
Toth, Z. and E. Kalnay / Monthly Weather Review, AMS, pp.3297-3319, Dec. 1997
[14]: Statistical Methods in the Atmospheric Sciences
Wilks, D.S / International Geophysics Series, Vol.100, 676pp.
[15]: A perturbation method for hurricane ensemble predictions
Zhang, Z and T. N. Krishnamurti / Monthly Weather Review, 1999, 127, 447-469
CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
47

More Related Content

Viewers also liked

Презентация ЖКХ: очистка систем отопления, компания "НаноСерв"
Презентация ЖКХ: очистка систем отопления, компания "НаноСерв"Презентация ЖКХ: очистка систем отопления, компания "НаноСерв"
Презентация ЖКХ: очистка систем отопления, компания "НаноСерв"Nadezhda Pchelnikova
 
Self Improvement Complete Directory
Self Improvement Complete DirectorySelf Improvement Complete Directory
Self Improvement Complete Directorykindheartedpred87
 
Sarah Shuholm Resume-2015
Sarah Shuholm Resume-2015Sarah Shuholm Resume-2015
Sarah Shuholm Resume-2015Sarah Shuholm
 
Osu presentation (1)
Osu presentation (1)Osu presentation (1)
Osu presentation (1)kunphuzed
 
Hope and Dream (Biographical Recount)
Hope and Dream (Biographical Recount)Hope and Dream (Biographical Recount)
Hope and Dream (Biographical Recount)Annisa Alfath
 
Estructura Client-Servidor
Estructura Client-ServidorEstructura Client-Servidor
Estructura Client-ServidorAndreaFP99
 
Elly kleinman holocaust education center
Elly kleinman holocaust education centerElly kleinman holocaust education center
Elly kleinman holocaust education centerDheeraj Chohil
 
SAP ABAP-Archana Jha
SAP ABAP-Archana JhaSAP ABAP-Archana Jha
SAP ABAP-Archana JhaArchana Jha
 

Viewers also liked (14)

Презентация ЖКХ: очистка систем отопления, компания "НаноСерв"
Презентация ЖКХ: очистка систем отопления, компания "НаноСерв"Презентация ЖКХ: очистка систем отопления, компания "НаноСерв"
Презентация ЖКХ: очистка систем отопления, компания "НаноСерв"
 
1474595020.pdf ปุ๋ย 9
1474595020.pdf ปุ๋ย 91474595020.pdf ปุ๋ย 9
1474595020.pdf ปุ๋ย 9
 
Curriculum_Vitae
Curriculum_VitaeCurriculum_Vitae
Curriculum_Vitae
 
Nallen c v 15
Nallen c v 15Nallen c v 15
Nallen c v 15
 
Catalogue
CatalogueCatalogue
Catalogue
 
Self Improvement Complete Directory
Self Improvement Complete DirectorySelf Improvement Complete Directory
Self Improvement Complete Directory
 
Sarah Shuholm Resume-2015
Sarah Shuholm Resume-2015Sarah Shuholm Resume-2015
Sarah Shuholm Resume-2015
 
Project 2
Project 2Project 2
Project 2
 
Osu presentation (1)
Osu presentation (1)Osu presentation (1)
Osu presentation (1)
 
Hope and Dream (Biographical Recount)
Hope and Dream (Biographical Recount)Hope and Dream (Biographical Recount)
Hope and Dream (Biographical Recount)
 
Struktur Sosial
Struktur SosialStruktur Sosial
Struktur Sosial
 
Estructura Client-Servidor
Estructura Client-ServidorEstructura Client-Servidor
Estructura Client-Servidor
 
Elly kleinman holocaust education center
Elly kleinman holocaust education centerElly kleinman holocaust education center
Elly kleinman holocaust education center
 
SAP ABAP-Archana Jha
SAP ABAP-Archana JhaSAP ABAP-Archana Jha
SAP ABAP-Archana Jha
 

Similar to stouffl_hyo13rapport

WindSight Validation (March 2011)
WindSight Validation (March 2011)WindSight Validation (March 2011)
WindSight Validation (March 2011)Carlos Pinto
 
Advanced weather forecasting for RES applications: Smart4RES developments tow...
Advanced weather forecasting for RES applications: Smart4RES developments tow...Advanced weather forecasting for RES applications: Smart4RES developments tow...
Advanced weather forecasting for RES applications: Smart4RES developments tow...Leonardo ENERGY
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
Wind_resource_assessment_using_the_WAsP_software_DTU_Wind_Energy_E_0174_.pdf
Wind_resource_assessment_using_the_WAsP_software_DTU_Wind_Energy_E_0174_.pdfWind_resource_assessment_using_the_WAsP_software_DTU_Wind_Energy_E_0174_.pdf
Wind_resource_assessment_using_the_WAsP_software_DTU_Wind_Energy_E_0174_.pdfMohamed Salah
 
IRJET- Rainfall Prediction by using Time-Series Data in Analysis of Artif...
IRJET-  	  Rainfall Prediction by using Time-Series Data in Analysis of Artif...IRJET-  	  Rainfall Prediction by using Time-Series Data in Analysis of Artif...
IRJET- Rainfall Prediction by using Time-Series Data in Analysis of Artif...IRJET Journal
 
An Investigation of Weather Forecasting using Machine Learning Techniques
An Investigation of Weather Forecasting using Machine Learning TechniquesAn Investigation of Weather Forecasting using Machine Learning Techniques
An Investigation of Weather Forecasting using Machine Learning TechniquesDr. Amarjeet Singh
 
Consequence assessment methods for incidents from lng
Consequence assessment methods for incidents from lngConsequence assessment methods for incidents from lng
Consequence assessment methods for incidents from lngaob
 
Climate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningClimate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningIRJET Journal
 
Climate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningClimate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningIRJET Journal
 
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...IJDKP
 
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...IJDKP
 
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...IJDKP
 
Measuring the benefits of climate forecasts
Measuring the benefits of climate forecastsMeasuring the benefits of climate forecasts
Measuring the benefits of climate forecastsmatteodefelice
 
Hydrological Calibration in the Mount Lofty Ranges using Source Paramenter Es...
Hydrological Calibration in the Mount Lofty Ranges using Source Paramenter Es...Hydrological Calibration in the Mount Lofty Ranges using Source Paramenter Es...
Hydrological Calibration in the Mount Lofty Ranges using Source Paramenter Es...eWater
 

Similar to stouffl_hyo13rapport (20)

WindSight Validation (March 2011)
WindSight Validation (March 2011)WindSight Validation (March 2011)
WindSight Validation (March 2011)
 
Advanced weather forecasting for RES applications: Smart4RES developments tow...
Advanced weather forecasting for RES applications: Smart4RES developments tow...Advanced weather forecasting for RES applications: Smart4RES developments tow...
Advanced weather forecasting for RES applications: Smart4RES developments tow...
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
Wind_resource_assessment_using_the_WAsP_software_DTU_Wind_Energy_E_0174_.pdf
Wind_resource_assessment_using_the_WAsP_software_DTU_Wind_Energy_E_0174_.pdfWind_resource_assessment_using_the_WAsP_software_DTU_Wind_Energy_E_0174_.pdf
Wind_resource_assessment_using_the_WAsP_software_DTU_Wind_Energy_E_0174_.pdf
 
Mercator Ocean newsletter 47
Mercator Ocean newsletter 47Mercator Ocean newsletter 47
Mercator Ocean newsletter 47
 
IRJET- Rainfall Prediction by using Time-Series Data in Analysis of Artif...
IRJET-  	  Rainfall Prediction by using Time-Series Data in Analysis of Artif...IRJET-  	  Rainfall Prediction by using Time-Series Data in Analysis of Artif...
IRJET- Rainfall Prediction by using Time-Series Data in Analysis of Artif...
 
An Investigation of Weather Forecasting using Machine Learning Techniques
An Investigation of Weather Forecasting using Machine Learning TechniquesAn Investigation of Weather Forecasting using Machine Learning Techniques
An Investigation of Weather Forecasting using Machine Learning Techniques
 
Consequence assessment methods for incidents from lng
Consequence assessment methods for incidents from lngConsequence assessment methods for incidents from lng
Consequence assessment methods for incidents from lng
 
Climate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningClimate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine Learning
 
Climate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningClimate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine Learning
 
Mercator Ocean newsletter 31
Mercator Ocean newsletter 31Mercator Ocean newsletter 31
Mercator Ocean newsletter 31
 
Q4103103110
Q4103103110Q4103103110
Q4103103110
 
EENA 2018 - Weather-related emergencies
EENA 2018 - Weather-related emergencies EENA 2018 - Weather-related emergencies
EENA 2018 - Weather-related emergencies
 
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
 
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
 
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
 
Measuring the benefits of climate forecasts
Measuring the benefits of climate forecastsMeasuring the benefits of climate forecasts
Measuring the benefits of climate forecasts
 
Hydrological Calibration in the Mount Lofty Ranges using Source Paramenter Es...
Hydrological Calibration in the Mount Lofty Ranges using Source Paramenter Es...Hydrological Calibration in the Mount Lofty Ranges using Source Paramenter Es...
Hydrological Calibration in the Mount Lofty Ranges using Source Paramenter Es...
 
Workshop Funceme 2013
Workshop Funceme 2013Workshop Funceme 2013
Workshop Funceme 2013
 
Mercator Ocean newsletter 24
Mercator Ocean newsletter 24Mercator Ocean newsletter 24
Mercator Ocean newsletter 24
 

stouffl_hyo13rapport

  • 1. Operational ocean wave ensemble forecasts: state-of-the- art validation and high resolution forecasts Final Year Project report towards the achievement of a Graduate Engineering Diploma in Hydrography at ENSTA Bretagne Loïc Stouff Tutor at ACTIMAR: M. Cyril FRELIN Tutor at ENSTA Bretagne: Mme. Amandine NICOLE 2012 – 2013
  • 2. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 2
  • 3. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 3 ABSTRACT ......................................................................................................................................................................4 1 INTRODUCTION...................................................................................................................................................5 1-1 OUTLINE OF THE REPORT ..................................................................................................................................5 1-2 INTRODUCTION TO ENSEMBLE FORECAST .........................................................................................................6 1-3 MAIN OCEAN WAVE ENSEMBLE PREDICTION CENTERS ......................................................................................7 NCEP........................................................................................................................................................................8 FNMOC....................................................................................................................................................................8 ECMWF....................................................................................................................................................................9 Norwegian Meteorological Institute ........................................................................................................................9 China National Meteorological Centre ...................................................................................................................9 1-4 POSSIBLE IMPROVEMENTS OF EXISTING MODELS..............................................................................................9 2 METHODOLOGY................................................................................................................................................ 11 2-1 BUOYS LOCATION AND DATA PROCESSING....................................................................................................... 11 2-2 DEFINITION OF WW3 STUDY AREAS ...............................................................................................................14 2-3 OVERVIEW OF MATHEMATICAL AND VISUALIZATION TOOLS............................................................................15 3 VALIDATION OF STATE-OF-THE-ART WAVE ENSEMBLE FORECAST................................................18 3-1 IN THE NORTH SEA .........................................................................................................................................18 3-2 IN TAIWAN......................................................................................................................................................25 3-3 WIND AND WAVE VARIABILITY........................................................................................................................29 4 IMPROVEMENT OF WAVE ENSEMBLE FORECASTS...............................................................................33 4-1 IMPROVEMENT OF STATE-OF-THE-ART FORECASTS .........................................................................................33 Linear Shift.............................................................................................................................................................33 Empirical Orthogonal functions............................................................................................................................35 4-2 HIGH RESOLUTION WAVE ENSEMBLE FORECAST..............................................................................................36 High resolution wave ensemble forecast validation..............................................................................................36 Clustering ...............................................................................................................................................................38 5 CONCLUSIONS ...................................................................................................................................................44 BIBLIOGRAPHY...........................................................................................................................................................45
  • 4. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 4 Abstract Ocean Wave Ensemble forecasts – predictions based on several runs of the same model with different initial and boundary conditions– progressively replace deterministic forecasts in operational contexts. The objectives of this study are, after reviewing various tools and theoretical matters, to quantify the performances of state-of-the-art wave ensemble forecasts and analyze the possibility of running higher resolution ensemble forecasts at low computational cost. It appears that performances of ensemble forecasts are, in most cases, better than higher resolution deterministic forecasts but seems still insufficient for particularly sensitive applications such as high risks offshore operations. Several statistical methods such as clustering and ensemble generation by Empirical Orthogonal Functions – EOFs – were investigated and revealed to be efficient for reducing computational cost; thus allowing to run higher resolution ensemble forecasts. Performances of very high resolution forecasts were then studied but several limits related with wind fields and model parameters caused these predictions to be slightly disappointing. Despite a higher variability which represents a strong improvement, the predictions sometimes differ too much from observations. Further studies have then to be conducted on this matter especially to analyze the influence of improvements on bathymetry and wind forcing fields.
  • 5. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 5 1 Introduction 1-1Outline of the report This study focuses on the possibility of running very high resolution wave ensemble forecasts at the lowest computational costs. As operational ensemble forecasts are shown growing interest in both public and private sectors, the question of its application as a decision support tool in high risks maritime operations is raised. This master thesis is part of a research and development (R & D) project founded by the CITEPH program (Consultation for Technological Innovation in Exploration and Production of Hydrocarbons) which expects practical answers and results. In addition to technical tasks including data-processing and visualization, an interpretation of model outputs was necessary to evaluate performances of forecasts coupled with extensive research on ways to reduce effectively computational costs. This study intends to answer following questions: - Are actual performances of state-of-the-art wave ensemble forecasts reliable enough to be used in decision support tools? - Do higher resolution forecasts increase these performances? - Can computational time be reduced without deteriorating performances? Five sections make up this report: - Section 1: Overview of ensemble forecasts. Useful scientific notions and reminders are briefly defined followed by a succinct presentation of main wave ensemble forecast centers. - Section 2: Methodology. This section provides information about data sources, data processing methods used as well as about statistical and numerical tools. - Section 3: Validation of state-of-the-art ensemble forecast. The performances of NCEP wave ensemble forecast are validated against buoy data. - Section 4: Improvements of Wave Ensemble Forecasts. Findings of the project in regards to ensemble forecasts are developed in this section and discussed in the light of previously mentioned questions. - Section 5: Conclusion and outlook of the project. Outcomes of the project are given and further research directions are highlighted.
  • 6. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 6 1-2Introduction to ensemble forecast Ensemble predictions originally come from meteorology and aim at taking into account the effect of uncertainty in the initial conditions of a model. Indeed, in meteorological studies, initial conditions represent our knowledge of the atmosphere’s state; due to the scarcity of observations and to the presence of inherent errors in measurements, this knowledge is imperfect. Figure 1-1: Principle of ensemble and deterministic forecasts Due to the non-linearity of flood mechanics equations [5], medium to long range results can vary drastically if errors are present in the initial conditions. It is nowadays commonly agreed that errors in observations are unavoidable, and therefore that these errors in initial conditions and forcing have to be taken into account. Considering this fact, the principle of a single deterministic solution of model’s governing equations can be questioned. The generation of an ensemble of solutions derivate from various initial conditions reflecting the observation’s uncertainty provides then more information on long-term behavior of predictions. In addition to improving reliability in predictions, ensemble prediction systems (EPS) estimate probabilities associated with different possible states. Unlike atmospheric models, ocean wave models are not very sensitive to initial conditions’ errors after the first 24 hours. However, perturbations in wind forcing fields represent the main source of errors in wave models, giving the opportunity to produce ensemble forecasts based on these wind perturbations. The methods used to produce these perturbed wind forcing fields have been relatively well documented in the literature [12] and are beyond the scope of this section; they will therefore not be treated here. All forecasts mentioned in this paper are based on third generation wave models. These models are governed by the action balance equation which describes the evolution of the wave energy spectrum F forced by specific source terms. The resolution of the equation requires the
  • 7. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 7 knowledge of the spectrum at a given time and surface winds for all time integration intervals. The equation’s formulation is given below. P (F) = DF Dt = Sin + Snl + Sds , where D Dt represents the Lagrangian derivative which can be written: D Dt = ∂ ∂t + cg · ∇ cg • The right-hand side of the action balance equation represents source terms: Sin being the wind-related input, Sds describing dissipation term and Snl standing for nonlinear wave- wave interactions terms [1]. • Further details about the action balance equation and source terms are available in Komen et al. (1994) [9]. The main quantity used in this paper is the significant wave height Hs defined as below (Ochi, 1998 [10]) which is consistent with the average of the third of the highest waves (H1/3) derived from measurements. Hs = 4 √E , where E, the wave energy, is given by: ∫∫ ∞ = 2π 0 0 θ).df.dθ.F(f,E For further information on third generation wave models, please refer to following papers –amongst others ([5], [6] and [7]) 1-3Main ocean wave ensemble prediction centers This section presents the main centers providing ocean wave ensemble forecasts, which are still marginal in comparison to atmospheric ensemble forecasts. The ensemble prediction systems presented in this section are based on third generation wave models, all ensemble members use the undisturbed analysis as initial conditions and the members’ entire spread results from wind forcing perturbations. Indeed, spectral initial conditions’ influence has been showed to be negligible in third generation models.
  • 8. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 8 NCEP The US National Center for Environmental Prediction (NCEP) has developed in 2004 and implemented operationally in 2006 a wave ensemble forecast system called Global Ensemble Ocean Wave Forecast System (GEOWaFS). The version of GEOWaFS currently used is an updated version running with the NOAA (National Oceanic and Atmospheric Administration) multi-grid WAVEWATCH III - replacing prior GEOWaFS with NOAA WAVEWATCH III (Cao et al., 2009 [3]). The model ranges from 78°S to 78°N with a 1° x 1° spatial resolution. Current version of GEOWaFS consists of 20+1 members generated by separate runs of NWW3 based on perturbed wind fields obtained from NOAA/NCEP Global Ensemble Forecast System (GEFS) bias-corrected 10m winds updated every 3 hours. Perturbations of the wind fields were generated using the breeding of growing mode method as described in (Toth and Kalnay, 1993 [12], 1997 [13]). The initial wave field comes from deterministic NWW3 forecast. Operational GEOWaFS is run 4 times a day (at 00, 06, 12, 18 UTC). Several studies (Chen, 2006 [6] / Cao et al., 2009 [3]) demonstrated that GEOWaFS produces more realistic and reliable predictions than current operational global deterministic wave forecast NWW3 system. FNMOC The US Navy Fleet Numerical Meteorology and Oceanography Center (FNMOC) also provides global ensemble ocean wave prediction. It consists in 20 members of 10days forecast run twice daily (at 00 and 12 UTC) with a 1° x 1° resolution. Ensemble members are generated from Navy Operational Global Atmospheric Prediction System (NOGAPS EFS) wind fields. Both NCEP and FNMOC ensemble forecasts are sometimes combined to form an ensemble of 40+1 independent ensemble members. Table below sums up characteristics of these two forecasts. NCEP wave ensemble system FNMOC wave ensemble system Number of members 20 20 Wind forcing Bias-corrected GEFS winds NOGAPS Ensemble Forecast System Grid Global spherical Global spherical Spatial resolution °° ×11 °° ×11 Geographical extension 78°S to 78°N 78°S to 78°N Cycle per days (Z) 4 runs a day 00, 06, 12, 18 UTC Twice daily at 00 and 12 UTC Forecast 10 days 10 days Table 1-1: Summary of main characteristics of FNMOC and GEOWaFS forecasts
  • 9. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 9 ECMWF The European Center for Medium-range Weather Forecasts (ECMWF) provides global wave ensemble forecasts with a spatial resolution of 0.5° x 0.5° with shallow water physics and 15days forecast range. Like for previously mentioned forecasts, the initial wave field comes from unperturbed deterministic prediction and the spread of all 50 members only depends of perturbations in wind fields. ECMWF EFS is run twice daily (at 00 and 12 UTC). Norwegian Meteorological Institute The Norwegian Meteorological Institute (met.no) runs daily a regional operational ensemble prediction system for ocean waves (WAMEPS). The model covers Northern Europe, Scandinavian Peninsula, the Nordic Seas (including North Sea and Barents Sea) with a 0.1° resolution and is forced by the atmospheric limited area ensemble prediction system (LAMEPS). More information is provided in (Carrasco et al., 2008 [4]) China National Meteorological Centre China National Meteorological Centre also runs a global ensemble wave forecast at 1° x 1° spatial resolution with WW3 model. 14+1 wave fields are calculated from perturbed wind fields of the atmospheric operational forecast of China National Meteorological Centre. The model is run twice daily (at 00 and 12 UTC) with a 10days forecast range. For the purpose of this study, NCEP and FNMOC forecasts were used as reference state-of- the-art numerical ocean wave ensemble predictions. Daily deterministic forecast run operationally on a global scale at ACTIMAR was also gathered as global gridded significant wave height data on the relevant period at a 0.5° x 0.5° spatial resolution in order to allow direct performance comparisons between probabilistic and deterministic forecasts. 1-4Possible improvements of existing models. The issue underlying this project is to know the reliability and the quality of existing operational forecasts for being used as a decision support tools for high risks maritime operations. The following question is then to know what are the opportunities to improve those forecasts or to produce higher resolution ones. The two possible ways are first to identify an eventual common pattern in all predictions, a recurrent tendency characteristic of a forecast, allowing to increase rapidly and effectively its performance by a simple correction. Otherwise if no such pattern is to be found, the only solution lies in running higher resolution models both spatially and temporally speaking.
  • 10. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 10 The computing power still being a fundamental issue when considering operational forecasts, especially for ensemble forecasts where several dozens of wave fields may have to be generated, it is essential to find solutions minimizing the computational cost of forecasts. Approaches which were retained include the generation of wind fields by empirical orthogonal functions from one single member, or the classification of members to reproduce a similar range with a reduced number of wave fields.
  • 11. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 11 2 Methodology 2-1 Buoys location and data processing For the purpose of the comparison of predicted wave fields with observations, buoys measurements where gathered from US National Data Buoy Centre (NDBC) and Taiwan Central Weather Bureau (CWB). Significant wave heights were recorded hourly at 8 different locations on which the study will focus. I selected four buoys in each relevant area that is to say four in the North Sea and four near Taiwan. The selection was based on several criteria including the geographical location of the buoy, the ocean depth, the distance to the shore and the quality of data. Despite this preliminary selection, several buoys show gaps in their time series, due to corrupted or unavailable data at particular dates. Even if they appear from times to times on graphs, periods covering those gaps where not taken into account when computing performance indicators of forecasts and therefore do not induce any bias in the interpretation of results. In the North Sea, buoys B62164, B62145, B62127 and B63113 were selected. They are all located within a 50-100km range from the shore in relatively deep water areas. Near Taiwan, buoys 46699A, 46778A, 46757B and C6V27 were considered. Except from buoy C6V27 which is located 250km from the shore with a 3000meters ocean depth, all others stand in near shore areas with ocean depth lower than 30meters. Both shallow and deep ocean behaviors of the forecast could then be studied. Despite the unknown uncertainty of measurements at these buoys, they were taken as reference values, as uncertainty of predicted values is assumed to be way higher. Following Table 2-1 and Figure 2-1 give the locations of these buoys. Buoy Longitude Latitude B62164 0.5°E 57°N B62145 2.8°E 53.1°N B63113 1.7°E 61°N B62127 0.7°E 54°N C6V27 118.8°E 21°N 46699A 121.6°E 24°N 46778A 120°E 23.1°N 46757B 120.8°E 24.8°N Table 2-1: Geographical coordinates of selected buoy moorings
  • 12. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 12 Figure 2-1: Buoy moorings locations in the North Sea and near Taiwan An important detail has to be mentioned concerning locations of buoys. As the spatial resolution of ACTIMAR’s deterministic forecast did not always permit to directly extract predictions at buoys’ precise locations, it was necessary to find a solution to overcome this issue: I linearly interpolated gridded Hs every time it was possible in order to have the most accurate prediction possible. However, at locations where the buoy was too close to the shoreline to allow interpolation, closest value available was chosen. Thus, error on Hs prediction is inevitable due to the influence of bathymetry. Nevertheless, as the ensemble forecasts’ resolution is twice lower than deterministic one, inter-comparisons of performance should not be strongly impacted. Another issue to be raised is the time steps’ variation between observations and both ensemble and deterministic forecasts. Whereas observations are sampled hourly, ensemble forecasts are sampled 6-hourly. Therefore, temporal interpolation had to be performed in order to make those time-series easily comparable. Two scenarios were possible and I had to choose between interpolating on the smaller time step -1 hour- or on the largest one -6 hours. A brief comparison of root mean square errors (RMSE –see Section 2-3) of the ensemble mean computed from datasets obtained from both methods showed that the differences between those RMSE values are low – typically less than 10mm except from buoys B63113 and B62127 (Table 2-2). It seems then reasonable to consider that both interpolation methods are equivalent for the purpose of this study.
  • 13. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 13 Buoy RMSE of ensemble mean (Raw Prediction/Interpolated Obs) RMSE of ensemble mean (Interpolated Prediction/Raw Obs) 46778A 0.317 0.310 C6V27 0.509 0.520 46699A 0.761 0.758 46757B 0.460 0.460 B62164 0.499 0.509 B62127 0.328 0.278 B62145 0.276 0.281 B63113 0.537 0.510 Table 2-2: RMSE of ensemble mean for both temporal interpolation methods considered In order to make sure that both observations and predictions will show phenomenon of the same frequency range, observations were interpolated on the 6 hours period of predictions. Indeed, the other scenario would have let observations show variations of significant height that predictions would not have been able to reproduce. However, improving the temporal resolution of ensemble forecasts appear also as a possible way to improve performance of forecasts, allowing to reproduce events in a larger frequency range. Before any analysis of data could be undertaken, I had to extract all observations and prediction time series from their various files and turn them, after compilation, into an easily readable format – here .mat files - using UNIX and matlab scripts. The typical processing chain for one data file is illustrated below: Figure 2-2: Sketch of data processing chain Unarchiving Extraction of data at relevant locations Conversion netCDF4 to netCDF3 Concatenation with previous dates Data Storage in .mat format ACTIMAR’s archives only
  • 14. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 14 Time series of significant height and wind speed were stored from the 20th of October 2012 to the 20th of April 2013 for GEOWaFS and ACTIMAR forecasts. Significant height time series of FNMOC forecast were stored from the 1st of January 2013 to the 20th of April 2013. Finally, time series of significant height recorded by buoys were also stored at each location from the 20th of October 2012 to the 20th of April 2013. 2-2 Definition of WW3 study areas The project focuses on 4 different areas which had to be defined spatially before starting simulations in order to be able to establish the forcing at borders or prepare the bathymetry. For the purpose of this master thesis, I focused first on Indonesian area and on the North Sea in order to benefit from a sufficient number of buoy observations making the validation of state-of- the-art forecasts easier. The area in the North Sea extends from 5°W to 10°E in longitude and from 50°N to 65°N in latitude, while the Indonesian one is much larger and extends from 90°E to 140°E in longitude and from 5°S to 30°N in latitude. The Indonesian area is characterized by the presence of thousands of islands with a drastic influence on sea states. The spatial resolution of the grid used in these areas is the one of the NCEP/FNMOC ensembles that is to say 1° x 1°. The second part of the project on high resolution forecasts will focus on the Tierra del Fuego area in Argentina which extends from 70°W to 62°W in longitude and from 58°S to 48°S in latitude. On this area, the grid spatial resolution is 0.1° x 0.1° and is forced by a 1° x 1° global run. Drastic climatic conditions can be observed with recurrent storms and strong weather. Concerning the input parameters of the model, such as the parameterization of bottom friction, surf breaking, dissipation, non linear interactions or the choice of advection schemes, the usual set of parameters used at ACTIMAR for operational forecasts was used. The study of the influence of these parameters was indeed beyond the scope of this thesis. This represents, however, a possible solution to improve results especially when dealing with small areas at very high resolution where the influence of these parameters can be higher than usual.
  • 15. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 15 2-3 Overview of mathematical and visualization tools Many possibilities exist to compare probabilistic forecasts (Bidlot and Al., 2002 [2]) focusing on different parameters and characteristics of the forecast. In addition to usual direct comparison of Hs and RMS errors, scatter plots of predictions relatively to observations were used, as well as Pearson product moment correlation coefficients and persistence analysis. All these performance indicators were selected to estimate the quality of existing forecasts for the purpose of offshore operation planning and were computed using Matlab scripts written during my master thesis. Visual comparisons of significant heights predicted by the forecasts were realized using a box-and-whiskers representation showing the median of the ensemble – the horizontal line, the 25th and 75th percentiles – vertical box, and the minimum and maximum values – vertical lines called whiskers. The dispersion of Hs for the members of the ensemble at a given time is then clearly represented. Figure 2-3: Box-and-whiskers representation of the ensemble members of the forecasts Root mean square errors were computed for the members of the forecasts at each given time in order to estimate the variations in the prediction of events within the ensemble using following formulation – where y represents the observation and ŷ stands for predicted values at given time t, n being the size of the ensemble.
  • 16. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 16 RMS error was also computed on the entire time-series of characteristic quantities of the ensemble - including median, maximum and minimum values, thus giving an indication of the overall deviation of predictions from observations. Another relevant performance indicator is the Pearson product-moment correlation coefficient r which gives a good estimate of the linear dependence between predictions and observations. Its formulation in the case of a sample of paired variables X and Y is given below – X and Y being here observation and prediction data, X and Y being their respective mean with n the size of the sample. Thanks to this coefficient, a “best member” of the ensemble forecast was defined using a posteriori comparisons with observations. The member of the forecast having the highest correlation coefficients being considered as the best member of the ensemble. In addition to those performance indicators, scatter plots were drawn. Values taken by the observation dataset are set as horizontal axis and predictions from the model as vertical axis. A linear regression was performed each time in order to estimate the deviation from observations. This regression consists in an iterative reweighed least square algorithm giving less weight to outliers implemented in Matlab toolboxes. Q-Q plots were also used; these diagrams consist in plotting the quantiles of two variables against each other. Probability distributions can then be compared easily. The same algorithm than for scatter plots was used to produce regression lines of datasets in Q-Q plots. In addition to those diagrams, a common analysis performed on climatic related variables is the persistence analysis, also known as detection of climatic windows. It consists in studying the ability of the forecast to predict successfully events during which climatic conditions – here significant wave height – stay below a given threshold for a given period of time. This analysis, which is essential for offshore operations, was realized thanks to a simple algorithm I implemented. It consists in browsing the time-series of Hs and saving dates consistent with a climatic-window pattern, that is to say a beginning date at which Hs falls below threshold as well as at its directly following date, while at preceding date Hs is above threshold. Ending dates are the last of two successive dates at which Hs is still below threshold while at following one, significant height falls above.
  • 17. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 17 Figure 2-4: Sketch of the pattern used for persistence analysis All climatic windows of any duration are recorded thanks to this pattern, a simple subtraction of serial date numbers of both corresponding ending and beginning dates permits then to separate windows of various durations and to count them. Borderline cases where no ending dates could be found before the end of the time-series were solved by imposing the end of dataset as an arbitrary ending date. The minimum size of detected window is twice the temporal resolution of data. (12h window for the 6h outputs) Along with this detection, another performance indicator of persistence analysis was computed, the equivalent percentage uptime. It represents, in percentages of the total number of hours of each month, the amount of time during which the environmental parameter (here significant height) stays below a given threshold. In order to perform this computation, each time step at which Hs is below the threshold is associated with a “Flag 1” while time steps which do not satisfy the condition are characterized by a “Flag 0”. The percentage of “Flag 1” gives the equivalent percentage uptime as defined above. Date0 Date1 DateX DateY… Hs > T Hs < T Hs < T Hs > T… T : Threshold Beginning date Ending date
  • 18. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 18 3 Validation of state-of-the-art wave ensemble forecast The purpose of this section is to develop comparisons of state-of-the-art NCEP and FNMOC ensemble forecasts with buoy data and against higher resolution deterministic forecast. Both areas of interest – that is to say North Sea and Taiwan regions - will be treated separately using various performance indicators. For a matter of redundancy and length, only parts of relevant figures will be shown in this paper. 3-1 In the North Sea The area is characterized by high waves events with Hs reaching values higher than 7meters. Those events represent a particularly sensitive matter for this study and will therefore be given top priority during the project. Figure 3-1 illustrates the maximum, minimum and median predicted values of ensemble forecast on a period covering several of these events along with observation and ACTIMAR deterministic forecast’s predictions at buoy B62164. Figure 3-1: Observations and deterministic forecast along with maximum, minimum and median values predicted by NCPE ensemble forecast from the 10th of January 2013 to the 1st of April 2013 at buoy B62164
  • 19. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 19 High waves events with Hs exceeding 6 meters are detected on 18/01/2013, 05/02/2013, 14/02/2013 and 19/02/2113 along with lower Hs peaks. Except from the first one, at all these events, both deterministic and ensemble predictions highly underestimated the significant height by 1.5 to 3meters. This tendency is confirmed at smaller Hs peaks at which predictions often reveal to be under observations. Furthermore, deterministic predicted values appear to be 0.5 to 1meter lower than the ones predicted by the ensemble forecast – lower than the minimum value given of the ensemble. Another remarkable fact is the low spread of the ensemble on most parts of the visualized period; indeed, the variations between minimum and maximum values of the ensemble never overcome 0.30 meters and they regularly appear to be superimposed – variations lower than 10cm within the ensemble. During lower Hs events, predictions are generally closer to observations and show approximately the same order of magnitude for differences between minimum and maximum values of the forecast with significant variations from date to date. For instance, these differences amount to more than 0.5m on 15/01/2013 but lead to almost superimposed curves on 17/01/2013. Figure 3-2 presents a box-and-whiskers representation of the ensemble predictions focusing on a smaller time period still at buoy B62164 – refer to Section 2-3 for further information about this representation. Figure 3-2: Observations and deterministic forecast along with box-and-whiskers representation of NCEP ensemble forecast from the 1st of March 2012 to the 22nd of March 2012 at buoy B62164.
  • 20. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 20 On this period, two relatively high wave events were recorded on 08/03/2013 and on 19/03/2013. In both cases predictions were below observations for both deterministic and ensemble forecasts. It is interesting to notice that predictions seem to be closer to observations during decreasing Hs phase than when Hs increases with also smaller variability. Except from two dates, boxes and whiskers are short, values in the ensemble showing very little spread. At dates where high Hs values were recorded, this indicates a lack of variability as they represent periods where the sea state is the most unpredictable, thus variability should be at its highest. Like previously observed, the deterministic forecast predicted significant heights even lower than for the ensemble forecast, typically 0.5meters lower. Similar tendency of underestimating significant height during high wave events is observed at buoy B62145. The highest Hs values recorded by the buoy amount to approximately 5meters while predictions often stay 1meter below that limit, many lower peaks were however well predicted. It also appears that variability within the ensemble of the forecast is higher, as minimum and maximum predicted values are generally further from one another as shown on Figure 3-3. Figure 3-3: Observations and deterministic forecast along with box-and-whiskers representation of NCEP ensemble forecast from 22nd of November 2012 to the 2nd of December 2012 at buoy B62145. Both boxes and whiskers components shown are longer, especially during higher wave events. On 25/11/2012 and 28/11/2012 higher ensemble variability is noticed which is consistent with the Hs peaks of 5.2 and 3meters recorded by the buoy. Other buoys show intermediate behaviors in terms of variability. The underestimating tendency is nevertheless recurrent at all locations. The use of scatter plots (Figure 3-4) permits to quantify this tendency by plotting predicted Hs values against observations. The regression line computed on the dataset gives an indication of the deviation from measures. The first characteristic of scatter plots to be mentioned is the position of regression line which falls below the identity line at high Hs values. Hence, observations are statistically higher than
  • 21. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 21 predictions during high waves events. Furthermore, despite the fact that best member and the median value are similar, it appears that the median value of the forecast is closer to observations than the other indicators as its correlation coefficient is higher. Statistically, its regression line is also closer to the identity line. The maximum value sometimes reveals to give good results for 0-24h predictions, but always falls far from observations at higher lead times. For this reason, the median value of the ensemble will be considered as the most effective way to characterize the forecast for the purpose of this study. The table in Appendix B gathers all RMSE and correlation coefficients values at all buoys and lead times and illustrates well the effectiveness of the median value compared to others indicators. At all buoys, the regression lines of median value’s dataset are slightly below the identity line with correlation coefficients still higher than for other indicators. The underestimating tendency is confirmed even if its impact does not seem to be as strong on scatter plots as it seems on time series. Figure 3-4: Scatter plots of best member, median value and maximum value of ensemble forecast (0-24h) against observations at buoy B62145.
  • 22. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 22 Using the median value as characteristic quantity of the ensemble performances, comparisons with deterministic predictions were performed. Scatter plots of both median and deterministic predictions were drawn on Figures 3-5 & 3-6. The median value is systematically better at high Hs at all lead times. Even if deterministic forecasts seem to show lower spread than median value, high Hs values are systematically underestimated and median values appear to be more appropriate for medium to long range predictions. Figure 3-5: Scatter plots of median value of ensemble and deterministic forecast (0-24h & 48-72h) against observations at buoy B62127 and B62164.
  • 23. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 23 Results of the persistence analysis performed at two buoys in the North Sea are plotted below on Figure 3-6. The number of detected windows represents the total number of 12h, 24h, 36h, 48h and 72h windows. The probability function of the number of windows was computed using a kernel smoothing density estimator implemented in Matlab toolboxes from values given by all members of the forecast. Results vary from buoy to buoy both in terms of number of windows and of quality of the prediction relatively to observations, however ensemble predictions are in good agreement with observations except from buoy B62127 at which predicted total number was lower of approximately 100 windows than observed value. Deterministic forecast’s result in terms of number of windows is also given along with observation and ensemble values. At all buoys in the North Sea, deterministic forecast shows a number of windows always way higher than the observations and are further from them than ensemble forecasts. Tables with results for all buoys are given in Appendix B. Equivalent percentage uptime – as defined in Section 2 - for given threshold and window duration are given for observations, deterministic forecast as well as minimum, median and maximum value of ensemble forecast.
  • 24. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 24 Figure 3-6: Total number of detected windows of 12h, 24h, 36h, 48h and 72h below the 1.5m threshold at buoys B62145 and B62127.
  • 25. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 25 3-2 In Taiwan In the area near Taiwan, lower significant heights are observed than in the North Sea although Hs peaks happen more frequently. Figure 3-7 & 3-8 illustrates the maximum, minimum and median predicted values of NCEP ensemble forecast on a period covering several of these events along with observation and ACTIMAR deterministic forecast’s predictions at buoy C6V27 and 46699A. Figure 3-7: Observations and deterministic forecast along with maximum, minimum and median values predicted by NCPE ensemble forecast from the 20th of October 2012 to the 29th of November 2012 at buoy C6V27 Figure 3-8: Observations and deterministic forecast along with maximum, minimum and median values predicted by NCPE ensemble forecast from the 30th of November 2012 to 9th of January 2013 at buoy 46699A According to these figures, the behavior of predictions varies a lot from one buoy to another. A tendency of overestimating significant wave height may be noticed but is not clear on all time series and will therefore have to be confirmed. Moreover, unlike in the North Sea, this tendency does not reveal to be limited to high wave events but covers the whole dataset and thus, may originate from a systematic bias either in observations or in predictions. Concerning the variability of the area in terms of Hs, peaks of higher waves can be observed every 3-5 days from 1meter above the average to more than 3meters. The average significant wave height
  • 26. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 26 is relatively high – approximately 2meters – and Hs seem to hardly ever fall below a 1m threshold. Figure 3-9 shows the observations, deterministic prediction and the ensemble forecast as a box-and- whiskers representation at buoy B46757B. Figure 3-9: Observations and deterministic forecast along with box-and-whiskers representation of NCEP ensemble forecast from 22nd of November 2012 to the 2nd of December 2012 at buoy 46757B. Just as in the North Sea, variability within the ensemble is low and only increase during short periods. Unlike previously, Hs peaks seem to suffer from time shifts from time to time in addition to the errors in the predicted height. Variability seems also to be higher during increasing Hs events which is consistent with the tendency observed in the North Sea. Predictions alternatively fall above and below observations and no overestimating global tendency can be confirmed. Deterministic forecast stays very close to ensemble predictions except at dates when Hs peaks are observed, the deterministic prediction is systematically higher. No evidence can be given to determine which forecast is closer to observations on time series. The median value of the ensemble was taken as a characteristic quantity of the forecast as it gives statistically better results than others.
  • 27. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 27 Figure 3-10: Scatter plots of median value of ensemble and deterministic forecast (0-24h & 48-72h) against observations at buoy 46778A and C6V27. Ensemble forecast in Taiwan seems to give better result at both high and low Hs with higher correlation coefficient than deterministic forecast according to scatter plots on Figure 3-10. The tendency of predictions is to overestimate Hs values lower than a threshold while values higher are underestimated. The threshold value varies with buoys and lead times from 0.5m to more than 3meters.
  • 28. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 28 Figure 3-11a: Total number of detected windows of 12h, 24h, 36h, 48h and 72h below the 1.5m threshold at buoy C6V27. The persistence analysis performed on buoys in Taiwan follows the same methods as the one used for the North Sea. The total number of windows varies with the position of the buoy as they are not exposed to the same sea states and ocean depths. At all locations, results are slightly further to observations than they were in the North Sea but still in relatively good agreement. Deterministic forecast regularly gives results similar to ensemble predictions in Taiwan in terms of persistence analysis. Tables with results for all buoys are given in Appendix A. Equivalent percentage uptime – as defined in Section 2 - for given threshold and window duration are given for observations, deterministic forecast as well as minimum, median and maximum value of ensemble forecast.
  • 29. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 29 Figure 3-11b: Total number of detected windows of 12h, 24h, 36h, 48h and 72h below the 1.5m threshold at buoy 46757B. 3-3 Wind and wave variability In both areas, ensemble predictions are characterized by a weak variability during high Hs events which does not prevent them to predict well these events compared to deterministic prediction. Persistence analysis demonstrates that the variability of the areas is well predicted. In Taiwan, high Hs are often better predicted than lower ones while the opposite applies in the North Sea. Considering all indicators, ensemble predictions are promising in operational context as they give regularly better result than higher resolution deterministic forecast. The lack of ensemble variability and the underestimating tendency of high significant heights represent however a major obstacle to be overcome. Considering both areas, the origin of the low ensemble spread was studied. The main factor which is likely to underlie this trend is the existence of a similar tendency in the wind fields’ ensemble. Time series of wave and wind ensembles were plotted simultaneously in a box-and-whiskers representation to qualitatively estimate a possible link.
  • 30. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 30 Figure 3-12: Wind magnitude, ensemble median along with observations and deterministic forecast at buoy C6V27 from 12/12/2012 to 24/12/2012. The lack of variability within the ensembles can be noticed on both wave and wind fields during high waves events. Even if a correlation seems to exist, it must be moderate as an increase in the spread of wind ensemble does not systematically lead to an equivalent increase in wave ensemble such as on the 16th of December 2012 at buoy C6V27 or on the 23rd of October 2012 (Figure 3-12 & 3-13). Except from occasional events which regularly occur in the time series, wind and wave ensemble spreads seem to be relatively well correlated in the North Sea as well as in Taiwan.
  • 31. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 31 Figure 3-13: Wind magnitude, ensemble median along with observations and deterministic forecast at buoy B62127 from 20/10/2012 to 30/10/2012. To quantify the correlation between both ensembles, scatter plots and QQ-plots of the wind and wave normalized amplitudes were drawn as illustrated on Figure 3-14. Normalization of amplitudes used the standard score formulation as given below: σ XX Z − = , where Z is the normalized value, X represents raw value of the population, X is the mean of the population and σ its standard deviation.
  • 32. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 32 Figure 3-14: Scatter plot and QQ-plot of normalized wave amplitude on normalized wind amplitude at buoy B62145. These scatter plots and quantiles-quantiles diagrams were similar at all buoys despite small variations in the regression coefficients computed. In all cases the regression line of the scatter falls below the identity line, hence indicating that the spread of wave ensemble is lower than wind ensemble’s one with slope varying from 0.3 to 0.6. The distributions of amplitudes computed from both ensembles are however well correlated as values in the QQ-plot gather around the identity line except from highest values. Variability of wind ensembles may then represent a major factor underlying the weak variability observed in significant wave height ensembles; however it does not fully explain this low spread as the correlation exists but is moderate. Ensemble forecasts are promising in operational context but their resolution does not seem to be sufficient in order to predict at best high wave events or structures with small temporal and spatial extent. Wind forcing resolution appears also to be critical when considering wave forecasts. It also appeared that median value of the forecast is efficient for characterizing its performances for medium to long ranges predictions (48h, 72h and 96h). Ensemble forecast performances have then revealed to be insufficient for extremely precise operational forecast as they underestimate high wave events and show less variability within the ensemble than expected which may lead to a loss a one of the greater strength of ensemble forecast. Thus, ways to improve these forecasts without increasing computational cost have to be found either by acting directly on these forecasts, either by running higher resolution ones.
  • 33. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 33 4 Improvement of Wave Ensemble Forecasts 4-1 Improvement of State-of-the-art forecasts Ensemble forecasting basically requires exceptional computational cost as it can involves up to several dozens of simulations. Indeed, each member of a wave ensemble forecast originates from an independent perturbed wind forcing field and an initial sea state. These wind fields often comes from the outputs of an atmospheric ensemble forecast run in parallel in the same forecast center. Hence, each operational ensemble forecast normally needs as many simulations as twice the number of members to be effective. However, some solutions exist to reduce this number of simulations; first by improving directly existing forecasts or by generating wind forcing fields faster. Linear Shift As predictions at all locations considered tend to underestimate significant height during high wave events, a common pattern was searched in the distribution of all Hs time series in order to find a simple linear transformation to apply to all datasets. QQ-plots of predicted Hs values on observations were drawn at all buoys for 0-24h, 24-48h, 48-72h and 72-96h lead times and linear regression was performed on datasets. Figure 4-1: QQ-plot of wave ensemble quantiles on observation data quantiles for 0-24h, 24-48h, 48-72h, 72-96h predictions at buoy 46778A.
  • 34. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 34 Obviously, the shape of the quantile-quantile diagram at buoy 46778A suggests that the predicted distribution is non-linearly correlated with observations’ distribution, and characterized by a positive skew according to the observed concavity. Linear regression was nevertheless performed on the dataset in order to make it fit the distribution of observations at best with the simplest transformation. Intercepts of the regression lines at buoy 46778A vary from -0.0051 to 0.11 and slopes from 0.72 to 0.94. Thus, no simple transformation can be applied to predictions at all lead times. The same exercise was done at other buoys including B62164. The opposite concavity is observed from buoy B62164 to 46778A, the correlation between predicted and observed distributions is however higher as the point cloud is gathered closer to the identity line. Slopes of regression lines vary from 0.89 to 0.94 and intercepts from 0.051 to 0.23. All values for regression lines of all lead times and buoys are given in Table 4-1. Figure 4-2: QQ-plot of wave ensemble quantiles on observation data quantiles for 0-24, 24-48h, 48-72h, 72-98h predictions at buoy B62164. As coefficients needed to perform a transformation shifting all datasets closer to observations vary a
  • 35. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 35 lot with location and lead time of the forecast, it seems impossible to easily improve performances of state-of-the-art forecasts with a simple linear transformation. 1 day lead time 2 days lead time 3 days lead time 4 days lead time B62145 Y=0.057 + X*0.940 Y=0.003 + X*1.000 Y=-0.007 + X*1.000 Y=-0.021 + X*1.100 B62164 Y=0.080 + X*0.890 Y=0.051 + X*0.920 Y=0.058 + X*0.940 Y=0.230 + X*0.900 B62127 Y=0.062 + X*1.000 Y=0.025 + X*1.100 Y=0.015 + X*1.100 Y=-0.067 + X*1.200 B63113 Y=0.240 + X*1.000 Y=0.260 + X*1.000 Y=0.058 + X*1.100 Y=-0.096 + X*1.200 46778A Y=-0.005 + X*0.940 Y=0.056 + X*0.810 Y=0.081 + X*0.760 Y=0.110 + X*0.720 46699A Y=0.140 + X*1.400 Y=0.240 + X*1.200 Y=0.280 + X*1.200 Y=0.280 +X*1.200 46757B Y=-0.110 + X*1.000 Y=-0.054 + X*0.980 Y=-0.053 + X*0.940 Y=-0.048 + X*0.960 C6V27 Y=0.290 + X*0.950 Y=0.290 + X*0.940 Y=0.260 + X*0.950 Y=0.270 + X*0.970 Table 4-1: Slopes and intercepts of regression lines from QQ-plots for J0 to J3 at all buoys This method may be applied for local conditions and in the case where a lot of time is available to perform climatologic analysis, this is however not generally the case in operational context. Therefore, the solution cannot be retained. Empirical Orthogonal functions Another alternative solution to create wind ensemble requiring way less computational time and using Empirical Orthogonal Functions (EOF) was also considered. Unfortunately, the time lacked during my master thesis to investigate fully this method and to assess its performances. Only theoretical matters were studied. EOFs represent orthogonal basis functions of a signal or a dataset accounting each for as much variance as possible. They are typically obtained by computing eigenvectors of the covariance matrix of the dataset. Thanks to this method, perturbed wind fields are generated from one single unperturbed wind field. The spatial structure of perturbations is given by EOFs, thus maintaining most spatial properties of the undisturbed field throughout the process. The method is often known as geographically weighted principal component analysis (PCA) in geophysics. The principle of the method developed in this study is to run atmospheric model in the chosen area on a long period –typically one or several years – once and for all. This initial run would give the overall structure of errors in the area, taking into account every situation encountered within the period considered. An ensemble representation of the error covariance is thus provided and analyzed in order to extract eigenvectors and eigenvalues of error covariance matrix. Each eigenvector represents a direction of the spatial structure of errors and the higher the associated eigenvalue is, the more variance the direction will account for. By simply introducing random errors following the directions given by the EOFs in a single wind field, it is then possible to obtain up to several dozens of wind ensemble members without any additional computational cost. This method combined with the clustering algorithm may permit to create even more accurate reduced forecasts. Indeed, EOF based forecasts can easily reach more than 20members, merging more members together in the clustering phase and therefore making the choice of clusters more efficient.
  • 36. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 36 4-2 High resolution wave ensemble forecast In the previous sections, the performances of existing forecasts along with statistical methods to reduce the computational cost when dealing with ensemble forecasts were investigated. Still, the question of the performances of high resolution operational ensemble forecasts remains unanswered. This very question was studied with predictions realized in the Tierra del Fuego area as defined in Section 2-2. The main question underlying this section is to study how forecasts are improved when wind forcing fields and bathymetry are refined. High resolution wave ensemble forecast validation Validation procedures were based on satellite wave observations on the area measured by Jason 2 during March and April 2013. So far, results of the WW3 runs performed at a high resolution are not satisfactory as can be observed on Figure 4-3 and 4-4 which shows the main characteristics of the comparison of both GEOWaFS and High Resolution forecasts respectively with the observations from a swath performed around the 10th of March 2013. Figure 4-3: Satellite measurement and GEOWaFS forecast around the 10th of March 2013 in Tierra del Fuego.
  • 37. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 37 Figure 4-4: Satellite measurement and ACTIMAR’s forecast around the 10th of March 2013 in Tierra del Fuego. From these Figures, two details can be mentioned. On the swath considered, the ensemble predicted by GEOWaFS seems to be in better agreement with JASON2’s satellite observations as it shows lower mean RMSE and higher mean correlation coefficient with measurements of respectively 0.20 and 0.65 than ACTIMAR’s high resolution forecast which reaches a value of 0.39 for mean RMSE and 0.62 for the mean correlation coefficient. Therefore, so far, the high resolution predictions are not accurate enough to overcome state-of-the-art forecasts. The causes of these disappointing results may be various and will be developed further in this thesis. The second point to be mentioned is the higher variability observed in the ensemble. As was demonstrated previously, a sufficient spread is needed to ensure good quality of results and this condition is satisfied. In regards to this matter, the high resolution forecast fulfils our expectations. The process used to run these forecasts have nevertheless to be revised. The limits of the high resolution forecasts are various. The choice of parameters of the model may be partly responsible. Indeed, the parameters used in the WaveWatch III model are the same as the one used for the global deterministic wave forecast at ACTIMAR, more accurate results may be obtained by tuning these parameters to fit at best the Tierra del Fuego area on the nested high resolution grid. A complete study of the influence of these parameters should then be performed to
  • 38. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 38 find the best combination possible. The input wind fields may be another limiting factor. With a lower resolution than the one of the grid, many phenomenons might be badly predicted. The validity of the wind ensemble has also to be checked against in-situ observations and against NOAA’s wind fields. A bias in the wind fields could easily explain the poor quality of results. The lack of time did not permit to fully investigate all aspects of high resolution forecasts, directions for further research will include, amongst others, studying the influence of the improvement of the wind forcing and the bathymetry respectively. Clustering In the context of high resolution ensemble forecasts, computational times – already high in all ensemble forecasts - begin to be critical and must be reduced somehow to assure an operational viability. As mentioned previously in this paper, ensemble members are chosen in order to reflect the uncertainty in the observations. Considering how important the variability of wind forcing is in regards to this matter, it is common sense to try to maintain the widest range of wind magnitude values in the different input wind fields. Therefore, a way to reduce the number of wind fields without deteriorating wind variability was investigated. Involving classification methods, it consists in merging members with similar variability together. K means clustering method was retained amongst many other possibilities as it was easy to adapt to a context of ensemble forecast and is known for converging quickly with the right heuristic algorithms despite its computational complexity. The principle of this method is very simple which makes it easy to study and use. The Matlab algorithm used for the purpose of this study consists in two iterative steps: • An assignment step: which consists in assigning each observation to the cluster whose mean – called centroïd - is the closest. The squared Euclidean distance was used to compute distances in the dataset. • An update step: during which the centroïds of clusters are computed again by including in the averaging phase the newly added points. The algorithm stops and is said to have converged when the assignment does not change. In other words, the algorithm tends to minimize the sum of within clusters distances from points to the centroïds. The clustering is performed by heuristic algorithms which do not find always the best solution but an approximate one. A local minimum of the sum of within clusters distances can be found instead of the absolute one, the convergence towards one or another solution being mostly related to the initial partition of the dataset. As described in the literature [Gong and Richman 1995], k means algorithms are indeed very sensitive to initial partition: the first guess of clusters partition of the dataset. An easy way to fix this problem, made possible by the quick convergence of the algorithm, is to run it several times with different initial partitions and find the lowest local minimum. 2000 runs of
  • 39. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 39 the kmeans algorithm were then performed on the dataset in order to approximate at best the minimum. The best solution amongst all these local minimums is then considered as the best partition of the data. In the context of operational ensemble forecast, the clustering should be performed each day of simulation and take into account predictions at all lead times in order to adapt at best to spatial and temporal variability. In regards to this need, the dataset on which the clustering is to be performed consists in a 2D-table where each row represents the wind magnitudes for one component measured for one member of wind ensemble. Each column stands for one grid point at a given date. Figure 4- 5 illustrates this layout. Figure 4-5: Layout of input dataset for clustering. X and Y stand for the number of grid points in longitude and latitude respectively, U(i , j) represents predicted wind magnitude in zonal direction at grid point (x=i , y=j). (J0,T0) to (J4,T0) are the lead times of the prediction with (J0,T0) = 00h00, (J0,T1) = 06h00, …, (J1,T0) = 24h00. For the purpose of this study, 24 points in latitude and 34 points in longitude were taken for the wind fields with a spatial resolution of 1°x1° and a temporal resolution of 6 hours. Predictions were used at lead times up to 4days: 0-96h. 20 different wind fields were available from WRF simulations. Another detail to be mentioned concerning the k means clustering method is that the number of clusters to be computed is an input parameter. Thus, preliminary runs were to be made in order to estimate the optimal number of clusters balancing the future computational cost and the accuracy of the solution. From the initial ensemble of 20 wind fields, it rapidly appeared that creating less than 7 clusters led to a significant lack of wind variability. With more than 11 clusters, each additional member did add only little information to the overall ensemble as the distribution of amplitude values was already close to the initial members’ one. . . . . . . 1st member 2nd member . . . X * Y grid points at J0, T0 X * Y grid points at J0, T1 X * Y grid points at J0, T2 X * Y grid points at J0, T3 U(i , j) U(i , j+n) U(i+n , j+n)U(i+n , j)
  • 40. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 40 Figure 4-6: Probability Density Function of the amplitude of 7 and 10 clusters ensembles and correlation coefficient with initial ensemble of 20 members. It appeared that the within ensemble wind amplitudes obtained from the ensemble of 7 clusters are significantly lower than both ensemble of 10 clusters and 20 members as illustrated by the distributions on Figure 4-6, The 10-clusters ensemble being slightly closer to the initial ensemble as shown by the higher correlation coefficient. Considering only wind distributions, the gain in performance using the 10-clusters ensemble seems to overcome the additional computational cost.
  • 41. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 41 The same analysis was however performed after computation of the respective wave predictions. Significant wave height predictions of the model using from 7 to 11 clusters were thus compared in order to estimate the gain in performances, keeping computational cost in mind. Figure 4-5 presents a box-and-whiskers representation of waves ensemble composed of 20 members, 7 clusters and 10 clusters respectively on a J0 to J3 prediction. Unlike what could be inferred from wind distributions, wave forecasts are very close to one another when computed from 7 or 10 clusters. Moreover, they are also close to the initial 20-members forecast except from a few dates at the end of the forecast around the 5th of July 2013. Unfortunately, as this analysis took place relatively late during my master thesis, it only considers a very short time period. Still, it gives a good preview of the expected performances of such a method which are very satisfactory. Further in this study, 7-clusters ensembles were considered as they need less computational cost and seem to give good results. Figure 4-7: Box-and-whiskers representation of 7-clusters, 10-clusters and 20-members ensemble forecasts from the 1st of July 2013 to the 6th of July 2013.
  • 42. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 42 As it was mentioned previously in this study, the median value has proven to be the most effective characteristic figure of the forecasts. Therefore, it seems mandatory to study the impact of clustering on the median of ensembles. On Figure 4-7 results for a 7-clusters ensemble are presented. The median values of the initial forecast from the 1st of July 2013 to the 6th of July 2013 are plotted along with the median of a 7-clusters ensemble and a weighted 7-clusters ensemble. This latter ensemble consists in replicating values within the 7-clusters ensemble as many times as the number of initial members which were merged in each cluster. In other words, if the 1st cluster gathers 4 different members, its value will be repeated 4 times in the weighted ensemble. Thus, we obtain a 20-members ensemble filled with cluster values, taking into account the probabilities for each cluster value to happen. It is clear on Figure 4-8 that the median values computed from the initial ensemble or a cluster ensemble are much the same, the clustering seems then to maintain well the variability of the ensemble throughout the process. The median value can then be used as well to characterize a forecast composed of a cluster ensemble. Figure 4-8: Median values of 20-members, 7-cluster and weighted 7-clusters ensembles from 2013/07/01 to 2013/07/06. Once the number of clusters was set, the question of the proper use of the output information given by the clustering had to be raised. Centroïds positions, as well as ensemble members constituting each cluster, represent the main outputs of the method, along with centroïd-point distances values which were used for additional check of results. I investigated two methods: All constituting members of the clusters were at first replaced by their respective centroïds. Thus, each member had an influence on the final “cluster ensemble” weighted by its distance from the centroïd. The main drawbacks of such a method are that the final “cluster ensemble” appeared to be slightly
  • 43. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 43 smoothed by the within cluster averaging, therefore the range of amplitudes – maximum of the ensemble minus minimum of the ensemble - was reduced. Moreover, this method meant performing several data conversion from Matlab format to NetCDF then to a specific format read by the wave model (.wnd). The second method consists in replacing all constituting members of a cluster by the closest member from the centroïd, thus avoiding any averaging between members or any data conversion. Several limits can be pointed out in regards to this clustering technique. The computational power and memory required to run the algorithm on huge datasets do not permit to take into account a long time period for the computation of clusters and it must be run separately for each date, therefore the selected members may change from one run to the following one. In order to avoid artefacts in the time series, a gap was intentionally left at the beginning of each run and a simple interpolation was performed to connect both sides of the gap. The loss of information is then minimized with low computational cost. Another limit which still remains is the dependence of the algorithm to the initial partition of the dataset. Despite the high number of different runs made in order to reduce its influence, this dependence leads to clusters slightly different from one to another on the same dataset. However, as these variations mostly take place between members predicting relatively close Hs values, the overall performance of the forecast is not really impacted. Members which are the furthest from the others are systematically selected as they reproduce limit cases. A quality check is also systematically performed after the computation of clusters to make sure the final ensemble is not too far from the initial one. The probabilities inherent to ensemble forecast on the most probable sea state to happen are not left aside as the number of members for which each cluster accounts is stored systematically. Thus, the most probable sea state can then easily be determined by the probability density function of the ensemble values. Despite the short time period covered by this analysis, clustering methods show impressive effectiveness to reduce computational cost while maintaining at best the quality of the forecast. Studies on longer periods will have to be performed in order to assess the performances more precisely and check their consistency in time.
  • 44. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 44 5 Conclusions State-of-the-art wave ensemble forecasts revealed to be more efficient in prediction of significant height than higher resolution deterministic forecast as they take into account the uncertainty in the initial conditions. In both the North Sea and near Taiwan at shallow and high depths, better results are noticed especially at longer lead times. It appeared also that the median value of forecasts characterize well their performances. However, a lack of variability sometimes appears within the ensemble mainly related to a lack of variability within the wind forcing ensemble. Thus, they may not be sufficient in regard to very sensitive marine operations. Several statistical methods were investigated to produce ensemble forecasts at lower computational cost. Clustering methods were studied in particular as they proved to be an effective way to gather correlated ensemble members, thus permitting to reduce the size of the ensemble – with the smallest loss of information. Preliminary results with the simplest K-mean clustering method are very encouraging. The generation of wind ensemble via Empirical Orthogonal Functions (EOFs) was also studied in a very theoretical way and may represent a direction of further research. High resolution ensemble forecasts represent a possible improvement of existing forecasts. Indeed, the high resolution forecast run in Tierra del Fuego area shows a higher variability than lower ensemble forecasts – which represents their main limit. However, a non negligible bias sometimes appear in these forecasts, probably related with poorly tuned model parameters and wind forcing fields not accurate enough to reproduce well all phenomenon. An in-depth study of the influence of these parameters, along with the influence of improvement on wind forcing fields and bathymetry, should be conducted in the future to improve the quality of high resolution forecasts.
  • 45. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 45 Bibliography [1]: Les vagues: un compartiment important du système terre Ardhuin, F, 2012 (Course) [2]: Intercomparison of the performance of operational ocean wave forecasting systems with buoy data Bidlot, J.R., D.J. Holmes, P.A. Wittmann, R.L. Lalbeharry, and H.S. Chen / Weather Forecasting 2002, 17, 287-310. [3]: Performance of the ocean wave ensemble forecast system at NCEP Cao, D., H.L. Tolman, H.S. Chen, A. Chawla and V.M. Gerald / MMAB contribution No.279, 2009 (available at http://polar.ncep.noaa.gov/mmab/papers/tn279/mmab279.pdf) [4]: A limited area wave ensemble prediction system for the Nordic seas and the North Sea. Carrasco, A. and O. Saetra / Report No.22/2008, Meteorology and oceanography, ISSN: 1503-8017, Dec.2008 [5]: Wave modeling – The state of the art Cavaleri, L. et al / Progress in Oceanography 75 (2007) 603-674 [6]: Ensemble Prediction of Ocean Waves at NCEP Chen, H.S / Proceedings of the 28th Ocean Engineering Conference in Taiwan, NSYSU, 2006 [7]: On ensemble prediction of ocean waves Farina, L. / Tellus - Series A: Dynamic Meteorology and Oceanography (2002), Vol. 54, Issue: 2, Pages: 148-158. [8]: On the Application of Cluster Analysis to Growing Season Precipitation Data in North America East of the Rockies. Gong, Xiaofeng, Michael B. Richman / J. Climate, 1995, 8, 897–931. [9]: Dynamics and Modeling of Ocean Waves Komen, G.J., L. Cavaleri, M. Donelan, K. Hasselmann, S. Hasselmann and P.A.E.M, Jansen / Cambridge University Press 1994, 532pp. [10]: Ocean Waves: The Stochastic Approach Ochi, M.K. / Cambridge University Press 1998, 319pp. [11]: Forecasting wave height probabilities with numerical weather prediction model Roulston, M.S, J. Ellepola, J. von Hardenberg, L.A. Smith / Ocean Engineering 32 (2005) 1841-1863 [12]: Ensemble forecasting at NMC: The generation of perturbations
  • 46. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 46 Toth, Z. and E. Kalnay / Bulletin of the American Meteorological Society Vol. 74, No. 12, Dec. 1993 [13]: Ensemble Forecasting at NMC and the Breeding Method Toth, Z. and E. Kalnay / Monthly Weather Review, AMS, pp.3297-3319, Dec. 1997 [14]: Statistical Methods in the Atmospheric Sciences Wilks, D.S / International Geophysics Series, Vol.100, 676pp. [15]: A perturbation method for hurricane ensemble predictions Zhang, Z and T. N. Krishnamurti / Monthly Weather Review, 1999, 127, 447-469
  • 47. CITEPH – Ocean Wave Ensemble Forecasts STOUFF Loïc - 16/08/2013 47