stouffl_hyo13rapport

Operational ocean wave
ensemble forecasts: state-of-the-
art validation and high resolution
forecasts
Final Year Project report towards the achievement of a Graduate Engineering
Diploma in Hydrography at ENSTA Bretagne
Loïc Stouff
Tutor at ACTIMAR: M. Cyril FRELIN
Tutor at ENSTA Bretagne: Mme. Amandine NICOLE
2012 – 2013

CITEPH – Ocean Wave Ensemble Forecasts
STOUFF Loïc - 16/08/2013
2

3
ABSTRACT ......................................................................................................................................................................4
1 INTRODUCTION...................................................................................................................................................5
1-1 OUTLINE OF THE REPORT ..................................................................................................................................5
1-2 INTRODUCTION TO ENSEMBLE FORECAST .........................................................................................................6
1-3 MAIN OCEAN WAVE ENSEMBLE PREDICTION CENTERS ......................................................................................7
NCEP........................................................................................................................................................................8
FNMOC....................................................................................................................................................................8
ECMWF....................................................................................................................................................................9
Norwegian Meteorological Institute ........................................................................................................................9
China National Meteorological Centre ...................................................................................................................9
1-4 POSSIBLE IMPROVEMENTS OF EXISTING MODELS..............................................................................................9
2 METHODOLOGY................................................................................................................................................ 11
2-1 BUOYS LOCATION AND DATA PROCESSING....................................................................................................... 11
2-2 DEFINITION OF WW3 STUDY AREAS ...............................................................................................................14
2-3 OVERVIEW OF MATHEMATICAL AND VISUALIZATION TOOLS............................................................................15
3 VALIDATION OF STATE-OF-THE-ART WAVE ENSEMBLE FORECAST................................................18
3-1 IN THE NORTH SEA .........................................................................................................................................18
3-2 IN TAIWAN......................................................................................................................................................25
3-3 WIND AND WAVE VARIABILITY........................................................................................................................29
4 IMPROVEMENT OF WAVE ENSEMBLE FORECASTS...............................................................................33
4-1 IMPROVEMENT OF STATE-OF-THE-ART FORECASTS .........................................................................................33
Linear Shift.............................................................................................................................................................33
Empirical Orthogonal functions............................................................................................................................35
4-2 HIGH RESOLUTION WAVE ENSEMBLE FORECAST..............................................................................................36
High resolution wave ensemble forecast validation..............................................................................................36
Clustering ...............................................................................................................................................................38
5 CONCLUSIONS ...................................................................................................................................................44
BIBLIOGRAPHY...........................................................................................................................................................45

4
Abstract
Ocean Wave Ensemble forecasts – predictions based on several runs of the same model with
different initial and boundary conditions– progressively replace deterministic forecasts in
operational contexts. The objectives of this study are, after reviewing various tools and theoretical
matters, to quantify the performances of state-of-the-art wave ensemble forecasts and analyze the
possibility of running higher resolution ensemble forecasts at low computational cost. It appears
that performances of ensemble forecasts are, in most cases, better than higher resolution
deterministic forecasts but seems still insufficient for particularly sensitive applications such as high
risks offshore operations.
Several statistical methods such as clustering and ensemble generation by Empirical Orthogonal
Functions – EOFs – were investigated and revealed to be efficient for reducing computational cost;
thus allowing to run higher resolution ensemble forecasts.
Performances of very high resolution forecasts were then studied but several limits related with
wind fields and model parameters caused these predictions to be slightly disappointing. Despite a
higher variability which represents a strong improvement, the predictions sometimes differ too
much from observations. Further studies have then to be conducted on this matter especially to
analyze the influence of improvements on bathymetry and wind forcing fields.

5
1 Introduction
1-1Outline of the report
This study focuses on the possibility of running very high resolution wave ensemble
forecasts at the lowest computational costs. As operational ensemble forecasts are shown growing
interest in both public and private sectors, the question of its application as a decision support tool
in high risks maritime operations is raised.
This master thesis is part of a research and development (R & D) project founded by the CITEPH
program (Consultation for Technological Innovation in Exploration and Production of
Hydrocarbons) which expects practical answers and results.
In addition to technical tasks including data-processing and visualization, an interpretation
of model outputs was necessary to evaluate performances of forecasts coupled with extensive
research on ways to reduce effectively computational costs.
This study intends to answer following questions:
- Are actual performances of state-of-the-art wave ensemble forecasts reliable enough to
be used in decision support tools?
- Do higher resolution forecasts increase these performances?
- Can computational time be reduced without deteriorating performances?
Five sections make up this report:
- Section 1: Overview of ensemble forecasts. Useful scientific notions and reminders are
briefly defined followed by a succinct presentation of main wave ensemble forecast
centers.
- Section 2: Methodology. This section provides information about data sources, data
processing methods used as well as about statistical and numerical tools.
- Section 3: Validation of state-of-the-art ensemble forecast. The performances of NCEP
wave ensemble forecast are validated against buoy data.
- Section 4: Improvements of Wave Ensemble Forecasts. Findings of the project in
regards to ensemble forecasts are developed in this section and discussed in the light of
previously mentioned questions.
- Section 5: Conclusion and outlook of the project. Outcomes of the project are given and
further research directions are highlighted.

6
1-2Introduction to ensemble forecast
Ensemble predictions originally come from meteorology and aim at taking into account the
effect of uncertainty in the initial conditions of a model. Indeed, in meteorological studies, initial
conditions represent our knowledge of the atmosphere’s state; due to the scarcity of observations
and to the presence of inherent errors in measurements, this knowledge is imperfect.
Figure 1-1: Principle of ensemble and deterministic forecasts
Due to the non-linearity of flood mechanics equations [5], medium to long range results can
vary drastically if errors are present in the initial conditions. It is nowadays commonly agreed that
errors in observations are unavoidable, and therefore that these errors in initial conditions and
forcing have to be taken into account. Considering this fact, the principle of a single deterministic
solution of model’s governing equations can be questioned. The generation of an ensemble of
solutions derivate from various initial conditions reflecting the observation’s uncertainty provides
then more information on long-term behavior of predictions. In addition to improving reliability in
predictions, ensemble prediction systems (EPS) estimate probabilities associated with different
possible states.
Unlike atmospheric models, ocean wave models are not very sensitive to initial conditions’ errors
after the first 24 hours. However, perturbations in wind forcing fields represent the main source of
errors in wave models, giving the opportunity to produce ensemble forecasts based on these wind
perturbations. The methods used to produce these perturbed wind forcing fields have been relatively
well documented in the literature [12] and are beyond the scope of this section; they will therefore
not be treated here.
All forecasts mentioned in this paper are based on third generation wave models. These
models are governed by the action balance equation which describes the evolution of the wave
energy spectrum F forced by specific source terms. The resolution of the equation requires the

7
knowledge of the spectrum at a given time and surface winds for all time integration intervals. The
equation’s formulation is given below.
P (F) =
DF
Dt
= Sin + Snl + Sds
, where
D
Dt
represents the Lagrangian derivative which can be written:
D
Dt
=
∂
∂t
+ cg · ∇ cg
• The right-hand side of the action balance equation represents source terms: Sin being the
wind-related input, Sds describing dissipation term and Snl standing for nonlinear wave-
wave interactions terms [1].
• Further details about the action balance equation and source terms are available in Komen et
al. (1994) [9].
The main quantity used in this paper is the significant wave height Hs defined as below
(Ochi, 1998 [10]) which is consistent with the average of the third of the highest waves (H1/3)
derived from measurements.
Hs = 4 √E
, where E, the wave energy, is given by:
∫∫
∞
=
2π
0 0
θ).df.dθ.F(f,E
For further information on third generation wave models, please refer to following papers –amongst
others ([5], [6] and [7])
1-3Main ocean wave ensemble prediction centers
This section presents the main centers providing ocean wave ensemble forecasts, which are
still marginal in comparison to atmospheric ensemble forecasts. The ensemble prediction systems
presented in this section are based on third generation wave models, all ensemble members use the
undisturbed analysis as initial conditions and the members’ entire spread results from wind forcing
perturbations. Indeed, spectral initial conditions’ influence has been showed to be negligible in third
generation models.

8
NCEP
The US National Center for Environmental Prediction (NCEP) has developed in 2004 and
implemented operationally in 2006 a wave ensemble forecast system called Global Ensemble Ocean
Wave Forecast System (GEOWaFS).
The version of GEOWaFS currently used is an updated version running with the NOAA (National
Oceanic and Atmospheric Administration) multi-grid WAVEWATCH III - replacing prior
GEOWaFS with NOAA WAVEWATCH III (Cao et al., 2009 [3]).
The model ranges from 78°S to 78°N with a 1° x 1° spatial resolution.
Current version of GEOWaFS consists of 20+1 members generated by separate runs of NWW3
based on perturbed wind fields obtained from NOAA/NCEP Global Ensemble Forecast System
(GEFS) bias-corrected 10m winds updated every 3 hours. Perturbations of the wind fields were
generated using the breeding of growing mode method as described in (Toth and Kalnay, 1993 [12],
1997 [13]).
The initial wave field comes from deterministic NWW3 forecast. Operational GEOWaFS is run 4
times a day (at 00, 06, 12, 18 UTC).
Several studies (Chen, 2006 [6] / Cao et al., 2009 [3]) demonstrated that GEOWaFS produces more
realistic and reliable predictions than current operational global deterministic wave forecast NWW3
system.
FNMOC
The US Navy Fleet Numerical Meteorology and Oceanography Center (FNMOC) also
provides global ensemble ocean wave prediction. It consists in 20 members of 10days forecast run
twice daily (at 00 and 12 UTC) with a 1° x 1° resolution. Ensemble members are generated from
Navy Operational Global Atmospheric Prediction System (NOGAPS EFS) wind fields.
Both NCEP and FNMOC ensemble forecasts are sometimes combined to form an ensemble of 40+1
independent ensemble members.
Table below sums up characteristics of these two forecasts.
NCEP wave ensemble system FNMOC wave ensemble system
Number of members 20 20
Wind forcing Bias-corrected GEFS winds NOGAPS Ensemble Forecast
System
Grid Global spherical Global spherical
Spatial resolution °°
×11 °°
×11
Geographical extension 78°S to 78°N 78°S to 78°N
Cycle per days (Z) 4 runs a day 00, 06, 12, 18
UTC
Twice daily at 00 and 12 UTC
Forecast 10 days 10 days
Table 1-1: Summary of main characteristics of FNMOC and GEOWaFS forecasts

9
ECMWF
The European Center for Medium-range Weather Forecasts (ECMWF) provides global wave
ensemble forecasts with a spatial resolution of 0.5° x 0.5° with shallow water physics and 15days
forecast range. Like for previously mentioned forecasts, the initial wave field comes from
unperturbed deterministic prediction and the spread of all 50 members only depends of
perturbations in wind fields. ECMWF EFS is run twice daily (at 00 and 12 UTC).
Norwegian Meteorological Institute
The Norwegian Meteorological Institute (met.no) runs daily a regional operational ensemble
prediction system for ocean waves (WAMEPS). The model covers Northern Europe, Scandinavian
Peninsula, the Nordic Seas (including North Sea and Barents Sea) with a 0.1° resolution and is
forced by the atmospheric limited area ensemble prediction system (LAMEPS).
More information is provided in (Carrasco et al., 2008 [4])
China National Meteorological Centre
China National Meteorological Centre also runs a global ensemble wave forecast at 1° x 1°
spatial resolution with WW3 model. 14+1 wave fields are calculated from perturbed wind fields of
the atmospheric operational forecast of China National Meteorological Centre. The model is run
twice daily (at 00 and 12 UTC) with a 10days forecast range.
For the purpose of this study, NCEP and FNMOC forecasts were used as reference state-of-
the-art numerical ocean wave ensemble predictions.
Daily deterministic forecast run operationally on a global scale at ACTIMAR was also gathered as
global gridded significant wave height data on the relevant period at a 0.5° x 0.5° spatial resolution
in order to allow direct performance comparisons between probabilistic and deterministic forecasts.
1-4Possible improvements of existing models.
The issue underlying this project is to know the reliability and the quality of existing
operational forecasts for being used as a decision support tools for high risks maritime operations.
The following question is then to know what are the opportunities to improve those forecasts or to
produce higher resolution ones. The two possible ways are first to identify an eventual common
pattern in all predictions, a recurrent tendency characteristic of a forecast, allowing to increase
rapidly and effectively its performance by a simple correction. Otherwise if no such pattern is to be
found, the only solution lies in running higher resolution models both spatially and temporally
speaking.

10
The computing power still being a fundamental issue when considering operational
forecasts, especially for ensemble forecasts where several dozens of wave fields may have to be
generated, it is essential to find solutions minimizing the computational cost of forecasts.
Approaches which were retained include the generation of wind fields by empirical orthogonal
functions from one single member, or the classification of members to reproduce a similar range
with a reduced number of wave fields.

11
2 Methodology
2-1 Buoys location and data processing
For the purpose of the comparison of predicted wave fields with observations, buoys
measurements where gathered from US National Data Buoy Centre (NDBC) and Taiwan Central
Weather Bureau (CWB). Significant wave heights were recorded hourly at 8 different locations on
which the study will focus. I selected four buoys in each relevant area that is to say four in the
North Sea and four near Taiwan. The selection was based on several criteria including the
geographical location of the buoy, the ocean depth, the distance to the shore and the quality of data.
Despite this preliminary selection, several buoys show gaps in their time series, due to corrupted or
unavailable data at particular dates. Even if they appear from times to times on graphs, periods
covering those gaps where not taken into account when computing performance indicators of
forecasts and therefore do not induce any bias in the interpretation of results.
In the North Sea, buoys B62164, B62145, B62127 and B63113 were selected. They are all
located within a 50-100km range from the shore in relatively deep water areas.
Near Taiwan, buoys 46699A, 46778A, 46757B and C6V27 were considered. Except from buoy
C6V27 which is located 250km from the shore with a 3000meters ocean depth, all others stand in
near shore areas with ocean depth lower than 30meters. Both shallow and deep ocean behaviors of
the forecast could then be studied.
Despite the unknown uncertainty of measurements at these buoys, they were taken as
reference values, as uncertainty of predicted values is assumed to be way higher.
Following Table 2-1 and Figure 2-1 give the locations of these buoys.
Buoy Longitude Latitude
B62164 0.5°E 57°N
B62145 2.8°E 53.1°N
B63113 1.7°E 61°N
B62127 0.7°E 54°N
C6V27 118.8°E 21°N
46699A 121.6°E 24°N
46778A 120°E 23.1°N
46757B 120.8°E 24.8°N
Table 2-1: Geographical coordinates of selected buoy moorings

12
Figure 2-1: Buoy moorings locations in the North Sea and near Taiwan
An important detail has to be mentioned concerning locations of buoys. As the spatial
resolution of ACTIMAR’s deterministic forecast did not always permit to directly extract
predictions at buoys’ precise locations, it was necessary to find a solution to overcome this issue: I
linearly interpolated gridded Hs every time it was possible in order to have the most accurate
prediction possible. However, at locations where the buoy was too close to the shoreline to allow
interpolation, closest value available was chosen. Thus, error on Hs prediction is inevitable due to
the influence of bathymetry. Nevertheless, as the ensemble forecasts’ resolution is twice lower than
deterministic one, inter-comparisons of performance should not be strongly impacted.
Another issue to be raised is the time steps’ variation between observations and both
ensemble and deterministic forecasts. Whereas observations are sampled hourly, ensemble forecasts
are sampled 6-hourly. Therefore, temporal interpolation had to be performed in order to make those
time-series easily comparable. Two scenarios were possible and I had to choose between
interpolating on the smaller time step -1 hour- or on the largest one -6 hours. A brief comparison of
root mean square errors (RMSE –see Section 2-3) of the ensemble mean computed from datasets
obtained from both methods showed that the differences between those RMSE values are low –
typically less than 10mm except from buoys B63113 and B62127 (Table 2-2). It seems then
reasonable to consider that both interpolation methods are equivalent for the purpose of this study.

13
Buoy RMSE of ensemble mean (Raw
Prediction/Interpolated Obs)
RMSE of ensemble mean
(Interpolated Prediction/Raw Obs)
46778A 0.317 0.310
C6V27 0.509 0.520
46699A 0.761 0.758
46757B 0.460 0.460
B62164 0.499 0.509
B62127 0.328 0.278
B62145 0.276 0.281
B63113 0.537 0.510
Table 2-2: RMSE of ensemble mean for both temporal interpolation methods considered
In order to make sure that both observations and predictions will show phenomenon of the
same frequency range, observations were interpolated on the 6 hours period of predictions. Indeed,
the other scenario would have let observations show variations of significant height that predictions
would not have been able to reproduce. However, improving the temporal resolution of ensemble
forecasts appear also as a possible way to improve performance of forecasts, allowing to reproduce
events in a larger frequency range.
Before any analysis of data could be undertaken, I had to extract all observations and
prediction time series from their various files and turn them, after compilation, into an easily
readable format – here .mat files - using UNIX and matlab scripts. The typical processing chain for
one data file is illustrated below:
Figure 2-2: Sketch of data processing chain
Unarchiving
Extraction of data at
relevant locations
Conversion netCDF4
to netCDF3
Concatenation with
previous dates
Data Storage in .mat
format
ACTIMAR’s
archives only

14
Time series of significant height and wind speed were stored from the 20th
of October 2012
to the 20th
of April 2013 for GEOWaFS and ACTIMAR forecasts. Significant height time series of
FNMOC forecast were stored from the 1st
of January 2013 to the 20th
of April 2013. Finally, time
series of significant height recorded by buoys were also stored at each location from the 20th
of
October 2012 to the 20th
of April 2013.
2-2 Definition of WW3 study areas
The project focuses on 4 different areas which had to be defined spatially before starting
simulations in order to be able to establish the forcing at borders or prepare the bathymetry.
For the purpose of this master thesis, I focused first on Indonesian area and on the North Sea in
order to benefit from a sufficient number of buoy observations making the validation of state-of-
the-art forecasts easier.
The area in the North Sea extends from 5°W to 10°E in longitude and from 50°N to 65°N in
latitude, while the Indonesian one is much larger and extends from 90°E to 140°E in longitude and
from 5°S to 30°N in latitude. The Indonesian area is characterized by the presence of thousands of
islands with a drastic influence on sea states. The spatial resolution of the grid used in these areas is
the one of the NCEP/FNMOC ensembles that is to say 1° x 1°.
The second part of the project on high resolution forecasts will focus on the Tierra del Fuego area in
Argentina which extends from 70°W to 62°W in longitude and from 58°S to 48°S in latitude. On
this area, the grid spatial resolution is 0.1° x 0.1° and is forced by a 1° x 1° global run. Drastic
climatic conditions can be observed with recurrent storms and strong weather.
Concerning the input parameters of the model, such as the parameterization of bottom friction, surf
breaking, dissipation, non linear interactions or the choice of advection schemes, the usual set of
parameters used at ACTIMAR for operational forecasts was used. The study of the influence of
these parameters was indeed beyond the scope of this thesis.
This represents, however, a possible solution to improve results especially when dealing with small
areas at very high resolution where the influence of these parameters can be higher than usual.

15
2-3 Overview of mathematical and visualization tools
Many possibilities exist to compare probabilistic forecasts (Bidlot and Al., 2002 [2])
focusing on different parameters and characteristics of the forecast. In addition to usual direct
comparison of Hs and RMS errors, scatter plots of predictions relatively to observations were used,
as well as Pearson product moment correlation coefficients and persistence analysis.
All these performance indicators were selected to estimate the quality of existing forecasts for the
purpose of offshore operation planning and were computed using Matlab scripts written during my
master thesis.
Visual comparisons of significant heights predicted by the forecasts were realized using a
box-and-whiskers representation showing the median of the ensemble – the horizontal line, the 25th
and 75th
percentiles – vertical box, and the minimum and maximum values – vertical lines called
whiskers.
The dispersion of Hs for the members of the ensemble at a given time is then clearly represented.
Figure 2-3: Box-and-whiskers representation of the ensemble members of the forecasts
Root mean square errors were computed for the members of the forecasts at each given time
in order to estimate the variations in the prediction of events within the ensemble using following
formulation – where y represents the observation and ŷ stands for predicted values at given time t, n
being the size of the ensemble.

16
RMS error was also computed on the entire time-series of characteristic quantities of the
ensemble - including median, maximum and minimum values, thus giving an indication of the
overall deviation of predictions from observations.
Another relevant performance indicator is the Pearson product-moment correlation
coefficient r which gives a good estimate of the linear dependence between predictions and
observations. Its formulation in the case of a sample of paired variables X and Y is given below – X
and Y being here observation and prediction data, X and Y being their respective mean with n the
size of the sample.
Thanks to this coefficient, a “best member” of the ensemble forecast was defined using a
posteriori comparisons with observations. The member of the forecast having the highest
correlation coefficients being considered as the best member of the ensemble.
In addition to those performance indicators, scatter plots were drawn. Values taken by the
observation dataset are set as horizontal axis and predictions from the model as vertical axis. A
linear regression was performed each time in order to estimate the deviation from observations.
This regression consists in an iterative reweighed least square algorithm giving less weight to
outliers implemented in Matlab toolboxes.
Q-Q plots were also used; these diagrams consist in plotting the quantiles of two variables
against each other. Probability distributions can then be compared easily. The same algorithm than
for scatter plots was used to produce regression lines of datasets in Q-Q plots.
In addition to those diagrams, a common analysis performed on climatic related variables is
the persistence analysis, also known as detection of climatic windows. It consists in studying the
ability of the forecast to predict successfully events during which climatic conditions – here
significant wave height – stay below a given threshold for a given period of time. This analysis,
which is essential for offshore operations, was realized thanks to a simple algorithm I implemented.
It consists in browsing the time-series of Hs and saving dates consistent with a climatic-window
pattern, that is to say a beginning date at which Hs falls below threshold as well as at its directly
following date, while at preceding date Hs is above threshold. Ending dates are the last of two
successive dates at which Hs is still below threshold while at following one, significant height falls
above.

17
Figure 2-4: Sketch of the pattern used for persistence analysis
All climatic windows of any duration are recorded thanks to this pattern, a simple subtraction of
serial date numbers of both corresponding ending and beginning dates permits then to separate
windows of various durations and to count them.
Borderline cases where no ending dates could be found before the end of the time-series were
solved by imposing the end of dataset as an arbitrary ending date. The minimum size of detected
window is twice the temporal resolution of data. (12h window for the 6h outputs)
Along with this detection, another performance indicator of persistence analysis was computed, the
equivalent percentage uptime. It represents, in percentages of the total number of hours of each
month, the amount of time during which the environmental parameter (here significant height) stays
below a given threshold.
In order to perform this computation, each time step at which Hs is below the threshold is
associated with a “Flag 1” while time steps which do not satisfy the condition are characterized by a
“Flag 0”. The percentage of “Flag 1” gives the equivalent percentage uptime as defined above.
Date0 Date1 DateX DateY…
Hs > T Hs < T Hs < T Hs > T…
T : Threshold
Beginning date Ending date

18
3 Validation of state-of-the-art wave ensemble forecast
The purpose of this section is to develop comparisons of state-of-the-art NCEP and FNMOC
ensemble forecasts with buoy data and against higher resolution deterministic forecast. Both areas
of interest – that is to say North Sea and Taiwan regions - will be treated separately using various
performance indicators.
For a matter of redundancy and length, only parts of relevant figures will be shown in this paper.
3-1 In the North Sea
The area is characterized by high waves events with Hs reaching values higher than 7meters.
Those events represent a particularly sensitive matter for this study and will therefore be given top
priority during the project.
Figure 3-1 illustrates the maximum, minimum and median predicted values of ensemble forecast on
a period covering several of these events along with observation and ACTIMAR deterministic
forecast’s predictions at buoy B62164.
Figure 3-1: Observations and deterministic forecast along with maximum, minimum and median values
predicted by NCPE ensemble forecast from the 10th
of January 2013 to the 1st
of April 2013 at buoy B62164

19
High waves events with Hs exceeding 6 meters are detected on 18/01/2013, 05/02/2013, 14/02/2013
and 19/02/2113 along with lower Hs peaks. Except from the first one, at all these events, both
deterministic and ensemble predictions highly underestimated the significant height by 1.5 to
3meters. This tendency is confirmed at smaller Hs peaks at which predictions often reveal to be
under observations. Furthermore, deterministic predicted values appear to be 0.5 to 1meter lower
than the ones predicted by the ensemble forecast – lower than the minimum value given of the
ensemble.
Another remarkable fact is the low spread of the ensemble on most parts of the visualized period;
indeed, the variations between minimum and maximum values of the ensemble never overcome
0.30 meters and they regularly appear to be superimposed – variations lower than 10cm within the
ensemble.
During lower Hs events, predictions are generally closer to observations and show approximately
the same order of magnitude for differences between minimum and maximum values of the forecast
with significant variations from date to date. For instance, these differences amount to more than
0.5m on 15/01/2013 but lead to almost superimposed curves on 17/01/2013.
Figure 3-2 presents a box-and-whiskers representation of the ensemble predictions focusing on a
smaller time period still at buoy B62164 – refer to Section 2-3 for further information about this
representation.
Figure 3-2: Observations and deterministic forecast along with box-and-whiskers representation of NCEP
ensemble forecast from the 1st
of March 2012 to the 22nd
of March 2012 at buoy B62164.

20
On this period, two relatively high wave events were recorded on 08/03/2013 and on 19/03/2013. In
both cases predictions were below observations for both deterministic and ensemble forecasts. It is
interesting to notice that predictions seem to be closer to observations during decreasing Hs phase
than when Hs increases with also smaller variability. Except from two dates, boxes and whiskers are
short, values in the ensemble showing very little spread. At dates where high Hs values were
recorded, this indicates a lack of variability as they represent periods where the sea state is the most
unpredictable, thus variability should be at its highest.
Like previously observed, the deterministic forecast predicted significant heights even lower than
for the ensemble forecast, typically 0.5meters lower.
Similar tendency of underestimating significant height during high wave events is observed
at buoy B62145. The highest Hs values recorded by the buoy amount to approximately 5meters
while predictions often stay 1meter below that limit, many lower peaks were however well
predicted.
It also appears that variability within the ensemble of the forecast is higher, as minimum and
maximum predicted values are generally further from one another as shown on Figure 3-3.
ensemble forecast from 22nd
of November 2012 to the 2nd
of December 2012 at buoy B62145.
Both boxes and whiskers components shown are longer, especially during higher wave
events. On 25/11/2012 and 28/11/2012 higher ensemble variability is noticed which is consistent
with the Hs peaks of 5.2 and 3meters recorded by the buoy.
Other buoys show intermediate behaviors in terms of variability. The underestimating tendency is
nevertheless recurrent at all locations. The use of scatter plots (Figure 3-4) permits to quantify this
tendency by plotting predicted Hs values against observations. The regression line computed on the
dataset gives an indication of the deviation from measures.
The first characteristic of scatter plots to be mentioned is the position of regression line which falls
below the identity line at high Hs values. Hence, observations are statistically higher than

21
predictions during high waves events. Furthermore, despite the fact that best member and the
median value are similar, it appears that the median value of the forecast is closer to observations
than the other indicators as its correlation coefficient is higher. Statistically, its regression line is
also closer to the identity line.
The maximum value sometimes reveals to give good results for 0-24h predictions, but always falls
far from observations at higher lead times.
For this reason, the median value of the ensemble will be considered as the most effective way to
characterize the forecast for the purpose of this study. The table in Appendix B gathers all RMSE
and correlation coefficients values at all buoys and lead times and illustrates well the effectiveness
of the median value compared to others indicators.
At all buoys, the regression lines of median value’s dataset are slightly below the identity line with
correlation coefficients still higher than for other indicators.
The underestimating tendency is confirmed even if its impact does not seem to be as strong on
scatter plots as it seems on time series.
Figure 3-4: Scatter plots of best member, median value and maximum value of ensemble forecast (0-24h) against
observations at buoy B62145.

22
Using the median value as characteristic quantity of the ensemble performances, comparisons with
deterministic predictions were performed. Scatter plots of both median and deterministic predictions
were drawn on Figures 3-5 & 3-6.
The median value is systematically better at high Hs at all lead times. Even if deterministic
forecasts seem to show lower spread than median value, high Hs values are systematically
underestimated and median values appear to be more appropriate for medium to long range
predictions.
Figure 3-5: Scatter plots of median value of ensemble and deterministic forecast (0-24h & 48-72h) against
observations at buoy B62127 and B62164.

23
Results of the persistence analysis performed at two buoys in the North Sea are plotted
below on Figure 3-6. The number of detected windows represents the total number of 12h, 24h,
36h, 48h and 72h windows.
The probability function of the number of windows was computed using a kernel smoothing density
estimator implemented in Matlab toolboxes from values given by all members of the forecast.
Results vary from buoy to buoy both in terms of number of windows and of quality of the
prediction relatively to observations, however ensemble predictions are in good agreement with
observations except from buoy B62127 at which predicted total number was lower of
approximately 100 windows than observed value.
Deterministic forecast’s result in terms of number of windows is also given along with observation
and ensemble values. At all buoys in the North Sea, deterministic forecast shows a number of
windows always way higher than the observations and are further from them than ensemble
forecasts.
Tables with results for all buoys are given in Appendix B. Equivalent percentage uptime – as
defined in Section 2 - for given threshold and window duration are given for observations,
deterministic forecast as well as minimum, median and maximum value of ensemble forecast.

24
Figure 3-6: Total number of detected windows of 12h, 24h, 36h, 48h and 72h below the 1.5m threshold at buoys
B62145 and B62127.

25
3-2 In Taiwan
In the area near Taiwan, lower significant heights are observed than in the North Sea although Hs
peaks happen more frequently.
Figure 3-7 & 3-8 illustrates the maximum, minimum and median predicted values of NCEP
ensemble forecast on a period covering several of these events along with observation and
ACTIMAR deterministic forecast’s predictions at buoy C6V27 and 46699A.
of October 2012 to the 29th
of November 2012 at buoy C6V27
of November 2012 to 9th
of January 2013 at buoy 46699A
According to these figures, the behavior of predictions varies a lot from one buoy to another. A
tendency of overestimating significant wave height may be noticed but is not clear on all time series
and will therefore have to be confirmed. Moreover, unlike in the North Sea, this tendency does not
reveal to be limited to high wave events but covers the whole dataset and thus, may originate from a
systematic bias either in observations or in predictions.
Concerning the variability of the area in terms of Hs, peaks of higher waves can be observed every
3-5 days from 1meter above the average to more than 3meters. The average significant wave height

26
is relatively high – approximately 2meters – and Hs seem to hardly ever fall below a 1m threshold.
Figure 3-9 shows the observations, deterministic prediction and the ensemble forecast as a box-and-
whiskers representation at buoy B46757B.
ensemble forecast from 22nd
of November 2012 to the 2nd
of December 2012 at buoy 46757B.
Just as in the North Sea, variability within the ensemble is low and only increase during short
periods. Unlike previously, Hs peaks seem to suffer from time shifts from time to time in addition to
the errors in the predicted height.
Variability seems also to be higher during increasing Hs events which is consistent with the
tendency observed in the North Sea.
Predictions alternatively fall above and below observations and no overestimating global tendency
can be confirmed. Deterministic forecast stays very close to ensemble predictions except at dates
when Hs peaks are observed, the deterministic prediction is systematically higher.
No evidence can be given to determine which forecast is closer to observations on time series.
The median value of the ensemble was taken as a characteristic quantity of the forecast as it gives
statistically better results than others.

27
Figure 3-10: Scatter plots of median value of ensemble and deterministic forecast (0-24h & 48-72h) against
observations at buoy 46778A and C6V27.
Ensemble forecast in Taiwan seems to give better result at both high and low Hs with higher
correlation coefficient than deterministic forecast according to scatter plots on Figure 3-10.
The tendency of predictions is to overestimate Hs values lower than a threshold while values higher
are underestimated. The threshold value varies with buoys and lead times from 0.5m to more than
3meters.

28
Figure 3-11a: Total number of detected windows of 12h, 24h, 36h, 48h and 72h below the 1.5m threshold at buoy
C6V27.
The persistence analysis performed on buoys in Taiwan follows the same methods as the one used
for the North Sea.
The total number of windows varies with the position of the buoy as they are not exposed to the
same sea states and ocean depths.
At all locations, results are slightly further to observations than they were in the North Sea but still
in relatively good agreement. Deterministic forecast regularly gives results similar to ensemble
predictions in Taiwan in terms of persistence analysis.
Tables with results for all buoys are given in Appendix A. Equivalent percentage uptime – as
defined in Section 2 - for given threshold and window duration are given for observations,
deterministic forecast as well as minimum, median and maximum value of ensemble forecast.

29
Figure 3-11b: Total number of detected windows of 12h, 24h, 36h, 48h and 72h below the 1.5m threshold at buoy
46757B.
3-3 Wind and wave variability
In both areas, ensemble predictions are characterized by a weak variability during high Hs
events which does not prevent them to predict well these events compared to deterministic
prediction. Persistence analysis demonstrates that the variability of the areas is well predicted.
In Taiwan, high Hs are often better predicted than lower ones while the opposite applies in the
North Sea. Considering all indicators, ensemble predictions are promising in operational context as
they give regularly better result than higher resolution deterministic forecast. The lack of ensemble
variability and the underestimating tendency of high significant heights represent however a major
obstacle to be overcome.
Considering both areas, the origin of the low ensemble spread was studied. The main factor which
is likely to underlie this trend is the existence of a similar tendency in the wind fields’ ensemble.
Time series of wave and wind ensembles were plotted simultaneously in a box-and-whiskers
representation to qualitatively estimate a possible link.

30
Figure 3-12: Wind magnitude, ensemble median along with observations and deterministic forecast at buoy
C6V27 from 12/12/2012 to 24/12/2012.
The lack of variability within the ensembles can be noticed on both wave and wind fields during
high waves events. Even if a correlation seems to exist, it must be moderate as an increase in the
spread of wind ensemble does not systematically lead to an equivalent increase in wave ensemble
such as on the 16th
of December 2012 at buoy C6V27 or on the 23rd
of October 2012 (Figure 3-12
& 3-13).
Except from occasional events which regularly occur in the time series, wind and wave ensemble
spreads seem to be relatively well correlated in the North Sea as well as in Taiwan.

31
Figure 3-13: Wind magnitude, ensemble median along with observations and deterministic forecast at buoy
B62127 from 20/10/2012 to 30/10/2012.
To quantify the correlation between both ensembles, scatter plots and QQ-plots of the wind and
wave normalized amplitudes were drawn as illustrated on Figure 3-14.
Normalization of amplitudes used the standard score formulation as given below:
σ
XX
Z
−
=
, where Z is the normalized value, X represents raw value of the population, X is the mean of the
population and σ its standard deviation.

32
Figure 3-14: Scatter plot and QQ-plot of normalized wave amplitude on normalized wind amplitude at buoy
B62145.
These scatter plots and quantiles-quantiles diagrams were similar at all buoys despite small
variations in the regression coefficients computed.
In all cases the regression line of the scatter falls below the identity line, hence indicating that the
spread of wave ensemble is lower than wind ensemble’s one with slope varying from 0.3 to 0.6.
The distributions of amplitudes computed from both ensembles are however well correlated as
values in the QQ-plot gather around the identity line except from highest values.
Variability of wind ensembles may then represent a major factor underlying the weak variability
observed in significant wave height ensembles; however it does not fully explain this low spread as
the correlation exists but is moderate.
Ensemble forecasts are promising in operational context but their resolution does not seem to be
sufficient in order to predict at best high wave events or structures with small temporal and spatial
extent. Wind forcing resolution appears also to be critical when considering wave forecasts.
It also appeared that median value of the forecast is efficient for characterizing its performances for
medium to long ranges predictions (48h, 72h and 96h).
Ensemble forecast performances have then revealed to be insufficient for extremely precise
operational forecast as they underestimate high wave events and show less variability within the
ensemble than expected which may lead to a loss a one of the greater strength of ensemble forecast.
Thus, ways to improve these forecasts without increasing computational cost have to be found
either by acting directly on these forecasts, either by running higher resolution ones.

33
4 Improvement of Wave Ensemble Forecasts
4-1 Improvement of State-of-the-art forecasts
Ensemble forecasting basically requires exceptional computational cost as it can involves up
to several dozens of simulations. Indeed, each member of a wave ensemble forecast originates from
an independent perturbed wind forcing field and an initial sea state. These wind fields often comes
from the outputs of an atmospheric ensemble forecast run in parallel in the same forecast center.
Hence, each operational ensemble forecast normally needs as many simulations as twice the number
of members to be effective. However, some solutions exist to reduce this number of simulations;
first by improving directly existing forecasts or by generating wind forcing fields faster.
Linear Shift
As predictions at all locations considered tend to underestimate significant height during high wave
events, a common pattern was searched in the distribution of all Hs time series in order to find a
simple linear transformation to apply to all datasets.
QQ-plots of predicted Hs values on observations were drawn at all buoys for 0-24h, 24-48h, 48-72h
and 72-96h lead times and linear regression was performed on datasets.
Figure 4-1: QQ-plot of wave ensemble quantiles on observation data quantiles for 0-24h, 24-48h, 48-72h, 72-96h
predictions at buoy 46778A.

34
Obviously, the shape of the quantile-quantile diagram at buoy 46778A suggests that the predicted
distribution is non-linearly correlated with observations’ distribution, and characterized by a
positive skew according to the observed concavity.
Linear regression was nevertheless performed on the dataset in order to make it fit the distribution
of observations at best with the simplest transformation.
Intercepts of the regression lines at buoy 46778A vary from -0.0051 to 0.11 and slopes from 0.72 to
0.94. Thus, no simple transformation can be applied to predictions at all lead times.
The same exercise was done at other buoys including B62164. The opposite concavity is observed
from buoy B62164 to 46778A, the correlation between predicted and observed distributions is
however higher as the point cloud is gathered closer to the identity line.
Slopes of regression lines vary from 0.89 to 0.94 and intercepts from 0.051 to 0.23. All values for
regression lines of all lead times and buoys are given in Table 4-1.
Figure 4-2: QQ-plot of wave ensemble quantiles on observation data quantiles for 0-24, 24-48h, 48-72h, 72-98h
predictions at buoy B62164.
As coefficients needed to perform a transformation shifting all datasets closer to observations vary a

35
lot with location and lead time of the forecast, it seems impossible to easily improve performances
of state-of-the-art forecasts with a simple linear transformation.
1 day lead time 2 days lead time 3 days lead time 4 days lead time
B62145 Y=0.057 + X*0.940 Y=0.003 + X*1.000 Y=-0.007 + X*1.000 Y=-0.021 + X*1.100
B62164 Y=0.080 + X*0.890 Y=0.051 + X*0.920 Y=0.058 + X*0.940 Y=0.230 + X*0.900
B62127 Y=0.062 + X*1.000 Y=0.025 + X*1.100 Y=0.015 + X*1.100 Y=-0.067 + X*1.200
B63113 Y=0.240 + X*1.000 Y=0.260 + X*1.000 Y=0.058 + X*1.100 Y=-0.096 + X*1.200
46778A Y=-0.005 + X*0.940 Y=0.056 + X*0.810 Y=0.081 + X*0.760 Y=0.110 + X*0.720
46699A Y=0.140 + X*1.400 Y=0.240 + X*1.200 Y=0.280 + X*1.200 Y=0.280 +X*1.200
46757B Y=-0.110 + X*1.000 Y=-0.054 + X*0.980 Y=-0.053 + X*0.940 Y=-0.048 + X*0.960
C6V27 Y=0.290 + X*0.950 Y=0.290 + X*0.940 Y=0.260 + X*0.950 Y=0.270 + X*0.970
Table 4-1: Slopes and intercepts of regression lines from QQ-plots for J0 to J3 at all buoys
This method may be applied for local conditions and in the case where a lot of time is available to
perform climatologic analysis, this is however not generally the case in operational context.
Therefore, the solution cannot be retained.
Empirical Orthogonal functions
Another alternative solution to create wind ensemble requiring way less computational time and
using Empirical Orthogonal Functions (EOF) was also considered. Unfortunately, the time lacked
during my master thesis to investigate fully this method and to assess its performances. Only
theoretical matters were studied.
EOFs represent orthogonal basis functions of a signal or a dataset accounting each for as much
variance as possible. They are typically obtained by computing eigenvectors of the covariance
matrix of the dataset.
Thanks to this method, perturbed wind fields are generated from one single unperturbed wind field.
The spatial structure of perturbations is given by EOFs, thus maintaining most spatial properties of
the undisturbed field throughout the process. The method is often known as geographically
weighted principal component analysis (PCA) in geophysics.
The principle of the method developed in this study is to run atmospheric model in the chosen area
on a long period –typically one or several years – once and for all. This initial run would give the
overall structure of errors in the area, taking into account every situation encountered within the
period considered.
An ensemble representation of the error covariance is thus provided and analyzed in order to extract
eigenvectors and eigenvalues of error covariance matrix. Each eigenvector represents a direction of
the spatial structure of errors and the higher the associated eigenvalue is, the more variance the
direction will account for. By simply introducing random errors following the directions given by
the EOFs in a single wind field, it is then possible to obtain up to several dozens of wind ensemble
members without any additional computational cost.
This method combined with the clustering algorithm may permit to create even more accurate
reduced forecasts. Indeed, EOF based forecasts can easily reach more than 20members, merging
more members together in the clustering phase and therefore making the choice of clusters more
efficient.

36
4-2 High resolution wave ensemble forecast
In the previous sections, the performances of existing forecasts along with statistical
methods to reduce the computational cost when dealing with ensemble forecasts were investigated.
Still, the question of the performances of high resolution operational ensemble forecasts remains
unanswered. This very question was studied with predictions realized in the Tierra del Fuego area as
defined in Section 2-2. The main question underlying this section is to study how forecasts are
improved when wind forcing fields and bathymetry are refined.
High resolution wave ensemble forecast validation
Validation procedures were based on satellite wave observations on the area measured by
Jason 2 during March and April 2013.
So far, results of the WW3 runs performed at a high resolution are not satisfactory as can be
observed on Figure 4-3 and 4-4 which shows the main characteristics of the comparison of both
GEOWaFS and High Resolution forecasts respectively with the observations from a swath
performed around the 10th of March 2013.
Figure 4-3: Satellite measurement and GEOWaFS forecast around the 10th
of March 2013 in Tierra del Fuego.

37
Figure 4-4: Satellite measurement and ACTIMAR’s forecast around the 10th
of March 2013 in Tierra del Fuego.
From these Figures, two details can be mentioned. On the swath considered, the ensemble
predicted by GEOWaFS seems to be in better agreement with JASON2’s satellite observations as it
shows lower mean RMSE and higher mean correlation coefficient with measurements of
respectively 0.20 and 0.65 than ACTIMAR’s high resolution forecast which reaches a value of 0.39
for mean RMSE and 0.62 for the mean correlation coefficient. Therefore, so far, the high resolution
predictions are not accurate enough to overcome state-of-the-art forecasts. The causes of these
disappointing results may be various and will be developed further in this thesis.
The second point to be mentioned is the higher variability observed in the ensemble. As was
demonstrated previously, a sufficient spread is needed to ensure good quality of results and this
condition is satisfied. In regards to this matter, the high resolution forecast fulfils our expectations.
The process used to run these forecasts have nevertheless to be revised.
The limits of the high resolution forecasts are various. The choice of parameters of the model may
be partly responsible. Indeed, the parameters used in the WaveWatch III model are the same as the
one used for the global deterministic wave forecast at ACTIMAR, more accurate results may be
obtained by tuning these parameters to fit at best the Tierra del Fuego area on the nested high
resolution grid. A complete study of the influence of these parameters should then be performed to

38
find the best combination possible.
The input wind fields may be another limiting factor. With a lower resolution than the one of the
grid, many phenomenons might be badly predicted. The validity of the wind ensemble has also to
be checked against in-situ observations and against NOAA’s wind fields. A bias in the wind fields
could easily explain the poor quality of results.
The lack of time did not permit to fully investigate all aspects of high resolution forecasts,
directions for further research will include, amongst others, studying the influence of the
improvement of the wind forcing and the bathymetry respectively.
Clustering
In the context of high resolution ensemble forecasts, computational times – already high in
all ensemble forecasts - begin to be critical and must be reduced somehow to assure an operational
viability.
As mentioned previously in this paper, ensemble members are chosen in order to reflect the
uncertainty in the observations. Considering how important the variability of wind forcing is in
regards to this matter, it is common sense to try to maintain the widest range of wind magnitude
values in the different input wind fields.
Therefore, a way to reduce the number of wind fields without deteriorating wind variability was
investigated. Involving classification methods, it consists in merging members with similar
variability together.
K means clustering method was retained amongst many other possibilities as it was easy to adapt to
a context of ensemble forecast and is known for converging quickly with the right heuristic
algorithms despite its computational complexity. The principle of this method is very simple which
makes it easy to study and use.
The Matlab algorithm used for the purpose of this study consists in two iterative steps:
• An assignment step: which consists in assigning each observation to the cluster
whose mean – called centroïd - is the closest. The squared Euclidean distance was
used to compute distances in the dataset.
• An update step: during which the centroïds of clusters are computed again by
including in the averaging phase the newly added points.
The algorithm stops and is said to have converged when the assignment does not change.
In other words, the algorithm tends to minimize the sum of within clusters distances from points to
the centroïds.
The clustering is performed by heuristic algorithms which do not find always the best solution but
an approximate one. A local minimum of the sum of within clusters distances can be found instead
of the absolute one, the convergence towards one or another solution being mostly related to the
initial partition of the dataset. As described in the literature [Gong and Richman 1995], k means
algorithms are indeed very sensitive to initial partition: the first guess of clusters partition of the
dataset.
An easy way to fix this problem, made possible by the quick convergence of the algorithm, is to
run it several times with different initial partitions and find the lowest local minimum. 2000 runs of

39
the kmeans algorithm were then performed on the dataset in order to approximate at best the
minimum. The best solution amongst all these local minimums is then considered as the best
partition of the data.
In the context of operational ensemble forecast, the clustering should be performed each day of
simulation and take into account predictions at all lead times in order to adapt at best to spatial and
temporal variability. In regards to this need, the dataset on which the clustering is to be performed
consists in a 2D-table where each row represents the wind magnitudes for one component measured
for one member of wind ensemble. Each column stands for one grid point at a given date. Figure 4-
5 illustrates this layout.
Figure 4-5: Layout of input dataset for clustering. X and Y stand for the number of grid points in longitude and
latitude respectively, U(i , j) represents predicted wind magnitude in zonal direction at grid point (x=i , y=j).
(J0,T0) to (J4,T0) are the lead times of the prediction with (J0,T0) = 00h00, (J0,T1) = 06h00, …, (J1,T0) = 24h00.
For the purpose of this study, 24 points in latitude and 34 points in longitude were taken for the
wind fields with a spatial resolution of 1°x1° and a temporal resolution of 6 hours. Predictions were
used at lead times up to 4days: 0-96h. 20 different wind fields were available from WRF
simulations.
Another detail to be mentioned concerning the k means clustering method is that the number of
clusters to be computed is an input parameter. Thus, preliminary runs were to be made in order to
estimate the optimal number of clusters balancing the future computational cost and the accuracy of
the solution.
From the initial ensemble of 20 wind fields, it rapidly appeared that creating less than 7 clusters led
to a significant lack of wind variability. With more than 11 clusters, each additional member did add
only little information to the overall ensemble as the distribution of amplitude values was already
close to the initial members’ one.
. . .
. . .
1st member
2nd
member
.
.
.
X * Y grid
points at J0, T0
X * Y grid
points at J0, T1
X * Y grid
points at J0, T2
X * Y grid
points at J0, T3
U(i , j) U(i , j+n) U(i+n , j+n)U(i+n , j)

40
Figure 4-6: Probability Density Function of the amplitude of 7 and 10 clusters ensembles and correlation
coefficient with initial ensemble of 20 members.
It appeared that the within ensemble wind amplitudes obtained from the ensemble of 7 clusters are
significantly lower than both ensemble of 10 clusters and 20 members as illustrated by the
distributions on Figure 4-6, The 10-clusters ensemble being slightly closer to the initial ensemble
as shown by the higher correlation coefficient. Considering only wind distributions, the gain in
performance using the 10-clusters ensemble seems to overcome the additional computational cost.

41
The same analysis was however performed after computation of the respective wave predictions.
Significant wave height predictions of the model using from 7 to 11 clusters were thus compared in
order to estimate the gain in performances, keeping computational cost in mind.
Figure 4-5 presents a box-and-whiskers representation of waves ensemble composed of 20
members, 7 clusters and 10 clusters respectively on a J0 to J3 prediction.
Unlike what could be inferred from wind distributions, wave forecasts are very close to one another
when computed from 7 or 10 clusters. Moreover, they are also close to the initial 20-members
forecast except from a few dates at the end of the forecast around the 5th
of July 2013.
Unfortunately, as this analysis took place relatively late during my master thesis, it only considers a
very short time period. Still, it gives a good preview of the expected performances of such a method
which are very satisfactory.
Further in this study, 7-clusters ensembles were considered as they need less computational cost and
seem to give good results.
Figure 4-7: Box-and-whiskers representation of 7-clusters, 10-clusters and 20-members ensemble forecasts from
the 1st
of July 2013 to the 6th
of July 2013.

42
As it was mentioned previously in this study, the median value has proven to be the most effective
characteristic figure of the forecasts. Therefore, it seems mandatory to study the impact of
clustering on the median of ensembles. On Figure 4-7 results for a 7-clusters ensemble are
presented. The median values of the initial forecast from the 1st
of July 2013 to the 6th
of July 2013
are plotted along with the median of a 7-clusters ensemble and a weighted 7-clusters ensemble. This
latter ensemble consists in replicating values within the 7-clusters ensemble as many times as the
number of initial members which were merged in each cluster. In other words, if the 1st
cluster
gathers 4 different members, its value will be repeated 4 times in the weighted ensemble. Thus, we
obtain a 20-members ensemble filled with cluster values, taking into account the probabilities for
each cluster value to happen.
It is clear on Figure 4-8 that the median values computed from the initial ensemble or a cluster
ensemble are much the same, the clustering seems then to maintain well the variability of the
ensemble throughout the process. The median value can then be used as well to characterize a
forecast composed of a cluster ensemble.
Figure 4-8: Median values of 20-members, 7-cluster and weighted 7-clusters ensembles from 2013/07/01 to
2013/07/06.
Once the number of clusters was set, the question of the proper use of the output information given
by the clustering had to be raised. Centroïds positions, as well as ensemble members constituting
each cluster, represent the main outputs of the method, along with centroïd-point distances values
which were used for additional check of results.
I investigated two methods:
All constituting members of the clusters were at first replaced by their respective centroïds.
Thus, each member had an influence on the final “cluster ensemble” weighted by its distance from
the centroïd.
The main drawbacks of such a method are that the final “cluster ensemble” appeared to be slightly

43
smoothed by the within cluster averaging, therefore the range of amplitudes – maximum of the
ensemble minus minimum of the ensemble - was reduced. Moreover, this method meant performing
several data conversion from Matlab format to NetCDF then to a specific format read by the wave
model (.wnd).
The second method consists in replacing all constituting members of a cluster by the closest
member from the centroïd, thus avoiding any averaging between members or any data conversion.
Several limits can be pointed out in regards to this clustering technique. The computational power
and memory required to run the algorithm on huge datasets do not permit to take into account a long
time period for the computation of clusters and it must be run separately for each date, therefore the
selected members may change from one run to the following one.
In order to avoid artefacts in the time series, a gap was intentionally left at the beginning of each run
and a simple interpolation was performed to connect both sides of the gap. The loss of information
is then minimized with low computational cost.
Another limit which still remains is the dependence of the algorithm to the initial partition of the
dataset. Despite the high number of different runs made in order to reduce its influence, this
dependence leads to clusters slightly different from one to another on the same dataset.
However, as these variations mostly take place between members predicting relatively close Hs
values, the overall performance of the forecast is not really impacted. Members which are the
furthest from the others are systematically selected as they reproduce limit cases.
A quality check is also systematically performed after the computation of clusters to make sure the
final ensemble is not too far from the initial one.
The probabilities inherent to ensemble forecast on the most probable sea state to happen are not left
aside as the number of members for which each cluster accounts is stored systematically.
Thus, the most probable sea state can then easily be determined by the probability density function
of the ensemble values.
Despite the short time period covered by this analysis, clustering methods show impressive
effectiveness to reduce computational cost while maintaining at best the quality of the forecast.
Studies on longer periods will have to be performed in order to assess the performances more
precisely and check their consistency in time.

44
5 Conclusions
State-of-the-art wave ensemble forecasts revealed to be more efficient in prediction of
significant height than higher resolution deterministic forecast as they take into account the
uncertainty in the initial conditions. In both the North Sea and near Taiwan at shallow and high
depths, better results are noticed especially at longer lead times. It appeared also that the median
value of forecasts characterize well their performances. However, a lack of variability sometimes
appears within the ensemble mainly related to a lack of variability within the wind forcing
ensemble. Thus, they may not be sufficient in regard to very sensitive marine operations.
Several statistical methods were investigated to produce ensemble forecasts at lower
computational cost. Clustering methods were studied in particular as they proved to be an effective
way to gather correlated ensemble members, thus permitting to reduce the size of the ensemble –
with the smallest loss of information. Preliminary results with the simplest K-mean clustering
method are very encouraging. The generation of wind ensemble via Empirical Orthogonal
Functions (EOFs) was also studied in a very theoretical way and may represent a direction of
further research.
High resolution ensemble forecasts represent a possible improvement of existing forecasts.
Indeed, the high resolution forecast run in Tierra del Fuego area shows a higher variability than
lower ensemble forecasts – which represents their main limit. However, a non negligible bias
sometimes appear in these forecasts, probably related with poorly tuned model parameters and wind
forcing fields not accurate enough to reproduce well all phenomenon. An in-depth study of the
influence of these parameters, along with the influence of improvement on wind forcing fields and
bathymetry, should be conducted in the future to improve the quality of high resolution forecasts.

45
Bibliography
[1]: Les vagues: un compartiment important du système terre
Ardhuin, F, 2012 (Course)
[2]: Intercomparison of the performance of operational ocean wave forecasting systems with
buoy data
Bidlot, J.R., D.J. Holmes, P.A. Wittmann, R.L. Lalbeharry, and H.S. Chen / Weather
Forecasting 2002, 17, 287-310.
[3]: Performance of the ocean wave ensemble forecast system at NCEP
Cao, D., H.L. Tolman, H.S. Chen, A. Chawla and V.M. Gerald / MMAB contribution
No.279, 2009 (available at http://polar.ncep.noaa.gov/mmab/papers/tn279/mmab279.pdf)
[4]: A limited area wave ensemble prediction system for the Nordic seas and the North Sea.
Carrasco, A. and O. Saetra / Report No.22/2008, Meteorology and oceanography, ISSN:
1503-8017, Dec.2008
[5]: Wave modeling – The state of the art
Cavaleri, L. et al / Progress in Oceanography 75 (2007) 603-674
[6]: Ensemble Prediction of Ocean Waves at NCEP
Chen, H.S / Proceedings of the 28th
Ocean Engineering Conference in Taiwan, NSYSU,
2006
[7]: On ensemble prediction of ocean waves
Farina, L. / Tellus - Series A: Dynamic Meteorology and Oceanography (2002), Vol. 54,
Issue: 2, Pages: 148-158.
[8]: On the Application of Cluster Analysis to Growing Season Precipitation Data in North
America East of the Rockies.
Gong, Xiaofeng, Michael B. Richman / J. Climate, 1995, 8, 897–931.
[9]: Dynamics and Modeling of Ocean Waves
Komen, G.J., L. Cavaleri, M. Donelan, K. Hasselmann, S. Hasselmann and P.A.E.M, Jansen
/ Cambridge University Press 1994, 532pp.
[10]: Ocean Waves: The Stochastic Approach
Ochi, M.K. / Cambridge University Press 1998, 319pp.
[11]: Forecasting wave height probabilities with numerical weather prediction model
Roulston, M.S, J. Ellepola, J. von Hardenberg, L.A. Smith / Ocean Engineering 32 (2005)
1841-1863
[12]: Ensemble forecasting at NMC: The generation of perturbations

46
Toth, Z. and E. Kalnay / Bulletin of the American Meteorological Society Vol. 74, No. 12,
Dec. 1993
[13]: Ensemble Forecasting at NMC and the Breeding Method
Toth, Z. and E. Kalnay / Monthly Weather Review, AMS, pp.3297-3319, Dec. 1997
[14]: Statistical Methods in the Atmospheric Sciences
Wilks, D.S / International Geophysics Series, Vol.100, 676pp.
[15]: A perturbation method for hurricane ensemble predictions
Zhang, Z and T. N. Krishnamurti / Monthly Weather Review, 1999, 127, 447-469

47

stouffl_hyo13rapport

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to stouffl_hyo13rapport

Similar to stouffl_hyo13rapport (20)

stouffl_hyo13rapport