SlideShare a Scribd company logo
1 of 21
Download to read offline
Contents lists available at ScienceDirect
Renewable and Sustainable Energy Reviews
journal homepage: www.elsevier.com/locate/rser
Forecasting long-term global solar radiation with an ANN algorithm
coupled with satellite-derived (MODIS) land surface temperature (LST) for
regional locations in Queensland
Ravinesh C. Deoa,c,⁎
, Mehmet Şahinb
a
School of Agricultural Computational and Environmental Sciences, Institute of Agriculture and Environment (IAg & E), International Centre for Applied
Climate Sciences (ICACS), University of Southern Queensland, Springfield, QLD 4300, Australia
b
Department of Electrical and Electronics Engineering, Siirt University, 56100 Siirt, Turkey
c
Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, Gansu, China
A R T I C L E I N F O
Keywords:
Satellite-based solar model
Neural network
Multi-linear regression
ARIMA model
A B S T R A C T
Forecasting solar radiation (G) is extremely crucial for engineering applications (e.g. design of solar furnaces
and energy-efficient buildings, solar concentrators, photovoltaic-systems and a site-selection of sites for future
power plants). To establish long-term sustainability of solar energy, energy practitioners utilize versatile
predictive models of G as an indispensable decision-making tool. Notwithstanding this, sparsity of solar sites,
instrument maintenance, policy and fiscal issues constraint the availability of model input data that must be
used for forecasting the onsite value of G. To surmount these challenge, low-cost, readily-available satellite
products accessible over large spatial domains can provide viable alternatives. In this paper, the preciseness of
artificial neural network (ANN) for predictive modelling of G is evaluated for regional Queensland, which
employed Moderate Resolution Imaging Spectroradiometer (MODIS) land-surface temperature (LST) as an
effective predictor. To couple an ANN model with satellite-derived variable, the LST data over 2012–2014 are
acquired in seven groups, with three sites per group where the data for first two (2012–2013) are utilised for
model development and the third (2014) group for cross-validation. For monthly horizon, the ANN model is
optimized by trialing 55 neuronal architectures, while for seasonal forecasting, nine neuronal architectures are
trailed with time-lagged LST. ANN coupled with zero lagged LST utilised scaled conjugate gradient algorithm,
and while ANN with time-lagged LST utilised Levenberg-Marquardt algorithm. To ascertain conclusive results,
the objective model is evaluated via multiple linear regression (MLR) and autoregressive integrated moving
average (ARIMA) algorithms. Results showed that an ANN model outperformed MLR and ARIMA models
where an analysis yielded 39% of cumulative errors in smallest magnitude bracket, whereas MLR and ARIMA
produced 15% and 25%. Superiority of an ANN model was demonstrated by site-averaged (monthly) relative
error of 5.85% compared with 10.23% (MLR) and 9.60 (ARIMA) with Willmott's Index of 0.954 (ANN), 0.899
(MLR) and 0.848 (ARIMA). This work ascertains that an ANN model coupled with satellite-derived LST data
can be adopted as a qualified stratagem for the proliferation of solar energy applications in locations that have
an appropriate satellite footprint.
1. Background review
Forecasting solar energy is an important research area for en-
gineers, energy experts, policy-makers and climate advocates since the
utilization of carbon-free energy with less environmental impacts is a
promising outlook for addressing climate change [1]. There is growing
consensus with healthy debates on the adoption of solar as a substitute
for carbon-based fuels not only from a global perspective but also in
Australia where solar has immense potential due to high insolation, low
rainfall and small fraction of cloud cover leading to less scattering of
solar radiation over large spatial domains [2,3]. Estimatedly, annual
solar radiation is 58 million petajoules, which is 10,000 fold Australia's
energy consumption [4]. However, photovoltaic systems contribute to
7.6% of Australia's annual energy use [5] while modest utilizations are
limited to rooftop water heating system that barely exceed 4500 MW
capacity [6,7]. In spite of the current staggering adoption of solar as a
http://dx.doi.org/10.1016/j.rser.2017.01.114
Received 22 August 2016; Received in revised form 4 December 2016; Accepted 17 January 2017
⁎
Corresponding author at: School of Agricultural Computational and Environmental Sciences, Institute of Agriculture and Environment (IAg & E), International Centre for Applied
Climate Sciences (ICACS), University of Southern Queensland, Springfield, QLD 4300, Australia.
E-mail address: ravinesh.deo@usq.edu.au (R.C. Deo).
Renewable and Sustainable Energy Reviews 72 (2017) 828–848
1364-0321/ © 2017 Elsevier Ltd. All rights reserved.
MARK
renewable energy option, the solar energy utilization is projected to
increase from 7.0 (2007–08) to 24.0 petajoules (2029–30) with
electricity generation from solar power projected to rise from 0.1
(2007–08) to 4.0 terawatt hour (2029–30) [8]. Besides, Renewable
Energy Target sanctioned by the Senate advocated that more than
23.5% of national electricity is to be derived from renewables by 2020
[9]. Government initiatives constantly support opportunities for scien-
tific research [10], firstly, to model solar prospectivity in Australia's
diverse spatial locations (metro and remote) with acceptable accuracy
by high-performance models and secondly, to harness this energy
where it is economically sustainable. This off course, is a strategic
measure to position Australia's clean energy demands and to contribute
to grid supply in the future [11]. In light of this need, in this paper the
forecasting of long-term (monthly and seasonal) global solar radiation
is undertaken for regional sites in Queensland, Australia.
For practical utilization of freely available solar energy in photo-
voltaic systems, heaters and electro-mechanical devices that convert
global solar radiation in a consumer-usable form, quantitative knowl-
edge of solar characteristics (G) is paramount [12,13]. This is because
the extracted amount of electricity is inherently proportional to the G
values received on the earth's surface [6]. This knowledge can assist in
the assessment of solar energy projects and their sustainability [14]. In
remote locations (e.g. the study sites that are trialed in this paper)
where ground-based meteorological networks are limited, an assess-
ment of the viability of solar projects raises the serious need for an
accurate, versatile and robust predictive model that is able to assess the
sustainability of solar-fueled investments. Methods applied for solar
radiation prediction utilize two types of predictive models; the deter-
ministic (mathematical) and data-driven (black-box). Although deter-
ministic approaches have profusely been adopted as described next, the
use of data-driven models especially for the present study region has
been limited, although the research into this area has been gaining
more attention nowadays (e.g. [15,16]).
Angstrom [17] demonstrated the usefulness of deterministic mod-
els for forecasting G using solar exposure and the ratio of terrestrial to
extraterrestrial radiation. Liu and Jordan [18] modelled daily and
hourly diffuse solar radiation, Orgill and Hollands [19] correlated
hourly diffuse radiation with clearness index and Iqbal [20] correlated
clearness index and solar altitude. Spencer [21] used regression
analysis for estimating hourly diffuse solar radiation from global G
values, while Boland et al. [22] used 15 min and hourly datasets to
ascertain if the smoothing generated of hourly data made a difference
to the forecasts. Boland et al. [23] used the logistic function for an
estimation of G. Deterministic equations based on linear (Angström–
Prescott), quadratic [24], cubic [25], logarithmic [26] and exponential
[27] forms were applied for the prediction of G including decomposi-
tion models and correlations between clearness index and diffuse
fraction, diffuse coefficient and direct transmittance [19,28,29].
Huang et al. [30] combined an autoregressive (AR) and dynamical
systems model (Coupled Autoregressive Dynamical System, CARDS)
for one-hour solar radiation forecasting. Based on the required
parametrisation for AR (2) process, Lucheroni and a combination of
the two, this work noted that the CARDS model was able to reduce the
Combination model error by about 33.4%.
Despite the usefulness of deterministic models, their capacity to
extract the pertinent features in predictors that are not incorporated in
their original mathematical formulations, is limited. The model's
complexity is also expected to worsen with a larger number of
predictors when primitive equations used in deterministic models are
modified to accommodate new predictors for forecasting. Angstrom
equation is also inflexible to incorporate this type of change without a
fundamental modification of its original form, that off course, is
achieved at the very expense of increasing the model complexity [31].
Difficulties can also arise in regards to the validity of the assumptions
that are made; for example, the model parameters may be assumed to
be invariant over time (e.g. as if the same sunshine duration persists on
the same days or month), but in fact, this may not hold true. Other
deterministic models, for example, the Iqbal, Gueymard and ASHRAE
[32–34] require information on atmospheric conditions to agree with
assumptions of normality, linearity, distributions, homoscedascity
(variance constancy). As deterministic models rely on measured data,
instrument errors in ground data compound the existing challenges or
degrade the model's predictive accuracy [35,36], as revealed in a review
on merits and challenges of Angström–Prescott models [37].
Advent of data-driven models, as utilised in the present study that
require no information on the physical system related to solar
dynamics and do not need complex mathematical equations, is gaining
a plethora of attention [15,38–40]. In today's era where chunks of data
are collected from measurements and physical models and are also
enhanced through improved data products by satellites and reanalysis
projects, soft-computing algorithms offers a promise for modelling
solar energy from known behaviors of solar variability. The models
requires little rationalization on the physics of solar variability. The
relationship between input(s) and the objective variable is constructed
with a machine algorithm that implements pattern recognition tools
[36]. Besides simplicity, data-driven models make no assumption on
the underlying data distribution but they do demonstrate competitive
performance over deterministic model [15,38,41–43]. In this paper, an
artificial neural network (ANN) model [40,44] is coupled with satellite-
derived land-surface temperature (LST) and is evaluated by multiple
linear regression [45–47] and auto-regressive integrated moving
average [48] model. ANN is a computational paradigm that mimics
the neuronal structure of brain, and learns and identifies complex data
patterns [40,49]. It is noteworthy that the ANN model has recently
captured significant attention in rainfall, streamflow, drought and
temperature forecasting problems [50–53] but its application for
global solar radiation forecasting is yet to be established, although
recent work [40,54–57] has validated the utility of an ANN model
elsewhere. Recent work [40] that reviewed theoretical, empirical,
regression and ANN models for estimation of solar radiation noted
superiority of ANN models over the former methods.
A challenge by any forecasting model is the requirement of
historical data (e.g. temperature) that must be related to an objective
variable (G). Data may not be available in all spatial regions and more
importantly, in remote sites where meteorological stations are largely
absent. For example, Typical Meteorological Year (TMY) Databank for
Australia provides long-term solar observations, but these are only for
less than 40 primary stations [58] and the hourly solar observations are
for about 18 locations [59], which are mostly limited to metropolitan
cities. On reason is that it is not possible to set up experimental
apparatus to acquire the data in all places, and even if it is so, issues of
instrumental maintenance cannot be disregarded [43,49]. Luckily,
remotely sensed data, utilised in this paper, has been identified as a
viable predictor for solar forecasting problems [54,56,60,61]. In this
view, the coupling of ANN model with satellite products is an
improvement over station-based data as the acquisition of satellite
imagery is feasible as long as a footprint is identified. It is also easy to
obtain this data from remote regions (e.g. mountainous terrain) where
meteorological stations are not built or are inaccessible. Satellite data is
in abundance over large spatial and temporal resolutions [14,62–64].
However, the coupling of satellite-derived data with a machine learning
algorithm (e.g. ANN model) in Australia is yet to be undertaken.
Due to better spatial relevance for solar mapping that can be
achieved via remotely sensed data [57], researchers have integrated
ANN algorithms with satellite-derived inputs. Şenkal and Kuleli [61]
estimated G using an ANN model coupled with satellite data to show a
mean square error between estimated and ground monthly average
daily sums as 3.94 MJ m−2
(ANN) and 5.37 MJm−2
(physical model),
respectively. Şenkal [60] estimated G values using an ANN model with
a mean square error of 6.59% while the work of Rahimikhoob [65]
validated an ANN model using satellite temperature data. That study
demonstrated that ANN-models of G yielded better results than
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
829
empirical models. Linares-Rodriguez et al. [56] developed an ANN-
ensemble model for daily G using Meteosat observations with a root
mean square error (6.74%) and correlation coefficient (99%).
Rahimikhoob et al. [57] used an ANN model for an estimation of G
using Advanced Very High Resolution Radiometer. Alternatively,
multiple linear regression (MLR) models were utilised for forecasting
G by examining cause-effect relationships between a set of dependent
variables [49,66]. Despite vast attention on satellite-based solar fore-
casting, to the best of our knowledge, no study has developed an ANN
model in this study region although one study has validated a wavelet-
coupled SVM model with ground-based predictor data [15].
Considering the foresaid, the novelty of this paper is to establish the
utility of an artificial neural network coupled with satellite-based land-
surface temperature for forecasting G in the sunshine state of
Queensland that has abundant solar resources [3,6,67]. This motiva-
tion is driven by a state-wide surge in solar investments, evidenced by
1.47 million project that integrated ground, satellite and atmospheric
data [6,7,67]. Due to surging interest in renewable energy mapping
[2,3,5,8,9,68–70], a quantitative model can ideally be employed as a
strategic ploy by decision-makers (e.g. Department of Resources,
Energy and Tourism, Geoscience Australia, Queensland Office of
Clean Energy and Clean Energy Council) [71].
The aim of this paper is: (1) To extract land surface temperature
(LST) from Moderate Resolution Imaging Spectroradiometer (MODIS)
to be used as an effective predictor for stations in regional Queensland;
(2) To develop an artificial neural network model and evaluate its
performance relative to multiple linear regression and autoregressive
integrated moving average model; (3) To cross-validate its predictive
skill for monthly and seasonal forecasting. The veracity of ANN model
coupled with LST data is established by utilizing a limited set (two
years) data (2012–2013) and forecasted over (target) 2014 period for
21 stations where 14 are used in model development and the
remainder in cross-validation phase. In next section, a theoretical
overview of LST and predictive models are presented, in Section 30.0
the materials and method are presented, while in Section 40.0, the
results are presented. This follows Section 50.0 with discussion and
opportunities for research and Section 60.0 concludes the findings of
this paper.
2. Theoretical framework
In this section an overview of the approach for extracting the
primary predictor (LST), the predictive model (ANN) and its compara-
tive counterparts (MLR and ARIMA) are presented. Considering a set
of predictor (input) variables (e.g. original and/or time-lagged LST,
month, altitude, latitude and longitude) that are related to the objective
variable (global solar radiation, G), the datum points can be written as
a set of X Y x y( , ) = {( , )}i i ki
where the set has between k (=6, 7… 10)
predictor variables and the objective variable, G ≡ yi (t=1… N), the
inputs for ANN and MLR models are in form of an N × k matrix
whereas the objective variable (G) is in form of the N ×1 matrix. Note
that xi (t) is the ith
(i=1… N) row of the input at time t acquired from the
MODIS Terra satellite sensor, as described in Section 2.1.
2.1. Satellite-derived land-surface temperature
To integrate an ANN model with satellite-derived predictor data,
the land surface temperature (LST) was extracted. LST can be
conveniently deduced with remote sensing tools [43,49,60,72–74] that
can investigate the radiative properties of the earth's surface from an
atmospheric window without establishing a physical connection with
the investigated objects [75]. In order to remotely sense the LST data, a
number of satellite-derived sources are available, although their
scanning regions can vary from very small time interval to a range of
electromagnetic spectrum. Satellites that acquire remotely sensed data
include: Geosynchronous Meteorological, Meteosat, Insat, Goes,
National Oceanic and Atmospheric Administration (NOAA) series,
Advanced Spaceborne Thermal Emission Reflection Radiometer,
Fengyung-1C and D and Metop satellites [76]. Among these, the
NOAA series satellites are able to examine properties on the earth
such as wind regime, cloudiness, humidity and sea surface tempera-
ture, fog or frost, glacier areas, rainfall, vegetation index, pressure and
ozone concentration. Also, a number of algorithms have been devel-
oped that can utilize the Advanced Very High Resolution Radiometer
(AVHRR) sensor from NOAA satellite series to extract the remotely
sensed data [76–80].
In this paper, the LST data were extracted from the Moderate
Resolution Imaging Spectroradiometer (MODIS) radiometer from the
NASA-built satellite. Note that the MODIS satellite has two primary
sensors: Terra and Aqua [81–83]. In general, MODIS operates within
the visible light and infrared spectrum and exhibits about 36 spectral
band-widths that span between 0.4 and 14.4 µm wavelengths.
However, the horizontal resolution of the MODIS sensor changes
according to the location where the data is to be acquired and it is
able to observe the earth's surface globally every 1–2 days. Due to its
wide spatial coverage, the sensor can observe and measure meteor-
ological variables such as cloud cover, energy budget, solar radiation,
aerosol optical depth, chlorophyll density and ocean, land and atmo-
spheric processes by calculating the radiation reflected from land and
the atmospheric particles [81].
In this paper, we have extracted the Terra sensor's 31st and 32nd
channel data, with a spectral band of 10.78–11.28 and 11.77–
12.27 µm, respectively [84,85]. It is imperative to note that the most
important factor generating an error in the extraction of the LST, the
atmospheric effect, was minimized by the use of the well-known split-
window formulation [86] that has been used previously in solar
modelling [72]. To reach an accurate estimation of the LST data, an
error value lower than 1 K was set as a threshold for any calculation of
LST between –10 and −50 K. This was realized by using Wan and
Dozier [86] algorithm, that considered that the values of the emissivity
were known. When the LST value was checked for its validity, less than
1 K error was evident in the region of the homogenous land surface
terrains [81,82,87].
Wan and Dozier [86] algorithm is written as follows:
⎡
⎣
⎢
⎤
⎦
⎥
⎡
⎣
⎢
⎤
⎦
⎥LST A A
ε
ε
A
Δε
ε
T T
B B
ε
ε
B
Δε
ε
T T C
= +
1 −
+
+
2
+ +
1 −
+
[ − ] +
1 2 3 2
31 32
1 2 3 2
31 32 (1)
Note that the terms T31 and T32 represent the brightness temperature
obtained from the MODIS Terra channel (31 and 32), respectively, and
Ai, Bi, and C are constants depending on the point of view between 0–
65°. In Eq. (1), the constant terms are defined by the air-ground
temperature and amount of water vapor acquired by the MODIS
simulation data based on regression analysis and ε is the emissivity
while the term Δε is the difference of the emissivity value. The
mathematical equivalence of these statements are:
ε ε ε= 0, 5( + )31 32 (2)
Δε ε ε= −31 32 (3)
where ε31 and ε32 are the emissivity of the considered channels [85,86].
It is important to note that the accuracy of the LST data (in comparison
with measured or ground-based temperature) is determined by simu-
lation data where emissivity, temperature, atmospheric temperature
and water vapor must represent for wider spaces than the real physical
space. To reduce the uncertainty and enhance the reliability of the LST
data, simulated data must include various conditions that affect its final
value [81,88].
2.2. Artificial neural network
In this paper an ANN model coupled with satellite-derived (x)=LST
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
830
data as a primary predictor (with site-specific parameters defined by
altitude, longitude, latitude and solar periodicity (month) as supple-
mentary predictors) was adopted for forecasting the global solar
radiation (G). The aim of the ANN model was to extract patterns
(predictive features) contained in x time-series in order to forecast the
objective variable, G. Fig. 1 outlines a schematic view of the model. An
ANN model is a non-linear modelling technique with a network
architecture that mimics the biological structure of our nervous system
[44]. It has interconnected inputs that are related to the G and is able to
transmit information through weighted connections (i.e. functional
neurons) to map nonlinearly the predictor data features to a high-
dimensional hyper-plane.
This study employed a popular ANN algorithm: Feed-Forward
Back-Propagation (FFBP) that contained multilayer perceptron neu-
rons and have been applied in problems of forecasting solar radiation
(e.g., [49,72]). The FFBP is superior to the other category of ANN
models [89–93] where the neuronal architecture is designed to
successively validate the model's parameters (i.e., weighted connec-
tions and neuron biases) to drive the empirical error to a set tolerance
through each iteration (epoch) of the forward passing of the updated
parameters and backward propagation of errors to fine tune them.
Mathematically, the ANN algorithm can be written as [50,93,94]:
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟∑y x F w t x t b( ) = ( ). ( ) +
i
L
i i
=1 (4)
where xi(t) = predictor (input) variable(s) in discrete time space t, y (x)
= forecasted G in cross-validation (test) data set, L = hidden neurons
determined iteratively, wj (t) = weight that connects the ith
neuron in
the input layer, b = neuronal bias and F(.) is the hidden transfer
function.
As an ANN model is a black-box and does not identify the training
algorithm in an explicit manner without an iterative model identifica-
tion process, this study has trialed several algorithms [93] whose
performances were assessed to select the superior model. MATLAB-
based algorithms used in this paper are classified in three categories:
the quasi-Newton [95] (that utilizes trainlm and trainbfg functions),
the gradient descent (traingdx) [96] and the conjugate Gradient
[97,98] (trainscg, traincgf and traincgp). Quasi-Newton method is
based on the Levenberg-Marquardt (LM) and the Broyden-Fletcher-
Goldfarb-Shanno (BFGS) algorithms [99,100] that minimize the mean
square error whereas the LM algorithm locates the minimum of an
input data that is expressed as the sum of squares of non-linear real-
valued functions [101]. Both exhibit a memory overhead issue due to
the gradient and Hessian matrix that needs to be calculated [102] and
this is especially a disadvantage for very large networks [103]. BFGS
uses the Newton's method based on a hill-climbing optimisation
approach that seeks a stationary point of (twice continuously differ-
entiable) function. This has good performance for non-smooth optimi-
zations [104] where the Hessian matrix is not evaluated directly, but
instead it is approximated by rank-one updates specified by gradient
evaluation. The Scaled conjugate updates weights/biases based on
conjugate directions without performing line search [105] while the
gradient backpropagation with Powell-Beale restart [106] and the
Fletcher-Reeves update [96] is able to train the network as long as
its weight, input and transfer functions have derivatives. Resilient
backpropagation [107] is generally fast, requires modest memory and
does not store the updated values of weight/bias, while the gradient
descent with momentum and adaptive learning rate (traingdx) [107]
combines adaptive learning with momentum training where momen-
tum coefficient is included as a training parameter [103]. Others use
one-step secant (trainoss) [108] and Bayesian regulation (trainbr)
function. One-step secant faces a smaller computation overhead,
without storing the Hessian matrix but rather assumes it at each
iteration as a compromise between the quasi-Newton and gradient
algorithm [108] while Bayesian regularization (trainbr) [109,110] uses
the Jacobian matrix to update weight/biases to attain the best general-
ization of the input/target dataset.
Considering the black-box nature of an ANN model, a primal task
for modelers is to determine the appropriate transfer function as this is
not a known a priori. A series equations, F(.) available in MATLAB
toolbox can be trialed [111]:
F x
x
Tangent Sigmoid ⇒ ( ) =
2
1 + exp(−2 )
− 1
(5.1)
F x
x
Log Sigmoid ⇒ ( ) =
1
1 + exp(− ) (5.2)
F x
x
x
Soft Max ⇒ ( ) =
exp ( )
sum (exp ( )) (5.3)
F x i xHard − Limit ⇒ ( ) = 1, f > 0 (5.4)
F x x if xPositive Linear ⇒ ( ) = ≥ 0, or 0 otherwise (5.5)
F x x xTriangular Basis ⇒ ( ) = 1 − abs ( ), if − 1 ≤ ≤ 1,
or 0 otherwise (5.6)
F x xRadial Basis ⇒ ( ) = exp(− )2
(5.7)
F x i xSymmetric Hardlimit ⇒ ( ) = 1, f ≥ 0 (5.8)
F x i x
F x x i x
F x i x
Saturating Linear ⇒ ( ) = 0, f ≤ 0,
⇒ ( ) = , f 0 < ≤ 1
⇒ ( ) = 1, f ≥ 1 (5.9)
F x i x
F x x i x
F x i x
Symmetric Saturating Linear ⇒ ( ) = −1, f ≤ −1,
⇒ ( ) = , f − 1 < ≤ 1,
⇒ ( ) = 1, f > 1 (5.10)
where x is the predictor dataset analysed in accordance with the
function F(x) that is able to map the predictive features to create a
hidden layer weight for the suitable model.
2.3. Multiple linear regression
To evaluate the veracity of the ANN model, multiple linear
regression (MLR), a statistical technique that examines the cause and
effect relationship between objective (y ≡ G) and predictor variables
(x), was employed. MLR is an extension of the simple regression model
to the case of multiple predictors where the goal is to deduce a model to
be able to explain as much as possible the variations in the predictor
dataset to determine their corresponding regression coefficients. MLR
ensures that the predictive model leaves as little variations as possible
due to the unexplained "noise" in the predictor data. For N observa-
tions for k predictor variables, an MLR model takes has a regression
Fig. 1. ANN architecture adopted for forecasted monthly and seasonal global incident
solar radiation (G).
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
831
equation of the form [112,113]:
Y C β X β X β X= + + + ... + k k1 1 2 2 (6)
where Y (N ×1) is a matrix of objective variable (G), X (N × k) is a
vector of predictor variable(s), C is the y-intercept and β is the multiple
regression coefficient for each regressor variable(s) [49,114]. Note that
the magnitude of β for each predictor variable is estimated through
least squares (e.g., [115,116]).
For forecasting purposes, the multiple linear equation is fitted to a
model with a set of Y and X matrix in the data training period. Next, the
fitted MLR model, by virtue of its coefficients and the y-intercept, are
used to generate the forecasts of Y values with an additional set of X
values in cross-validation (or testing) period. For more details on MLR
modelling process, readers can refer to the work of Draper and Smith
[113] and Montgomery and Peck [112].
2.4. Auto-regressive Moving Integrated Average Model
This study has also adopted the Auto Regressive Integrated Moving
Average (ARIMA) model that operated via a set of (univariate)
predictor data partitioned into an input/target subset to validate the
ANN model. As an ARIMA model uses its own time-lagged information
and the respective model errors, it has an ability to identify the complex
patterns in the original G data, and therefore can be advantageous
when multiple predictor data in other models (e.g., ANN or MLR) are
not available [117].
An ARIMA modelling process is governed by parameters (p, q, d)
with p as the number of autoregressive terms, d as the number of non-
seasonal differences and q as the number of lagged errors. The steps in
developing an ARIMA model involve model identification, estimation
and forecasting. An ARIMA (p, q, d) process is thus defined as [117]:
Ψ B B Y δ θ B( ) (1 − ) = + ( )p
d
t q εt (7)
where Yt is the original predictor dataset, εt is the random perturbation
(white noise) with a zero mean and covariance and constant variance),
B is the backshift operator, δ is the constant value, ψp is the
autoregressive parameter of order p, θq is the moving average para-
meter of order q and d is the differencing order used for the regular or
non-seasonal part of the series.
In model identification, differencing parameter (d) should be
finalized by autocorrelation and partial autocorrelation that check their
‘tailing off’ trend to confirm whether a differencing is necessary in case
of non-stationary dataset. p and q terms are identified for ‘trial’ models
by analyzing maximum likelihood estimation (MLE) that determines
the parameters that maximize the probability of obtaining data using
least squares. The maximum log likelihood, that is, the logarithm of the
probability of observed data coming from estimated model is used
when finding parameter estimates. Akaike's Information Criterion
(AIC), is used to establish ARIMA model, considering the magnitude
of MLE and the variance and correlation coefficients collectively
assessed in training data viz [118]:
AIC L p q k= −2 log( ) + 2( + + + 1) (8)
where L is the log likelihood of data, k=1 if c ‡0 and k=0 if c =0. The
last term in brackets is the number of parameters (including the
variance of the residual).
3. Materials and methods
3.1. Study region and climate data
The study sites are located in the ‘sunshine State’ of Queensland,
Australia's second largest state covering 1.9 million km2
land and is
home to 4.5 million citizens. In this state, there is an urgent need for
mapping the sustainability of solar energy projects [67,119]. A state
level effort shows that solar energy is an integral focus of sustainable
energy projects through Queensland Office of Clean Energy (OEC).
Solar Bonus Scheme, for example, contributed to 30,000 customers
installing rooftop photovoltaic system. Several initiatives require
quantitative assessment of solar radiation (e.g., package valued at A
$115 million to fund Virtual Solar Power Station of 500 MW in towns
and communities, the A$60 million Solar Hot Water Rebate Scheme
worth A$600–1000 per solar hot water installations, the A$5.8 million
photovoltaic systems for 420 kindergartens, the A$9.9 million Solar
Sport and Communities initiative to support sports clubs and commu-
nity organizations and the A$35.4 million for Kogan Creek Solar
Thermal systems installation). The A$60 million in Solar and Energy
Efficiency in State Schools program and the State's provision of A$5
million for Federal Government's Solar Cities Program are noteworthy
investments in (non-metropolitan) regions [119]. Thus, the develop-
ment of a solar model especially in remote Queensland, is a justified
research endeavour.
Fig. 2 shows the seven groups of training and cross-validation study
sites (named Group 1; Group 2; Group 3; Group 4; Group 5; Group 6
and Group 7) that utilised data for forecasting the monthly and
seasonal global solar radiation (G). As outlined in Section 2.1 the
primary predictor dataset for these stations were acquired from the
MODIS Terra sensor of the NASA-satellites while the cross-validation
(test) data were acquired from the Scientific Information for Land
Owners (SILO) archives [120]. It is noteworthy that for data quality
assurance purposes, the Australian Bureau of Meteorology (BOM) that
has sourced the SILO data, continually assesses the reliability of its
data networks to ensure an accurate, effective and cost-efficient
mechanism is in operation [121]. In principle, the SILO database
was developed by Queensland Department of Environment and
Resource Management from observational records. The missing values
were interpolated in accordance with statistical techniques [122–124].
Table 1 lists the (two) training and (one) cross-validation sites
allocated to each of the seven groups, constructed from 21 sites, spread
in regional Queensland (Fig. 2). The LST data were acquired for the
period 2012–2014, which were partitioned into the first two years
(training) and the remainder one year for cross-validation (testing)
purpose. In terms of the agreement between the LST and G values for
training and cross-validation, the percentage difference was ≈0.41
(lowest) to 1.27% (highest) and ≈0.15 (lowest) to 5.41% (highest),
respectively, (Table 1) when individual groups were analysed. The
140o
E 144o
E 148o
E 152o
E
o
S
o
S
o
S
o
S
28
24
20
16
12o
S
Longitude
edutitaL
G1
G2
G3
G6
G4
G7
G5
Fig. 2. The seven groups of training and cross-validation study sites (Group 1; Group 2;
Group 3; Group 4; Group 5; Group 6 and Group 7) for forecasted monthly and seasonal
global solar radiation (G). Each group has 3 nearby site-specific data with data for 2 study
sites are used for model development (2012–2013 data) and data for 1 study site (2014)
are used for cross-validation.
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
832
difference between the minimum, maximum, mean and standard
deviation of the training (LSTT) and cross-validation (LSTCV) dataset
for all seven group of stations was ≈0.68%, 0.97%, 6.79% and 4.96%,
respectively (Table 2).
Fig. 3(a) plots a monthly cycle of the LST data averaged for the 14
out of 21 study sites used in model development and the G at ground
level for the seven study sites used for the model cross-validation.
Fig. 3(b) shows a scatterplot of LST versus G. It is noticeable that the
monthly pattern in the LST data follows closely the trends observed in
respective G values where a correlation coefficient of r2
=0.8373 is
evident. However, when only the LST data for cross-validation study
sites are compared (Fig. 4a), r2
=0.787 is obtained whereas the
correlation of the G and LST data for cross-validation study sites yield
an r2
value of ≈0.836 (Fig. 4b). The statistics confirmed that there is a
high degree of statistical agreement between satellite-derived and
ground-based LST for training and cross-validation study sites where
≈88.7% of statistical variance observed in G data is explained by the
variance in satellite-derived (land-surface) LST. Although the compar-
ison is grounded on linear assumptions, it does provide first order
justification on the suitability of LST to be used as a predictor variable
for forecasting solar radiation.
3.2. Predictive model development
To develop a robust ANN-based forecasting model, a cardinal task
was to optimise the architecture of the model to utilize the cause-and-
effect relationships between the inputs and objective variable [91].
Unlike previous works (e.g., [7,15,38,49]) where a predictive model
was developed using the data partitioned into the input/target subsets
for a site where the forecasting was also performed, in this paper, the
ANN and MLR models were trained from all data pooled in a single
matrix (i.e. for all 14 study sites over 2012–2013) but the final model
was applied to simulate the G values independently of this set (over
2014) for seven cross-validation (test) sites located in close proximity
to the two training sites in each cross-validation group (see Table 1). It
is noteworthy that the use of only two years of predictor data allowed
an evaluation of the parsimonious nature of the models to validate
whether they accomplished a desired accuracy level. Application of a
globally trained data for model development rather than individually-
trained models for each site can be practically useful in forecasting G at
a site where a satellite footprint may not be available but the predictive
features can be identified from nearby site(s) of similar climate
(Table 1). As a noteworthy point, the use of pooled data to achieve a
universal model rather than a series of models per site, also ensured a
robust model for spatial application of the approach. The scaling of
inputs was also performed to avoid numeric issues caused by data
attributes or large fluctuations [125], as a normalization between 0 and
1:
x
x x
x x
=
( − )
( − )
normalized
min
max min (9)
where x= any datum (input or output), xmin= minimum value of the
entire dataset, xmax = maximum value of the entire dataset, and
xnormalized = normalized value of the datum point.
Table 1
Geographical characteristics of study sites with monthly mean satellite-derived land-surface temperature (LST) and solar radiation (G) for training (2012–2013) and cross-validation
(2014). Distance between training/cross-validation sites in each group is indicated.
Group Training & cross-validation Sites Location Altitude (m) LST (K) G (MJ m−2
day−1
)
Training Cross-Validation Training Cross-Validation
2012–2013 2014 2012–2013 2014
1 Gayndah Flume TM ID 39323 [1.3 km] 151.60°E, 25.62°S 85.0 302.43 19.50
Gayndah P Office; ID 39039 [0.5 km] 151.61°E, 25.63°S 107.5
Cross-Validation: Gayndah ID 39191 151.61°E, 25.63°S 85.0 306.25 19.33
2 Emerald Radar AL; 35146 [6.9 km] 148.24°E, 23. 55°S 188.0 308.70 20.60
Emerald Radar; 35277 [6.9 km] 148.24°E, 23.55°S 187.0
Cross-Validation: Emerald Airport ID 35264 148.18°E, 23.57°S 189.4 311.80 20.57
3 Tambo P Office; ID 35069 [0.22 km] 146.26°E, 24.88°S 395.4 308.60 21.02
Tambo Station; ID 35072 [2.2 km] 146.28°E, 24.89°S 400.0
Cross-Validation: Tambo ID 35284 146.26°E, 24.88°S, 387.0 311.74 20.75
4 Blackall Airport; ID 36034 [3.1 km] 145.43°E, 24.43°S 291.0 309.74 21.40
Blackall Township; ID 36143 [1.1 km] 145.47°E, 24.42°S, 284.0
Cross-Validation: Blackall ID 36155 145.46°E, 24.43°S 276.0 311.59 21.44
5 27 Mile Garden; ID 44193 [2.6 km] 146.42°E, 26.08°S 223.6 306.86 20.95
27 Mile Garden; ID 44201 [34.1 km] 146.42°E, 26.08°S 232.6
Cross-Validation: Barradeen ID 44204 146.42°E, 26.06°S 339.0 308.12 20.33
6 Charleville Aero; ID 44021 [2.199 km] 146.26°E, 26.41°S 301.6 307.05 20.92
Charleville; ID 44156 [0 km] 146.24°E, 26.40°S 287.8
Cross-Validation: Charleville TM ID 44205 146.24°E, 26.40°S 288.0 307.37 20.15
7 St George; ID 043053 [2.263 km] 148.56°E, 28.05°S 200.0 306.70 20.51
St George TM; ID 043104 [1.545 km] 148.57°E, 28.07°S 186.0
Cross-Validation: St George Airport 043109 148.59°E, 28.05°S 198.5 304.22 19.46
Table 2
Descriptive statistics of monthly predictor (LST) and objective (solar radiation, G)
variable averaged for all 7 groups.
Statistical property LSTT LSTcv Gcv
Minimum 291.81 K 293.82 K 12.90 W m−2
day−1
Maximum 322.32 K 325.47 K 28.00 W m−2
day-
Meana
0.503 0.471 0.486
Standard Deviation 0.275 0.262 0.285
Skewness −0.167 −0.076 0.020
Flatness −0.982 −0.965 −1.260
a
denotes normalized property between [0,1] to allow comparisons between LST and
Gcv. LSTT: land surface temperature for training, LSTcv – land surface temperature for
cross-validation, and Gcv – global solar radiation for cross-validation.
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
833
All predictive models developed in MATLAB software using ‘pooled’
LST data (with latitude, longitude, altitude and month) for these
stations (i.e., Gayndah Flume TM, Gayndah Post Office, Emerald
Radar Al, Emerald Radar, Tambo Post Office, Tambo Station,
Blackall Airport, Blackall Township, 27 Mile Garden, 27 Mile Garden
TM, Charleville Aero, Charleville, St George, St George TM). The
models were cross-validated at the test sites (i.e., Gayndah, Emerald
Airport, Tambo, Blackall, Barradeen, Charleville TM, St George
Airport). For the MLR model development, a similar process was
adopted to acquire the regression coefficients and y-intercepts for the
month, altitude, latitude, longitude and LST data. Then the model was
applied at seven cross-validation sites (i.e., Gayndah, Emerald Airport,
Tambo, Blackall, Barradeen, Charleville TM, St George Airport),
resulting in a single predictive model trained using all forecasting
algorithms (Table 1).
To determine the inputs for the ANN/MLR models, cross correla-
tion between the LST and G data for all 14 training stations were
examined (Fig. 5). Specifically, the cross correlations measured the
statistical similarity between inputs (x) and shifted (lagged) copy of
output (G) as a function of the particular lag. For any discrete signal,
the correlation between xi =(x1, x2… xM – 1) and y=(y1, y2… yN – 1) is:
∑ϕ x k M N= , = −( + 1), ... ,0, ... ,( − 1)xy k
j k
M k N
j k,
= max(0, )
min( −1+ , −1)
−
(10)
where cross correlation coefficient, rcross is:
Fig. 3. (a) Monthly cycle of main predictor (satellite-derived land-surface temperature,
LST) averaged for the group of 14 study sites used in model development phase (right
axis) and the measured G at the ground level for 7 cross-validation sites (left axis). (b)
Scatterplot of LST versus G.
Fig. 4. Comparison of the normalized satellite-derived LST with ground-based G in training (T) and cross-validation (CV) sets. (a) LSTcv versus LSTT. (b) Gcv versus LSTCV. Note:
Training data for stations in each group was averaged over monthly period and then plotted with respective cross-validation data and data were normalized within [0,1].
Fig. 5. Cross-correlation coefficient (rcross) between G and predictor variable (LST) in
training period (2012–2012) for 7 group of stations. (a) Monthly, (b) Seasonal. 95%
confidence interval for rcross is indicated in blue and only positive rcross values are plotted.
Note: Green circles indicate statistically significant lags.(For interpretation of the
references to color in this figure legend, the reader is referred to the web version of
this article.)
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
834
r t
ϕ t
ϕ ϕ
( ) =
( )
(0) (0)
cross
xy
xx yy (11)
r t( )cross is bounded by [−1, 1] with 1 indicating that both time-series
have an exact shape (although amplitudes can vary) while −1 indicates
that both time-series have the same shape but possess opposite signs.
The N and M are the lengths of the predictor and predictand data,
respectively. When the value of r t( )cross =0, both signals are uncorre-
lated but if r t( )cross ≥0.70, a good match between the two signals can be
identified.
Evidently, the LST versus the G data at zero, 1 and 2 lags were
statistically significant at 95% confidence interval (Fig. 5a) for the
monthly, and the LST versus the G data at zero lag were statistically
significant for the seasonal forecasting horizons (Fig. 5b).
Subsequently, for the monthly forecasting horizon, the ANN model
was developed using the LST (t), month, latitude, longitude and
altitude with lagged combination, LST(t – 1), LST(t – 2), LST(t – 3)
and LST(t – 4). It is noteworthy that, while a maximum lag of 2 was
significant, an additional predictor data with two lags (LST(t – 3) and
LST(t – 4)) were also considered as the predictor variables to check if
an improvement in the forecasting accuracy was attained (Table 3a–b).
For seasonal forecasting, both models were developed using the LST (t)
data and the respective supplementary variables (Table 3c). A total of
45 ANN models with the LST (t) and seven models with the LST (t – n)
predictors (where n varied from 1 to 4 for different lags) for monthly
(Table 3a–b) and nine models for seasonal forecasting were con-
structed. All models were optimized by alternately testing the training
algorithm and hidden transfer function in different combinations, as
detailed in Table 3(a–b), which are also stated in Eqs. (5.1–5.10).
The ANN model had a three-layer network designed with input
(where predictors were fed), learning (where transfer function was
applied to extract features to formulate a model with lowest mean
square error) and the output space (with forecasted G) (Table 3). The
number of input neurons (x) was 8 (denoted as x1, x2, x3 …, x8) for the
monthly and 5 for the seasonal forecasting horizons where 1 neuron
was assigned for the month (for solar periodicity), between 5 and 8
neurons for the LST and the remainder 3 neurons for the latitude,
longitude and altitude as the input variables. In data-driven models,
there is not set rule to attest the best transfer function, as the predictive
features are not known apriori. Following an iterative modelling
Table 3
Development of ANN model.
(a) Monthly forecasting with LST at zero lag (n=0)
Trial
model
Algorithm Transfer
function
Model
architecture
Correlation
coefficient
Root
mean
square
error
(Input-
Hidden-
Output)
(r) RMSE
(MJ m−2
day−1
)
T1 trainlm logsig 5–18-1 0.946 1.552
T2 trainlm logsig 5–24-1 0.942 1.530
T3 trainlm logsig 5–46–1 0.932 1.709
T4 trainlm tansig 5–8–1 0.947 1.583
T5 trainlm tansig 5–26–1 0.948 1.611
T6 trainlm tansig 5–28–1 0.948 1.575
T7 trainlm tansig 5–38–1 0.945 1.538
T8 trainlm tansig 5–50–1 0.940 1.634
T9 trainbfg tansig 5–2–1 0.953 1.584
T10 trainbfg tansig 5–8–1 0.956 1.539
T11 trainbfg tansig 5–30–1 0.956 1.640
T12 trainbfg tansig 5–48–1 0.954 1.641
T13 trainbfg logsig 5–12–1 0.950 1.634
T14 trainbfg logsig 5–46–1 0.952 1.634
T15 trainrp logsig 5–42–1 0.831 2.503
T16 trainrp tansig 5–28–1 0.778 2.694
T17 trainscg tansig 5–2–1 0.955 1.648
T18 trainscg tansig 5–18–1 0.940 1.674
T19 trainscg tansig 5–30–1 0.949 1.601
T20 trainscg logsig 5–14–1 0.955 1.674
T21 trainscg logsig 5–48–1 0.943 1.795
T22 traincgb logsig 5–6–1 0.946 1.735
T23 traincgb logsig 5–38–1 0.944 1.673
T24 traincgb tansig 5–4–1 0.951 1.623
T25 traincgb tansig 5–40–1 0.947 1.695
T26 traincgf tansig 5–6–1 0.952 1.555
T27 traincgf tansig 5–28–1 0.954 1.603
T28 traincgf logsig 5–18–1 0.944 1.653
T29 traincgf logsig 5–42–1 0.939 1.768
T30 traincgp logsig 5–2–1 0.954 1.508
T31 traincgp logsig 5–32–1 0.942 1.737
T32 traincgp tansig 5–6–1 0.950 1.615
T33 traincgp tansig 5–28–1 0.946 1.674
T34 traincgp tansig 5–42–1 0.936 1.679
T35 trainoss tansig 5–14–1 0.953 1.658
T36 trainoss tansig 5–18–1 0.946 1.634
T37 trainoss tansig 5–46–1 0.951 1.603
T38 trainoss logsig 5–4–1 0.944 1.682
T39 trainoss logsig 5–26–1 0.946 1.673
T40 traingdx logsig 5–12–1 0.931 1.860
T41 traingdx logsig 5–36–1 0.926 1.769
T42 traingdx logsig 5–481- 0.948 1.699
T43 traingdx tansig 5–41- 0.947 1.741
T44 traingdx tansig 5–61- 0.941 1.688
T45 traingdx tansig 5–50–1 0.920 1.856
(b) Monthly forecasting with n-lagged LST combinations (n =1, 2, 3, 4). Note that
the '4' in (n + 4) represents the mandatory inputs defined by the altitude, latitude,
longitude and the month (Fig. 1).
Trial
model
Algorithm Transfer
function
Model
architecture
r RMSE
T1 trainlm tansig (n+4)–2–1 0.931 1.683
T2 trainlm tansig (n+4)–6–1 0.946 1.589
T3 trainlm logsig (n+4)–14–1 0.941 1.577
T4 trainscg tansig (n+4)–8– 1 0.942 1.608
T5 traincgb logsig (n+4)–10–1 0.924 1.879
T6 trainoss tansig (n+4)–16–1 0.933 1.753
T7 trainoss tansig (n+4)–18–1 0.920 2.000
(c) Seasonal forecasting with LST at zero lag (n=0).
Trial Algorithm Transfer Model r RMSE
Table 3 (continued)
(c) Seasonal forecasting with LST at zero lag (n=0).
Trial
model
Algorithm Transfer
function
Model
architecture
r RMSE
model function architecture
T1 trainlm tansig 5–2–1 0.985 1.191
T2 trainlm tansig 5–4–1 0.969 1.174
T3 trainlm tansig 5–10–1 0.981 1.040
T4 trainlm tansig 5–26–1 0.970 1.189
T5 trainlm tansig 5–38–1 0.969 1.289
T6 trainlm logsig 5–4–1 0.973 1.132
T7 trainlm logsig 5–8–1 0.969 1.162
T8 trainlm logsig 5–10-–1 0.975 1.037
T9 trainlm logsig 5–42–1 0.964 1.166
Note: Optimum model is boldfaced (red); the lags data applied for monthly
forecasting are statistically significant based on cross correlation coefficients between
the predictor (LST) and objective variable (G) while the periodicity (month), latitude,
longitude and altitude are also used as supplementary predictors in each model.
[trainbfg = BFGS quasi-Newton, trainrp = resilient, trainscg = scaled conjugate
gradient, trainlm = Levenberg–Marquardt, traincgb = conjugate gradient BP with
Powell-Beale restarts, traincgf = conjugate gradient BP with Fletcher-Reeves update,
trainoss = one-step secant, trainbr = Bayesian regulation, traingdx = gradient descent
with momentum and adaptive learning are the machine learning algorithms with
tansig = tangent sigmoid and logsig = logarithmic sigmoid are the hidden transfer
functions].
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
835
process, we trialed several transfer functions (Eqs. (5.1–5.10)) and
learning algorithm to optimise the ANN model's accuracy (Table 3a-c).
In each trial, the number of neurons in hidden layer was increased
at an interval of one, and a subsequent adjustment in model was
considered. The nearly optimal neurons with relevant transfer function,
determined by mean square error, which was intrinsically disparate for
each combination, was adopted. As an additional measure, correlation
coefficient (r) of trained models was noted to verify the model. In
accordance with lowest RMSE (≈1.577 MJ m−2
day−1
) and high r value
(≈0.941), Model T3 with n-lagged combinations of the LST data
executed with the Levenberg–Marquardt (LM) algorithm, logarithmic
sigmoid (Eq. (5.2)) and linear output function with architecture of
n−14-1 was adopted for monthly forecasting horizons. For seasonal
forecasting horizon, the LM training algorithm, logarithmic sigmoid
and linear functions with an architecture of 1–10-1 was established.
For an MLR model, regression coefficients and the y-intercept for the
LST, month, latitude, longitude and altitude were also determined,
which are detailed in Table 4.
An ARIMA model was developed using the ‘R’ software with its
architecture established by an iterative modelling process [91,117]
detailed in Section 2.4. Table 5 displays the ARIMA model's architec-
ture and the respective goodness-of-fit tests performed to construct the
forecasting model. The ground-based G data for two study sites in each
of the seven group over 2012–2013 were used in the model develop-
ment. The first step was to examine the stationarity of G via
autocorrelation (ACF) and partial autocorrelation (PACF) function.
Evidently, the monthly solar radiation for all seven groups was non-
stationary. The ARIMA model required an input data to have a
constant mean, variance and autocorrelation through time [91,117],
so a differencing of the data was applied to make it stationary, which
was confirmed by running the autoarima() function in R-software to
yield the lowest standard deviation/AIC and the highest likelihood (Eq.
(8)). By contrast, the seasonal data, which was already stationary, did
not require any differencing. The number of non-seasonal differences
(d) was set to 2 (for monthly) and 0 (seasonal). Following this, an
estimation of autoregressive (p) and moving average (q) terms for each
cross-validation site was performed by trailing a set of feasible values to
satisfy the training performance criteria according to the log like-
lihood/AIC in conjunction with lowest variance and highest correlation
coefficient. The model was cross-validated by generating the monthly
(12-values) and seasonal (4-values) forecasts. This required the trained
model to be applied to 2014 ‘test’ data using ‘predict ()’ function. It is
noteworthy that the training performance of ARIMA was unique for
different sites based on goodness-of-fit and the statistical test para-
meters.
3.3. Model performance criteria
Following a review of the model evaluation metrics by American
Society for Civil Engineers (ASCE) [126], two category of model
evaluation measures have surfaced: visual and descriptive statistics
(i.e., observed and forecasted data to cross-check the difference in
minimum, maximum, mean, variance, standard deviation skewness,
kurtosis) and the standardized metrics applied to the forecasts that are
validated with observed data. To establish whether the data-driven
models were qualified for forecasting global solar radiation, a range of
statistical error criterion was adopted. They were based on root mean
square error (RMSE), mean absolute error (MAE), correlation coeffi-
cient (r), Willmott's Index (WI) and relative (%) error values based on
the RMSE and MAE, with their mathematical equations written as
follows [15,127,128]:
I. Correlation coefficient (r) expressed as
⎛
⎝
⎜
⎜⎜
⎞
⎠
⎟
⎟⎟
r
G G G G
G G G G
=
∑ ( − )( − )
∑ ( − ) ∑ ( − )
i
N
OBS i OBS i FOR i FOR i
i
N
OBS i OBS i i
N
FOR i FOR i
=1 , , , ,
=1 , ,
2
=1 , ,
2
(12)
II. Root mean square error (RMSE) expressed as
∑RMSE
N
G G=
1
( − )
i
N
FOR i OBS i
=1
, ,
2
(13)
III. Mean absolute error (MAE) expressed as
∑MAE
N
G G=
1
( − )
i
N
FOR i OBS i
=1
, ,
(14)
IV. Willmott's Index (WI) expressed as
⎡
⎣
⎢
⎢
⎤
⎦
⎥
⎥
WI
G G
G G G G
d
= 1 −
∑ ( − )
∑ ( − + − )
,
0 ≤ ≤ 1
i
N
FOR i OBS i
i
N
FOR i OBS i OBS i OBS i
=1 , ,
2
=1 , , , ,
2
(15)
V. Relative root mean square error (RRMSE¸%) expressed as
Table 4
Development of MLR model for monthly and seasonal forecasting. For LST as predictor,
lag (n) varies from 0 to 4.
Model Predictor Variable
(input), x
Monthly
Forecasting
Seasonal
Forecasting
Name Coefficient Magnitude Magnitude
M1 Periodicity
(Month)
β1 0.002 −0.191
Altitude β2 0.003 0.003
Latitude β3 0.332 0.326
Longitude β4 0.220 0.205
Y-Intercept C −198.2 −194.0
LST (t) β5 0.532 0.524
M2 Month β1 −0.041
Altitude β2 0.003
Latitude β3 0.331
Longitude β4 0.220
Y-Intercept C −196.9
LST (t) β5 0.572
LST(t – 1) β6 −0.042
M3 Month β1 −0.024
Altitude β2 0.003
Latitude β3 0.307
Longitude β4 0.228
Y-Intercept C −195.6
LST (t) β5 0.585
LST(t – 1) β6 −0.074
LST(t – 2) β7 0.024
M4 Month β1 −0.102
Altitude β2 0.003
Latitude β3 0.245
Longitude β4 0.218
Y-Intercept C −171.4
LST (t) β5 0.584
LST(t – 1) β6 −0.121
LST(t – 2) β7 0.136
LST(t – 3) β8 −0.110
M5 Month β1 −0.257
Altitude β2 0.002
Latitude β3 0.054
Longitude β4 0.187
Y-Intercept C −95.69
LST (t) β5 0.562
LST(t – 1) β6 −0.152
LST(t – 2) β7 0.070
LST(t – 3) β8 0.107
LST(t – 4) β9 −0.245
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
836
RRMSE
G G
G
=
∑ ( − )
∑ ( )
× 100
N i
N
FOR i OBS i
N i
N
OBS i
1
=1 , ,
2
1
=1 , (16)
VI. Mean absolute percentage error (MAPE; %), expressed as
∑MAPE
N
G G
G
=
1 ( − )
× 100
i
N
FOR i OBS i
OBS i=1
, ,
, (17)
where GOBS and GFOR are the observed and forecasted ith
value of G,
GOBS and GFOR are the observed and forecasted mean G in cross-
validation (test) set and N is the number of datum points in the test set.
In terms of physical reasoning for the performance metrics, it is
deducible that the correlation coefficient, bounded by [0,1] where 0=
relatively poor to 1.0= perfect model, describes the proportion of
variance in observed solar radiation that is explained by the ANN, MLR
and ARIMA models [128]. The equation for r, however, is based on the
consideration of linear relationship between GOBS and GFOR and there-
fore, is limited in it capacity to provide a robust assessment since it
standardizes the observed and forecasted means and variances.
However, the RMSE and MAE are able to provide better information
about the predictive skill whereby RMSE measures the goodness-of-fit
relevant to high values whereas the MAE is not weighted towards
high(er) magnitude or low(er) magnitude events, but instead evaluates
all deviations from observed, in both an equal manner and regardless
of sign. It important to note, that while the RMSE can assess the model
with higher level of skill compared to correlation coefficient, this metric
is computed on squared differences. Thus, performance assessment is
biased in favour of the peaks and high(er) magnitude events, that will
in most cases exhibit the greatest error, and be insensitive to low(er)
magnitude sequences [128]. Consequently, the RMSE can be more
sensitive than other performance metrics due to occasional large errors
as the squaring process can yield disproportionate weights to the very
large errors [129]. To overcome this issue, Willmott's Index (WI) [130]
was computed by considering the ratio of the mean square error
instead of the square of the differences [131], providing an advantage
over the r, RMSE and MAE values. Considering the geographic
differences between the present study sites (Table 1; Fig. 2), which in
fact, can lead to differences in the distribution of solar radiation data,
the relative root mean square error (RRMSE) was also computed
[15,132] to compare and evaluate the model over geographically
diverse study sites. According to [133], a model's precision level is
excellent if the RRMSE < 10%, good if 10% < RRMSE < 20%, fair if
20% < RRMSE < 30% and poor if the RRMSE > 30%.
4. Results
In this paper, the results attained for appraising an ANN model
coupled with satellite-derived LST data for forecasting monthly and
seasonal solar radiation over a group of seven cross-validation sites in
regional Queensland are presented. The ANN model is evaluated in
respect to an MLR and ARIMA model. To establish whether an ANN
was a parsimonious model to accomplish a desired level of accuracy
with as few predictor variables as possible, an iterative modelling
process was applied to optimise the model's parameters using input
combinations, training algorithms and hidden transfer functions where
the lowest mean square error for optimum model was sought (Section
3.2). In this section, we first provide the results for monthly forecast-
ing, and then proceed with seasonal forecasting evaluation measures
based on a number of statistical performance metrics described in Eqs.
(12–17).
In order to compare directly the forecasted and observed monthly
G, Fig. 6(a–g) plots the model's error, GFOR – GOBS for each tested
month in cross-validation (2014) period. A scatterplot showing the
goodness-of-fit line and its correlation coefficient (r) to depict the
extent of agreement between GFOR and GOBS is included. There is
compelling evidence that ANN outperforms the MLR and ARIMA
models for all tested months. On a month-by-month basis, the
forecasting error is seen to exhibit a much larger magnitude for the
MLR model especially for the forecasts generated in the month of
Table 5
ARIMA with structure (p, d, q) with d = differencing, p and q = order of autoregressive (AR) and moving average (MA) term.
Group Training station ARIMA
structure
AIC Log
likelihood
Variance Correlation
coefficient
Parameters
(AR1, AR2, AR3, MA1, MA2, MA3)
(p, d, q) r AR1 AR2 AR3 MA1 MA2 MA3
Monthly forecasting
1 Gayndah Post Office /
Gayndah Flume TM
ARIMA (3, 1, 3) 112.96 −49.48 2.905 0.878 0.727 0.681 −0.918 −1.193 −0.513 0.770
2 Emerald Radar / Emerald
Radar AI
ARIMA (3, 2, 3) 111.13 −49.56 2.575 0.893 1.218 −0.108 −0.505 −2.874 2.874 −1.000
3 Tambo Post Office / Tambo
Station
ARIMA (2, 2, 3) 95.26 −41.63 1.441 0.947 1.724 −0.990 −2.873 2.873 −1.000
4 Blackall Airport / Blackall
Township
ARIMA (3, 2, 2) 106.4 −47.2 2.843 0.897 0.782 0.553 −0.828 −1.957 1.000
5 27 Mile Garden / ARIMA (3, 2, 3) 108.55 −47.27 2.464 0.921 1.373 −0.393 −0.351 −2.781 2.778 −0.996
27 Mile Garden TM
6 Charleville Aero / Charleville ARIMA (2, 2, 3) 110.78 −49.39 3.133 0.899 1.727 −0.981 −2.871 2.871 −1.000
7 St George Airport / St George
TM
ARIMA (2, 2, 3) 107.00 −47.5 2.2546 0.929 1.727 −0.984 −2.880 2.880 −1.000
Seasonal forecasting
1 As Above ARIMA (2, 0, 2) 32.70 −10.4 0.111 0.995 0.027 −0.988 −1.962 0.999
2 ARIMA (3, 0, 3) 13.60 1.20 0.008 0.999 −0.886 −0.980 −0.908 −1.516 0.799 0.091
3 ARIMA (3, 0, 2) −23.10 18.50 0.000 0.999 −0.996 −0.997 −1.808 0.998
4 ARIMA (3, 0, 2) 21.90 −4.00 0.009 0.998 −0.734 −0.915 −0.808 −1.808 1.000
5 ARIMA (3, 0, 3) 18.20 −1.10 0.018 0.997 −0.916 −0.945 −0.973 −1.181 −0.460 0.725
6 ARIMA (3, 0, 3) 10.80 2.60 0.006 0.995 −0.952 −0.956 −1.074 −0.561 0.822
7 ARIMA (3, 0, 3) 36.70 −10.40 0.102 0.998 −0.739 −0.953 −0.786 −1.337 0.017 0.526
Note: Akaike's Information Criterion (AIC) used to identify model in conjunction with log Likelihood, variance and correlation coefficient.
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
837
Fig. 6. (Left) Times-series of ANN, MLR and ARIMA model monthly forecasting errors and (right) the scatterplot of forecasted and observed monthly G. (a) Group 1 Gayndah, (b)
Group 2 Emerald Airport, (c) Group 3 Tambo, (d) Group 4 Blackall, (e) Group 5 Barradeen, (f) Group 6 Charleville, (g) Group 7 St George Airport. For each scatterplot, least square
fitting line and its respective correlation coefficient is shown. Cumulative frequency of ANN, MLR and ARIMA model monthly forecasting errors in G for the seven group of stations
pooled together. Note that the percentage for each bracket is shown in the respective error bracket.
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
838
August and September. Except for Gayndah (Fig. 6a), the ANN model
does not result in significantly large errors relative to its comparative
counterparts for the other six groups. It is important to note that the
ARIMA model is seen to generate more accurate forecasting of solar
radiation compared to the MLR model although its performance
remains inferior to ANN for all tested months. Therefore, it is evident
that the ANN model can simulate G-values with good accuracy, also
verified by higher level of agreement (i.e., r value).
In Table 6(a) we evaluate the preciseness of the ANN model in
relation to the MLR model where the results for a set of five trial
performances over the 2014 period with a combination of predictor
variable are shown. It is imperative to mention that the number of
inputs in each trial model is increased by one to yield a total of five
unique models and performances are validated using r, RMSE and
MAE. For all study sites considered, the accuracy of an ANN model
appears to have generally improved as the number predictor variables
with lagged input combinations of the LST data are fed into the
algorithm. This can be verified by the notable increase in correlation
coefficient computed between monthly observed and forecasted G and
a corresponding decrease in model's generalization error. However, for
the ANN model, the improvement in forecasting accuracy appears to
have attained an asymptotic state with no further increase in r value or
a further reduction in RMSE/MAE values with additional combination
of LST after the third lag.
In the case of an optimal ANN model (i.e., M4) applied at the
Gayndah station, there is a gradual reduction in MAE (≈1.54 to 0.82
Fig. 6. (continued)
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
839
MJ m−2
) and an increase in r (≈0.961–0.962) for predictors defined by
LST(t), LST(t – 1), LST(t – 2) and LST(t – 3) relative to single input,
LST(t) as the predictor. Similar trend is also evident for the case of
Barradeen, Charleville TM and St George stations where a total of four
predictor variables based on the original and lagged LST data are
required to attain an accurate forecasting. By contrast, when an ANN
model is cross-validated for Emerald Airport and Blackall stations, only
three predictor variables, and that for the case of Tambo site, a total of
Table 6
Evaluation of monthly forecasting models using correlation coefficient (r), root mean square error (RMSE; MJ m−2
day−1
) and mean absolute error (MAE; MJ m−2
day−1
). Note that the
best model is boldfaced (red).
(a) ANN and MLR
Model Input combinations ANN MLR
r RMSE MAE r RMSE MAE
Group1: Gayndah
M1 LST (t) 0.961 1.73 1.54 0.957 2.07 1.80
M2 LST (t) + LST (t – 1) 0.939 1.88 1.28 0.806 3.21 2.58
M3 LST (t) + LST (t – 1) + LST (t – 2) 0.949 1.55 1.18 0.801 3.17 2.50
M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.962 1.13 0.82 0.791 3.16 2.48
M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.964 1.38 1.12 0.790 3.02 2.53
Group2: Emerald Airport
M1 LST (t) 0.968 1.28 1.05 0.945 2.53 2.32
M2 LST (t) + LST (t – 1) 0.968 1.06 0.81 0.802 3.52 2.94
M3 LST (t) + LST (t – 1) + LST (t – 2) 0.965 1.04 0.78 0.793 3.55 2.95
M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.943 1.42 1.05 0.761 3.59 3.04
M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.969 1.11 0.82 0.799 3.00 2.60
Group 3: Tambo
M1 LST (t) 0.958 1.35 1.11 0.948 2.65 2.26
M2 LST (t) + LST (t – 1) 0.926 1.70 1.37 0.868 3.10 2.51
M3 LST (t) + LST (t – 1) + LST (t – 2) 0.939 1.69 1.37 0.856 3.20 2.60
M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.900 1.85 1.35 0.814 3.33 2.76
M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.958 1.39 1.00 0.823 2.99 2.61
Group 4: Blackall
M1 LST (t) 0.968 1.13 0.91 0.965 1.57 1.18
M2 LST (t) + LST (t – 1) 0.959 1.18 1.01 0.836 2.46 1.92
M3 LST (t) + LST (t – 1) + LST (t – 2) 0.975 0.93 0.62 0.828 2.53 2.01
M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.907 1.80 1.44 0.787 2.76 2.21
M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.969 1.16 0.95 0.787 2.760 2.12
Group 5: Barradeen
M1 LST (t) 0.949 1.53 1.24 0.938 1.78 1.371
M2 LST (t) + LST (t – 1) 0.942 1.77 1.33 0.881 2.29 1.839
M3 LST (t) + LST (t – 1) + LST (t – 2) 0.944 1.82 1.58 0.874 2.35 1.887
M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.953 1.31 0.98 0.853 2.50 2.014
M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.954 1.36 1.06 0.847 2.49 2.016
Group 6: Charleville TM
M1 LST (t) 0.944 1.72 1.40 0.919 2.05 1.69
M2 LST (t) + LST (t – 1) 0.940 1.83 1.43 0.814 2.78 2.31
M3 LST (t) + LST (t – 1) + LST (t – 2) 0.938 1.66 1.41 0.806 2.83 2.35
M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.961 1.42 0.95 0.773 3.03 2.46
M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.953 1.45 0.99 0.811 2.80 2.37
Group 7: St George Airport
M1 LST (t) 0.944 1.66 1.36 0.926 0.65 0.92
M2 LST (t) + LST (t – 1) 0.963 1.36 1.12 0.880 1.66 0.923
M3 LST (t) + LST (t – 1) + LST (t – 2) 0.960 1.44 1.20 0.922 1.70 1.32
M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.967 1.38 0.96 0.895 1.98 1.56
M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.952 1.16 1.01 0.899 2.01 1.52
Overall Average 0.963 1.23 1.02 0.943 1.90 1.65
(b) ARIMA
Cross-validation stations r RMSE MAE
Group1: Gayndah 0.937 1.56 1.35
Group2: Emerald Airport 0.961 1.13 0.87
Group 3: Tambo 0.945 1.57 1.35
Group 4: Blackall 0.823 2.81 2.29
Group 5: Barradeen 0.926 2.00 1.61
Group 6: Charleville TM 0.940 2.41 1.83
Group 7: St George Airport 0.933 2.09 1.40
Overall Average 0.924 1.94 1.53
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
840
five predictor variables, are necessary for accurate forecasting. In
accordance with this, the ANN model for each study site appears to
respond differently to the prescribed set of variables, which presumably
provides different types of predictive features. In agreement with
earlier studies (e.g., [15,132]), the findings suggest that the pertinent
features in the prescribed data-driven predictive models must be
harvested by trialing different combinations of variables of an appro-
priate choice (e.g., Table 6) in order to optimise the forecasting model.
It must also be noted that different level of accuracy stipulate that the
model performance are expected to scale differently for the geographi-
cally diverse sites [15] (e.g., Fig. 2; Table 1).
As a noteworthy point, the MLR model did not yield similar result
where its performance for the cross-validation sites deteriorated with
larger number of predictors and a successive addition of the lagged LST
data (Table 6). Take for example, Gayndah: the correlation coefficient
between the observed and forecasted monthly G decreased from
≈0.957 to 0.790 and root mean square and mean absolute error
increased from ≈2.07–3.02 MJ m−2
and ≈1.80–2.53 MJm−2
, respec-
tively, as noted for models (M1 and M5) when input variables were
increased. While the exact cause of this is not known, it does indicate
that the MLR model has a different mechanism to model the solar
radiation data compared to an ANN model. In comparison to the ANN
model, the MLR model exhibits an inferior performance for all tested
study sites. In fact, the forecasting error generated by the MLR model is
also significantly larger with a smaller degree of statistical correlation
between the forecasted and observed G, which confirms the superiority
of the ANN over the MLR model.
Table 6b lists the performance of the ARIMA model which in fact,
operates as a univariate (i.e., single-input) model. In our study, the
ARIMA model has utilised the onsite G data in the training and
evaluation phase. In this regard, the ARIMA model is developed by
using two training study sites in the respective group (Table 1) with
only one set of predictor (G) variable over 2012–2013 and it is then
applied to generate the forecasts of solar radiation for the remaining
3rd (test) site in particular cross-validation group. For the seven group
of study sites that are tested, the performance of the ANN model far
exceeds that of the ARIMA model. More precisely, the metrics are
clearly distinguishable, with r values that are larger by ≈2.25–18.46%
and a correspondingly lower RMSE/MAE value by ≈8.55–67.01% and
≈10.67–72.92% across the different cross-validation sites. In fact, the
best performance of an ARIMA model is recorded for the Emerald
Airport station where the smallest difference in MAE and RMSE
metrics in respect to the ANN model metric is evident. When averaged
over all seven sites, the root mean square and the mean absolute error
generated by the ANN model is smaller by over 35% compared to the
MLR and ARIMA models, which are inherently reflected in the
statistical correlation metric of observed and forecasted global solar
radiation (see Table 6).
Whilst the correlation coefficient and RMSE are useful goodness-of-
fit statistics, the metrics are regarded as suboptimal measures with an
inability to compare models between geographically (and climatologi-
cally) diverse sites. This is because these metrics are based on the non-
normalized values of the forecasting error, and hence may carry little
meaning where different distributions of predictor and objective
variable(s) are evident [132]. In the retrospective outcome of model
evaluation using r and RMSE, it is perceived that the squaring of the
relatively larger forecasting errors can over-weigh the influence of the
smaller errors based on the sum-of-squared errors used to calculate
these metrics [134]. To overcome this insensitivity and over-domi-
nance of large over small errors, an assessment of the ANN model with
its comparative model is undertaken using the normalized RMSE and
MAE where the respective percentages are used. Table 7 shows the
Willmott's Index (WI) [134,135] and relative percentage of RMSE and
MAE [15,132]. In accordance with earlier deduction (Table 6), in this
case we only show the best performing ANN and MLR model with
respective combination of the LST data in the optimal case. For the
case of an ARIMA model, the zero-lag G values are employed as the
predictor with an appropriate differencing and an auto-regressive and a
moving average term added to create the optimal model (Table 5a).
An interpretation of forgoing results is relatively straightforward.
The ANN model dramatically outperformed the MLR and ARIMA
models, which is confirmed by marked difference in the normalized
metrics. For Gayndah, Emerald Airport and Tambo stations, the
optimal performance is attained by an ANN model followed by an
ARIMA and then the MLR model (i.e., WI ≈0.959, 0.962 and 0.93
compared to 0.931, 0.960 and 41 (ARIMA) and 0.906, 0.834 and 0.878
(MLR)). Correspondingly, the relative RMSE are ≈5.70%, 5.02% and
6.35% (ANN), ≈8.08%, 5.53% and 7.65% (ARIMA) and ≈10.72%,
12.33% and 12.78% (MLR). While performance of the ANN model
demonstrates high level of superiority over the MLR and ARIMA
models for Blackall, Charleville TM and St George Airport stations, the
MLR model is preferable over an ARIMA model. That is, the MLR
outperforms the ARIMA model by a large margin with r≈0.877–0.927
compared with ≈0.798–0.894. An ANN model is also found to be
superior when the relative percentage MAE are compared (Table 7). It
is interesting to note that for all study sites, the univariate ARIMA
model that utilised the G data at the respective sites is superior to the
MLR model, indicating that it is a preferable option over the multi-
variate (MLR) model although the ANN by all performance measures
remains superior to ARIMA. In terms of site-averaged performance,
the ANN model is found to yield the highest Willmott's Index (≈0.954),
contrasted by a remarkable margin the mean value for the MLR model
(≈0.899) and the ARIMA model (≈0.848). The site-averaged RRMSE
for the ANN model is notably lower, with 5.85% compared to the MLR
(10.23%) and ARIMA models (9.60%). This shows that forecasts
generated by the MLR model are significantly out of phase with the
observations, as the relative error in fact, exceeded the recommended
10% threshold for an excellent model (e.g., [133]).
To establish a complete picture of forecasting skills of the data-
driven models, Fig. 6 shows a frequency plot where relative predictive
errors over monthly horizon in error bracket of ± 0.5 MJ m−2
on
abscissa is indicated. For every bar, the percentage of datum points
within the cross-validation (test) set for the seven group of study sites
pooled together is indicated. There is sound evidence that the ANN
model generated the largest proportion of forecasting errors (≈39%) in
the smallest ( ± 0.5 MJ m−2
) error range whereas the ARIMA model
recorded about 25%, and the MLR model generated only 15% of all
errors in this error bracket. Moving rightward to the abscissa; the next
error bracket (0.50 < error ≤1.00) captured 27% of all errors, which
contrasts the ARIMA and MLR models with about 21% and 17%,
respectively. In response to the greater proportion of errors that are
being recorded in smallest error bracket, the ANN model produces less
than 35% of errors in error magnitude larger than 1.0 MJ m−2
. By
contrast, the proportion of errors amounted to about 53% (for ARIMA)
and about 69% (for MLR). In concurrence with earlier results, it is
deducible that the ANN model produces a much smaller proportion of
forecasting errors that fall in the larger magnitude bracket, which in
fact, typifies that a more accurate level of forecasting is achieved by this
method.
Fig. 7 shows a boxplot of the model's forecasting error for seven
groups pooled together where the least and most accurate results (i.e.,
Group 5: Barradeen and Group 4: Blackall) is included. In each
boxplot, the outliers (indicated by +) that represent extreme values of
the forecasting error within the cross-validation set and their upper
quartile, median and lower quartile values are also indicated. Overall,
the boxplot provides justification that the distributed errors for the
ANN model acquire a much lesser spread with a correspondingly
smaller magnitude of the quartile statistics and median values com-
pared to the MLR and the ARIMA model. In fact, the net shift in the
monthly forecasting error values towards larger magnitude for the
ARMA and MLR models are clearly consistent with previous result
(e.g., Fig. 6; Table 7). While the ANN model remains the optimal model
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
841
in terms of a clustered error distribution towards smaller magnitudes,
the MLR model performance is significantly inferior to the ARIMA
model for all errors pooled together and for errors recorded at the
Barradeen study site. In the case of most accurate forecasting results
obtained for the Blackall station, the ARIMA model is found to be
similar in its ability to forecast the solar radiation compared to the
MLR model, although the ANN model is seen to outweigh the
performance of both forecasting models.
To check the predictive ability of the ANN model coupled with the
LST data for seasonal forecasting of solar radiation, we show the
normalized metrics in terms of the Willmott's Index, relative root mean
square error and mean absolute error. It is imperative to state that the
Table 7
Evaluation of monthly forecasting models using normalized errors: Willmott's Index (WI) and relative RMSE and MAE (%).
Cross-Validation Site Inputs ANN Input MLR Input ARIMA
WI RRMSE RMAE WI RRMSE RMAE WI RRMSE RMAE
Group1: Gayndah LST (t) 0.959 5.70 4.44 LST (t) 0.906 10.72 9.84 G(t) 0.931 8.08 7.51
LST (t – 1)
LST (t – 2)
LST (t – 3)
Group2: Emerald Airport LST (t) 0.962 5.02 3.52 LST (t) 0.834 12.33 12.23 G(t) 0.960 5.53 4.09
LST (t – 1)
LST (t – 2)
Group 3: Tambo LST (t) 0.953 6.35 4.49 LST (t) 0.878 12.78 11.40 G(t) 0.941 7.65 6.38
LST (t – 1) LST (t – 2)
LST (t – 3) LST(t – 4)
Group 4: Blackall LST (t) 0.971 4.37 2.78 LST (t) 0.942 7.46 5.71 G(t) 0.382 13.3 9.92
LST (t – 1)
LST (t – 2)
Group 5: Barradeen LST (t) 0.946 6.48 4.71 LST (t) 0.927 8.54 6.83 G(t) 0.914 9.86 7.66
LST (t – 1)
LST (t – 2)
LST (t – 3)
Group 6: Charleville TM LST (t) 0.945 7.29 4.54 LST (t) 0.888 10.45 8.88 G(t) 0.907 12.4 9.39
LST (t – 1)
LST (t – 2)
LST (t – 3)
Group 7: St George Airport LST (t) 0.942 5.79 4.73 LST (t) 0.917 9.33 8.41 G(t) 0.910 10.4 6.62
LST (t – 1)
LST (t – 2)
LST (t – 3)
Overall 0.954 5.85 4.17 0.899 10.23 9.04 0.848 9.60 7.36
Fig. 7. Boxplot of the distribution of monthly forecasting error generated by ANN, MLR and ARIMA model for monthly G-forecasting. (a) All seven groups of stations pooled together,
(b) least accurate forecasting results (Group 5), (c) least accurate forecasting results (Group 4).
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
842
primary predictor used in seasonal forecasting was the LST (t) data
(i.e., with zero lag), together with supplementary input parameters
defined in terms of latitude, longitude, altitude and the periodicity for
the ANN and the MLR models. This decision followed the cross-
correlation results (Fig. 5) where lagged combinations of the LST
values were found to be largely insignificant, and therefore, were not
applied in seasonal forecasting.
In agreement with the monthly forecasting horizons, the ANN
models tested for seasonal forecasting also demonstrated significantly
better results than the MLR and ARIMA models for all cross-validation
sites (Fig. 8a-c). This has been exemplified in the scatterplots of the
forecasted data for all stations pooled together, including the goodness-
of-fit regression line, y= mx + C based on GOBS and GFOR data. Note
that r2
and m close to 1.00 and C close to 0 should be attained for a
perfect forecasting model. Despite some degree of scatter between the
observed and forecasted G data, a reasonably linear fit is evident, albeit
with different levels of accuracy for the three models considered. That
is, r2
≈0.951 for the ANN, 0.896 for the MLR and 0.688 for the ARIMA
models whereas the gradient of regression line is ≈1.06 (ANN), 0.985
(ARIMA) and 0.962 (MLR) model. Consequently, the statistics provide
first order justification of the superior performance of the ANN over the
MLR and ARIMA models applied for seasonal forecasting of global
solar radiation.
In Table 8 we evaluate seasonal forecasting models using normal-
ized error metrics in terms of the Willmott's Index and the relative
(percentage) RMSE and MAEvalues. It is immediately obvious that WI
for the ANN model is the highest with a range of 0.944–0.976 and the
relative RMSE/MAE is the lowest over the range of 4.60–5.80% and
0.03–4.67%, respectively, compared with those of the MLR and
ARIMA models. In accordance with [133], the less than 10% RRMSE
accedes to an excellent performance of the present ANN model. By
contrast, the MLR model yielded an error of RMSE≈5.23–11.65%
while that of the ARIMA model was between 2.93–26.50%.
In the foregoing research context, it is attested that the ARIMA
model forecasts are significantly in error especially for the case of
Emerald Airport (Group 2), Blackall (Group 4) and Barradeen (Group
5) study sites where a relative RMSE ≈26.5%, 9.54% and 9.73%,
respectively, are attained. While further exploration of this result is
needed, it can be construed that the relatively poor performance of the
ARIMA model is associated with a less number of datum points (i.e., 4
per year) utilised in the model development phase that may lead to an
insufficient number of predictive features. Nevertheless, the RRMSE
generated by the ANN model is smaller by over fivefold for the case of
Emerald Airport and is between 1.6 to two fold for the case of Blackall
and Barradeen, respectively. Similar deduction is made when the
correlation coefficient and RMAE values are checked. When the errors
averaged over all study sites, the performance of the ANN model
remains remarkably superior to that of the MLR and ARIMA models
(Table 8).
To draw more solid conclusive arguments about the forecasting skill
of the ANN model, one can inspect Fig. 9, where a bar graph is used to
depict the relative forecasting error for the case of the most and the
least accurate study site on a season-by-season basis. With a few
exceptions, the ANN model yields the smallest error in forecasting solar
radiation. Importantly, there is a dramatic difference in the relative
error for MAM and JJA for Charleville station. Notwithstanding this,
the ARIMA model performs better than the ANN and MLR model in
the spring (SON) season (for Charleville), and the autumn (MAM
season for Blackall). However, for all other seasons, the ANN model
remain the preferred model (with the lowest relative error).
Fig. 10 displays a bar graph of seasonal data. Evidently, the relative
percentage error accumulated across these seasons shows a dramati-
cally superior result for the ANN model. However, the ANN model
error encountered in predicting the global solar radiation is at par with
the MLR model, and this is the largest for the summer (DJF) season. In
spite of this, the forecasting errors recorded for the autumn (MAM),
Fig. 8. Scatterplot of ANN, MLR and ARIMA model performance for seasonal forecasting of G using all station data pooled together.
Table 8
Evaluation of seasonal forecasting models using normalized errors: Willmott's Index (WI) and relative RMSE and MAE (%).
Cross-validation site ANN MLR ARIMA
WI RRMSE RMAE WI RRMSE RMAE WI RRMSE RMAE
Group1: Gayndah ID 039191 0.959 4.92 4.67 0.844 9.14 9.06 0.943 7.41 0.03
Group2: Emerald Airport ID 035264 0.957 4.88 2.14 0.622 10.93 11.0 0.075 26.5 3.74
Group 3: Tambo ID 035284 0.952 5.70 3.46 0.726 11.65 10.76 0.988 2.93 0.89
Group 4: Blackall ID 036155 0.944 5.80 0.03 0.956 5.23 4.00 0.884 9.54 −0.90
Group 5: Barradeen ID 044204 0.971 4.76 2.90 0.920 7.09 5.18 0.883 9.73 4.41
Group 6: Charleville TM ID 044205 0.976 4.60 0.18 0.909 7.96 3.32 0.935 7.23 5.42
Group 7: St George Airport ID 043109 0.969 4.81 2.29 0.934 7.07 2.14 0.913 8.39 3.14
Overall 0.961 5.07 2.24 0.844 8.44 6.49 0.803 10.25 2.39
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
843
winter (JJA) and the spring (SON) seasons remain significantly
smaller, especially in respect to the MLR and ARIMA models for the
autumn (MAM) season. Accordingly, it can be argued that the ANN
model integrated with satellite-derived LST data as its predictor
variable is a preferred model for seasonal forecasting of solar radiation.
5. Discussion: limitations and opportunity for further
research
To address environmental, health and economic issues associated
with climate shift by promoting more sustainable energy projects, there
is a nationwide surge in solar investments [3,6,67,119,136], consistent
with the global trends. A desire to attain predictive models incorpo-
rated with remotely sensed data for forecasting global solar radiation is
driving a lot of research momentum among scientists, who are
exploring satellite data-coupled predictive models [6,136] over deter-
ministic models with ground-based (on-site) data [22,23]. This precise
aim of this paper was thus to model global solar radiation at remote
sites in Queensland, Australia using Moderate-Resolution Imaging
Spectroradiometer land-surface temperature (LST). By incorporating
the LST data over a relatively limited period (2012–2013), this study
has designed and validated an artificial neural network (ANN) model
and evaluated its forecasting performance in respect to multiple linear
regression (MLR) and autoregressive integrated moving average
(ARIMA) models. A validation of the simulations over the 2014 period
showed the utility of the ANN model coupled with satellite-derived data
as an advancement over deterministic models that wholly rely on
ground-based variables to simulate the solar radiation [22,23,30,137].
This study, which can be adapted to a location were satellite footprint is
available, is an advancement over TMY datasets [58,59] that provide
quantitative estimates of solar radiation mostly limited to certain
ground-based measurement stations.
In spite of the superior performance, it is perceived that the ANN
model based on satellite-derived data should be trialed over a much
larger spatial area than investigated in this pilot study, as this would be
of interest to solar engineers who require quantitative evidence for
developing sustainable energy projects over continental-scales. An
application of satellite data with a potential to provide instantaneous
pixel values of solar radiation with high resolution (up to 5 km) and
hourly temporal coverage [67] can help expand the scope, relevance
and practical utility of the solar forecasting models. As an additional
merit, satellites also observe areas at different wavelengths within small
period and pixel resolution [60,61] and the estimated irradiation is
more accurate than interpolation techniques with distance to stations
that are greater than 34 km for hourly irradiation and 50 km for daily
irradiation [138]. In this paper, the ANN model coupled with satellite-
derived data was diagnostically evaluated to yield less than 6.0%
uncertainty based on root mean square error and high statistical
correlation between measured and forecasted solar radiation
(r≈0.961) averaged over cross-validation sites (Tables 7, 8). As the
forecasting error achieved was less than 10%, this accords to excellent
performance [133]. In a practical sense, the improved version of the
ANN model can be adopted for justifying whether or not solar energy
investments in a remote region are made where meteorological stations
may be absent. Consequently, one may also explore the utility of an
ANN model for site selection purposes in areas that are eyed for solar
energy investments, and addressing solar energy infrastructure and
developmental constraints [6,7,136].
In spite of the foresaid merits, our pilot study has shortcomings
create opportunity for follow-up work. While we have adopted the
MODIS data over a relatively small region in Queensland, a practical
implementation of the ANN model requires its validation with alter-
native datasets for a wider spatial domain. Follow-up studies can also
use Australian Bureau of Meteorology's Geostationary Meteorological
and MTSAT satellite data operated by the Japan Meteorological Agency
and the GOES-9 satellite of National Oceanographic & Atmospheric
Administration for Japan Meteorological Agency [6,7]. In principle,
Fig. 9. Bar graph of the relative forecasting error (RRMSE, %) for most (Group 6, Charleville) and least accurate (Group 4, Blackall) stations.
Fig. 10. Bar graphs of relative root mean square error (RRMSE; %) for different seasons
(DJF=summer; MAM=autumn; JJA=winter; SON=spring).
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
844
satellites are able to yield hourly variables over a grid resolution of
0.05° ×0.05° [136]. Additionally, some research has shown that
reanalysis data products (e.g., ERA-interim reanalysis of European
Centre for Medium Range Weather Forecasts) can be utilised when
satellite and station irradiance data are not available [139]. Generally,
the ERA-interim data is of good temporal resolution since the variables
are available 6 hourly from 1979–present at the highest grid resolution
of 1.5°×.5° [140,141].
The identification of alternatives predictor data sources provide an
opportunity to combine them in such a way that the performance of the
model is better than a model with just the meteorological, satellite or
reanalysis on its own [6,7]. The optimisation of models is important as
most studies only rely on meteorological irradiance to generate solar
estimates over vast topographical and non-homogenous terrains
[67,71,124,136]. The scope of this paper was limited in terms of
forecasting horizon as we have validated the ANN model for long-term
periods. A follow-up study could investigate the model's skill for better
temporal resolution (e.g., sub-hourly, hourly and daily) with improved
pre-satellite irradiance estimates, sunshine durations and cloud cover
[124]. Real-time implementation especially for consumer power grids
requires quantitative data for sub-hourly, hourly and daily forecasts.
Therefore, the ANN model coupled with satellite-derived data needs to
be validated for smaller timescales. In spite of coarse temporal
resolution, the ANN model has set a promising pathway for alternative
predictors with better resolution to be explored over a larger con-
tinental scales than those investigated in this pilot study.
It is noted that we have employed a set of 21 regional sites in
Queensland, of which 14 have been utilised in model development and/
or training and remainder (seven) for cross-validation (Fig. 2). A
‘universal’ model was constructed to forecast G at cross-validation
sites. It is noteworthy that LST data used for cross-validation sites were
not used in developing the model (Table 1), yet, a good accuracy was
attained (Tables 6–8). The notion of a ‘universal’ model for forecasting
at sites closest in distance to the training site(s) show the spatial
relevance of our approach in terms of its application at nearby site(s)
where predictor data are not unavailable. Indeed, some research (e.g.,
[138,142]) that compared hourly site/time global irradiances obtained
via extrapolation and interpolation of ground sites obtained by
processing satellite images with statistical model found that for hourly
data, satellite estimated solar radiation becomes more accurate than a
local ground station if the distance from the station exceeds 34 km,
down from a previously reported 50 km range for daily irradiances and
for a regularly spaced network, the breakeven distance is estimated to
be 50 km. This shows that satellite-derived irradiance can surpass site/
time specific accuracy achievable with geostationary satellites.
Considering this, accuracy of an ANN model applied to nearby sites
where models were trained, is expected to scale with the distance and
predictive features at the cross-validation sites. A follow-up study
should investigate the spatial relevance of model in terms of how the
predictive features at cross-validation site could influence the accuracy.
In doing so, one must consider the nature and source of errors at these
sites emerging from satellite-determination of cloud thickness and
atmospheric turbidity, etc.
There is no doubt that, our paper found good accuracy with LST
data, as evidenced in an error magnitude of less than 10% based on
root mean square error and over 96% correlation between observed
and forecasted G (Table 7). However, more exhaustive list of predictors
based on MODIS and other products (e.g., Geostationary
Meteorological Satellite, MTSAT and GOES-9 satellite) could be
utilised. In follow-up study, one could incorporate an exhaustive list
of carefully screened predictors. Atmospheric properties such cloud
cover (including optical thickness, effective particle radius, thermo-
dynamic phase, cloud top altitude, temperature, cirrus reflectance),
precipitable water, and temperature profiles can be incorporated to
establish the relevant inputs [81,84,88]. To increase predictors and add
features or predictive patterns into the model, reanalysis products
[140,141] which are yet to be explored for solar modelling in Australia,
can be explored.
It is acknowledged that, in this paper, we utilised a single predictor
(i.e., LST) that in fact, did not have a pre-processing algorithm to pre-
select the predictive features. Data-driven models encounter challenges
with non-stationarity, jumps, periodicity and stochastic behaviour in
input variables [143]. Despite the flexibility of the ANN model in
handling non-linear data, the accuracy is affected by chaotic behaviour
in training variables if features prevail over a wide range of frequencies
(e.g., hourly, daily or seasonal scale) [144,145]. In case of ‘daily cycle’
or ‘seasonality’, the absence of an input-output data pre-post-proces-
sing scheme be challenging in attaining an optimum model. In this
paper we utilised LST without identifying its underlying frequencies,
therefore, a hybrid model with data pre-processing technique based on
wavelet transformation algorithm can be used to obviate this short-
coming [15]. Decomposition of predictors in high (approximation) and
low frequency (detail) subset can enhance the model's responsiveness
to predictive features [146]. Similar observation was made in a study
on solar forecasting for metropolitan and regional sites [15] where a
wavelet-based hybrid support vector machine model was more precise
relative to a support vector machine model. Research elsewhere
support similar deduction [132,143].
In order to broaden the scope, predictors from satellite-derived
product (e.g., cloud cover) and reanalysis products (e.g., wind fields)
related to solar radiation can be utilised with a non-linear feature
selection algorithm that assimilates the relevant features. Statistical
methods such as iterative input selection (IIS) [147], bootstrap rank-
ordered conditional mutual information (broCMI) [148] and evolu-
tionary methods such as grouping genetic algorithm (GGA) [149] and
coral reef optimisation (CRO) [150,151] can be applied to rank
predictor variables in context of a wavelet-hybrid model. Although
hybrid models with feature selection algorithm is known to yield good
accuracy in energy applications (e.g., [149–151]), it was beyond the
scope of this paper and could form the subject of another independent
study.
6. Summary
This paper established the preciseness of artificial neural network
(ANN) coupled with satellite-derived land-surface temperature (LST)
as predictor for forecasting solar radiation for regional Queensland. A
limited set of LST data from 2012 to 2014 for 21 stations was utilised,
partitioned in seven groups with two stations (2012–2013) in each
group for development of model and the third for cross-validation over
2014 period. To enhance the preciseness of the ANN model, several
training algorithms and hidden transfer functions were trialed such
that the Levenberg-Marquardt training algorithm and logarithmic
sigmoid function was finally adopted for forecasting G. To meet the
objectives (Section 1.0), ANN model was benchmarked with multiple
linear regression (MLR) and autoregressive moving average (ARIMA)
models. The findings have been enumerated below.
1. For monthly horizons, different lagged combinations of LST as
primary predictor variable was required to attain an accurate ANN
model, albeit with a significant degree of variability over different
study sites. By contrast, the MLR model required only the LST data
with zero lagged data whereas the ARIMA model was developed
using the G data at the cross-validation study sites.
2. The performance of the ANN model in terms of its root mean square
error and mean absolute error in cross-validation period was
dramatically lower than the MLR and ARIMA model with average
values of RMSE≈1.23 MJ m−2
, 1.90 MJ m−2
and 1.94 MJ m−2
,
whereas the MAE was 1.02 MJ m−2
, 1.65 MJ m−2
and 1.53 MJ m−2
,
respectively. The normalized performance metrics yielded a
Willmott's Index of 0.954 (ANN) compared with 0.899 (MLR) and
0.848 (ARIMA), and the relative RMSE was only 5.85% (ANN)
R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848
845
Forecasting long term global solar radiation with an ann algorithm
Forecasting long term global solar radiation with an ann algorithm
Forecasting long term global solar radiation with an ann algorithm

More Related Content

What's hot

Hourly probabilistic solar power forecasts 3v
Hourly probabilistic solar power forecasts 3vHourly probabilistic solar power forecasts 3v
Hourly probabilistic solar power forecasts 3vMohamed Abuella
 
46 optimization paper id 0017 edit septian
46 optimization paper id 0017 edit septian46 optimization paper id 0017 edit septian
46 optimization paper id 0017 edit septianIAESIJEECS
 
Design and performance evaluation of a solar tracking panel of single axis in...
Design and performance evaluation of a solar tracking panel of single axis in...Design and performance evaluation of a solar tracking panel of single axis in...
Design and performance evaluation of a solar tracking panel of single axis in...IJECEIAES
 
Adjusting post processing approach for very short-term solar power forecasts
Adjusting post processing approach for very short-term solar power forecastsAdjusting post processing approach for very short-term solar power forecasts
Adjusting post processing approach for very short-term solar power forecastsMohamed Abuella
 
Application of the extreme learning machine algorithm for the
Application of the extreme learning machine algorithm for theApplication of the extreme learning machine algorithm for the
Application of the extreme learning machine algorithm for themehmet şahin
 
Short Presentation: Mohamed abuella's Research Highlights
Short Presentation: Mohamed abuella's Research HighlightsShort Presentation: Mohamed abuella's Research Highlights
Short Presentation: Mohamed abuella's Research HighlightsMohamed Abuella
 
Monitoring and Forecasting of Air Emissions with IoT Measuring Stations and a...
Monitoring and Forecasting of Air Emissions with IoT Measuring Stations and a...Monitoring and Forecasting of Air Emissions with IoT Measuring Stations and a...
Monitoring and Forecasting of Air Emissions with IoT Measuring Stations and a...Francisco Marcelo Pereira Hernandez
 
PV Solar Power Forecasting
PV Solar Power ForecastingPV Solar Power Forecasting
PV Solar Power ForecastingMohamed Abuella
 
STOCHASTIC GENERATION OF ARTIFICIAL WEATHER DATA FOR SUBTROPICAL CLIMATES USI...
STOCHASTIC GENERATION OF ARTIFICIAL WEATHER DATA FOR SUBTROPICAL CLIMATES USI...STOCHASTIC GENERATION OF ARTIFICIAL WEATHER DATA FOR SUBTROPICAL CLIMATES USI...
STOCHASTIC GENERATION OF ARTIFICIAL WEATHER DATA FOR SUBTROPICAL CLIMATES USI...IAEME Publication
 
The effect of microscale spatial variability of wind on estimation of technic...
The effect of microscale spatial variability of wind on estimation of technic...The effect of microscale spatial variability of wind on estimation of technic...
The effect of microscale spatial variability of wind on estimation of technic...IEA-ETSAP
 
EcoTas13 BradEvans e-Mast UNSW
EcoTas13 BradEvans e-Mast UNSWEcoTas13 BradEvans e-Mast UNSW
EcoTas13 BradEvans e-Mast UNSWTERN Australia
 
2013 ASPRS Track, Developing an ArcGIS Toolbox for Estimating EvapoTranspirat...
2013 ASPRS Track, Developing an ArcGIS Toolbox for Estimating EvapoTranspirat...2013 ASPRS Track, Developing an ArcGIS Toolbox for Estimating EvapoTranspirat...
2013 ASPRS Track, Developing an ArcGIS Toolbox for Estimating EvapoTranspirat...GIS in the Rockies
 
The Development of an Application Conceived for the Design, Feasibility Study...
The Development of an Application Conceived for the Design, Feasibility Study...The Development of an Application Conceived for the Design, Feasibility Study...
The Development of an Application Conceived for the Design, Feasibility Study...IJECEIAES
 
A Land Data Assimilation System Utilizing Low Frequency Passive Microwave Rem...
A Land Data Assimilation System Utilizing Low Frequency Passive Microwave Rem...A Land Data Assimilation System Utilizing Low Frequency Passive Microwave Rem...
A Land Data Assimilation System Utilizing Low Frequency Passive Microwave Rem...drboon
 
B041111321
B041111321B041111321
B041111321IOSR-JEN
 
Empirical model for the estimation of global solar
Empirical model for the estimation of global solarEmpirical model for the estimation of global solar
Empirical model for the estimation of global solareSAT Publishing House
 
Evaluating aboveground terrestrial carbon flux as ecosystem planning
Evaluating aboveground terrestrial carbon flux as ecosystem planningEvaluating aboveground terrestrial carbon flux as ecosystem planning
Evaluating aboveground terrestrial carbon flux as ecosystem planningWorld Agroforestry (ICRAF)
 
Assessment of wind resource and
Assessment of wind resource andAssessment of wind resource and
Assessment of wind resource andijcsa
 
Coordination of blade pitch controller and battery energy storage using firef...
Coordination of blade pitch controller and battery energy storage using firef...Coordination of blade pitch controller and battery energy storage using firef...
Coordination of blade pitch controller and battery energy storage using firef...TELKOMNIKA JOURNAL
 

What's hot (20)

Hourly probabilistic solar power forecasts 3v
Hourly probabilistic solar power forecasts 3vHourly probabilistic solar power forecasts 3v
Hourly probabilistic solar power forecasts 3v
 
46 optimization paper id 0017 edit septian
46 optimization paper id 0017 edit septian46 optimization paper id 0017 edit septian
46 optimization paper id 0017 edit septian
 
Design and performance evaluation of a solar tracking panel of single axis in...
Design and performance evaluation of a solar tracking panel of single axis in...Design and performance evaluation of a solar tracking panel of single axis in...
Design and performance evaluation of a solar tracking panel of single axis in...
 
Adjusting post processing approach for very short-term solar power forecasts
Adjusting post processing approach for very short-term solar power forecastsAdjusting post processing approach for very short-term solar power forecasts
Adjusting post processing approach for very short-term solar power forecasts
 
Application of the extreme learning machine algorithm for the
Application of the extreme learning machine algorithm for theApplication of the extreme learning machine algorithm for the
Application of the extreme learning machine algorithm for the
 
Thesis report
Thesis reportThesis report
Thesis report
 
Short Presentation: Mohamed abuella's Research Highlights
Short Presentation: Mohamed abuella's Research HighlightsShort Presentation: Mohamed abuella's Research Highlights
Short Presentation: Mohamed abuella's Research Highlights
 
Monitoring and Forecasting of Air Emissions with IoT Measuring Stations and a...
Monitoring and Forecasting of Air Emissions with IoT Measuring Stations and a...Monitoring and Forecasting of Air Emissions with IoT Measuring Stations and a...
Monitoring and Forecasting of Air Emissions with IoT Measuring Stations and a...
 
PV Solar Power Forecasting
PV Solar Power ForecastingPV Solar Power Forecasting
PV Solar Power Forecasting
 
STOCHASTIC GENERATION OF ARTIFICIAL WEATHER DATA FOR SUBTROPICAL CLIMATES USI...
STOCHASTIC GENERATION OF ARTIFICIAL WEATHER DATA FOR SUBTROPICAL CLIMATES USI...STOCHASTIC GENERATION OF ARTIFICIAL WEATHER DATA FOR SUBTROPICAL CLIMATES USI...
STOCHASTIC GENERATION OF ARTIFICIAL WEATHER DATA FOR SUBTROPICAL CLIMATES USI...
 
The effect of microscale spatial variability of wind on estimation of technic...
The effect of microscale spatial variability of wind on estimation of technic...The effect of microscale spatial variability of wind on estimation of technic...
The effect of microscale spatial variability of wind on estimation of technic...
 
EcoTas13 BradEvans e-Mast UNSW
EcoTas13 BradEvans e-Mast UNSWEcoTas13 BradEvans e-Mast UNSW
EcoTas13 BradEvans e-Mast UNSW
 
2013 ASPRS Track, Developing an ArcGIS Toolbox for Estimating EvapoTranspirat...
2013 ASPRS Track, Developing an ArcGIS Toolbox for Estimating EvapoTranspirat...2013 ASPRS Track, Developing an ArcGIS Toolbox for Estimating EvapoTranspirat...
2013 ASPRS Track, Developing an ArcGIS Toolbox for Estimating EvapoTranspirat...
 
The Development of an Application Conceived for the Design, Feasibility Study...
The Development of an Application Conceived for the Design, Feasibility Study...The Development of an Application Conceived for the Design, Feasibility Study...
The Development of an Application Conceived for the Design, Feasibility Study...
 
A Land Data Assimilation System Utilizing Low Frequency Passive Microwave Rem...
A Land Data Assimilation System Utilizing Low Frequency Passive Microwave Rem...A Land Data Assimilation System Utilizing Low Frequency Passive Microwave Rem...
A Land Data Assimilation System Utilizing Low Frequency Passive Microwave Rem...
 
B041111321
B041111321B041111321
B041111321
 
Empirical model for the estimation of global solar
Empirical model for the estimation of global solarEmpirical model for the estimation of global solar
Empirical model for the estimation of global solar
 
Evaluating aboveground terrestrial carbon flux as ecosystem planning
Evaluating aboveground terrestrial carbon flux as ecosystem planningEvaluating aboveground terrestrial carbon flux as ecosystem planning
Evaluating aboveground terrestrial carbon flux as ecosystem planning
 
Assessment of wind resource and
Assessment of wind resource andAssessment of wind resource and
Assessment of wind resource and
 
Coordination of blade pitch controller and battery energy storage using firef...
Coordination of blade pitch controller and battery energy storage using firef...Coordination of blade pitch controller and battery energy storage using firef...
Coordination of blade pitch controller and battery energy storage using firef...
 

Similar to Forecasting long term global solar radiation with an ann algorithm

A MODEL DRIVEN OPTIMIZATION APPROACH TO DETERMINE TILT ANGLE OF SOLAR COLLECT...
A MODEL DRIVEN OPTIMIZATION APPROACH TO DETERMINE TILT ANGLE OF SOLAR COLLECT...A MODEL DRIVEN OPTIMIZATION APPROACH TO DETERMINE TILT ANGLE OF SOLAR COLLECT...
A MODEL DRIVEN OPTIMIZATION APPROACH TO DETERMINE TILT ANGLE OF SOLAR COLLECT...IAEME Publication
 
Estimation of diffuse solar radiation in the south of cameroon
Estimation of diffuse solar radiation in the south of cameroonEstimation of diffuse solar radiation in the south of cameroon
Estimation of diffuse solar radiation in the south of cameroonAlexander Decker
 
Comparison of ann and mlr models for
Comparison of ann and mlr models forComparison of ann and mlr models for
Comparison of ann and mlr models formehmet şahin
 
Calculation of solar radiation by using regression methods
Calculation of solar radiation by using regression methodsCalculation of solar radiation by using regression methods
Calculation of solar radiation by using regression methodsmehmet şahin
 
An integrated multiple layer perceptron-genetic algorithm decision support sy...
An integrated multiple layer perceptron-genetic algorithm decision support sy...An integrated multiple layer perceptron-genetic algorithm decision support sy...
An integrated multiple layer perceptron-genetic algorithm decision support sy...IJECEIAES
 
A Systematic Review of Renewable Energy Trend.pdf
A Systematic Review of Renewable Energy Trend.pdfA Systematic Review of Renewable Energy Trend.pdf
A Systematic Review of Renewable Energy Trend.pdfssuser793b4e
 
Typical Meteorological Year Report for CSP, CPV and PV solar plants
Typical Meteorological Year Report for CSP, CPV and PV solar plantsTypical Meteorological Year Report for CSP, CPV and PV solar plants
Typical Meteorological Year Report for CSP, CPV and PV solar plantsIrSOLaV Pomares
 
Comparative Studies of the Measured and Predicted Values of Global Solar Radi...
Comparative Studies of the Measured and Predicted Values of Global Solar Radi...Comparative Studies of the Measured and Predicted Values of Global Solar Radi...
Comparative Studies of the Measured and Predicted Values of Global Solar Radi...YogeshIJTSRD
 
Angstrom-Prescott Model for Predicting Global Solar Radiation in Mubi, Nigeria
Angstrom-Prescott Model for Predicting Global Solar Radiation in Mubi, NigeriaAngstrom-Prescott Model for Predicting Global Solar Radiation in Mubi, Nigeria
Angstrom-Prescott Model for Predicting Global Solar Radiation in Mubi, NigeriaAssociate Professor in VSB Coimbatore
 
Application of extreme learning machine for estimating solar radiation from s...
Application of extreme learning machine for estimating solar radiation from s...Application of extreme learning machine for estimating solar radiation from s...
Application of extreme learning machine for estimating solar radiation from s...mehmet şahin
 
Optimal artificial neural network configurations for hourly solar irradiation...
Optimal artificial neural network configurations for hourly solar irradiation...Optimal artificial neural network configurations for hourly solar irradiation...
Optimal artificial neural network configurations for hourly solar irradiation...IJECEIAES
 
Workshop on Applications of Solar Radiation Forecasting - Introduction - Jesú...
Workshop on Applications of Solar Radiation Forecasting - Introduction - Jesú...Workshop on Applications of Solar Radiation Forecasting - Introduction - Jesú...
Workshop on Applications of Solar Radiation Forecasting - Introduction - Jesú...IrSOLaV Pomares
 
Comparative Study of Selective Locations (Different region) for Power Generat...
Comparative Study of Selective Locations (Different region) for Power Generat...Comparative Study of Selective Locations (Different region) for Power Generat...
Comparative Study of Selective Locations (Different region) for Power Generat...ijceronline
 
RE.SUN Validation (March 2013)
RE.SUN Validation (March 2013)RE.SUN Validation (March 2013)
RE.SUN Validation (March 2013)Carlos Pinto
 
Dynamic_Feature_Selection_for_Solar_Irradiance_Forecasting_Based_on_Deep_Rein...
Dynamic_Feature_Selection_for_Solar_Irradiance_Forecasting_Based_on_Deep_Rein...Dynamic_Feature_Selection_for_Solar_Irradiance_Forecasting_Based_on_Deep_Rein...
Dynamic_Feature_Selection_for_Solar_Irradiance_Forecasting_Based_on_Deep_Rein...AhsunIqbal2
 
Design and implementation of smart electronic solar tracker based on Arduino
Design and implementation of smart electronic solar tracker based on ArduinoDesign and implementation of smart electronic solar tracker based on Arduino
Design and implementation of smart electronic solar tracker based on ArduinoTELKOMNIKA JOURNAL
 
meteodynWT meso coupling downscaling regional planing
meteodynWT meso coupling downscaling regional planingmeteodynWT meso coupling downscaling regional planing
meteodynWT meso coupling downscaling regional planingJean-Claude Meteodyn
 

Similar to Forecasting long term global solar radiation with an ann algorithm (20)

A MODEL DRIVEN OPTIMIZATION APPROACH TO DETERMINE TILT ANGLE OF SOLAR COLLECT...
A MODEL DRIVEN OPTIMIZATION APPROACH TO DETERMINE TILT ANGLE OF SOLAR COLLECT...A MODEL DRIVEN OPTIMIZATION APPROACH TO DETERMINE TILT ANGLE OF SOLAR COLLECT...
A MODEL DRIVEN OPTIMIZATION APPROACH TO DETERMINE TILT ANGLE OF SOLAR COLLECT...
 
Estimation of diffuse solar radiation in the south of cameroon
Estimation of diffuse solar radiation in the south of cameroonEstimation of diffuse solar radiation in the south of cameroon
Estimation of diffuse solar radiation in the south of cameroon
 
Comparison of ann and mlr models for
Comparison of ann and mlr models forComparison of ann and mlr models for
Comparison of ann and mlr models for
 
Calculation of solar radiation by using regression methods
Calculation of solar radiation by using regression methodsCalculation of solar radiation by using regression methods
Calculation of solar radiation by using regression methods
 
An integrated multiple layer perceptron-genetic algorithm decision support sy...
An integrated multiple layer perceptron-genetic algorithm decision support sy...An integrated multiple layer perceptron-genetic algorithm decision support sy...
An integrated multiple layer perceptron-genetic algorithm decision support sy...
 
A Systematic Review of Renewable Energy Trend.pdf
A Systematic Review of Renewable Energy Trend.pdfA Systematic Review of Renewable Energy Trend.pdf
A Systematic Review of Renewable Energy Trend.pdf
 
Typical Meteorological Year Report for CSP, CPV and PV solar plants
Typical Meteorological Year Report for CSP, CPV and PV solar plantsTypical Meteorological Year Report for CSP, CPV and PV solar plants
Typical Meteorological Year Report for CSP, CPV and PV solar plants
 
Comparative Studies of the Measured and Predicted Values of Global Solar Radi...
Comparative Studies of the Measured and Predicted Values of Global Solar Radi...Comparative Studies of the Measured and Predicted Values of Global Solar Radi...
Comparative Studies of the Measured and Predicted Values of Global Solar Radi...
 
Angstrom-Prescott Model for Predicting Global Solar Radiation in Mubi, Nigeria
Angstrom-Prescott Model for Predicting Global Solar Radiation in Mubi, NigeriaAngstrom-Prescott Model for Predicting Global Solar Radiation in Mubi, Nigeria
Angstrom-Prescott Model for Predicting Global Solar Radiation in Mubi, Nigeria
 
Application of extreme learning machine for estimating solar radiation from s...
Application of extreme learning machine for estimating solar radiation from s...Application of extreme learning machine for estimating solar radiation from s...
Application of extreme learning machine for estimating solar radiation from s...
 
Optimal artificial neural network configurations for hourly solar irradiation...
Optimal artificial neural network configurations for hourly solar irradiation...Optimal artificial neural network configurations for hourly solar irradiation...
Optimal artificial neural network configurations for hourly solar irradiation...
 
Aee036
Aee036Aee036
Aee036
 
Workshop on Applications of Solar Radiation Forecasting - Introduction - Jesú...
Workshop on Applications of Solar Radiation Forecasting - Introduction - Jesú...Workshop on Applications of Solar Radiation Forecasting - Introduction - Jesú...
Workshop on Applications of Solar Radiation Forecasting - Introduction - Jesú...
 
Comparative Study of Selective Locations (Different region) for Power Generat...
Comparative Study of Selective Locations (Different region) for Power Generat...Comparative Study of Selective Locations (Different region) for Power Generat...
Comparative Study of Selective Locations (Different region) for Power Generat...
 
RE.SUN Validation (March 2013)
RE.SUN Validation (March 2013)RE.SUN Validation (March 2013)
RE.SUN Validation (March 2013)
 
test
testtest
test
 
Dynamic_Feature_Selection_for_Solar_Irradiance_Forecasting_Based_on_Deep_Rein...
Dynamic_Feature_Selection_for_Solar_Irradiance_Forecasting_Based_on_Deep_Rein...Dynamic_Feature_Selection_for_Solar_Irradiance_Forecasting_Based_on_Deep_Rein...
Dynamic_Feature_Selection_for_Solar_Irradiance_Forecasting_Based_on_Deep_Rein...
 
Design and implementation of smart electronic solar tracker based on Arduino
Design and implementation of smart electronic solar tracker based on ArduinoDesign and implementation of smart electronic solar tracker based on Arduino
Design and implementation of smart electronic solar tracker based on Arduino
 
Energies 12-01934
Energies 12-01934Energies 12-01934
Energies 12-01934
 
meteodynWT meso coupling downscaling regional planing
meteodynWT meso coupling downscaling regional planingmeteodynWT meso coupling downscaling regional planing
meteodynWT meso coupling downscaling regional planing
 

More from mehmet şahin

Yapay sinir ağı ve noaaavhrr uydu verilerini kullanarak hava sıcaklığının tah...
Yapay sinir ağı ve noaaavhrr uydu verilerini kullanarak hava sıcaklığının tah...Yapay sinir ağı ve noaaavhrr uydu verilerini kullanarak hava sıcaklığının tah...
Yapay sinir ağı ve noaaavhrr uydu verilerini kullanarak hava sıcaklığının tah...mehmet şahin
 
Siirt ilinin yer yüzey sıcaklığının belirlenmesi için farklı split window alg...
Siirt ilinin yer yüzey sıcaklığının belirlenmesi için farklı split window alg...Siirt ilinin yer yüzey sıcaklığının belirlenmesi için farklı split window alg...
Siirt ilinin yer yüzey sıcaklığının belirlenmesi için farklı split window alg...mehmet şahin
 
Forecasting of air temperature based on remote
Forecasting of air temperature based on remoteForecasting of air temperature based on remote
Forecasting of air temperature based on remotemehmet şahin
 
Estimation of global solar radiation by using machine learning methods
Estimation of global solar radiation by using machine learning methodsEstimation of global solar radiation by using machine learning methods
Estimation of global solar radiation by using machine learning methodsmehmet şahin
 
Estimation of wind power density with artificial neural network
Estimation of wind power density with artificial neural networkEstimation of wind power density with artificial neural network
Estimation of wind power density with artificial neural networkmehmet şahin
 
Estimation of solar radiation by different machine learning methods
Estimation of solar radiation by different machine learning methodsEstimation of solar radiation by different machine learning methods
Estimation of solar radiation by different machine learning methodsmehmet şahin
 
Determination of wind energy potential of campus area of siirt university
Determination of wind energy potential of campus area of siirt universityDetermination of wind energy potential of campus area of siirt university
Determination of wind energy potential of campus area of siirt universitymehmet şahin
 
Yer yüzey sicakliğinin becker ve li–1990 algori̇tmasina bağli
Yer yüzey sicakliğinin becker ve li–1990 algori̇tmasina bağliYer yüzey sicakliğinin becker ve li–1990 algori̇tmasina bağli
Yer yüzey sicakliğinin becker ve li–1990 algori̇tmasina bağlimehmet şahin
 
Comparison of modelling ann and elm to estimate solar radiation over turkey u...
Comparison of modelling ann and elm to estimate solar radiation over turkey u...Comparison of modelling ann and elm to estimate solar radiation over turkey u...
Comparison of modelling ann and elm to estimate solar radiation over turkey u...mehmet şahin
 
Estimation of the vapour pressure deficit using noaa avhrr data
Estimation of the vapour pressure deficit using noaa avhrr dataEstimation of the vapour pressure deficit using noaa avhrr data
Estimation of the vapour pressure deficit using noaa avhrr datamehmet şahin
 
Modelling and remote sensing of land surface
Modelling and remote sensing of land surfaceModelling and remote sensing of land surface
Modelling and remote sensing of land surfacemehmet şahin
 
Modelling of air temperature using ann and remote sensing
Modelling of air temperature using ann and remote sensingModelling of air temperature using ann and remote sensing
Modelling of air temperature using ann and remote sensingmehmet şahin
 
Precipitable water modelling using artificial neural
Precipitable water modelling using artificial neuralPrecipitable water modelling using artificial neural
Precipitable water modelling using artificial neuralmehmet şahin
 
Application of the artificial neural network model for prediction of
Application of the artificial neural network model for prediction ofApplication of the artificial neural network model for prediction of
Application of the artificial neural network model for prediction ofmehmet şahin
 
A comparison of two solar radiation models using artificial neural networks a...
A comparison of two solar radiation models using artificial neural networks a...A comparison of two solar radiation models using artificial neural networks a...
A comparison of two solar radiation models using artificial neural networks a...mehmet şahin
 

More from mehmet şahin (15)

Yapay sinir ağı ve noaaavhrr uydu verilerini kullanarak hava sıcaklığının tah...
Yapay sinir ağı ve noaaavhrr uydu verilerini kullanarak hava sıcaklığının tah...Yapay sinir ağı ve noaaavhrr uydu verilerini kullanarak hava sıcaklığının tah...
Yapay sinir ağı ve noaaavhrr uydu verilerini kullanarak hava sıcaklığının tah...
 
Siirt ilinin yer yüzey sıcaklığının belirlenmesi için farklı split window alg...
Siirt ilinin yer yüzey sıcaklığının belirlenmesi için farklı split window alg...Siirt ilinin yer yüzey sıcaklığının belirlenmesi için farklı split window alg...
Siirt ilinin yer yüzey sıcaklığının belirlenmesi için farklı split window alg...
 
Forecasting of air temperature based on remote
Forecasting of air temperature based on remoteForecasting of air temperature based on remote
Forecasting of air temperature based on remote
 
Estimation of global solar radiation by using machine learning methods
Estimation of global solar radiation by using machine learning methodsEstimation of global solar radiation by using machine learning methods
Estimation of global solar radiation by using machine learning methods
 
Estimation of wind power density with artificial neural network
Estimation of wind power density with artificial neural networkEstimation of wind power density with artificial neural network
Estimation of wind power density with artificial neural network
 
Estimation of solar radiation by different machine learning methods
Estimation of solar radiation by different machine learning methodsEstimation of solar radiation by different machine learning methods
Estimation of solar radiation by different machine learning methods
 
Determination of wind energy potential of campus area of siirt university
Determination of wind energy potential of campus area of siirt universityDetermination of wind energy potential of campus area of siirt university
Determination of wind energy potential of campus area of siirt university
 
Yer yüzey sicakliğinin becker ve li–1990 algori̇tmasina bağli
Yer yüzey sicakliğinin becker ve li–1990 algori̇tmasina bağliYer yüzey sicakliğinin becker ve li–1990 algori̇tmasina bağli
Yer yüzey sicakliğinin becker ve li–1990 algori̇tmasina bağli
 
Comparison of modelling ann and elm to estimate solar radiation over turkey u...
Comparison of modelling ann and elm to estimate solar radiation over turkey u...Comparison of modelling ann and elm to estimate solar radiation over turkey u...
Comparison of modelling ann and elm to estimate solar radiation over turkey u...
 
Estimation of the vapour pressure deficit using noaa avhrr data
Estimation of the vapour pressure deficit using noaa avhrr dataEstimation of the vapour pressure deficit using noaa avhrr data
Estimation of the vapour pressure deficit using noaa avhrr data
 
Modelling and remote sensing of land surface
Modelling and remote sensing of land surfaceModelling and remote sensing of land surface
Modelling and remote sensing of land surface
 
Modelling of air temperature using ann and remote sensing
Modelling of air temperature using ann and remote sensingModelling of air temperature using ann and remote sensing
Modelling of air temperature using ann and remote sensing
 
Precipitable water modelling using artificial neural
Precipitable water modelling using artificial neuralPrecipitable water modelling using artificial neural
Precipitable water modelling using artificial neural
 
Application of the artificial neural network model for prediction of
Application of the artificial neural network model for prediction ofApplication of the artificial neural network model for prediction of
Application of the artificial neural network model for prediction of
 
A comparison of two solar radiation models using artificial neural networks a...
A comparison of two solar radiation models using artificial neural networks a...A comparison of two solar radiation models using artificial neural networks a...
A comparison of two solar radiation models using artificial neural networks a...
 

Recently uploaded

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 

Recently uploaded (20)

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 

Forecasting long term global solar radiation with an ann algorithm

  • 1. Contents lists available at ScienceDirect Renewable and Sustainable Energy Reviews journal homepage: www.elsevier.com/locate/rser Forecasting long-term global solar radiation with an ANN algorithm coupled with satellite-derived (MODIS) land surface temperature (LST) for regional locations in Queensland Ravinesh C. Deoa,c,⁎ , Mehmet Şahinb a School of Agricultural Computational and Environmental Sciences, Institute of Agriculture and Environment (IAg & E), International Centre for Applied Climate Sciences (ICACS), University of Southern Queensland, Springfield, QLD 4300, Australia b Department of Electrical and Electronics Engineering, Siirt University, 56100 Siirt, Turkey c Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, Gansu, China A R T I C L E I N F O Keywords: Satellite-based solar model Neural network Multi-linear regression ARIMA model A B S T R A C T Forecasting solar radiation (G) is extremely crucial for engineering applications (e.g. design of solar furnaces and energy-efficient buildings, solar concentrators, photovoltaic-systems and a site-selection of sites for future power plants). To establish long-term sustainability of solar energy, energy practitioners utilize versatile predictive models of G as an indispensable decision-making tool. Notwithstanding this, sparsity of solar sites, instrument maintenance, policy and fiscal issues constraint the availability of model input data that must be used for forecasting the onsite value of G. To surmount these challenge, low-cost, readily-available satellite products accessible over large spatial domains can provide viable alternatives. In this paper, the preciseness of artificial neural network (ANN) for predictive modelling of G is evaluated for regional Queensland, which employed Moderate Resolution Imaging Spectroradiometer (MODIS) land-surface temperature (LST) as an effective predictor. To couple an ANN model with satellite-derived variable, the LST data over 2012–2014 are acquired in seven groups, with three sites per group where the data for first two (2012–2013) are utilised for model development and the third (2014) group for cross-validation. For monthly horizon, the ANN model is optimized by trialing 55 neuronal architectures, while for seasonal forecasting, nine neuronal architectures are trailed with time-lagged LST. ANN coupled with zero lagged LST utilised scaled conjugate gradient algorithm, and while ANN with time-lagged LST utilised Levenberg-Marquardt algorithm. To ascertain conclusive results, the objective model is evaluated via multiple linear regression (MLR) and autoregressive integrated moving average (ARIMA) algorithms. Results showed that an ANN model outperformed MLR and ARIMA models where an analysis yielded 39% of cumulative errors in smallest magnitude bracket, whereas MLR and ARIMA produced 15% and 25%. Superiority of an ANN model was demonstrated by site-averaged (monthly) relative error of 5.85% compared with 10.23% (MLR) and 9.60 (ARIMA) with Willmott's Index of 0.954 (ANN), 0.899 (MLR) and 0.848 (ARIMA). This work ascertains that an ANN model coupled with satellite-derived LST data can be adopted as a qualified stratagem for the proliferation of solar energy applications in locations that have an appropriate satellite footprint. 1. Background review Forecasting solar energy is an important research area for en- gineers, energy experts, policy-makers and climate advocates since the utilization of carbon-free energy with less environmental impacts is a promising outlook for addressing climate change [1]. There is growing consensus with healthy debates on the adoption of solar as a substitute for carbon-based fuels not only from a global perspective but also in Australia where solar has immense potential due to high insolation, low rainfall and small fraction of cloud cover leading to less scattering of solar radiation over large spatial domains [2,3]. Estimatedly, annual solar radiation is 58 million petajoules, which is 10,000 fold Australia's energy consumption [4]. However, photovoltaic systems contribute to 7.6% of Australia's annual energy use [5] while modest utilizations are limited to rooftop water heating system that barely exceed 4500 MW capacity [6,7]. In spite of the current staggering adoption of solar as a http://dx.doi.org/10.1016/j.rser.2017.01.114 Received 22 August 2016; Received in revised form 4 December 2016; Accepted 17 January 2017 ⁎ Corresponding author at: School of Agricultural Computational and Environmental Sciences, Institute of Agriculture and Environment (IAg & E), International Centre for Applied Climate Sciences (ICACS), University of Southern Queensland, Springfield, QLD 4300, Australia. E-mail address: ravinesh.deo@usq.edu.au (R.C. Deo). Renewable and Sustainable Energy Reviews 72 (2017) 828–848 1364-0321/ © 2017 Elsevier Ltd. All rights reserved. MARK
  • 2. renewable energy option, the solar energy utilization is projected to increase from 7.0 (2007–08) to 24.0 petajoules (2029–30) with electricity generation from solar power projected to rise from 0.1 (2007–08) to 4.0 terawatt hour (2029–30) [8]. Besides, Renewable Energy Target sanctioned by the Senate advocated that more than 23.5% of national electricity is to be derived from renewables by 2020 [9]. Government initiatives constantly support opportunities for scien- tific research [10], firstly, to model solar prospectivity in Australia's diverse spatial locations (metro and remote) with acceptable accuracy by high-performance models and secondly, to harness this energy where it is economically sustainable. This off course, is a strategic measure to position Australia's clean energy demands and to contribute to grid supply in the future [11]. In light of this need, in this paper the forecasting of long-term (monthly and seasonal) global solar radiation is undertaken for regional sites in Queensland, Australia. For practical utilization of freely available solar energy in photo- voltaic systems, heaters and electro-mechanical devices that convert global solar radiation in a consumer-usable form, quantitative knowl- edge of solar characteristics (G) is paramount [12,13]. This is because the extracted amount of electricity is inherently proportional to the G values received on the earth's surface [6]. This knowledge can assist in the assessment of solar energy projects and their sustainability [14]. In remote locations (e.g. the study sites that are trialed in this paper) where ground-based meteorological networks are limited, an assess- ment of the viability of solar projects raises the serious need for an accurate, versatile and robust predictive model that is able to assess the sustainability of solar-fueled investments. Methods applied for solar radiation prediction utilize two types of predictive models; the deter- ministic (mathematical) and data-driven (black-box). Although deter- ministic approaches have profusely been adopted as described next, the use of data-driven models especially for the present study region has been limited, although the research into this area has been gaining more attention nowadays (e.g. [15,16]). Angstrom [17] demonstrated the usefulness of deterministic mod- els for forecasting G using solar exposure and the ratio of terrestrial to extraterrestrial radiation. Liu and Jordan [18] modelled daily and hourly diffuse solar radiation, Orgill and Hollands [19] correlated hourly diffuse radiation with clearness index and Iqbal [20] correlated clearness index and solar altitude. Spencer [21] used regression analysis for estimating hourly diffuse solar radiation from global G values, while Boland et al. [22] used 15 min and hourly datasets to ascertain if the smoothing generated of hourly data made a difference to the forecasts. Boland et al. [23] used the logistic function for an estimation of G. Deterministic equations based on linear (Angström– Prescott), quadratic [24], cubic [25], logarithmic [26] and exponential [27] forms were applied for the prediction of G including decomposi- tion models and correlations between clearness index and diffuse fraction, diffuse coefficient and direct transmittance [19,28,29]. Huang et al. [30] combined an autoregressive (AR) and dynamical systems model (Coupled Autoregressive Dynamical System, CARDS) for one-hour solar radiation forecasting. Based on the required parametrisation for AR (2) process, Lucheroni and a combination of the two, this work noted that the CARDS model was able to reduce the Combination model error by about 33.4%. Despite the usefulness of deterministic models, their capacity to extract the pertinent features in predictors that are not incorporated in their original mathematical formulations, is limited. The model's complexity is also expected to worsen with a larger number of predictors when primitive equations used in deterministic models are modified to accommodate new predictors for forecasting. Angstrom equation is also inflexible to incorporate this type of change without a fundamental modification of its original form, that off course, is achieved at the very expense of increasing the model complexity [31]. Difficulties can also arise in regards to the validity of the assumptions that are made; for example, the model parameters may be assumed to be invariant over time (e.g. as if the same sunshine duration persists on the same days or month), but in fact, this may not hold true. Other deterministic models, for example, the Iqbal, Gueymard and ASHRAE [32–34] require information on atmospheric conditions to agree with assumptions of normality, linearity, distributions, homoscedascity (variance constancy). As deterministic models rely on measured data, instrument errors in ground data compound the existing challenges or degrade the model's predictive accuracy [35,36], as revealed in a review on merits and challenges of Angström–Prescott models [37]. Advent of data-driven models, as utilised in the present study that require no information on the physical system related to solar dynamics and do not need complex mathematical equations, is gaining a plethora of attention [15,38–40]. In today's era where chunks of data are collected from measurements and physical models and are also enhanced through improved data products by satellites and reanalysis projects, soft-computing algorithms offers a promise for modelling solar energy from known behaviors of solar variability. The models requires little rationalization on the physics of solar variability. The relationship between input(s) and the objective variable is constructed with a machine algorithm that implements pattern recognition tools [36]. Besides simplicity, data-driven models make no assumption on the underlying data distribution but they do demonstrate competitive performance over deterministic model [15,38,41–43]. In this paper, an artificial neural network (ANN) model [40,44] is coupled with satellite- derived land-surface temperature (LST) and is evaluated by multiple linear regression [45–47] and auto-regressive integrated moving average [48] model. ANN is a computational paradigm that mimics the neuronal structure of brain, and learns and identifies complex data patterns [40,49]. It is noteworthy that the ANN model has recently captured significant attention in rainfall, streamflow, drought and temperature forecasting problems [50–53] but its application for global solar radiation forecasting is yet to be established, although recent work [40,54–57] has validated the utility of an ANN model elsewhere. Recent work [40] that reviewed theoretical, empirical, regression and ANN models for estimation of solar radiation noted superiority of ANN models over the former methods. A challenge by any forecasting model is the requirement of historical data (e.g. temperature) that must be related to an objective variable (G). Data may not be available in all spatial regions and more importantly, in remote sites where meteorological stations are largely absent. For example, Typical Meteorological Year (TMY) Databank for Australia provides long-term solar observations, but these are only for less than 40 primary stations [58] and the hourly solar observations are for about 18 locations [59], which are mostly limited to metropolitan cities. On reason is that it is not possible to set up experimental apparatus to acquire the data in all places, and even if it is so, issues of instrumental maintenance cannot be disregarded [43,49]. Luckily, remotely sensed data, utilised in this paper, has been identified as a viable predictor for solar forecasting problems [54,56,60,61]. In this view, the coupling of ANN model with satellite products is an improvement over station-based data as the acquisition of satellite imagery is feasible as long as a footprint is identified. It is also easy to obtain this data from remote regions (e.g. mountainous terrain) where meteorological stations are not built or are inaccessible. Satellite data is in abundance over large spatial and temporal resolutions [14,62–64]. However, the coupling of satellite-derived data with a machine learning algorithm (e.g. ANN model) in Australia is yet to be undertaken. Due to better spatial relevance for solar mapping that can be achieved via remotely sensed data [57], researchers have integrated ANN algorithms with satellite-derived inputs. Şenkal and Kuleli [61] estimated G using an ANN model coupled with satellite data to show a mean square error between estimated and ground monthly average daily sums as 3.94 MJ m−2 (ANN) and 5.37 MJm−2 (physical model), respectively. Şenkal [60] estimated G values using an ANN model with a mean square error of 6.59% while the work of Rahimikhoob [65] validated an ANN model using satellite temperature data. That study demonstrated that ANN-models of G yielded better results than R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 829
  • 3. empirical models. Linares-Rodriguez et al. [56] developed an ANN- ensemble model for daily G using Meteosat observations with a root mean square error (6.74%) and correlation coefficient (99%). Rahimikhoob et al. [57] used an ANN model for an estimation of G using Advanced Very High Resolution Radiometer. Alternatively, multiple linear regression (MLR) models were utilised for forecasting G by examining cause-effect relationships between a set of dependent variables [49,66]. Despite vast attention on satellite-based solar fore- casting, to the best of our knowledge, no study has developed an ANN model in this study region although one study has validated a wavelet- coupled SVM model with ground-based predictor data [15]. Considering the foresaid, the novelty of this paper is to establish the utility of an artificial neural network coupled with satellite-based land- surface temperature for forecasting G in the sunshine state of Queensland that has abundant solar resources [3,6,67]. This motiva- tion is driven by a state-wide surge in solar investments, evidenced by 1.47 million project that integrated ground, satellite and atmospheric data [6,7,67]. Due to surging interest in renewable energy mapping [2,3,5,8,9,68–70], a quantitative model can ideally be employed as a strategic ploy by decision-makers (e.g. Department of Resources, Energy and Tourism, Geoscience Australia, Queensland Office of Clean Energy and Clean Energy Council) [71]. The aim of this paper is: (1) To extract land surface temperature (LST) from Moderate Resolution Imaging Spectroradiometer (MODIS) to be used as an effective predictor for stations in regional Queensland; (2) To develop an artificial neural network model and evaluate its performance relative to multiple linear regression and autoregressive integrated moving average model; (3) To cross-validate its predictive skill for monthly and seasonal forecasting. The veracity of ANN model coupled with LST data is established by utilizing a limited set (two years) data (2012–2013) and forecasted over (target) 2014 period for 21 stations where 14 are used in model development and the remainder in cross-validation phase. In next section, a theoretical overview of LST and predictive models are presented, in Section 30.0 the materials and method are presented, while in Section 40.0, the results are presented. This follows Section 50.0 with discussion and opportunities for research and Section 60.0 concludes the findings of this paper. 2. Theoretical framework In this section an overview of the approach for extracting the primary predictor (LST), the predictive model (ANN) and its compara- tive counterparts (MLR and ARIMA) are presented. Considering a set of predictor (input) variables (e.g. original and/or time-lagged LST, month, altitude, latitude and longitude) that are related to the objective variable (global solar radiation, G), the datum points can be written as a set of X Y x y( , ) = {( , )}i i ki where the set has between k (=6, 7… 10) predictor variables and the objective variable, G ≡ yi (t=1… N), the inputs for ANN and MLR models are in form of an N × k matrix whereas the objective variable (G) is in form of the N ×1 matrix. Note that xi (t) is the ith (i=1… N) row of the input at time t acquired from the MODIS Terra satellite sensor, as described in Section 2.1. 2.1. Satellite-derived land-surface temperature To integrate an ANN model with satellite-derived predictor data, the land surface temperature (LST) was extracted. LST can be conveniently deduced with remote sensing tools [43,49,60,72–74] that can investigate the radiative properties of the earth's surface from an atmospheric window without establishing a physical connection with the investigated objects [75]. In order to remotely sense the LST data, a number of satellite-derived sources are available, although their scanning regions can vary from very small time interval to a range of electromagnetic spectrum. Satellites that acquire remotely sensed data include: Geosynchronous Meteorological, Meteosat, Insat, Goes, National Oceanic and Atmospheric Administration (NOAA) series, Advanced Spaceborne Thermal Emission Reflection Radiometer, Fengyung-1C and D and Metop satellites [76]. Among these, the NOAA series satellites are able to examine properties on the earth such as wind regime, cloudiness, humidity and sea surface tempera- ture, fog or frost, glacier areas, rainfall, vegetation index, pressure and ozone concentration. Also, a number of algorithms have been devel- oped that can utilize the Advanced Very High Resolution Radiometer (AVHRR) sensor from NOAA satellite series to extract the remotely sensed data [76–80]. In this paper, the LST data were extracted from the Moderate Resolution Imaging Spectroradiometer (MODIS) radiometer from the NASA-built satellite. Note that the MODIS satellite has two primary sensors: Terra and Aqua [81–83]. In general, MODIS operates within the visible light and infrared spectrum and exhibits about 36 spectral band-widths that span between 0.4 and 14.4 µm wavelengths. However, the horizontal resolution of the MODIS sensor changes according to the location where the data is to be acquired and it is able to observe the earth's surface globally every 1–2 days. Due to its wide spatial coverage, the sensor can observe and measure meteor- ological variables such as cloud cover, energy budget, solar radiation, aerosol optical depth, chlorophyll density and ocean, land and atmo- spheric processes by calculating the radiation reflected from land and the atmospheric particles [81]. In this paper, we have extracted the Terra sensor's 31st and 32nd channel data, with a spectral band of 10.78–11.28 and 11.77– 12.27 µm, respectively [84,85]. It is imperative to note that the most important factor generating an error in the extraction of the LST, the atmospheric effect, was minimized by the use of the well-known split- window formulation [86] that has been used previously in solar modelling [72]. To reach an accurate estimation of the LST data, an error value lower than 1 K was set as a threshold for any calculation of LST between –10 and −50 K. This was realized by using Wan and Dozier [86] algorithm, that considered that the values of the emissivity were known. When the LST value was checked for its validity, less than 1 K error was evident in the region of the homogenous land surface terrains [81,82,87]. Wan and Dozier [86] algorithm is written as follows: ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥LST A A ε ε A Δε ε T T B B ε ε B Δε ε T T C = + 1 − + + 2 + + 1 − + [ − ] + 1 2 3 2 31 32 1 2 3 2 31 32 (1) Note that the terms T31 and T32 represent the brightness temperature obtained from the MODIS Terra channel (31 and 32), respectively, and Ai, Bi, and C are constants depending on the point of view between 0– 65°. In Eq. (1), the constant terms are defined by the air-ground temperature and amount of water vapor acquired by the MODIS simulation data based on regression analysis and ε is the emissivity while the term Δε is the difference of the emissivity value. The mathematical equivalence of these statements are: ε ε ε= 0, 5( + )31 32 (2) Δε ε ε= −31 32 (3) where ε31 and ε32 are the emissivity of the considered channels [85,86]. It is important to note that the accuracy of the LST data (in comparison with measured or ground-based temperature) is determined by simu- lation data where emissivity, temperature, atmospheric temperature and water vapor must represent for wider spaces than the real physical space. To reduce the uncertainty and enhance the reliability of the LST data, simulated data must include various conditions that affect its final value [81,88]. 2.2. Artificial neural network In this paper an ANN model coupled with satellite-derived (x)=LST R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 830
  • 4. data as a primary predictor (with site-specific parameters defined by altitude, longitude, latitude and solar periodicity (month) as supple- mentary predictors) was adopted for forecasting the global solar radiation (G). The aim of the ANN model was to extract patterns (predictive features) contained in x time-series in order to forecast the objective variable, G. Fig. 1 outlines a schematic view of the model. An ANN model is a non-linear modelling technique with a network architecture that mimics the biological structure of our nervous system [44]. It has interconnected inputs that are related to the G and is able to transmit information through weighted connections (i.e. functional neurons) to map nonlinearly the predictor data features to a high- dimensional hyper-plane. This study employed a popular ANN algorithm: Feed-Forward Back-Propagation (FFBP) that contained multilayer perceptron neu- rons and have been applied in problems of forecasting solar radiation (e.g., [49,72]). The FFBP is superior to the other category of ANN models [89–93] where the neuronal architecture is designed to successively validate the model's parameters (i.e., weighted connec- tions and neuron biases) to drive the empirical error to a set tolerance through each iteration (epoch) of the forward passing of the updated parameters and backward propagation of errors to fine tune them. Mathematically, the ANN algorithm can be written as [50,93,94]: ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟∑y x F w t x t b( ) = ( ). ( ) + i L i i =1 (4) where xi(t) = predictor (input) variable(s) in discrete time space t, y (x) = forecasted G in cross-validation (test) data set, L = hidden neurons determined iteratively, wj (t) = weight that connects the ith neuron in the input layer, b = neuronal bias and F(.) is the hidden transfer function. As an ANN model is a black-box and does not identify the training algorithm in an explicit manner without an iterative model identifica- tion process, this study has trialed several algorithms [93] whose performances were assessed to select the superior model. MATLAB- based algorithms used in this paper are classified in three categories: the quasi-Newton [95] (that utilizes trainlm and trainbfg functions), the gradient descent (traingdx) [96] and the conjugate Gradient [97,98] (trainscg, traincgf and traincgp). Quasi-Newton method is based on the Levenberg-Marquardt (LM) and the Broyden-Fletcher- Goldfarb-Shanno (BFGS) algorithms [99,100] that minimize the mean square error whereas the LM algorithm locates the minimum of an input data that is expressed as the sum of squares of non-linear real- valued functions [101]. Both exhibit a memory overhead issue due to the gradient and Hessian matrix that needs to be calculated [102] and this is especially a disadvantage for very large networks [103]. BFGS uses the Newton's method based on a hill-climbing optimisation approach that seeks a stationary point of (twice continuously differ- entiable) function. This has good performance for non-smooth optimi- zations [104] where the Hessian matrix is not evaluated directly, but instead it is approximated by rank-one updates specified by gradient evaluation. The Scaled conjugate updates weights/biases based on conjugate directions without performing line search [105] while the gradient backpropagation with Powell-Beale restart [106] and the Fletcher-Reeves update [96] is able to train the network as long as its weight, input and transfer functions have derivatives. Resilient backpropagation [107] is generally fast, requires modest memory and does not store the updated values of weight/bias, while the gradient descent with momentum and adaptive learning rate (traingdx) [107] combines adaptive learning with momentum training where momen- tum coefficient is included as a training parameter [103]. Others use one-step secant (trainoss) [108] and Bayesian regulation (trainbr) function. One-step secant faces a smaller computation overhead, without storing the Hessian matrix but rather assumes it at each iteration as a compromise between the quasi-Newton and gradient algorithm [108] while Bayesian regularization (trainbr) [109,110] uses the Jacobian matrix to update weight/biases to attain the best general- ization of the input/target dataset. Considering the black-box nature of an ANN model, a primal task for modelers is to determine the appropriate transfer function as this is not a known a priori. A series equations, F(.) available in MATLAB toolbox can be trialed [111]: F x x Tangent Sigmoid ⇒ ( ) = 2 1 + exp(−2 ) − 1 (5.1) F x x Log Sigmoid ⇒ ( ) = 1 1 + exp(− ) (5.2) F x x x Soft Max ⇒ ( ) = exp ( ) sum (exp ( )) (5.3) F x i xHard − Limit ⇒ ( ) = 1, f > 0 (5.4) F x x if xPositive Linear ⇒ ( ) = ≥ 0, or 0 otherwise (5.5) F x x xTriangular Basis ⇒ ( ) = 1 − abs ( ), if − 1 ≤ ≤ 1, or 0 otherwise (5.6) F x xRadial Basis ⇒ ( ) = exp(− )2 (5.7) F x i xSymmetric Hardlimit ⇒ ( ) = 1, f ≥ 0 (5.8) F x i x F x x i x F x i x Saturating Linear ⇒ ( ) = 0, f ≤ 0, ⇒ ( ) = , f 0 < ≤ 1 ⇒ ( ) = 1, f ≥ 1 (5.9) F x i x F x x i x F x i x Symmetric Saturating Linear ⇒ ( ) = −1, f ≤ −1, ⇒ ( ) = , f − 1 < ≤ 1, ⇒ ( ) = 1, f > 1 (5.10) where x is the predictor dataset analysed in accordance with the function F(x) that is able to map the predictive features to create a hidden layer weight for the suitable model. 2.3. Multiple linear regression To evaluate the veracity of the ANN model, multiple linear regression (MLR), a statistical technique that examines the cause and effect relationship between objective (y ≡ G) and predictor variables (x), was employed. MLR is an extension of the simple regression model to the case of multiple predictors where the goal is to deduce a model to be able to explain as much as possible the variations in the predictor dataset to determine their corresponding regression coefficients. MLR ensures that the predictive model leaves as little variations as possible due to the unexplained "noise" in the predictor data. For N observa- tions for k predictor variables, an MLR model takes has a regression Fig. 1. ANN architecture adopted for forecasted monthly and seasonal global incident solar radiation (G). R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 831
  • 5. equation of the form [112,113]: Y C β X β X β X= + + + ... + k k1 1 2 2 (6) where Y (N ×1) is a matrix of objective variable (G), X (N × k) is a vector of predictor variable(s), C is the y-intercept and β is the multiple regression coefficient for each regressor variable(s) [49,114]. Note that the magnitude of β for each predictor variable is estimated through least squares (e.g., [115,116]). For forecasting purposes, the multiple linear equation is fitted to a model with a set of Y and X matrix in the data training period. Next, the fitted MLR model, by virtue of its coefficients and the y-intercept, are used to generate the forecasts of Y values with an additional set of X values in cross-validation (or testing) period. For more details on MLR modelling process, readers can refer to the work of Draper and Smith [113] and Montgomery and Peck [112]. 2.4. Auto-regressive Moving Integrated Average Model This study has also adopted the Auto Regressive Integrated Moving Average (ARIMA) model that operated via a set of (univariate) predictor data partitioned into an input/target subset to validate the ANN model. As an ARIMA model uses its own time-lagged information and the respective model errors, it has an ability to identify the complex patterns in the original G data, and therefore can be advantageous when multiple predictor data in other models (e.g., ANN or MLR) are not available [117]. An ARIMA modelling process is governed by parameters (p, q, d) with p as the number of autoregressive terms, d as the number of non- seasonal differences and q as the number of lagged errors. The steps in developing an ARIMA model involve model identification, estimation and forecasting. An ARIMA (p, q, d) process is thus defined as [117]: Ψ B B Y δ θ B( ) (1 − ) = + ( )p d t q εt (7) where Yt is the original predictor dataset, εt is the random perturbation (white noise) with a zero mean and covariance and constant variance), B is the backshift operator, δ is the constant value, ψp is the autoregressive parameter of order p, θq is the moving average para- meter of order q and d is the differencing order used for the regular or non-seasonal part of the series. In model identification, differencing parameter (d) should be finalized by autocorrelation and partial autocorrelation that check their ‘tailing off’ trend to confirm whether a differencing is necessary in case of non-stationary dataset. p and q terms are identified for ‘trial’ models by analyzing maximum likelihood estimation (MLE) that determines the parameters that maximize the probability of obtaining data using least squares. The maximum log likelihood, that is, the logarithm of the probability of observed data coming from estimated model is used when finding parameter estimates. Akaike's Information Criterion (AIC), is used to establish ARIMA model, considering the magnitude of MLE and the variance and correlation coefficients collectively assessed in training data viz [118]: AIC L p q k= −2 log( ) + 2( + + + 1) (8) where L is the log likelihood of data, k=1 if c ‡0 and k=0 if c =0. The last term in brackets is the number of parameters (including the variance of the residual). 3. Materials and methods 3.1. Study region and climate data The study sites are located in the ‘sunshine State’ of Queensland, Australia's second largest state covering 1.9 million km2 land and is home to 4.5 million citizens. In this state, there is an urgent need for mapping the sustainability of solar energy projects [67,119]. A state level effort shows that solar energy is an integral focus of sustainable energy projects through Queensland Office of Clean Energy (OEC). Solar Bonus Scheme, for example, contributed to 30,000 customers installing rooftop photovoltaic system. Several initiatives require quantitative assessment of solar radiation (e.g., package valued at A $115 million to fund Virtual Solar Power Station of 500 MW in towns and communities, the A$60 million Solar Hot Water Rebate Scheme worth A$600–1000 per solar hot water installations, the A$5.8 million photovoltaic systems for 420 kindergartens, the A$9.9 million Solar Sport and Communities initiative to support sports clubs and commu- nity organizations and the A$35.4 million for Kogan Creek Solar Thermal systems installation). The A$60 million in Solar and Energy Efficiency in State Schools program and the State's provision of A$5 million for Federal Government's Solar Cities Program are noteworthy investments in (non-metropolitan) regions [119]. Thus, the develop- ment of a solar model especially in remote Queensland, is a justified research endeavour. Fig. 2 shows the seven groups of training and cross-validation study sites (named Group 1; Group 2; Group 3; Group 4; Group 5; Group 6 and Group 7) that utilised data for forecasting the monthly and seasonal global solar radiation (G). As outlined in Section 2.1 the primary predictor dataset for these stations were acquired from the MODIS Terra sensor of the NASA-satellites while the cross-validation (test) data were acquired from the Scientific Information for Land Owners (SILO) archives [120]. It is noteworthy that for data quality assurance purposes, the Australian Bureau of Meteorology (BOM) that has sourced the SILO data, continually assesses the reliability of its data networks to ensure an accurate, effective and cost-efficient mechanism is in operation [121]. In principle, the SILO database was developed by Queensland Department of Environment and Resource Management from observational records. The missing values were interpolated in accordance with statistical techniques [122–124]. Table 1 lists the (two) training and (one) cross-validation sites allocated to each of the seven groups, constructed from 21 sites, spread in regional Queensland (Fig. 2). The LST data were acquired for the period 2012–2014, which were partitioned into the first two years (training) and the remainder one year for cross-validation (testing) purpose. In terms of the agreement between the LST and G values for training and cross-validation, the percentage difference was ≈0.41 (lowest) to 1.27% (highest) and ≈0.15 (lowest) to 5.41% (highest), respectively, (Table 1) when individual groups were analysed. The 140o E 144o E 148o E 152o E o S o S o S o S 28 24 20 16 12o S Longitude edutitaL G1 G2 G3 G6 G4 G7 G5 Fig. 2. The seven groups of training and cross-validation study sites (Group 1; Group 2; Group 3; Group 4; Group 5; Group 6 and Group 7) for forecasted monthly and seasonal global solar radiation (G). Each group has 3 nearby site-specific data with data for 2 study sites are used for model development (2012–2013 data) and data for 1 study site (2014) are used for cross-validation. R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 832
  • 6. difference between the minimum, maximum, mean and standard deviation of the training (LSTT) and cross-validation (LSTCV) dataset for all seven group of stations was ≈0.68%, 0.97%, 6.79% and 4.96%, respectively (Table 2). Fig. 3(a) plots a monthly cycle of the LST data averaged for the 14 out of 21 study sites used in model development and the G at ground level for the seven study sites used for the model cross-validation. Fig. 3(b) shows a scatterplot of LST versus G. It is noticeable that the monthly pattern in the LST data follows closely the trends observed in respective G values where a correlation coefficient of r2 =0.8373 is evident. However, when only the LST data for cross-validation study sites are compared (Fig. 4a), r2 =0.787 is obtained whereas the correlation of the G and LST data for cross-validation study sites yield an r2 value of ≈0.836 (Fig. 4b). The statistics confirmed that there is a high degree of statistical agreement between satellite-derived and ground-based LST for training and cross-validation study sites where ≈88.7% of statistical variance observed in G data is explained by the variance in satellite-derived (land-surface) LST. Although the compar- ison is grounded on linear assumptions, it does provide first order justification on the suitability of LST to be used as a predictor variable for forecasting solar radiation. 3.2. Predictive model development To develop a robust ANN-based forecasting model, a cardinal task was to optimise the architecture of the model to utilize the cause-and- effect relationships between the inputs and objective variable [91]. Unlike previous works (e.g., [7,15,38,49]) where a predictive model was developed using the data partitioned into the input/target subsets for a site where the forecasting was also performed, in this paper, the ANN and MLR models were trained from all data pooled in a single matrix (i.e. for all 14 study sites over 2012–2013) but the final model was applied to simulate the G values independently of this set (over 2014) for seven cross-validation (test) sites located in close proximity to the two training sites in each cross-validation group (see Table 1). It is noteworthy that the use of only two years of predictor data allowed an evaluation of the parsimonious nature of the models to validate whether they accomplished a desired accuracy level. Application of a globally trained data for model development rather than individually- trained models for each site can be practically useful in forecasting G at a site where a satellite footprint may not be available but the predictive features can be identified from nearby site(s) of similar climate (Table 1). As a noteworthy point, the use of pooled data to achieve a universal model rather than a series of models per site, also ensured a robust model for spatial application of the approach. The scaling of inputs was also performed to avoid numeric issues caused by data attributes or large fluctuations [125], as a normalization between 0 and 1: x x x x x = ( − ) ( − ) normalized min max min (9) where x= any datum (input or output), xmin= minimum value of the entire dataset, xmax = maximum value of the entire dataset, and xnormalized = normalized value of the datum point. Table 1 Geographical characteristics of study sites with monthly mean satellite-derived land-surface temperature (LST) and solar radiation (G) for training (2012–2013) and cross-validation (2014). Distance between training/cross-validation sites in each group is indicated. Group Training & cross-validation Sites Location Altitude (m) LST (K) G (MJ m−2 day−1 ) Training Cross-Validation Training Cross-Validation 2012–2013 2014 2012–2013 2014 1 Gayndah Flume TM ID 39323 [1.3 km] 151.60°E, 25.62°S 85.0 302.43 19.50 Gayndah P Office; ID 39039 [0.5 km] 151.61°E, 25.63°S 107.5 Cross-Validation: Gayndah ID 39191 151.61°E, 25.63°S 85.0 306.25 19.33 2 Emerald Radar AL; 35146 [6.9 km] 148.24°E, 23. 55°S 188.0 308.70 20.60 Emerald Radar; 35277 [6.9 km] 148.24°E, 23.55°S 187.0 Cross-Validation: Emerald Airport ID 35264 148.18°E, 23.57°S 189.4 311.80 20.57 3 Tambo P Office; ID 35069 [0.22 km] 146.26°E, 24.88°S 395.4 308.60 21.02 Tambo Station; ID 35072 [2.2 km] 146.28°E, 24.89°S 400.0 Cross-Validation: Tambo ID 35284 146.26°E, 24.88°S, 387.0 311.74 20.75 4 Blackall Airport; ID 36034 [3.1 km] 145.43°E, 24.43°S 291.0 309.74 21.40 Blackall Township; ID 36143 [1.1 km] 145.47°E, 24.42°S, 284.0 Cross-Validation: Blackall ID 36155 145.46°E, 24.43°S 276.0 311.59 21.44 5 27 Mile Garden; ID 44193 [2.6 km] 146.42°E, 26.08°S 223.6 306.86 20.95 27 Mile Garden; ID 44201 [34.1 km] 146.42°E, 26.08°S 232.6 Cross-Validation: Barradeen ID 44204 146.42°E, 26.06°S 339.0 308.12 20.33 6 Charleville Aero; ID 44021 [2.199 km] 146.26°E, 26.41°S 301.6 307.05 20.92 Charleville; ID 44156 [0 km] 146.24°E, 26.40°S 287.8 Cross-Validation: Charleville TM ID 44205 146.24°E, 26.40°S 288.0 307.37 20.15 7 St George; ID 043053 [2.263 km] 148.56°E, 28.05°S 200.0 306.70 20.51 St George TM; ID 043104 [1.545 km] 148.57°E, 28.07°S 186.0 Cross-Validation: St George Airport 043109 148.59°E, 28.05°S 198.5 304.22 19.46 Table 2 Descriptive statistics of monthly predictor (LST) and objective (solar radiation, G) variable averaged for all 7 groups. Statistical property LSTT LSTcv Gcv Minimum 291.81 K 293.82 K 12.90 W m−2 day−1 Maximum 322.32 K 325.47 K 28.00 W m−2 day- Meana 0.503 0.471 0.486 Standard Deviation 0.275 0.262 0.285 Skewness −0.167 −0.076 0.020 Flatness −0.982 −0.965 −1.260 a denotes normalized property between [0,1] to allow comparisons between LST and Gcv. LSTT: land surface temperature for training, LSTcv – land surface temperature for cross-validation, and Gcv – global solar radiation for cross-validation. R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 833
  • 7. All predictive models developed in MATLAB software using ‘pooled’ LST data (with latitude, longitude, altitude and month) for these stations (i.e., Gayndah Flume TM, Gayndah Post Office, Emerald Radar Al, Emerald Radar, Tambo Post Office, Tambo Station, Blackall Airport, Blackall Township, 27 Mile Garden, 27 Mile Garden TM, Charleville Aero, Charleville, St George, St George TM). The models were cross-validated at the test sites (i.e., Gayndah, Emerald Airport, Tambo, Blackall, Barradeen, Charleville TM, St George Airport). For the MLR model development, a similar process was adopted to acquire the regression coefficients and y-intercepts for the month, altitude, latitude, longitude and LST data. Then the model was applied at seven cross-validation sites (i.e., Gayndah, Emerald Airport, Tambo, Blackall, Barradeen, Charleville TM, St George Airport), resulting in a single predictive model trained using all forecasting algorithms (Table 1). To determine the inputs for the ANN/MLR models, cross correla- tion between the LST and G data for all 14 training stations were examined (Fig. 5). Specifically, the cross correlations measured the statistical similarity between inputs (x) and shifted (lagged) copy of output (G) as a function of the particular lag. For any discrete signal, the correlation between xi =(x1, x2… xM – 1) and y=(y1, y2… yN – 1) is: ∑ϕ x k M N= , = −( + 1), ... ,0, ... ,( − 1)xy k j k M k N j k, = max(0, ) min( −1+ , −1) − (10) where cross correlation coefficient, rcross is: Fig. 3. (a) Monthly cycle of main predictor (satellite-derived land-surface temperature, LST) averaged for the group of 14 study sites used in model development phase (right axis) and the measured G at the ground level for 7 cross-validation sites (left axis). (b) Scatterplot of LST versus G. Fig. 4. Comparison of the normalized satellite-derived LST with ground-based G in training (T) and cross-validation (CV) sets. (a) LSTcv versus LSTT. (b) Gcv versus LSTCV. Note: Training data for stations in each group was averaged over monthly period and then plotted with respective cross-validation data and data were normalized within [0,1]. Fig. 5. Cross-correlation coefficient (rcross) between G and predictor variable (LST) in training period (2012–2012) for 7 group of stations. (a) Monthly, (b) Seasonal. 95% confidence interval for rcross is indicated in blue and only positive rcross values are plotted. Note: Green circles indicate statistically significant lags.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 834
  • 8. r t ϕ t ϕ ϕ ( ) = ( ) (0) (0) cross xy xx yy (11) r t( )cross is bounded by [−1, 1] with 1 indicating that both time-series have an exact shape (although amplitudes can vary) while −1 indicates that both time-series have the same shape but possess opposite signs. The N and M are the lengths of the predictor and predictand data, respectively. When the value of r t( )cross =0, both signals are uncorre- lated but if r t( )cross ≥0.70, a good match between the two signals can be identified. Evidently, the LST versus the G data at zero, 1 and 2 lags were statistically significant at 95% confidence interval (Fig. 5a) for the monthly, and the LST versus the G data at zero lag were statistically significant for the seasonal forecasting horizons (Fig. 5b). Subsequently, for the monthly forecasting horizon, the ANN model was developed using the LST (t), month, latitude, longitude and altitude with lagged combination, LST(t – 1), LST(t – 2), LST(t – 3) and LST(t – 4). It is noteworthy that, while a maximum lag of 2 was significant, an additional predictor data with two lags (LST(t – 3) and LST(t – 4)) were also considered as the predictor variables to check if an improvement in the forecasting accuracy was attained (Table 3a–b). For seasonal forecasting, both models were developed using the LST (t) data and the respective supplementary variables (Table 3c). A total of 45 ANN models with the LST (t) and seven models with the LST (t – n) predictors (where n varied from 1 to 4 for different lags) for monthly (Table 3a–b) and nine models for seasonal forecasting were con- structed. All models were optimized by alternately testing the training algorithm and hidden transfer function in different combinations, as detailed in Table 3(a–b), which are also stated in Eqs. (5.1–5.10). The ANN model had a three-layer network designed with input (where predictors were fed), learning (where transfer function was applied to extract features to formulate a model with lowest mean square error) and the output space (with forecasted G) (Table 3). The number of input neurons (x) was 8 (denoted as x1, x2, x3 …, x8) for the monthly and 5 for the seasonal forecasting horizons where 1 neuron was assigned for the month (for solar periodicity), between 5 and 8 neurons for the LST and the remainder 3 neurons for the latitude, longitude and altitude as the input variables. In data-driven models, there is not set rule to attest the best transfer function, as the predictive features are not known apriori. Following an iterative modelling Table 3 Development of ANN model. (a) Monthly forecasting with LST at zero lag (n=0) Trial model Algorithm Transfer function Model architecture Correlation coefficient Root mean square error (Input- Hidden- Output) (r) RMSE (MJ m−2 day−1 ) T1 trainlm logsig 5–18-1 0.946 1.552 T2 trainlm logsig 5–24-1 0.942 1.530 T3 trainlm logsig 5–46–1 0.932 1.709 T4 trainlm tansig 5–8–1 0.947 1.583 T5 trainlm tansig 5–26–1 0.948 1.611 T6 trainlm tansig 5–28–1 0.948 1.575 T7 trainlm tansig 5–38–1 0.945 1.538 T8 trainlm tansig 5–50–1 0.940 1.634 T9 trainbfg tansig 5–2–1 0.953 1.584 T10 trainbfg tansig 5–8–1 0.956 1.539 T11 trainbfg tansig 5–30–1 0.956 1.640 T12 trainbfg tansig 5–48–1 0.954 1.641 T13 trainbfg logsig 5–12–1 0.950 1.634 T14 trainbfg logsig 5–46–1 0.952 1.634 T15 trainrp logsig 5–42–1 0.831 2.503 T16 trainrp tansig 5–28–1 0.778 2.694 T17 trainscg tansig 5–2–1 0.955 1.648 T18 trainscg tansig 5–18–1 0.940 1.674 T19 trainscg tansig 5–30–1 0.949 1.601 T20 trainscg logsig 5–14–1 0.955 1.674 T21 trainscg logsig 5–48–1 0.943 1.795 T22 traincgb logsig 5–6–1 0.946 1.735 T23 traincgb logsig 5–38–1 0.944 1.673 T24 traincgb tansig 5–4–1 0.951 1.623 T25 traincgb tansig 5–40–1 0.947 1.695 T26 traincgf tansig 5–6–1 0.952 1.555 T27 traincgf tansig 5–28–1 0.954 1.603 T28 traincgf logsig 5–18–1 0.944 1.653 T29 traincgf logsig 5–42–1 0.939 1.768 T30 traincgp logsig 5–2–1 0.954 1.508 T31 traincgp logsig 5–32–1 0.942 1.737 T32 traincgp tansig 5–6–1 0.950 1.615 T33 traincgp tansig 5–28–1 0.946 1.674 T34 traincgp tansig 5–42–1 0.936 1.679 T35 trainoss tansig 5–14–1 0.953 1.658 T36 trainoss tansig 5–18–1 0.946 1.634 T37 trainoss tansig 5–46–1 0.951 1.603 T38 trainoss logsig 5–4–1 0.944 1.682 T39 trainoss logsig 5–26–1 0.946 1.673 T40 traingdx logsig 5–12–1 0.931 1.860 T41 traingdx logsig 5–36–1 0.926 1.769 T42 traingdx logsig 5–481- 0.948 1.699 T43 traingdx tansig 5–41- 0.947 1.741 T44 traingdx tansig 5–61- 0.941 1.688 T45 traingdx tansig 5–50–1 0.920 1.856 (b) Monthly forecasting with n-lagged LST combinations (n =1, 2, 3, 4). Note that the '4' in (n + 4) represents the mandatory inputs defined by the altitude, latitude, longitude and the month (Fig. 1). Trial model Algorithm Transfer function Model architecture r RMSE T1 trainlm tansig (n+4)–2–1 0.931 1.683 T2 trainlm tansig (n+4)–6–1 0.946 1.589 T3 trainlm logsig (n+4)–14–1 0.941 1.577 T4 trainscg tansig (n+4)–8– 1 0.942 1.608 T5 traincgb logsig (n+4)–10–1 0.924 1.879 T6 trainoss tansig (n+4)–16–1 0.933 1.753 T7 trainoss tansig (n+4)–18–1 0.920 2.000 (c) Seasonal forecasting with LST at zero lag (n=0). Trial Algorithm Transfer Model r RMSE Table 3 (continued) (c) Seasonal forecasting with LST at zero lag (n=0). Trial model Algorithm Transfer function Model architecture r RMSE model function architecture T1 trainlm tansig 5–2–1 0.985 1.191 T2 trainlm tansig 5–4–1 0.969 1.174 T3 trainlm tansig 5–10–1 0.981 1.040 T4 trainlm tansig 5–26–1 0.970 1.189 T5 trainlm tansig 5–38–1 0.969 1.289 T6 trainlm logsig 5–4–1 0.973 1.132 T7 trainlm logsig 5–8–1 0.969 1.162 T8 trainlm logsig 5–10-–1 0.975 1.037 T9 trainlm logsig 5–42–1 0.964 1.166 Note: Optimum model is boldfaced (red); the lags data applied for monthly forecasting are statistically significant based on cross correlation coefficients between the predictor (LST) and objective variable (G) while the periodicity (month), latitude, longitude and altitude are also used as supplementary predictors in each model. [trainbfg = BFGS quasi-Newton, trainrp = resilient, trainscg = scaled conjugate gradient, trainlm = Levenberg–Marquardt, traincgb = conjugate gradient BP with Powell-Beale restarts, traincgf = conjugate gradient BP with Fletcher-Reeves update, trainoss = one-step secant, trainbr = Bayesian regulation, traingdx = gradient descent with momentum and adaptive learning are the machine learning algorithms with tansig = tangent sigmoid and logsig = logarithmic sigmoid are the hidden transfer functions]. R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 835
  • 9. process, we trialed several transfer functions (Eqs. (5.1–5.10)) and learning algorithm to optimise the ANN model's accuracy (Table 3a-c). In each trial, the number of neurons in hidden layer was increased at an interval of one, and a subsequent adjustment in model was considered. The nearly optimal neurons with relevant transfer function, determined by mean square error, which was intrinsically disparate for each combination, was adopted. As an additional measure, correlation coefficient (r) of trained models was noted to verify the model. In accordance with lowest RMSE (≈1.577 MJ m−2 day−1 ) and high r value (≈0.941), Model T3 with n-lagged combinations of the LST data executed with the Levenberg–Marquardt (LM) algorithm, logarithmic sigmoid (Eq. (5.2)) and linear output function with architecture of n−14-1 was adopted for monthly forecasting horizons. For seasonal forecasting horizon, the LM training algorithm, logarithmic sigmoid and linear functions with an architecture of 1–10-1 was established. For an MLR model, regression coefficients and the y-intercept for the LST, month, latitude, longitude and altitude were also determined, which are detailed in Table 4. An ARIMA model was developed using the ‘R’ software with its architecture established by an iterative modelling process [91,117] detailed in Section 2.4. Table 5 displays the ARIMA model's architec- ture and the respective goodness-of-fit tests performed to construct the forecasting model. The ground-based G data for two study sites in each of the seven group over 2012–2013 were used in the model develop- ment. The first step was to examine the stationarity of G via autocorrelation (ACF) and partial autocorrelation (PACF) function. Evidently, the monthly solar radiation for all seven groups was non- stationary. The ARIMA model required an input data to have a constant mean, variance and autocorrelation through time [91,117], so a differencing of the data was applied to make it stationary, which was confirmed by running the autoarima() function in R-software to yield the lowest standard deviation/AIC and the highest likelihood (Eq. (8)). By contrast, the seasonal data, which was already stationary, did not require any differencing. The number of non-seasonal differences (d) was set to 2 (for monthly) and 0 (seasonal). Following this, an estimation of autoregressive (p) and moving average (q) terms for each cross-validation site was performed by trailing a set of feasible values to satisfy the training performance criteria according to the log like- lihood/AIC in conjunction with lowest variance and highest correlation coefficient. The model was cross-validated by generating the monthly (12-values) and seasonal (4-values) forecasts. This required the trained model to be applied to 2014 ‘test’ data using ‘predict ()’ function. It is noteworthy that the training performance of ARIMA was unique for different sites based on goodness-of-fit and the statistical test para- meters. 3.3. Model performance criteria Following a review of the model evaluation metrics by American Society for Civil Engineers (ASCE) [126], two category of model evaluation measures have surfaced: visual and descriptive statistics (i.e., observed and forecasted data to cross-check the difference in minimum, maximum, mean, variance, standard deviation skewness, kurtosis) and the standardized metrics applied to the forecasts that are validated with observed data. To establish whether the data-driven models were qualified for forecasting global solar radiation, a range of statistical error criterion was adopted. They were based on root mean square error (RMSE), mean absolute error (MAE), correlation coeffi- cient (r), Willmott's Index (WI) and relative (%) error values based on the RMSE and MAE, with their mathematical equations written as follows [15,127,128]: I. Correlation coefficient (r) expressed as ⎛ ⎝ ⎜ ⎜⎜ ⎞ ⎠ ⎟ ⎟⎟ r G G G G G G G G = ∑ ( − )( − ) ∑ ( − ) ∑ ( − ) i N OBS i OBS i FOR i FOR i i N OBS i OBS i i N FOR i FOR i =1 , , , , =1 , , 2 =1 , , 2 (12) II. Root mean square error (RMSE) expressed as ∑RMSE N G G= 1 ( − ) i N FOR i OBS i =1 , , 2 (13) III. Mean absolute error (MAE) expressed as ∑MAE N G G= 1 ( − ) i N FOR i OBS i =1 , , (14) IV. Willmott's Index (WI) expressed as ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ WI G G G G G G d = 1 − ∑ ( − ) ∑ ( − + − ) , 0 ≤ ≤ 1 i N FOR i OBS i i N FOR i OBS i OBS i OBS i =1 , , 2 =1 , , , , 2 (15) V. Relative root mean square error (RRMSE¸%) expressed as Table 4 Development of MLR model for monthly and seasonal forecasting. For LST as predictor, lag (n) varies from 0 to 4. Model Predictor Variable (input), x Monthly Forecasting Seasonal Forecasting Name Coefficient Magnitude Magnitude M1 Periodicity (Month) β1 0.002 −0.191 Altitude β2 0.003 0.003 Latitude β3 0.332 0.326 Longitude β4 0.220 0.205 Y-Intercept C −198.2 −194.0 LST (t) β5 0.532 0.524 M2 Month β1 −0.041 Altitude β2 0.003 Latitude β3 0.331 Longitude β4 0.220 Y-Intercept C −196.9 LST (t) β5 0.572 LST(t – 1) β6 −0.042 M3 Month β1 −0.024 Altitude β2 0.003 Latitude β3 0.307 Longitude β4 0.228 Y-Intercept C −195.6 LST (t) β5 0.585 LST(t – 1) β6 −0.074 LST(t – 2) β7 0.024 M4 Month β1 −0.102 Altitude β2 0.003 Latitude β3 0.245 Longitude β4 0.218 Y-Intercept C −171.4 LST (t) β5 0.584 LST(t – 1) β6 −0.121 LST(t – 2) β7 0.136 LST(t – 3) β8 −0.110 M5 Month β1 −0.257 Altitude β2 0.002 Latitude β3 0.054 Longitude β4 0.187 Y-Intercept C −95.69 LST (t) β5 0.562 LST(t – 1) β6 −0.152 LST(t – 2) β7 0.070 LST(t – 3) β8 0.107 LST(t – 4) β9 −0.245 R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 836
  • 10. RRMSE G G G = ∑ ( − ) ∑ ( ) × 100 N i N FOR i OBS i N i N OBS i 1 =1 , , 2 1 =1 , (16) VI. Mean absolute percentage error (MAPE; %), expressed as ∑MAPE N G G G = 1 ( − ) × 100 i N FOR i OBS i OBS i=1 , , , (17) where GOBS and GFOR are the observed and forecasted ith value of G, GOBS and GFOR are the observed and forecasted mean G in cross- validation (test) set and N is the number of datum points in the test set. In terms of physical reasoning for the performance metrics, it is deducible that the correlation coefficient, bounded by [0,1] where 0= relatively poor to 1.0= perfect model, describes the proportion of variance in observed solar radiation that is explained by the ANN, MLR and ARIMA models [128]. The equation for r, however, is based on the consideration of linear relationship between GOBS and GFOR and there- fore, is limited in it capacity to provide a robust assessment since it standardizes the observed and forecasted means and variances. However, the RMSE and MAE are able to provide better information about the predictive skill whereby RMSE measures the goodness-of-fit relevant to high values whereas the MAE is not weighted towards high(er) magnitude or low(er) magnitude events, but instead evaluates all deviations from observed, in both an equal manner and regardless of sign. It important to note, that while the RMSE can assess the model with higher level of skill compared to correlation coefficient, this metric is computed on squared differences. Thus, performance assessment is biased in favour of the peaks and high(er) magnitude events, that will in most cases exhibit the greatest error, and be insensitive to low(er) magnitude sequences [128]. Consequently, the RMSE can be more sensitive than other performance metrics due to occasional large errors as the squaring process can yield disproportionate weights to the very large errors [129]. To overcome this issue, Willmott's Index (WI) [130] was computed by considering the ratio of the mean square error instead of the square of the differences [131], providing an advantage over the r, RMSE and MAE values. Considering the geographic differences between the present study sites (Table 1; Fig. 2), which in fact, can lead to differences in the distribution of solar radiation data, the relative root mean square error (RRMSE) was also computed [15,132] to compare and evaluate the model over geographically diverse study sites. According to [133], a model's precision level is excellent if the RRMSE < 10%, good if 10% < RRMSE < 20%, fair if 20% < RRMSE < 30% and poor if the RRMSE > 30%. 4. Results In this paper, the results attained for appraising an ANN model coupled with satellite-derived LST data for forecasting monthly and seasonal solar radiation over a group of seven cross-validation sites in regional Queensland are presented. The ANN model is evaluated in respect to an MLR and ARIMA model. To establish whether an ANN was a parsimonious model to accomplish a desired level of accuracy with as few predictor variables as possible, an iterative modelling process was applied to optimise the model's parameters using input combinations, training algorithms and hidden transfer functions where the lowest mean square error for optimum model was sought (Section 3.2). In this section, we first provide the results for monthly forecast- ing, and then proceed with seasonal forecasting evaluation measures based on a number of statistical performance metrics described in Eqs. (12–17). In order to compare directly the forecasted and observed monthly G, Fig. 6(a–g) plots the model's error, GFOR – GOBS for each tested month in cross-validation (2014) period. A scatterplot showing the goodness-of-fit line and its correlation coefficient (r) to depict the extent of agreement between GFOR and GOBS is included. There is compelling evidence that ANN outperforms the MLR and ARIMA models for all tested months. On a month-by-month basis, the forecasting error is seen to exhibit a much larger magnitude for the MLR model especially for the forecasts generated in the month of Table 5 ARIMA with structure (p, d, q) with d = differencing, p and q = order of autoregressive (AR) and moving average (MA) term. Group Training station ARIMA structure AIC Log likelihood Variance Correlation coefficient Parameters (AR1, AR2, AR3, MA1, MA2, MA3) (p, d, q) r AR1 AR2 AR3 MA1 MA2 MA3 Monthly forecasting 1 Gayndah Post Office / Gayndah Flume TM ARIMA (3, 1, 3) 112.96 −49.48 2.905 0.878 0.727 0.681 −0.918 −1.193 −0.513 0.770 2 Emerald Radar / Emerald Radar AI ARIMA (3, 2, 3) 111.13 −49.56 2.575 0.893 1.218 −0.108 −0.505 −2.874 2.874 −1.000 3 Tambo Post Office / Tambo Station ARIMA (2, 2, 3) 95.26 −41.63 1.441 0.947 1.724 −0.990 −2.873 2.873 −1.000 4 Blackall Airport / Blackall Township ARIMA (3, 2, 2) 106.4 −47.2 2.843 0.897 0.782 0.553 −0.828 −1.957 1.000 5 27 Mile Garden / ARIMA (3, 2, 3) 108.55 −47.27 2.464 0.921 1.373 −0.393 −0.351 −2.781 2.778 −0.996 27 Mile Garden TM 6 Charleville Aero / Charleville ARIMA (2, 2, 3) 110.78 −49.39 3.133 0.899 1.727 −0.981 −2.871 2.871 −1.000 7 St George Airport / St George TM ARIMA (2, 2, 3) 107.00 −47.5 2.2546 0.929 1.727 −0.984 −2.880 2.880 −1.000 Seasonal forecasting 1 As Above ARIMA (2, 0, 2) 32.70 −10.4 0.111 0.995 0.027 −0.988 −1.962 0.999 2 ARIMA (3, 0, 3) 13.60 1.20 0.008 0.999 −0.886 −0.980 −0.908 −1.516 0.799 0.091 3 ARIMA (3, 0, 2) −23.10 18.50 0.000 0.999 −0.996 −0.997 −1.808 0.998 4 ARIMA (3, 0, 2) 21.90 −4.00 0.009 0.998 −0.734 −0.915 −0.808 −1.808 1.000 5 ARIMA (3, 0, 3) 18.20 −1.10 0.018 0.997 −0.916 −0.945 −0.973 −1.181 −0.460 0.725 6 ARIMA (3, 0, 3) 10.80 2.60 0.006 0.995 −0.952 −0.956 −1.074 −0.561 0.822 7 ARIMA (3, 0, 3) 36.70 −10.40 0.102 0.998 −0.739 −0.953 −0.786 −1.337 0.017 0.526 Note: Akaike's Information Criterion (AIC) used to identify model in conjunction with log Likelihood, variance and correlation coefficient. R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 837
  • 11. Fig. 6. (Left) Times-series of ANN, MLR and ARIMA model monthly forecasting errors and (right) the scatterplot of forecasted and observed monthly G. (a) Group 1 Gayndah, (b) Group 2 Emerald Airport, (c) Group 3 Tambo, (d) Group 4 Blackall, (e) Group 5 Barradeen, (f) Group 6 Charleville, (g) Group 7 St George Airport. For each scatterplot, least square fitting line and its respective correlation coefficient is shown. Cumulative frequency of ANN, MLR and ARIMA model monthly forecasting errors in G for the seven group of stations pooled together. Note that the percentage for each bracket is shown in the respective error bracket. R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 838
  • 12. August and September. Except for Gayndah (Fig. 6a), the ANN model does not result in significantly large errors relative to its comparative counterparts for the other six groups. It is important to note that the ARIMA model is seen to generate more accurate forecasting of solar radiation compared to the MLR model although its performance remains inferior to ANN for all tested months. Therefore, it is evident that the ANN model can simulate G-values with good accuracy, also verified by higher level of agreement (i.e., r value). In Table 6(a) we evaluate the preciseness of the ANN model in relation to the MLR model where the results for a set of five trial performances over the 2014 period with a combination of predictor variable are shown. It is imperative to mention that the number of inputs in each trial model is increased by one to yield a total of five unique models and performances are validated using r, RMSE and MAE. For all study sites considered, the accuracy of an ANN model appears to have generally improved as the number predictor variables with lagged input combinations of the LST data are fed into the algorithm. This can be verified by the notable increase in correlation coefficient computed between monthly observed and forecasted G and a corresponding decrease in model's generalization error. However, for the ANN model, the improvement in forecasting accuracy appears to have attained an asymptotic state with no further increase in r value or a further reduction in RMSE/MAE values with additional combination of LST after the third lag. In the case of an optimal ANN model (i.e., M4) applied at the Gayndah station, there is a gradual reduction in MAE (≈1.54 to 0.82 Fig. 6. (continued) R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 839
  • 13. MJ m−2 ) and an increase in r (≈0.961–0.962) for predictors defined by LST(t), LST(t – 1), LST(t – 2) and LST(t – 3) relative to single input, LST(t) as the predictor. Similar trend is also evident for the case of Barradeen, Charleville TM and St George stations where a total of four predictor variables based on the original and lagged LST data are required to attain an accurate forecasting. By contrast, when an ANN model is cross-validated for Emerald Airport and Blackall stations, only three predictor variables, and that for the case of Tambo site, a total of Table 6 Evaluation of monthly forecasting models using correlation coefficient (r), root mean square error (RMSE; MJ m−2 day−1 ) and mean absolute error (MAE; MJ m−2 day−1 ). Note that the best model is boldfaced (red). (a) ANN and MLR Model Input combinations ANN MLR r RMSE MAE r RMSE MAE Group1: Gayndah M1 LST (t) 0.961 1.73 1.54 0.957 2.07 1.80 M2 LST (t) + LST (t – 1) 0.939 1.88 1.28 0.806 3.21 2.58 M3 LST (t) + LST (t – 1) + LST (t – 2) 0.949 1.55 1.18 0.801 3.17 2.50 M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.962 1.13 0.82 0.791 3.16 2.48 M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.964 1.38 1.12 0.790 3.02 2.53 Group2: Emerald Airport M1 LST (t) 0.968 1.28 1.05 0.945 2.53 2.32 M2 LST (t) + LST (t – 1) 0.968 1.06 0.81 0.802 3.52 2.94 M3 LST (t) + LST (t – 1) + LST (t – 2) 0.965 1.04 0.78 0.793 3.55 2.95 M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.943 1.42 1.05 0.761 3.59 3.04 M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.969 1.11 0.82 0.799 3.00 2.60 Group 3: Tambo M1 LST (t) 0.958 1.35 1.11 0.948 2.65 2.26 M2 LST (t) + LST (t – 1) 0.926 1.70 1.37 0.868 3.10 2.51 M3 LST (t) + LST (t – 1) + LST (t – 2) 0.939 1.69 1.37 0.856 3.20 2.60 M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.900 1.85 1.35 0.814 3.33 2.76 M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.958 1.39 1.00 0.823 2.99 2.61 Group 4: Blackall M1 LST (t) 0.968 1.13 0.91 0.965 1.57 1.18 M2 LST (t) + LST (t – 1) 0.959 1.18 1.01 0.836 2.46 1.92 M3 LST (t) + LST (t – 1) + LST (t – 2) 0.975 0.93 0.62 0.828 2.53 2.01 M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.907 1.80 1.44 0.787 2.76 2.21 M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.969 1.16 0.95 0.787 2.760 2.12 Group 5: Barradeen M1 LST (t) 0.949 1.53 1.24 0.938 1.78 1.371 M2 LST (t) + LST (t – 1) 0.942 1.77 1.33 0.881 2.29 1.839 M3 LST (t) + LST (t – 1) + LST (t – 2) 0.944 1.82 1.58 0.874 2.35 1.887 M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.953 1.31 0.98 0.853 2.50 2.014 M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.954 1.36 1.06 0.847 2.49 2.016 Group 6: Charleville TM M1 LST (t) 0.944 1.72 1.40 0.919 2.05 1.69 M2 LST (t) + LST (t – 1) 0.940 1.83 1.43 0.814 2.78 2.31 M3 LST (t) + LST (t – 1) + LST (t – 2) 0.938 1.66 1.41 0.806 2.83 2.35 M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.961 1.42 0.95 0.773 3.03 2.46 M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.953 1.45 0.99 0.811 2.80 2.37 Group 7: St George Airport M1 LST (t) 0.944 1.66 1.36 0.926 0.65 0.92 M2 LST (t) + LST (t – 1) 0.963 1.36 1.12 0.880 1.66 0.923 M3 LST (t) + LST (t – 1) + LST (t – 2) 0.960 1.44 1.20 0.922 1.70 1.32 M4 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) 0.967 1.38 0.96 0.895 1.98 1.56 M5 LST (t) + LST (t – 1) + LST (t – 2) + LST (t – 3) + LST (t – 4) 0.952 1.16 1.01 0.899 2.01 1.52 Overall Average 0.963 1.23 1.02 0.943 1.90 1.65 (b) ARIMA Cross-validation stations r RMSE MAE Group1: Gayndah 0.937 1.56 1.35 Group2: Emerald Airport 0.961 1.13 0.87 Group 3: Tambo 0.945 1.57 1.35 Group 4: Blackall 0.823 2.81 2.29 Group 5: Barradeen 0.926 2.00 1.61 Group 6: Charleville TM 0.940 2.41 1.83 Group 7: St George Airport 0.933 2.09 1.40 Overall Average 0.924 1.94 1.53 R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 840
  • 14. five predictor variables, are necessary for accurate forecasting. In accordance with this, the ANN model for each study site appears to respond differently to the prescribed set of variables, which presumably provides different types of predictive features. In agreement with earlier studies (e.g., [15,132]), the findings suggest that the pertinent features in the prescribed data-driven predictive models must be harvested by trialing different combinations of variables of an appro- priate choice (e.g., Table 6) in order to optimise the forecasting model. It must also be noted that different level of accuracy stipulate that the model performance are expected to scale differently for the geographi- cally diverse sites [15] (e.g., Fig. 2; Table 1). As a noteworthy point, the MLR model did not yield similar result where its performance for the cross-validation sites deteriorated with larger number of predictors and a successive addition of the lagged LST data (Table 6). Take for example, Gayndah: the correlation coefficient between the observed and forecasted monthly G decreased from ≈0.957 to 0.790 and root mean square and mean absolute error increased from ≈2.07–3.02 MJ m−2 and ≈1.80–2.53 MJm−2 , respec- tively, as noted for models (M1 and M5) when input variables were increased. While the exact cause of this is not known, it does indicate that the MLR model has a different mechanism to model the solar radiation data compared to an ANN model. In comparison to the ANN model, the MLR model exhibits an inferior performance for all tested study sites. In fact, the forecasting error generated by the MLR model is also significantly larger with a smaller degree of statistical correlation between the forecasted and observed G, which confirms the superiority of the ANN over the MLR model. Table 6b lists the performance of the ARIMA model which in fact, operates as a univariate (i.e., single-input) model. In our study, the ARIMA model has utilised the onsite G data in the training and evaluation phase. In this regard, the ARIMA model is developed by using two training study sites in the respective group (Table 1) with only one set of predictor (G) variable over 2012–2013 and it is then applied to generate the forecasts of solar radiation for the remaining 3rd (test) site in particular cross-validation group. For the seven group of study sites that are tested, the performance of the ANN model far exceeds that of the ARIMA model. More precisely, the metrics are clearly distinguishable, with r values that are larger by ≈2.25–18.46% and a correspondingly lower RMSE/MAE value by ≈8.55–67.01% and ≈10.67–72.92% across the different cross-validation sites. In fact, the best performance of an ARIMA model is recorded for the Emerald Airport station where the smallest difference in MAE and RMSE metrics in respect to the ANN model metric is evident. When averaged over all seven sites, the root mean square and the mean absolute error generated by the ANN model is smaller by over 35% compared to the MLR and ARIMA models, which are inherently reflected in the statistical correlation metric of observed and forecasted global solar radiation (see Table 6). Whilst the correlation coefficient and RMSE are useful goodness-of- fit statistics, the metrics are regarded as suboptimal measures with an inability to compare models between geographically (and climatologi- cally) diverse sites. This is because these metrics are based on the non- normalized values of the forecasting error, and hence may carry little meaning where different distributions of predictor and objective variable(s) are evident [132]. In the retrospective outcome of model evaluation using r and RMSE, it is perceived that the squaring of the relatively larger forecasting errors can over-weigh the influence of the smaller errors based on the sum-of-squared errors used to calculate these metrics [134]. To overcome this insensitivity and over-domi- nance of large over small errors, an assessment of the ANN model with its comparative model is undertaken using the normalized RMSE and MAE where the respective percentages are used. Table 7 shows the Willmott's Index (WI) [134,135] and relative percentage of RMSE and MAE [15,132]. In accordance with earlier deduction (Table 6), in this case we only show the best performing ANN and MLR model with respective combination of the LST data in the optimal case. For the case of an ARIMA model, the zero-lag G values are employed as the predictor with an appropriate differencing and an auto-regressive and a moving average term added to create the optimal model (Table 5a). An interpretation of forgoing results is relatively straightforward. The ANN model dramatically outperformed the MLR and ARIMA models, which is confirmed by marked difference in the normalized metrics. For Gayndah, Emerald Airport and Tambo stations, the optimal performance is attained by an ANN model followed by an ARIMA and then the MLR model (i.e., WI ≈0.959, 0.962 and 0.93 compared to 0.931, 0.960 and 41 (ARIMA) and 0.906, 0.834 and 0.878 (MLR)). Correspondingly, the relative RMSE are ≈5.70%, 5.02% and 6.35% (ANN), ≈8.08%, 5.53% and 7.65% (ARIMA) and ≈10.72%, 12.33% and 12.78% (MLR). While performance of the ANN model demonstrates high level of superiority over the MLR and ARIMA models for Blackall, Charleville TM and St George Airport stations, the MLR model is preferable over an ARIMA model. That is, the MLR outperforms the ARIMA model by a large margin with r≈0.877–0.927 compared with ≈0.798–0.894. An ANN model is also found to be superior when the relative percentage MAE are compared (Table 7). It is interesting to note that for all study sites, the univariate ARIMA model that utilised the G data at the respective sites is superior to the MLR model, indicating that it is a preferable option over the multi- variate (MLR) model although the ANN by all performance measures remains superior to ARIMA. In terms of site-averaged performance, the ANN model is found to yield the highest Willmott's Index (≈0.954), contrasted by a remarkable margin the mean value for the MLR model (≈0.899) and the ARIMA model (≈0.848). The site-averaged RRMSE for the ANN model is notably lower, with 5.85% compared to the MLR (10.23%) and ARIMA models (9.60%). This shows that forecasts generated by the MLR model are significantly out of phase with the observations, as the relative error in fact, exceeded the recommended 10% threshold for an excellent model (e.g., [133]). To establish a complete picture of forecasting skills of the data- driven models, Fig. 6 shows a frequency plot where relative predictive errors over monthly horizon in error bracket of ± 0.5 MJ m−2 on abscissa is indicated. For every bar, the percentage of datum points within the cross-validation (test) set for the seven group of study sites pooled together is indicated. There is sound evidence that the ANN model generated the largest proportion of forecasting errors (≈39%) in the smallest ( ± 0.5 MJ m−2 ) error range whereas the ARIMA model recorded about 25%, and the MLR model generated only 15% of all errors in this error bracket. Moving rightward to the abscissa; the next error bracket (0.50 < error ≤1.00) captured 27% of all errors, which contrasts the ARIMA and MLR models with about 21% and 17%, respectively. In response to the greater proportion of errors that are being recorded in smallest error bracket, the ANN model produces less than 35% of errors in error magnitude larger than 1.0 MJ m−2 . By contrast, the proportion of errors amounted to about 53% (for ARIMA) and about 69% (for MLR). In concurrence with earlier results, it is deducible that the ANN model produces a much smaller proportion of forecasting errors that fall in the larger magnitude bracket, which in fact, typifies that a more accurate level of forecasting is achieved by this method. Fig. 7 shows a boxplot of the model's forecasting error for seven groups pooled together where the least and most accurate results (i.e., Group 5: Barradeen and Group 4: Blackall) is included. In each boxplot, the outliers (indicated by +) that represent extreme values of the forecasting error within the cross-validation set and their upper quartile, median and lower quartile values are also indicated. Overall, the boxplot provides justification that the distributed errors for the ANN model acquire a much lesser spread with a correspondingly smaller magnitude of the quartile statistics and median values com- pared to the MLR and the ARIMA model. In fact, the net shift in the monthly forecasting error values towards larger magnitude for the ARMA and MLR models are clearly consistent with previous result (e.g., Fig. 6; Table 7). While the ANN model remains the optimal model R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 841
  • 15. in terms of a clustered error distribution towards smaller magnitudes, the MLR model performance is significantly inferior to the ARIMA model for all errors pooled together and for errors recorded at the Barradeen study site. In the case of most accurate forecasting results obtained for the Blackall station, the ARIMA model is found to be similar in its ability to forecast the solar radiation compared to the MLR model, although the ANN model is seen to outweigh the performance of both forecasting models. To check the predictive ability of the ANN model coupled with the LST data for seasonal forecasting of solar radiation, we show the normalized metrics in terms of the Willmott's Index, relative root mean square error and mean absolute error. It is imperative to state that the Table 7 Evaluation of monthly forecasting models using normalized errors: Willmott's Index (WI) and relative RMSE and MAE (%). Cross-Validation Site Inputs ANN Input MLR Input ARIMA WI RRMSE RMAE WI RRMSE RMAE WI RRMSE RMAE Group1: Gayndah LST (t) 0.959 5.70 4.44 LST (t) 0.906 10.72 9.84 G(t) 0.931 8.08 7.51 LST (t – 1) LST (t – 2) LST (t – 3) Group2: Emerald Airport LST (t) 0.962 5.02 3.52 LST (t) 0.834 12.33 12.23 G(t) 0.960 5.53 4.09 LST (t – 1) LST (t – 2) Group 3: Tambo LST (t) 0.953 6.35 4.49 LST (t) 0.878 12.78 11.40 G(t) 0.941 7.65 6.38 LST (t – 1) LST (t – 2) LST (t – 3) LST(t – 4) Group 4: Blackall LST (t) 0.971 4.37 2.78 LST (t) 0.942 7.46 5.71 G(t) 0.382 13.3 9.92 LST (t – 1) LST (t – 2) Group 5: Barradeen LST (t) 0.946 6.48 4.71 LST (t) 0.927 8.54 6.83 G(t) 0.914 9.86 7.66 LST (t – 1) LST (t – 2) LST (t – 3) Group 6: Charleville TM LST (t) 0.945 7.29 4.54 LST (t) 0.888 10.45 8.88 G(t) 0.907 12.4 9.39 LST (t – 1) LST (t – 2) LST (t – 3) Group 7: St George Airport LST (t) 0.942 5.79 4.73 LST (t) 0.917 9.33 8.41 G(t) 0.910 10.4 6.62 LST (t – 1) LST (t – 2) LST (t – 3) Overall 0.954 5.85 4.17 0.899 10.23 9.04 0.848 9.60 7.36 Fig. 7. Boxplot of the distribution of monthly forecasting error generated by ANN, MLR and ARIMA model for monthly G-forecasting. (a) All seven groups of stations pooled together, (b) least accurate forecasting results (Group 5), (c) least accurate forecasting results (Group 4). R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 842
  • 16. primary predictor used in seasonal forecasting was the LST (t) data (i.e., with zero lag), together with supplementary input parameters defined in terms of latitude, longitude, altitude and the periodicity for the ANN and the MLR models. This decision followed the cross- correlation results (Fig. 5) where lagged combinations of the LST values were found to be largely insignificant, and therefore, were not applied in seasonal forecasting. In agreement with the monthly forecasting horizons, the ANN models tested for seasonal forecasting also demonstrated significantly better results than the MLR and ARIMA models for all cross-validation sites (Fig. 8a-c). This has been exemplified in the scatterplots of the forecasted data for all stations pooled together, including the goodness- of-fit regression line, y= mx + C based on GOBS and GFOR data. Note that r2 and m close to 1.00 and C close to 0 should be attained for a perfect forecasting model. Despite some degree of scatter between the observed and forecasted G data, a reasonably linear fit is evident, albeit with different levels of accuracy for the three models considered. That is, r2 ≈0.951 for the ANN, 0.896 for the MLR and 0.688 for the ARIMA models whereas the gradient of regression line is ≈1.06 (ANN), 0.985 (ARIMA) and 0.962 (MLR) model. Consequently, the statistics provide first order justification of the superior performance of the ANN over the MLR and ARIMA models applied for seasonal forecasting of global solar radiation. In Table 8 we evaluate seasonal forecasting models using normal- ized error metrics in terms of the Willmott's Index and the relative (percentage) RMSE and MAEvalues. It is immediately obvious that WI for the ANN model is the highest with a range of 0.944–0.976 and the relative RMSE/MAE is the lowest over the range of 4.60–5.80% and 0.03–4.67%, respectively, compared with those of the MLR and ARIMA models. In accordance with [133], the less than 10% RRMSE accedes to an excellent performance of the present ANN model. By contrast, the MLR model yielded an error of RMSE≈5.23–11.65% while that of the ARIMA model was between 2.93–26.50%. In the foregoing research context, it is attested that the ARIMA model forecasts are significantly in error especially for the case of Emerald Airport (Group 2), Blackall (Group 4) and Barradeen (Group 5) study sites where a relative RMSE ≈26.5%, 9.54% and 9.73%, respectively, are attained. While further exploration of this result is needed, it can be construed that the relatively poor performance of the ARIMA model is associated with a less number of datum points (i.e., 4 per year) utilised in the model development phase that may lead to an insufficient number of predictive features. Nevertheless, the RRMSE generated by the ANN model is smaller by over fivefold for the case of Emerald Airport and is between 1.6 to two fold for the case of Blackall and Barradeen, respectively. Similar deduction is made when the correlation coefficient and RMAE values are checked. When the errors averaged over all study sites, the performance of the ANN model remains remarkably superior to that of the MLR and ARIMA models (Table 8). To draw more solid conclusive arguments about the forecasting skill of the ANN model, one can inspect Fig. 9, where a bar graph is used to depict the relative forecasting error for the case of the most and the least accurate study site on a season-by-season basis. With a few exceptions, the ANN model yields the smallest error in forecasting solar radiation. Importantly, there is a dramatic difference in the relative error for MAM and JJA for Charleville station. Notwithstanding this, the ARIMA model performs better than the ANN and MLR model in the spring (SON) season (for Charleville), and the autumn (MAM season for Blackall). However, for all other seasons, the ANN model remain the preferred model (with the lowest relative error). Fig. 10 displays a bar graph of seasonal data. Evidently, the relative percentage error accumulated across these seasons shows a dramati- cally superior result for the ANN model. However, the ANN model error encountered in predicting the global solar radiation is at par with the MLR model, and this is the largest for the summer (DJF) season. In spite of this, the forecasting errors recorded for the autumn (MAM), Fig. 8. Scatterplot of ANN, MLR and ARIMA model performance for seasonal forecasting of G using all station data pooled together. Table 8 Evaluation of seasonal forecasting models using normalized errors: Willmott's Index (WI) and relative RMSE and MAE (%). Cross-validation site ANN MLR ARIMA WI RRMSE RMAE WI RRMSE RMAE WI RRMSE RMAE Group1: Gayndah ID 039191 0.959 4.92 4.67 0.844 9.14 9.06 0.943 7.41 0.03 Group2: Emerald Airport ID 035264 0.957 4.88 2.14 0.622 10.93 11.0 0.075 26.5 3.74 Group 3: Tambo ID 035284 0.952 5.70 3.46 0.726 11.65 10.76 0.988 2.93 0.89 Group 4: Blackall ID 036155 0.944 5.80 0.03 0.956 5.23 4.00 0.884 9.54 −0.90 Group 5: Barradeen ID 044204 0.971 4.76 2.90 0.920 7.09 5.18 0.883 9.73 4.41 Group 6: Charleville TM ID 044205 0.976 4.60 0.18 0.909 7.96 3.32 0.935 7.23 5.42 Group 7: St George Airport ID 043109 0.969 4.81 2.29 0.934 7.07 2.14 0.913 8.39 3.14 Overall 0.961 5.07 2.24 0.844 8.44 6.49 0.803 10.25 2.39 R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 843
  • 17. winter (JJA) and the spring (SON) seasons remain significantly smaller, especially in respect to the MLR and ARIMA models for the autumn (MAM) season. Accordingly, it can be argued that the ANN model integrated with satellite-derived LST data as its predictor variable is a preferred model for seasonal forecasting of solar radiation. 5. Discussion: limitations and opportunity for further research To address environmental, health and economic issues associated with climate shift by promoting more sustainable energy projects, there is a nationwide surge in solar investments [3,6,67,119,136], consistent with the global trends. A desire to attain predictive models incorpo- rated with remotely sensed data for forecasting global solar radiation is driving a lot of research momentum among scientists, who are exploring satellite data-coupled predictive models [6,136] over deter- ministic models with ground-based (on-site) data [22,23]. This precise aim of this paper was thus to model global solar radiation at remote sites in Queensland, Australia using Moderate-Resolution Imaging Spectroradiometer land-surface temperature (LST). By incorporating the LST data over a relatively limited period (2012–2013), this study has designed and validated an artificial neural network (ANN) model and evaluated its forecasting performance in respect to multiple linear regression (MLR) and autoregressive integrated moving average (ARIMA) models. A validation of the simulations over the 2014 period showed the utility of the ANN model coupled with satellite-derived data as an advancement over deterministic models that wholly rely on ground-based variables to simulate the solar radiation [22,23,30,137]. This study, which can be adapted to a location were satellite footprint is available, is an advancement over TMY datasets [58,59] that provide quantitative estimates of solar radiation mostly limited to certain ground-based measurement stations. In spite of the superior performance, it is perceived that the ANN model based on satellite-derived data should be trialed over a much larger spatial area than investigated in this pilot study, as this would be of interest to solar engineers who require quantitative evidence for developing sustainable energy projects over continental-scales. An application of satellite data with a potential to provide instantaneous pixel values of solar radiation with high resolution (up to 5 km) and hourly temporal coverage [67] can help expand the scope, relevance and practical utility of the solar forecasting models. As an additional merit, satellites also observe areas at different wavelengths within small period and pixel resolution [60,61] and the estimated irradiation is more accurate than interpolation techniques with distance to stations that are greater than 34 km for hourly irradiation and 50 km for daily irradiation [138]. In this paper, the ANN model coupled with satellite- derived data was diagnostically evaluated to yield less than 6.0% uncertainty based on root mean square error and high statistical correlation between measured and forecasted solar radiation (r≈0.961) averaged over cross-validation sites (Tables 7, 8). As the forecasting error achieved was less than 10%, this accords to excellent performance [133]. In a practical sense, the improved version of the ANN model can be adopted for justifying whether or not solar energy investments in a remote region are made where meteorological stations may be absent. Consequently, one may also explore the utility of an ANN model for site selection purposes in areas that are eyed for solar energy investments, and addressing solar energy infrastructure and developmental constraints [6,7,136]. In spite of the foresaid merits, our pilot study has shortcomings create opportunity for follow-up work. While we have adopted the MODIS data over a relatively small region in Queensland, a practical implementation of the ANN model requires its validation with alter- native datasets for a wider spatial domain. Follow-up studies can also use Australian Bureau of Meteorology's Geostationary Meteorological and MTSAT satellite data operated by the Japan Meteorological Agency and the GOES-9 satellite of National Oceanographic & Atmospheric Administration for Japan Meteorological Agency [6,7]. In principle, Fig. 9. Bar graph of the relative forecasting error (RRMSE, %) for most (Group 6, Charleville) and least accurate (Group 4, Blackall) stations. Fig. 10. Bar graphs of relative root mean square error (RRMSE; %) for different seasons (DJF=summer; MAM=autumn; JJA=winter; SON=spring). R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 844
  • 18. satellites are able to yield hourly variables over a grid resolution of 0.05° ×0.05° [136]. Additionally, some research has shown that reanalysis data products (e.g., ERA-interim reanalysis of European Centre for Medium Range Weather Forecasts) can be utilised when satellite and station irradiance data are not available [139]. Generally, the ERA-interim data is of good temporal resolution since the variables are available 6 hourly from 1979–present at the highest grid resolution of 1.5°×.5° [140,141]. The identification of alternatives predictor data sources provide an opportunity to combine them in such a way that the performance of the model is better than a model with just the meteorological, satellite or reanalysis on its own [6,7]. The optimisation of models is important as most studies only rely on meteorological irradiance to generate solar estimates over vast topographical and non-homogenous terrains [67,71,124,136]. The scope of this paper was limited in terms of forecasting horizon as we have validated the ANN model for long-term periods. A follow-up study could investigate the model's skill for better temporal resolution (e.g., sub-hourly, hourly and daily) with improved pre-satellite irradiance estimates, sunshine durations and cloud cover [124]. Real-time implementation especially for consumer power grids requires quantitative data for sub-hourly, hourly and daily forecasts. Therefore, the ANN model coupled with satellite-derived data needs to be validated for smaller timescales. In spite of coarse temporal resolution, the ANN model has set a promising pathway for alternative predictors with better resolution to be explored over a larger con- tinental scales than those investigated in this pilot study. It is noted that we have employed a set of 21 regional sites in Queensland, of which 14 have been utilised in model development and/ or training and remainder (seven) for cross-validation (Fig. 2). A ‘universal’ model was constructed to forecast G at cross-validation sites. It is noteworthy that LST data used for cross-validation sites were not used in developing the model (Table 1), yet, a good accuracy was attained (Tables 6–8). The notion of a ‘universal’ model for forecasting at sites closest in distance to the training site(s) show the spatial relevance of our approach in terms of its application at nearby site(s) where predictor data are not unavailable. Indeed, some research (e.g., [138,142]) that compared hourly site/time global irradiances obtained via extrapolation and interpolation of ground sites obtained by processing satellite images with statistical model found that for hourly data, satellite estimated solar radiation becomes more accurate than a local ground station if the distance from the station exceeds 34 km, down from a previously reported 50 km range for daily irradiances and for a regularly spaced network, the breakeven distance is estimated to be 50 km. This shows that satellite-derived irradiance can surpass site/ time specific accuracy achievable with geostationary satellites. Considering this, accuracy of an ANN model applied to nearby sites where models were trained, is expected to scale with the distance and predictive features at the cross-validation sites. A follow-up study should investigate the spatial relevance of model in terms of how the predictive features at cross-validation site could influence the accuracy. In doing so, one must consider the nature and source of errors at these sites emerging from satellite-determination of cloud thickness and atmospheric turbidity, etc. There is no doubt that, our paper found good accuracy with LST data, as evidenced in an error magnitude of less than 10% based on root mean square error and over 96% correlation between observed and forecasted G (Table 7). However, more exhaustive list of predictors based on MODIS and other products (e.g., Geostationary Meteorological Satellite, MTSAT and GOES-9 satellite) could be utilised. In follow-up study, one could incorporate an exhaustive list of carefully screened predictors. Atmospheric properties such cloud cover (including optical thickness, effective particle radius, thermo- dynamic phase, cloud top altitude, temperature, cirrus reflectance), precipitable water, and temperature profiles can be incorporated to establish the relevant inputs [81,84,88]. To increase predictors and add features or predictive patterns into the model, reanalysis products [140,141] which are yet to be explored for solar modelling in Australia, can be explored. It is acknowledged that, in this paper, we utilised a single predictor (i.e., LST) that in fact, did not have a pre-processing algorithm to pre- select the predictive features. Data-driven models encounter challenges with non-stationarity, jumps, periodicity and stochastic behaviour in input variables [143]. Despite the flexibility of the ANN model in handling non-linear data, the accuracy is affected by chaotic behaviour in training variables if features prevail over a wide range of frequencies (e.g., hourly, daily or seasonal scale) [144,145]. In case of ‘daily cycle’ or ‘seasonality’, the absence of an input-output data pre-post-proces- sing scheme be challenging in attaining an optimum model. In this paper we utilised LST without identifying its underlying frequencies, therefore, a hybrid model with data pre-processing technique based on wavelet transformation algorithm can be used to obviate this short- coming [15]. Decomposition of predictors in high (approximation) and low frequency (detail) subset can enhance the model's responsiveness to predictive features [146]. Similar observation was made in a study on solar forecasting for metropolitan and regional sites [15] where a wavelet-based hybrid support vector machine model was more precise relative to a support vector machine model. Research elsewhere support similar deduction [132,143]. In order to broaden the scope, predictors from satellite-derived product (e.g., cloud cover) and reanalysis products (e.g., wind fields) related to solar radiation can be utilised with a non-linear feature selection algorithm that assimilates the relevant features. Statistical methods such as iterative input selection (IIS) [147], bootstrap rank- ordered conditional mutual information (broCMI) [148] and evolu- tionary methods such as grouping genetic algorithm (GGA) [149] and coral reef optimisation (CRO) [150,151] can be applied to rank predictor variables in context of a wavelet-hybrid model. Although hybrid models with feature selection algorithm is known to yield good accuracy in energy applications (e.g., [149–151]), it was beyond the scope of this paper and could form the subject of another independent study. 6. Summary This paper established the preciseness of artificial neural network (ANN) coupled with satellite-derived land-surface temperature (LST) as predictor for forecasting solar radiation for regional Queensland. A limited set of LST data from 2012 to 2014 for 21 stations was utilised, partitioned in seven groups with two stations (2012–2013) in each group for development of model and the third for cross-validation over 2014 period. To enhance the preciseness of the ANN model, several training algorithms and hidden transfer functions were trialed such that the Levenberg-Marquardt training algorithm and logarithmic sigmoid function was finally adopted for forecasting G. To meet the objectives (Section 1.0), ANN model was benchmarked with multiple linear regression (MLR) and autoregressive moving average (ARIMA) models. The findings have been enumerated below. 1. For monthly horizons, different lagged combinations of LST as primary predictor variable was required to attain an accurate ANN model, albeit with a significant degree of variability over different study sites. By contrast, the MLR model required only the LST data with zero lagged data whereas the ARIMA model was developed using the G data at the cross-validation study sites. 2. The performance of the ANN model in terms of its root mean square error and mean absolute error in cross-validation period was dramatically lower than the MLR and ARIMA model with average values of RMSE≈1.23 MJ m−2 , 1.90 MJ m−2 and 1.94 MJ m−2 , whereas the MAE was 1.02 MJ m−2 , 1.65 MJ m−2 and 1.53 MJ m−2 , respectively. The normalized performance metrics yielded a Willmott's Index of 0.954 (ANN) compared with 0.899 (MLR) and 0.848 (ARIMA), and the relative RMSE was only 5.85% (ANN) R.C. Deo, M. Şahin Renewable and Sustainable Energy Reviews 72 (2017) 828–848 845